We all have them. They cause us to
worry incessantly, lose sleep and frequently miss precious time
with our families. They are often the bane of processes that
require liquids to be transferred from one location to another.
These mechanical monsters are pumps that fail repeatedly and
are widely and unflatteringly known as bad
By definition, bad actors are pumps
that fail so frequently that they stand apart from the rest of
the pump population. There are bad actors that have failed as
many as 16 times in one year, some even more often. These
troublesome machines sap precious resources from our maintenance departments and prevent
us from achieving world-class reliability performance.
Chances are these troublemakers
were all carefully selected by well-intentioned vendors and project engineers, and installed
dutifully by construction companies. But the
devil is in the details. Fatal flawsranging from slender
shafts (poor L/D ratios) to poor operating practicescrept
into these pumping systems. They crippled performance and
forced pumps to lead notorious lives.
The inordinate number of failures
experienced by bad actors tends to dramatically skew downward
the mean time between repairs (MTBR) for a plant average. For
this reason, a key strategy for improving plant MTBR starts by
identifying and improving the reliability of ones most
troublesome pumps. This article presents a straightforward
methodology for addressing the most problematic pumps at an
Addressing bad actors
To address bad actors, one must
first define what constitutes such pumps. Usually, definitions
contain a combination of failure rate and repair cost criteria.
For example, one may define a bad actor as any pump that fails
two or more times and has caused more than $10,000 in repair
costs over the previous 12-month period. Of course, these
criteria can be modified to satisfy management preferences.
Some plants also include lost
opportunity costs during the same reporting period. It is
possible to simplify reporting by combining repair cost and
production losses into a single figure called
losses. These multiple criteria tend to cull
nuisance pumps that fail many times each year but do not have a
large annual repair total. By using the multiple criteria of
failure rate and repair costs, one can quickly identify the
pumps having the greatest impact on reliability.
Go after the top. After creating a list
similar to Table 1, one simply sorts in descending order of the
most to the least costly pump. The top 10 on this list
represents bad actors. This list should probably be compiled
quarterly, semi-annually or annually. It is customary to start
by attacking the worst of the bad actors.
Examine the equipment
history. The next steps describe more closely how to
attack each bad actor. Lets examine a hypothetical
data set for a bad actor. To construct a data format similar to
Table 2, one needs to know the date of each failure and the
repair cost for every past failure in the time frame of
interest. A starting point must be defined, as well.
In the following example, the first
failure occurred 15 months after the defined starting time and
the repair cost was $5,000. The next failure occurred 12 months
after the first failure and resulted in a repair cost of
$5,500. This means that the cumulative time (third column) for
the second failure was 27 months and the cumulative repair cost
(fourth column) for the second was $10,500. For each subsequent
failure, you keep accumulating the failure numbers, time and
repair costs, as seen in the cumulative failure, time and cost
columns in Table 2.
Plotting the cumulative failure
number and cumulative repair cost value vs. the cumulative time
will yield a plot similar to the one shown in Fig. 2. One might
call these reliability growth plots because they clearly
illustrate if the failure rate is constant or changing over
time and if the rate of cost to perform maintenance is changing over time. A
constant slope means the failure rate is constant, while a
curving plot means the failure rate is changing. The
reliability growth plot in Fig. 2 shows a constant failure rate
up until months 160 to 170. After that time, the failure rate
and expenditure rate begin to increase and eventually settle
into a new higher failure rate for some undefined reason.
Fig. 1. A troublesome centrifugal
Fig. 2. Reliability growth plot
for a hypothetical
These reliability growth plots
offer a wealth of information. First, the cumulative failure
plot shows if the failure rate is constant or changing with
time. If the failure rate did change, it tells the analyst when
the change occurred. One can discover if the failure rate was
always bad or if it changed at some time in that past.
Similarly, examining cumulative repair cost data allows the
analyst to determine if something changed in the past or if
failure costs have been constant from the beginning.
If there is a defining moment when
reliability decreased, the analyst might ask what changed.
Interviews with operators and mechanics allow us to find
reasons for the observed change in reliability. Field personnel
very often provide key insights that assist in complex root
cause failure analyses (RCFAs). Among the clues, we may find
mechanical and procedural changes, such as:
The nature of the
process has changed
The control scheme was modified in the past
The seal flush source was modified due to process
Interviewing personnel close to the
equipment is a great way to uncover subtle issues that may be
affecting reliability performance. Here, then, is a telling
example involving pumps that were failing every few months. It
was discovered that a production engineer decided to eliminate
the use of an external seal flush because he felt it was
contaminating the process. After convincing him to reinstate
the flush at a lower, friendlier rate, seal life returned to
the anticipated norm.
Suppose the general trends observed
on reliability growth plots are fairly
constant over the operational lives of the pumps in question.
It would then be fair to assume there is something wrong with
the basic design of the pumping system. Possible causes may
Poor L/D ratio
Poor pump selection
Excessive piping strain.
The reliability growth plots also
tell reviewers how much the pumps are costing. In this
particular example one can quickly conclude that $126,700 was
spent over a period of 219 months. This equates to an annual
rate of $6,942. The annual rate of expenditure conveys the
value of solving the problem. If one were to assume that annual
repair costs can be reduced to 25% of the starting value, one
might expect to save about $5,200 per year. For a two-year
payback needed to justify capital expenditures, spending about
$10,000 on a solution would be justified.
To ensure an acceptable return on
investment, the author tries to avoid working on pumps that
have annual repair and process losses below $10,000. Although
it is often assumed that finding economic justification of
reliability projects for pumps with annual
losses less than $10,000 is next to impossible, this rule will
not hold whenever simple seal improvements, bearing upgrades or
procedural changes are involved.
Conducting detailed design audits and
RCFAs. The next step in dealing with troublesome pumps
requires conducting a design, installation and performance
audit. Such an audit involves:
Reviewing the pump
selection, driver selection, seal design, piping design and
control system design
Conducting a detailed vibration analysis of the
pump, motor and piping system
Reviewing the base plate and foundation
Assessing current hydraulic performance vs. what
was expected or ascertained on earlier occasions.
It can be said that this phase of
an audit includes ascertaining that the correct pump and system
design are used for the service in question. The cold eye
review will often be appropriate. It refers to the fresh
assessment of a system or process by an experienced, unbiased
third party. This party could be another pump engineer or
technician accompanying the audit engineer on his or her field
visit and inspecting the pump in question.
The intent of the cold eye review
is to look for anything that might be considered unacceptable.
Excessive vibration, lack of piping supports, inattention to
thermal growth and absence of pressure gauges are among the
many things noted and requiring remedial action. After living
with a problem pump for a long period, we can become oblivious
to issues right in front of us. The cold eye review can help
uncover potentially important issues that were overlooked by
those living and working close to chronic bad actors.
Once the analyst has reviewed the
failure history and conducted a design audit, some seemingly
elusive contributing factors begin to stand out. The next
analysis step requires us to determine the root causes of
failures. It is important not to stop at a physical root cause,
such as the pump failed due to a bearing failure or shaft
failure. A good investigative team will uncover any latent root
causes, ones that often lurk beneath the figurative surface.
The key point here is that an investigative team must be
open-minded during the data collection and evaluation. Parts
fail for a reason and the decisions of people led to whatever
issues we now experience. Your goal is to seek the truth and
back it up by science.
Determine a path, then track progress. Once
the root cause and contributing factors are established by the
team, it is time to formulate a plan of attack. It has been
said that less is more. In other words, it is easier to sell
two recommendations to management than 20 recommendations. It
is also easier to implement two recommendations than 20. This
doesnt mean that no more than two recommendations can be
made. It simply means that, by only presenting the highest
priority recommendations to management, ones chances of
securing approval dramatically improve.
Dont be afraid to fail; we
all fail occasionally. The best approach involves gathering
lots of data, analyzing the data in exhaustive detail, and
using a repeatable and structured RCFA approach. The RCFA
process is a process of continuous improvement. Some problems
are so complex that they may take several tries to solve.
After obtaining management
approvals, it is time to implement remedial recommendations in
a timely fashion and to track the benefits of the improvements.
Proof of success will be seen in an updated reliability growth
plot, where, hopefully, reliability improvements are manifested
Whenever clear improvement is seen,
the news deserves to be published. Management, operating
personnel, and contributors will be motivated to continue
working toward reducing, and even eliminating, bad actors until
plant-wide MTBR targets are reached.
Critically important steps
There are seemingly insignificant
buying decisions and other events that can occur during the
early life of a pump that eventually lead to below-average
reliability performance. However, reliability improvements and
systematic upgrading of weak links can turn things around.
Successful reliability improvement programs require that latent
root causes be identified and corrected. Starting with
ones most troublesome pumps, failure lists must be
systematically reduced until world class reliability is
achieved. Remember these critical steps in bad actor
Define, list and compare
Go after the top bad actors
Examine the equipment
Conduct a detailed audit
Perform an RCFA
Determine a path forward
Track your progress.
There will always be another pump
failure from which to analyze and learn. Every failure should
be considered an opportunity to learn more about equipment,
processes and systems, and improve them.
Robert Perez is the author of
Operators Guide to Centrifugal Pumps and
co-creator and editor of the PumpCalcs.com website. He
has more than 30 years of rotating equipment experience
in the petrochemical industry and
has numerous machinery reliability articles to his
credit. Mr. Perez holds a BS degree in mechanical
engineering from Texas A&M University at College
Station and an MS degree in mechanical engineering from
the University of Texas at Austin. He holds a Texas PE