Process hazard analysis (PHA) and mechanical integrity (MI)
programs are two essential elements used in the Process Safety
Management (PSM) Standard to prevent or to minimize the
consequences of catastrophic toxic, reactive, flammable or
explosive chemical releases. In instances where process
containment is essential for maintaining process safety,
equipment that does not meet reliability expectations is more
likely to be involved in a PSM failure. When a PSM failure
occurs, the equipment maintenance history often exposes
the failure as an accident that was waiting to happen.
The MI program and related safeguards must control the
consequences of equipment failure process safety hazards
identified during a PHA to an acceptable level. A quantitative
approach to evaluating equipment failure risk can be used to
determine the reliability needed to adequately
prevent or minimize potential process safety consequences.
Additional safeguards are needed when the MI program alone
cannot realistically achieve an acceptable level of equipment
performance. This article explains how one refinery uses a quantitative
approach to satisfy PSM objectives on potential releases
represented by process pump mechanical-seal failures.
Equipment reliability impacts process safety
OSHAs PSM Standard (CFR 1910.119) details the
requirements for preventing or minimizing the consequences of
catastrophic releases of toxic, reactive, flammable or
explosive chemicals. PHA teams assemble to identify and
evaluate hazards that represent the potential release of
dangerous materials as described in subpart e of
the PSM StandardCFR 1910.119(e). As part of the
PHA process, action items are assigned to manage the
consequences of identified hazards to an acceptable level. It
is not uncommon for historical PHAs to be evaluated in response
to an accidental release of potentially hazardous process
materials. This evaluation is triggered to understand how
previous PHA teams assessed the hazard. It is, therefore,
possible to interpret any failure that results in the
accidental release of potentially hazardous process materials
as a PHA team failure. More precisely, it indicates that the
previous PHA teams may have failed to adequately identify,
evaluate and control the hazard. Its potential consequences
were, therefore, left for discovery by the process safety
failure that exposed them.
Many, if not most, process safety failures are preceded by
lower consequence repeat failures. These low consequence
failures can form a failure rap sheet documented in
the equipment work order (WO) history of a plants
computerized maintenance management system (CMMS).
When a process safety failure is investigated, a persistent
history of failures stands out as an obvious warning of greater
risks that should have been identified and controlled. However,
prior to the process safety failure, these lower consequence
failures can easily be seen as a normal part of equipment
operation and maintenance. This type of normalization of
deviance received much attention within NASA after the
loss of 14 astronauts and two space shuttles. Evidence of it
can also be found in the hydrocarbon processing and manufacturing
industries. Consider the following examples of catastrophic
releases caused by centrifugal pump failures.
Several case histories demonstrate the need for responsive
risk mitigation efforts.
Alkylate pump fire.
A fire and explosion occurred at a large New Mexico refinery
on April 8, 2004.1 The fire ignited upon the
catastrophic release of flammable process liquid following a
centrifugal pump mechanical-seal failure. The failure caused
six injuries as well as extensive property damage and business
The fire resulted due to loss of process containment from
one of a set of three centrifugal pumps in alkylate
recirculation service. Alkylate is a mixture of light
hydrocarbons typically in the C4 (butane) to
C8 (octane) range. Process material leaking from the
pump at 350°F autoignited upon contact with air. This
failure was attributed to misapplication of energy control
after the pump was removed from service to address a process
fouling problem. More specifically, the release occurred when
mechanics that were scheduled to replace a defective mechanical
seal began disassembling the pump on location.
The three alkylate-recirculation pumps at this facility had
a recurring history of seal failures. In its formal report, the
US Chemical Safety Board (CSB) designated this incident as an
MI program failure. In the report, the CSB cites 23 WOs issued
to address seal failures in the three pumps in a 12-month
period leading up to the catastrophic process release.
(Note: A summary table was published in
Hydrocarbon Processing, May 2010, p. 9). The CSB makes
the valid argument that an effective mechanical integrity
program would have investigated and resolved the problems that
were repeatedly causing the (alkylate) pumps to fail.
Instead, the investigation showed that any opportunity to
prevent the failure through reliability improvement was
substituted with breakdown maintenance. In other
words, maintenance was used to address the problems caused by
pump failures rather than addressing the underlying causes that
would have resulted in satisfactory pump performance,
sufficient to reduce the risk for a process safety failure.
The potential for a process safety failure increases during
shutdowns for maintenance.2 This case history
illustrates how energy-control defects experienced during
routine maintenance activities can interfere with safe work
execution. For this reason, multiple safe-work practices like
lockout-tagout and confined space entry policies are often used
to mitigate additional risk when equipment is shut down for
maintenance. But it is not unusual for process safety failures
to occur on equipment in continuous operation when these
additional safety precautions typically do not apply. The next
case history illustrates how similar MI defects can be involved
in process safety failures regardless of the equipments
No. 2 fuel-oil pump fire.
A fire occurred in a distillate hydrotreating unit operated
by a large US Midwestern refinery on Nov. 15, 2004 (Fig. 1).
The fire ignited upon the catastrophic release of flammable
process liquid following a centrifugal pump mechanical-seal
failure, as shown in Fig. 2. The failure resulted in one
OSHA-recordable injury along with extensive property damage and
significant business interruption.
Fig. 1. No. 2 fuel-oil
Fig. 2. No. 2 fuel-oil
The fire ignited when the No. 2 fuel oil, a diesel-range
hydrocarbon mixture with carbon-chain lengths ranging between
C10 and C20, leaked from the pump through
a failed mechanical seal. Similar to the pump failure described
previously, the 600°F process material leaked out above its
autoignition temperature (about 500°F) and caught fire
immediately upon contacting oxygen.
The injury occurred during the emergency response to the
fire. The first responder began applying water to the pump fire
without first increasing his personal protective equipment
(PPE) level. At some time during the response, an injury
resulted from smoke inhalation. However, within two hours, the
fire was extinguished and the emergency situation was brought
under control without any further safety consequences.
Reviewing the maintenance record of the pumps involved in
this event revealed a long history of thrust bearing failures
and seal leaks (see Table 1) similar to the pumps examined in
the first case history. Likewise, the cause of the fire was
determined to be a catastrophic mechanical-seal failure.
However, the physical evidence collected at the unit after the
fire indicated that seal damage was a secondary effect and had
been preceded by catastrophic thrust-bearing failure. The
primary failure had caused uncontrolled shaft movement in the
axial (thrust) direction, which then destroyed the mechanical
Eventually, the investigation team was able to link together
the probable causes of unstable hydraulics at the pump
installation. The failure mechanism was introduced by operating
the No. 2 fuel-oil pumps in continuous parallel service.
Originally, the pumps were designed for single-spare operation.
However, through years of growth and unit debottlenecking
efforts, the pumps were continuously operated in parallel to
overcome rundown piping pressure constraints. In the parallel
operation, the pumps rotating elements came under
constant stress. Before the fire, this failure mechanism was
adequately managed by condition monitoring and frequent
repairs. But, eventually, a fire in an operating unit and an
OSHA-recordable injury settled any debate over the potential
consequences for accepting poor pump reliability in this
service. Although this particular installation had been
examined twice previously by PHA teams in accordance with OSHA
regulations, the hazard remained hidden until the process
safety failure exposed it.
Although these two separate failures occurred in different
facilities, in different services,
at different times, and under different process operating
conditions, the common thread of below-expectations reliability
runs between them. In both cases, it is easy to look back on
events as an accident waiting to happen. Unfortunately, in
neither case was the MI program able to prevent repeat failures
that eventually resulted in an unacceptable, non-discretionary
process safety failure.
In both cases, the owner-operators of the unreliable
equipment were in full compliance with OSHA 1910.119(e)
governing the use of the PHA program to detect hazards that
could result in the potential release. However, in both cases,
the PHA program failed to identify and adequately control the
hazards that ultimately resulted in the failure. Additionally,
the MI program (OSHA 1910.119(j)) was unable to achieve a level
of equipment reliability sufficient to offset any PHA defect.
The MI program is just as important for the PSM standard to
achieve its objective as any of its other elements.
Conservative, but reasonable?
Typical PHA team members do not take lightly their
responsibility to identify hazards. Rendering their services in
PHA meetings requires a considerable amount of time away from
their normal responsibilities. They participate with the
intention of adding value by detecting, assessing and
controlling any hidden hazards to protect themselves and their
fellow workers. They, therefore, take their PHA performance
very seriously and are committed to learning from their
mistakes. By learning, they can add more value in future PHA
meetings. Should PHA teams or team members be criticized
(publicly, privately or interpersonally) for having failed to
avoid an incident? It is common practice for the teams to
err on the safe side in future PHA meetings.
In some cases, this conservative response may be
appropriate. For example, it is both reasonable and important
to expect drastic changes when a facility learns of an
operation that is contrary to industry policies or standards.
Conversely, it would not be realistic to view all potential
hydrocarbon releases equally. Yet, this is what some teams do
upon recognizing their failure to generate action items
sufficient to mitigate the potential risk of a process safety
failure in previous PHA meetings. Merely piling on more action
items may or may not add value.
Addressing them may create an illusory image of improving
workplace safety while not really making progress on mitigating
hazards that truly represent unacceptable risk.
Recent events in process safety failure show how dangerous
it can be to develop initiatives around safety items that
represent little or no incremental value. This situation was
brought to British Petroleums (BPs) attention after
the refinery explosion at its Texas City, Texas, facility on
March 23, 2005.3 It is not that BP was not concerned
about, nor investing in, process safety improvements. The
unfair truth about process safety is that there is no reward
for hard work. To avoid a process safety failure, the effort
must be properly directed. A safety program will fail if it
focuses employee attention on the wrong things. The illusion of
a safe workplace is destroyed when a catastrophic failure
exposes a persistent, unacceptable risk as an accident waiting
to happen. Working on the wrong things creates a distraction
from the greater and more realistic process safety threats that
should be resolved first.
Although the argument could be made that safer pump
operation results from upgrading with more robust seals,
bearings and monitoring systems, doing so is probably not the
most deliberate way to achieve process safety. In many cases
upgrades offer no incremental improvement unless they address a
deficiency that causes the pump to perform below justified
life-cycle expectations.4 Indiscriminately upgrading
pumps can consume a considerable amount of resources with the
intention of making a system safer, while creating a
distraction from other process safety hazards that often
represent even greater risk.
Risk is a function of frequency and consequence.
Not all centrifugal process pump failures represent the same
risk. For example, a hydrocarbon pump operating in the middle
of a congested process unit may not represent the same
potential consequences as a pump moving similar process liquids
in a remote location away from an operating unit. Likewise, the
high-temperature gasoil (GO) fraction that leaked in the second
case history may not represent the same potential consequences
as a leaking GO fraction cooled below 300°F, downstream
from a rundown cooler.
The point here is that assessing the potential consequence
of a catastrophic pump failure is not a binary process. A risk
assessment (RA) is not performed by simply asking whether or
not the pump contains hydrocarbon. The consequences of a
catastrophic pump failure are a function of several critical
factors. Some factors include the type of process material,
leak rate, failure location and temperature. Additionally,
assessing the failure frequency can be aided by determining
what is in the CMMS before a process safety failure triggers an
investigation. This information makes it possible to detect an
accident waiting to happen before it
What does good look like?
The two case histories given here illustrate scenarios where
a high frequency of seal failures preceded a catastrophic
chemical release that defeated PSM objectives. These are
considered MI program failures because the MI program did not
drive the equipment-failure frequency sufficiently low to
mitigate the risk for a process safety failure.
Remember: Risk is a function of frequency and
consequence. Therefore, driving the risk for a safety process
failure down to zero (the goal of a zero-injury
workplace) simply involves reducing the equipment-failure
frequency to zero. Unfortunately, this can only be achieved by
shutting down equipment for which failure may result in PSM
consequences. Even the most reliable equipment represents risk
as long as it is operating.
Although most companies would immediately shut down
equipment found operating unsafely, few industrial enterprises
would voluntarily shut down a machine to guarantee their
zero-injury workplace goal. After everything is
shut down, nobody gets hurt at work because nobody goes to
work. It is more satisfying to set an acceptable risk tolerance
and understand what exactly needs to be done to achieve it. By
assigning risk and consequence, it becomes possible to
establish equipment-reliability targets based on the
relationship between risk, consequence and failure frequency.
This is a much more rewarding alternative to achieving safe
equipment operation. It represents an approach that helps facilities manage their MI program
with performance expectations that are aligned with equipment
failure risk tolerance.
Standardized approach to risk
An RA tool was constructed to evaluate the risk represented
by process releases resulting from catastrophic pump failures.
The guideline was developed to be consistent with, and borrows
heavily from, the approach defined in API Publication 581
Risk-Based Inspection (RBI) Base Resource Document.4
RBI is a widely accepted method currently practiced across the
refining industry. Although API 581
applies primarily to fixed equipment, the approach has many
parallels that apply to failure RA for rotating machines.
Accordingly, the standard RBI components are supplemented with
data, methods and tools more specific to centrifugal pumps when
A standardized RA approach reduces the inconsistency that
different PHA teams may encounter at different times. More
importantly, a standardized approach adds value by connecting
the reliability of a specific pump installation to process
safety risk tolerance. The benefit comes from determining a
realistic target for the MI program to achieve, instead of
motivating reliability professionals to achieve their safety
goals with nonspecific targets like work harder, or
do better or fail less. Setting a
tangible reliability target allows a responsible decision to be
made as to whether or not risk tolerance can be achieved
through the MI program alone. If the MI program cannot
realistically achieve the desired level of risk control, then
additional layers of protection must be added to manage the
risk to an acceptable level.
In some cases, the MI program may adequately drive risk to
an acceptable level without requiring any additional safeguards
or improvements. At such a time, the PHA team has a basis to
conclude that no further actions are needed to mitigate the
potential hazards associated with a catastrophic pump failure.
In short, the existing safeguards have been evaluated and are
considered adequate. The process of evaluating the potential
risk associated with catastrophic pump failures begins with
determining an acceptable level of risk. This prevents the PHA
process from defeating its purpose by creating action items
that consume available resources that should be working on
resolving more important process safety risks.
Fig. 3 shows the basic process used to evaluate the risk
represented by a catastrophic centrifugal pump seal failure.
The analysis begins with a technical pump risk
assessmentrisk-based pump analysis ( RBPA). This step is
performed according to the consequence analysis and likelihood
analysis methods described in API 581 Sections 7 and 8.
Afterward, a quantitative layer of protection analysis (LOPA)
is used to compare the specific pump risk against an acceptable
risk tolerance. This makes it possible to develop a reliability
plan to operate the pumps within risk tolerance.
Fig. 3. Catastrophic
pump-seal failure risk
assessment method overview.
Catastrophic pump-seal failure and consequences.
OSHA data from 1992 to 2009 contains a record of 36
catastrophic releases of highly hazardous chemicals that
resulted in fatalities.5 These incidents are
responsible for 52 fatalities and 250 employee injuries.
Ninety-eight of these injuries were severe enough to require
hospitalization. One of these incidents involved a process
release that occurred while steaming-out a pump casing. The
pump casing split open, resulting in a hot oil release that
immediately exploded (Jan. 19, 2005, Kern Oil Refinery,
Bakersfield, California). The conditions present during this
failure are similar to those that the CSB documents in the
first case history. However, none of the fatal incidents
contained in the OSHA database resulted from a pump-reliability
It would not be responsible to conclude that a catastrophic
pump seal failure could not result in a fatality based on these
historical statistics. The second case history illustrates the
potential for pump-failure mechanisms to be directly involved
in a process safety incident capable of causing severe
consequences. Although there is insufficient data for a
straightforward fatality frequency calculation, enough
statistical information exists to estimate a minimum frequency
based on site-specific data and industry averages. A
frequency/consequence diagram, such as the one shown in Fig. 4,
can be constructed using this information along with these
facts and assumptions:
A total estimated 2009 refining capacity of 17.67 million
The relationship of approximately one fire
for every one thousand repairs, as cited by an industry reliability authority.
7,8 This was corroborated by a large US refinery in 2009.
Fig. 4. Catastrophic
According to this analysis, the frequency for a fatality
(highest severity consequence) is estimated to be lower than
1x10-6 (1/1 million) years. This frequency suggests
that a fatality caused by a pump-reliability issue is probably
more likely than an airline fatality but less likely than other
typical US workplace fatality causes.9 Based on the
industry workplace fatality statistics contained in the OSHA
database, this relative ranking seems reasonable.
This information makes it possible to define risk tolerance.
Risk tolerance (or literally tolerance to risk)
implies that the choice has been made to operate equipment in a
responsible manner rather than shutting it down to mitigate a
process safety failure risk. Risk tolerance will vary between
different organizations. It is a decision that should be made
under the direction of legal counsel and supported by industry
Risk-based pump analysis.
Fig. 5 shows the primary steps involved in the RBPA. In the
RBPA, results from the consequence analysis are combined with
the likelihood analysis to determine the risk associated with a
catastrophic pump failure. Comparing actual operating risk
against a designated risk tolerance makes it possible to assess
risk reduction options that may adequately control the process
safety hazard. To be effective, the risk reduction options must
directly address the factors governing process safety.
Fig. 5. Risk-based pump
The consequence analysis is covered extensively in API 581
RBI Base Resource Document Section 7. It is used to calculate
the release area that would develop upon a loss of process
containment caused by a catastrophic equipment failure. In this
case, the RBI principles of API 581 Section 7 are being applied
to potential releases caused by a catastrophic pump failure.
Fig. 6 outlines the recommended approach for working through
the consequence analysis using the methods described in API 581
Fig. 6. Consequence
The analysis should be based on a representative fluid and
should assume that typical refinery pump service is constantly
changing and the process material properties being evaluated
may be best described as an estimate of average operating
conditions over a time period. API 581 breaks process fluids
down to a discrete number of representative fluids. This level
of detail is sufficient for the consequence analysis.
The flow area for a major leak is represented by an annular
area between the shaft sleeve and the closest fixed dimension
of the pump casing or packing gland. The OD of the shaft sleeve
and the ID of the closest fixed dimension of the pump casing or
packing gland are determined from the seal manufacturers
detailed drawing as illustrated in Fig. 7. These dimensions are
then used to calculate a major seal failure leak rate.
Fig. 7. Simplified seal
The likelihood analysis is described in detail by API 581
RBI Base Resource Document Section 8. Its purpose is to
generate an initiating event frequency for both the major and
full bore leak scenarios. The likelihood analysis described in
this study makes use of generic initiating event frequencies
(IEFg) that are based on the empirical data
shown in Fig. 8. This figure is based on catastrophic pump
failure data placed into the public domain by multiple
sources.1018 This information covers a wide
range of leak rates from minor leaks (low severity) to full
bore leaks (high severity). The middle area of the chart
represents the major leak range.
Fig. 8. Generic
centrifugal pump leak
The likelihood analysis is performed by 1) selecting an
appropriate IEFg based on the analysis
represented in Fig. 9 then 2) adjusting the
IEFg based on the specific pumps
actual reliability history (MTBFa see
Eq. 1) compared with the standard reliability of a generic
refinery process pump (MTBFg). This
adjustment is made according to Eq. 2, which produces the
initiating event frequency for a specific pump installation
Fig. 9. Creating a
risk-based reliability plan.
The risk analysis takes place as the LOPA that assesses the
pump operating risk against the designated risk tolerance. Its
purpose is to determine if a pump installation meets its
reliability expectations. This is true if the frequency of
mitigated consequences is less than the designated risk
tolerance. If the frequency of mitigated consequences is more
than the designated risk tolerance, then guidance should be
suggested to improve performance to meet process safety
Probability of personnel in affected area.
The probability of personnel in the affected area,
Pp, is a function of the size of the
affected area, Aa, and the amount of time
personnel are likely to be in this area. There are causes of
catastrophic pump failures that increase the probability of
personnel being in the affected area at the time of the event.
An example may be an abnormal process condition (such as flow
loss) where the console operator calls for the outside
personnel to respond. There are also causes that are random in
nature where the probability of personnel in the affected area
is based on the average amount of time that people are in the
area on any given day.
Failure cause distribution estimates for centrifugal pumps
in US process plants indicate that approximately 12% of
failures are caused by improper operation.1019
Some of these causes result from chronic poor operating
practices that reduce pump reliability. They may have been
normalized over time and do not result in an operator response.
An example may include cavitation noises caused by low
NPSHa operation or long-term flow outside of
recommended reliability limits.20 An estimate of 10%
of the causes of major releases that result in increased
occupancy of the affected area is assumed for this analysis.
The remaining random occupancy that does not increase the
probability of personnel in the affected area would therefore
An estimated random occupancy of 1 hr/day/1,000
ft2 is assumed for normal process areas. This
estimate should be modified if there is evidence of higher or
lower occupancy. Remote areas that are not frequented with
multiple rounds a shift will be less. Affected areas that
include known high-occupancy zones will be greater. Any basis
for choosing a different random occupancy should be documented.
This random occupancy is further simplified to a probability of
0.04/1,000 ft2. By combining cause generated
occupancy with random occupancy an overall probability of
personnel in the affected area can be determined by Eq. 3.
Pp = 0.10 + Aa
(0.04/1,000 ft2) (3)
Probability of ignition.
API 581 reports probability of ignition,
Pi, for five potential outcomes in tables.
The proper table in API 581 Section 7 should be selected based
upon the process leak assessment made during the consequence
Risk-based reliability plan.
The output from the RBPA feeds into a process for managing
the risk of a process safety failure. The frequency of
mitigated consequences, Fm, is the product
of the frequency of the specific pumps initiating event
frequency, IEFa, the total probability of
failure on demand for each independent layer of protection,
PFDt, the probability of personnel in the
affected area and the probability for ignition as calculated in
Eq. 4. If the frequency of mitigated consequences,
Fm, is higher than the designated risk
tolerance, then a risk-based reliability plan must be developed
to manage the risk for a process safety failure. This can be
accomplished by either increasing the pumps reliability,
MTBF, or by applying safeguards sufficient to mitigate
the consequences of a catastrophic pump failure to an
acceptable level. The basic process used to develop a
risk-based reliability plan is shown in Fig. 9.
Fm = IEFa
The LOPA results designate a target MTBF for meeting a
designated risk tolerance. MTBF improvements have a number of
advantages. For example, they reduce both maintenance costs and the potential
to introduce some major leak failure modes during repairs like
the one described in the first case history. MTBF improvements
are typically preventive instead of reactive. However, it may
be difficult to quantify the expected MTBF improvement
available through failure analysis and investigation. Failure
analysis skills, training and methods are involved in
developing an effective set of corrective actions to increase
MTBF. This depends greatly upon the failure investigators
Machinery engineers must be consulted to determine if the
MTBF improvement is realistically achievable. Consideration
should be given to proven technology and both industry and
personal experience with the process requirements. MTBF
improvements can be applied together with additional safeguards
to meet the overall risk tolerance criteria. If MTBF
alternatives are selected as a part of the strategy to meet the
risk tolerance, MTBF becomes a part of the process safety risk
management for the pump group under consideration. It should be
managed with the same diligence and priority as defined by safe
Case history in preventive risk mitigation.
An investigation was used to determine the cause for a
series of recurring seal and thrust bearing failures on two
heavy vacuum gasoil (HVGO) service pumps operating side-by-side
in a refinery vacuum crude unit. The maintenance history of
these pumps is shown in Table 2. The investigation determined
that high frequency vibration caused by vortex cavitation
suction recirculation (VCSR) was responsible for the low MTBF.
Based on this diagnosis, an action item was created to increase
the pumps NPSH margin ratio to reduce the cavitation
forces responsible for excessive stress on the thrust
Addressing this action item would require either redesigning
or replacing the pumps at considerable expense. Based on the
resulting maintenance expenses, other competing reliability
improvement projects offered a greater return on
investment. Therefore, it was decided that the risk of
catastrophic HVGO pump failure should continue to be managed by
repairs. The repairs were to be triggered by condition
monitoring until the higher priority reliability improvement projects could be completed.
The potential consequences of HVGO leaks in, vacuum crude
unit service are not comforting (Fig. 10). A leak of sufficient
size would likely autoignite upon contacting air. The
consequence for a catastrophic pump failure represents a
potential PSM incident in addition to property damage and
business interruption. But condition monitoring seemed to be an
acceptable approach to managing the risk for a catastrophic
HVGO pump failure based on previous operating history.
Fig. 10. Catastrophic HVGO
Upon developing the RBPA guidance, the HVGO pumps were
reevaluated to verify that the reliability strategy was in
agreement with refinery risk tolerance. The analysis showed
that the pump group was one protective layer short at its
present MTBF, and its reliability would have to be increased to
at least six years MTBF to operate within refinery risk
tolerance. This immediately changed the basis for the project
from a reliability improvement opportunity to a process safety
risk mitigation project. The priority of the HVGO pump project was elevated and an
execution date was scheduled.
Disclaimer and conclusions.
The guideline and methodology discussed in this article
attempts to be generally applicable to all centrifugal pumps.
However, good engineering judgment must prevail while applying
this guideline. The approach can be modified as appropriate
following a recommended peer-review and documenting the
technical basis for deviations.
Tolerating repeat failures on machinery that contains
potentially hazardous process materials can have disappointing
consequences. However, it is not uncommon for equipment
failures to be accepted without comparing actual reliability
performance against a designated risk tolerance. In cases where
breakdown maintenance is the option selected to manage the risk
for catastrophic process releases, a definitive and objective
basis is needed to expose a potentially unacceptable process
safety hazard before an incident occurs.
A standard RA method can be developed to evaluate pump
reliability on the basis of managing its failure frequency
sufficiently low to realistically avoid a process safety
incident. However, the MI program by itself may not
sufficiently elevate equipment reliability to a level where
process safety consequences can confidently be prevented. Risk
tolerance ultimately determines the complete plan needed to
fully address a process safety risk. In many cases, a complete
plan represents a combination of reliability (MI program)
improvements and safeguards (layers of protection).
The RBI method described in API 581 Sections 7 and 8
provides a sound engineering basis to do risk analysis on
equipment whose failure may represent process safety
consequences. This information can be supplemented with site
specific failure data and industry statistics to develop RA
criteria for centrifugal pumps operating in the petroleum and
chemical processing industries. Practicing this approach to
process safety is expected to provide more satisfying and
effective results than dedicating resources to random safety
improvements that may ultimately fall short in avoiding a
process safety failure. HP
This is an updated and refreshed version of the original
paper presented at the American Institute of Chemical Engineers
2011 Spring Meeting, 7th Global Congress on Process Safety,
Chicago, Illinois, March 1316, 2011.
1 Oil Refinery Fire and Explosion,
2004-08-I-NM, U.S. Chemical Safety and Hazard Investigation
Board, October 2005.
2 Safe Ups and Downs, 1st Ed. (booklet),
Standard Oil Co., 1960.
3 The Report of the BP U.S. Refineries
Independent Safety Review Panel, January 2007.
4 Risk Based Inspection (RBI) Base Resource
Document, API Publication 581 1st ed., May 2000.
5 OSHA National Emphasis Program
Directive, CPL 03-00-004, June 7, 2007,
http://www.osha.gov/ (accessed December 26, 2010).
6 Refining Capacity Report, January 1,
2009, National Petrochemical & Refiners
Association, August 2009.
7 Bloch, H. P., Understanding canned motor
pumps, Lubrication Management and Technology, September/October
8 Bloch, H. P., Pump statistics should shape
strategies, Maintenance Technology, October
9 Layer of Protection Analysis Simplified Risk
Assessment, Center for Chemical Process Safety of the
American Institute of Chemical Engineers, 2001.
10 OREDA, Offshore Reliability Data, 4th
and earlier eds., SINTEF, 2002.
11 HSE, Offshore Hydrocarbon Releases Statistics and
Analysis, 2002, HSR 2002 02, 2003.
12 Mannan, S., Lees Loss Prevention in the
Process Industries, 3rd ed., 2005.
13 Cox, A.W., F.P. Lees, and M.L. Ang,
Classification of Hazardous Locations, Institute of
Chemical Engineers, 1990.
14 DNV (for DTI in UK), White Rose DA Volume 5
Part Two (Concept Safety Analysis), July 2000.
15 HSE, Offshore Technology Report OTO
1999 079, January 2000.
16 Spouge, J., New Generic Leak Frequencies
for Process Equipment, Process Safety Progress,
Vol. 24, No.4, December 2005.
17 Guidelines for Quantitative Risk
Assessment, Purple Book, CPR18E, SDU, Committee
for the Prevention of Disasters (CPR), The Hague, 1999.
18 Flemish Government, LNE Department,
Handbook Failure Frequencies 2009 for drawing up a Safety
Report, May 5, 2009.
19 Bloch, H. P. and F. K. Geitner, Machinery
Failure Analysis and Troubleshooting, Vol. 2, 1999
20 Schiavello, B., Cavitation and
Recirculation Troubleshooting Methodology, Proceedings of
the 10th International Pump Users Symposium, 1993.
Kenneth Bloch is a PHA/Loss control
engineer at Flint Hills Resources Pine Bend
Refinery in Rosemount, Minnesota. He is responsible for
detecting and addressing potential process safety
failures. He specializes in root-cause analysis and
catastrophic equipment failure investigation. He
publishes articles about equipment failure analysis,
life-cycle extension and reliability improvement, and
speaks regularly at the API/NPRA Operating Practices
Symposium, NPRA National Safety Conference, and AIChE
Loss Prevention Symposium. Mr. Bloch holds a BS degree
(honors) from Lamar University in Beaumont, Texas, as
well as, API -510, 570, and 653 inspection
Jeremy Bertsch is a reliability
center manager at the Flint Hills Resources
Pine Bend Refinery in Rosemount, Minnesota. He is
responsible for providing reliable production and
excellent operating team performance for the process
units within his department. Mr. Bertsch has over 15
years of refinery experience, primarily within the
rotating equipment and reliability engineering
disciplines. He holds a BS degree in mechanical
engineering from the South Dakota School of Mines and
Technology in Rapid City,
Doug Dunmire is a refinery consultant with
Western ROPE. His work includes development of risk
analysis and management tools for refining clients. Mr.
Dunmire holds a BS degree in chemical engineering from
the University of California Davis.