Hydrocarbon Processing Copying and distributing are prohibited without permission of the publisher
Email a friend
  • Please enter a maximum of 5 recipients. Use ; to separate more than one email address.

Update your reliability performance to meet process safety expectations

06.01.2011  |  Bloch, K.,  Flint Hills Resources, L.P., Rosemount, MinnesotaBertsch, J.,  Flint Hills Resources, LP, St. Paul, MinnesotaDunmire, D. ,  Western ROPE, LLC, Long Beach, California

Better risk assessment can identify root causes for potential site catastrophes before they occur

Keywords: [process safety] [management] [equipment] [reliability] [safety]

Process hazard analysis (PHA) and mechanical integrity (MI) programs are two essential elements used in the Process Safety Management (PSM) Standard to prevent or to minimize the consequences of catastrophic toxic, reactive, flammable or explosive chemical releases. In instances where process containment is essential for maintaining process safety, equipment that does not meet reliability expectations is more likely to be involved in a PSM failure. When a PSM failure occurs, the equipment maintenance history often exposes the failure as an accident that was waiting to happen.

The MI program and related safeguards must control the consequences of equipment failure process safety hazards identified during a PHA to an acceptable level. A quantitative approach to evaluating equipment failure risk can be used to determine the reliability needed to adequately prevent or minimize potential process safety consequences. Additional safeguards are needed when the MI program alone cannot realistically achieve an acceptable level of equipment performance. This article explains how one refinery uses a quantitative approach to satisfy PSM objectives on potential releases represented by process pump mechanical-seal failures.

Equipment reliability impacts process safety

OSHA’s PSM Standard (CFR 1910.119) details the requirements for preventing or minimizing the consequences of catastrophic releases of toxic, reactive, flammable or explosive chemicals. PHA teams assemble to identify and evaluate hazards that represent the potential release of dangerous materials as described in subpart “e” of the PSM Standard—CFR 1910.119(e). As part of the PHA process, action items are assigned to manage the consequences of identified hazards to an acceptable level. It is not uncommon for historical PHAs to be evaluated in response to an accidental release of potentially hazardous process materials. This evaluation is triggered to understand how previous PHA teams assessed the hazard. It is, therefore, possible to interpret any failure that results in the accidental release of potentially hazardous process materials as a PHA team failure. More precisely, it indicates that the previous PHA teams may have failed to adequately identify, evaluate and control the hazard. Its potential consequences were, therefore, left for discovery by the process safety failure that exposed them.

Many, if not most, process safety failures are preceded by lower consequence repeat failures. These low consequence failures can form a failure “rap sheet” documented in the equipment work order (WO) history of a plant’s computerized maintenance management system (CMMS).

When a process safety failure is investigated, a persistent history of failures stands out as an obvious warning of greater risks that should have been identified and controlled. However, prior to the process safety failure, these lower consequence failures can easily be seen as a normal part of equipment operation and maintenance. This type of “normalization of deviance” received much attention within NASA after the loss of 14 astronauts and two space shuttles. Evidence of it can also be found in the hydrocarbon processing and manufacturing industries. Consider the following examples of catastrophic releases caused by centrifugal pump failures.


Several case histories demonstrate the need for responsive risk mitigation efforts.

Alkylate pump fire.

A fire and explosion occurred at a large New Mexico refinery on April 8, 2004.1 The fire ignited upon the catastrophic release of flammable process liquid following a centrifugal pump mechanical-seal failure. The failure caused six injuries as well as extensive property damage and business interruption.

The fire resulted due to loss of process containment from one of a set of three centrifugal pumps in alkylate recirculation service. Alkylate is a mixture of light hydrocarbons typically in the C4 (butane) to C8 (octane) range. Process material leaking from the pump at 350°F autoignited upon contact with air. This failure was attributed to misapplication of energy control after the pump was removed from service to address a process fouling problem. More specifically, the release occurred when mechanics that were scheduled to replace a defective mechanical seal began disassembling the pump on location.

The three alkylate-recirculation pumps at this facility had a recurring history of seal failures. In its formal report, the US Chemical Safety Board (CSB) designated this incident as an MI program failure. In the report, the CSB cites 23 WOs issued to address seal failures in the three pumps in a 12-month period leading up to the catastrophic process release. (Note: A summary table was published in Hydrocarbon Processing, May 2010, p. 9). The CSB makes the valid argument that “an effective mechanical integrity program would have investigated and resolved the problems that were repeatedly causing the (alkylate) pumps to fail.” Instead, the investigation showed that any opportunity to prevent the failure through reliability improvement was substituted with “breakdown maintenance.” In other words, maintenance was used to address the problems caused by pump failures rather than addressing the underlying causes that would have resulted in satisfactory pump performance, sufficient to reduce the risk for a process safety failure.

The potential for a process safety failure increases during shutdowns for maintenance.2 This case history illustrates how energy-control defects experienced during routine maintenance activities can interfere with safe work execution. For this reason, multiple safe-work practices like lockout-tagout and confined space entry policies are often used to mitigate additional risk when equipment is shut down for maintenance. But it is not unusual for process safety failures to occur on equipment in continuous operation when these additional safety precautions typically do not apply. The next case history illustrates how similar MI defects can be involved in process safety failures regardless of the equipment’s operating mode.

No. 2 fuel-oil pump fire.

A fire occurred in a distillate hydrotreating unit operated by a large US Midwestern refinery on Nov. 15, 2004 (Fig. 1). The fire ignited upon the catastrophic release of flammable process liquid following a centrifugal pump mechanical-seal failure, as shown in Fig. 2. The failure resulted in one OSHA-recordable injury along with extensive property damage and significant business interruption.


  Fig. 1. No. 2 fuel-oil pump emergency


  Fig. 2. No. 2 fuel-oil pump failure.  

The fire ignited when the No. 2 fuel oil, a diesel-range hydrocarbon mixture with carbon-chain lengths ranging between C10 and C20, leaked from the pump through a failed mechanical seal. Similar to the pump failure described previously, the 600°F process material leaked out above its autoignition temperature (about 500°F) and caught fire immediately upon contacting oxygen.

The injury occurred during the emergency response to the fire. The first responder began applying water to the pump fire without first increasing his personal protective equipment (PPE) level. At some time during the response, an injury resulted from smoke inhalation. However, within two hours, the fire was extinguished and the emergency situation was brought under control without any further safety consequences.

Reviewing the maintenance record of the pumps involved in this event revealed a long history of thrust bearing failures and seal leaks (see Table 1) similar to the pumps examined in the first case history. Likewise, the cause of the fire was determined to be a catastrophic mechanical-seal failure. However, the physical evidence collected at the unit after the fire indicated that seal damage was a secondary effect and had been preceded by catastrophic thrust-bearing failure. The primary failure had caused uncontrolled shaft movement in the axial (thrust) direction, which then destroyed the mechanical seal.


Eventually, the investigation team was able to link together the probable causes of unstable hydraulics at the pump installation. The failure mechanism was introduced by operating the No. 2 fuel-oil pumps in continuous parallel service. Originally, the pumps were designed for single-spare operation. However, through years of growth and unit debottlenecking efforts, the pumps were continuously operated in parallel to overcome rundown piping pressure constraints. In the parallel operation, the pumps’ rotating elements came under constant stress. Before the fire, this failure mechanism was adequately managed by condition monitoring and frequent repairs. But, eventually, a fire in an operating unit and an OSHA-recordable injury settled any debate over the potential consequences for accepting poor pump reliability in this service. Although this particular installation had been examined twice previously by PHA teams in accordance with OSHA regulations, the hazard remained hidden until the process safety failure exposed it.

Although these two separate failures occurred in different facilities, in different services, at different times, and under different process operating conditions, the common thread of below-expectations reliability runs between them. In both cases, it is easy to look back on events as an accident waiting to happen. Unfortunately, in neither case was the MI program able to prevent repeat failures that eventually resulted in an unacceptable, non-discretionary process safety failure.

In both cases, the owner-operators of the unreliable equipment were in full compliance with OSHA 1910.119(e) governing the use of the PHA program to detect hazards that could result in the potential release. However, in both cases, the PHA program failed to identify and adequately control the hazards that ultimately resulted in the failure. Additionally, the MI program (OSHA 1910.119(j)) was unable to achieve a level of equipment reliability sufficient to offset any PHA defect. The MI program is just as important for the PSM standard to achieve its objective as any of its other elements.

Conservative, but reasonable?

Typical PHA team members do not take lightly their responsibility to identify hazards. Rendering their services in PHA meetings requires a considerable amount of time away from their normal responsibilities. They participate with the intention of adding value by detecting, assessing and controlling any hidden hazards to protect themselves and their fellow workers. They, therefore, take their PHA performance very seriously and are committed to learning from their mistakes. By learning, they can add more value in future PHA meetings. Should PHA teams or team members be criticized (publicly, privately or interpersonally) for having failed to avoid an incident? It is common practice for the teams to “err on the safe side” in future PHA meetings.

In some cases, this conservative response may be appropriate. For example, it is both reasonable and important to expect drastic changes when a facility learns of an operation that is contrary to industry policies or standards. Conversely, it would not be realistic to view all potential hydrocarbon releases equally. Yet, this is what some teams do upon recognizing their failure to generate action items sufficient to mitigate the potential risk of a process safety failure in previous PHA meetings. Merely piling on more action items may or may not add value.

Addressing them may create an illusory image of improving workplace safety while not really making progress on mitigating hazards that truly represent unacceptable risk.

Recent events in process safety failure show how dangerous it can be to develop initiatives around safety items that represent little or no incremental value. This situation was brought to British Petroleum’s (BP’s) attention after the refinery explosion at its Texas City, Texas, facility on March 23, 2005.3 It is not that BP was not concerned about, nor investing in, process safety improvements. The unfair truth about process safety is that there is no reward for hard work. To avoid a process safety failure, the effort must be properly directed. A safety program will fail if it focuses employee attention on the wrong things. The illusion of a safe workplace is destroyed when a catastrophic failure exposes a persistent, unacceptable risk as an accident waiting to happen. Working on the wrong things creates a distraction from the greater and more realistic process safety threats that should be resolved first.

Risk basics

Although the argument could be made that safer pump operation results from upgrading with more robust seals, bearings and monitoring systems, doing so is probably not the most deliberate way to achieve process safety. In many cases upgrades offer no incremental improvement unless they address a deficiency that causes the pump to perform below justified life-cycle expectations.4 Indiscriminately upgrading pumps can consume a considerable amount of resources with the intention of making a system safer, while creating a distraction from other process safety hazards that often represent even greater risk.

Risk is a function of frequency and consequence. Not all centrifugal process pump failures represent the same risk. For example, a hydrocarbon pump operating in the middle of a congested process unit may not represent the same potential consequences as a pump moving similar process liquids in a remote location away from an operating unit. Likewise, the high-temperature gasoil (GO) fraction that leaked in the second case history may not represent the same potential consequences as a leaking GO fraction cooled below 300°F, downstream from a rundown cooler.

The point here is that assessing the potential consequence of a catastrophic pump failure is not a binary process. A risk assessment (RA) is not performed by simply asking whether or not the pump contains hydrocarbon. The consequences of a catastrophic pump failure are a function of several critical factors. Some factors include the type of process material, leak rate, failure location and temperature. Additionally, assessing the failure frequency can be aided by determining what is in the CMMS before a process safety failure triggers an investigation. This information makes it possible to detect an “accident waiting to happen” before it happens.

What does ‘good’ look like?

The two case histories given here illustrate scenarios where a high frequency of seal failures preceded a catastrophic chemical release that defeated PSM objectives. These are considered MI program failures because the MI program did not drive the equipment-failure frequency sufficiently low to mitigate the risk for a process safety failure. Remember: Risk is a function of frequency and consequence. Therefore, driving the risk for a safety process failure down to zero (the goal of a “zero-injury” workplace) simply involves reducing the equipment-failure frequency to zero. Unfortunately, this can only be achieved by shutting down equipment for which failure may result in PSM consequences. Even the most reliable equipment represents risk as long as it is operating.

Although most companies would immediately shut down equipment found operating unsafely, few industrial enterprises would voluntarily shut down a machine to guarantee their “zero-injury” workplace goal. After everything is shut down, nobody gets hurt at work because nobody goes to work. It is more satisfying to set an acceptable risk tolerance and understand what exactly needs to be done to achieve it. By assigning risk and consequence, it becomes possible to establish equipment-reliability targets based on the relationship between risk, consequence and failure frequency. This is a much more rewarding alternative to achieving safe equipment operation. It represents an approach that helps facilities manage their MI program with performance expectations that are aligned with equipment failure risk tolerance.

Standardized approach to risk

An RA tool was constructed to evaluate the risk represented by process releases resulting from catastrophic pump failures. The guideline was developed to be consistent with, and borrows heavily from, the approach defined in API Publication 581 Risk-Based Inspection (RBI) Base Resource Document.4 RBI is a widely accepted method currently practiced across the refining industry. Although API 581 applies primarily to fixed equipment, the approach has many parallels that apply to failure RA for rotating machines. Accordingly, the standard RBI components are supplemented with data, methods and tools more specific to centrifugal pumps when needed.

A standardized RA approach reduces the inconsistency that different PHA teams may encounter at different times. More importantly, a standardized approach adds value by connecting the reliability of a specific pump installation to process safety risk tolerance. The benefit comes from determining a realistic target for the MI program to achieve, instead of motivating reliability professionals to achieve their safety goals with nonspecific targets like “work harder,” or “do better” or “fail less.” Setting a tangible reliability target allows a responsible decision to be made as to whether or not risk tolerance can be achieved through the MI program alone. If the MI program cannot realistically achieve the desired level of risk control, then additional layers of protection must be added to manage the risk to an acceptable level.

In some cases, the MI program may adequately drive risk to an acceptable level without requiring any additional safeguards or improvements. At such a time, the PHA team has a basis to conclude that no further actions are needed to mitigate the potential hazards associated with a catastrophic pump failure. In short, the existing safeguards have been evaluated and are considered adequate. The process of evaluating the potential risk associated with catastrophic pump failures begins with determining an acceptable level of risk. This prevents the PHA process from defeating its purpose by creating action items that consume available resources that should be working on resolving more important process safety risks.

Method overview.

Fig. 3 shows the basic process used to evaluate the risk represented by a catastrophic centrifugal pump seal failure. The analysis begins with a technical pump risk assessment—risk-based pump analysis ( RBPA). This step is performed according to the consequence analysis and likelihood analysis methods described in API 581 Sections 7 and 8. Afterward, a quantitative layer of protection analysis (LOPA) is used to compare the specific pump risk against an acceptable risk tolerance. This makes it possible to develop a reliability plan to operate the pumps within risk tolerance.


  Fig. 3. Catastrophic pump-seal failure risk
  assessment method overview.  

Catastrophic pump-seal failure and consequences.

OSHA data from 1992 to 2009 contains a record of 36 catastrophic releases of highly hazardous chemicals that resulted in fatalities.5 These incidents are responsible for 52 fatalities and 250 employee injuries. Ninety-eight of these injuries were severe enough to require hospitalization. One of these incidents involved a process release that occurred while steaming-out a pump casing. The pump casing split open, resulting in a hot oil release that immediately exploded (Jan. 19, 2005, Kern Oil Refinery, Bakersfield, California). The conditions present during this failure are similar to those that the CSB documents in the first case history. However, none of the fatal incidents contained in the OSHA database resulted from a pump-reliability issue.

It would not be responsible to conclude that a catastrophic pump seal failure could not result in a fatality based on these historical statistics. The second case history illustrates the potential for pump-failure mechanisms to be directly involved in a process safety incident capable of causing severe consequences. Although there is insufficient data for a straightforward fatality frequency calculation, enough statistical information exists to estimate a minimum frequency based on site-specific data and industry averages. A frequency/consequence diagram, such as the one shown in Fig. 4, can be constructed using this information along with these facts and assumptions:
• A total estimated 2009 refining capacity of 17.67 million bpd6
• The relationship of approximately one fire for every one thousand repairs, as cited by an industry reliability authority. 7,8 This was corroborated by a large US refinery in 2009.


  Fig. 4. Catastrophic pump failure
  frequency/consequence plot.  

According to this analysis, the frequency for a fatality (highest severity consequence) is estimated to be lower than 1x10-6 (1/1 million) years. This frequency suggests that a fatality caused by a pump-reliability issue is probably more likely than an airline fatality but less likely than other typical US workplace fatality causes.9 Based on the industry workplace fatality statistics contained in the OSHA database, this relative ranking seems reasonable.

This information makes it possible to define risk tolerance. Risk tolerance (or literally “tolerance to risk”) implies that the choice has been made to operate equipment in a responsible manner rather than shutting it down to mitigate a process safety failure risk. Risk tolerance will vary between different organizations. It is a decision that should be made under the direction of legal counsel and supported by industry statistics.

Risk-based pump analysis.

Fig. 5 shows the primary steps involved in the RBPA. In the RBPA, results from the consequence analysis are combined with the likelihood analysis to determine the risk associated with a catastrophic pump failure. Comparing actual operating risk against a designated risk tolerance makes it possible to assess risk reduction options that may adequately control the process safety hazard. To be effective, the risk reduction options must directly address the factors governing process safety.


  Fig. 5. Risk-based pump analysis.  

Consequence analysis.

The consequence analysis is covered extensively in API 581 RBI Base Resource Document Section 7. It is used to calculate the release area that would develop upon a loss of process containment caused by a catastrophic equipment failure. In this case, the RBI principles of API 581 Section 7 are being applied to potential releases caused by a catastrophic pump failure. Fig. 6 outlines the recommended approach for working through the consequence analysis using the methods described in API 581 Section 7.


  Fig. 6. Consequence analysis steps.  

The analysis should be based on a representative fluid and should assume that typical refinery pump service is constantly changing and the process material properties being evaluated may be best described as an estimate of average operating conditions over a time period. API 581 breaks process fluids down to a discrete number of representative fluids. This level of detail is sufficient for the consequence analysis.

The flow area for a major leak is represented by an annular area between the shaft sleeve and the closest fixed dimension of the pump casing or packing gland. The OD of the shaft sleeve and the ID of the closest fixed dimension of the pump casing or packing gland are determined from the seal manufacturer’s detailed drawing as illustrated in Fig. 7. These dimensions are then used to calculate a major seal failure leak rate.


  Fig. 7. Simplified seal sketch—Major leak
  flow path. 

Likelihood analysis.

The likelihood analysis is described in detail by API 581 RBI Base Resource Document Section 8. Its purpose is to generate an initiating event frequency for both the major and full bore leak scenarios. The likelihood analysis described in this study makes use of generic initiating event frequencies (IEFg) that are based on the empirical data shown in Fig. 8. This figure is based on catastrophic pump failure data placed into the public domain by multiple sources.10–18 This information covers a wide range of leak rates from minor leaks (low severity) to full bore leaks (high severity). The middle area of the chart represents the major leak range.


  Fig. 8. Generic centrifugal pump leak

The likelihood analysis is performed by 1) selecting an appropriate IEFg based on the analysis represented in Fig. 9 then 2) adjusting the IEFg based on the specific pump’s actual reliability history (MTBFa —see Eq. 1) compared with the standard reliability of a generic refinery process pump (MTBFg). This adjustment is made according to Eq. 2, which produces the initiating event frequency for a specific pump installation (IEFa).





  Fig. 9. Creating a risk-based reliability plan. 

Risk analysis.

The risk analysis takes place as the LOPA that assesses the pump operating risk against the designated risk tolerance. Its purpose is to determine if a pump installation meets its reliability expectations. This is true if the frequency of mitigated consequences is less than the designated risk tolerance. If the frequency of mitigated consequences is more than the designated risk tolerance, then guidance should be suggested to improve performance to meet process safety objectives.

Probability of personnel in affected area.

The probability of personnel in the affected area, Pp, is a function of the size of the affected area, Aa, and the amount of time personnel are likely to be in this area. There are causes of catastrophic pump failures that increase the probability of personnel being in the affected area at the time of the event. An example may be an abnormal process condition (such as flow loss) where the console operator calls for the outside personnel to respond. There are also causes that are random in nature where the probability of personnel in the affected area is based on the average amount of time that people are in the area on any given day.

Failure cause distribution estimates for centrifugal pumps in US process plants indicate that approximately 12% of failures are caused by improper operation.10–19 Some of these causes result from chronic poor operating practices that reduce pump reliability. They may have been normalized over time and do not result in an operator response. An example may include cavitation noises caused by low NPSHa operation or long-term flow outside of recommended reliability limits.20 An estimate of 10% of the causes of major releases that result in increased occupancy of the affected area is assumed for this analysis. The remaining random occupancy that does not increase the probability of personnel in the affected area would therefore be 90%.

An estimated random occupancy of 1 hr/day/1,000 ft2 is assumed for normal process areas. This estimate should be modified if there is evidence of higher or lower occupancy. Remote areas that are not frequented with multiple rounds a shift will be less. Affected areas that include known high-occupancy zones will be greater. Any basis for choosing a different random occupancy should be documented. This random occupancy is further simplified to a probability of 0.04/1,000 ft2. By combining cause generated occupancy with random occupancy an overall probability of personnel in the affected area can be determined by Eq. 3.

Pp = 0.10 + Aa (0.04/1,000 ft2)     (3)

Probability of ignition.

API 581 reports probability of ignition, Pi, for five potential outcomes in tables. The proper table in API 581 Section 7 should be selected based upon the process leak assessment made during the consequence analysis.

Risk-based reliability plan.

The output from the RBPA feeds into a process for managing the risk of a process safety failure. The frequency of mitigated consequences, Fm, is the product of the frequency of the specific pump’s initiating event frequency, IEFa, the total probability of failure on demand for each independent layer of protection, PFDt, the probability of personnel in the affected area and the probability for ignition as calculated in Eq. 4. If the frequency of mitigated consequences, Fm, is higher than the designated risk tolerance, then a risk-based reliability plan must be developed to manage the risk for a process safety failure. This can be accomplished by either increasing the pump’s reliability, MTBF, or by applying safeguards sufficient to mitigate the consequences of a catastrophic pump failure to an acceptable level. The basic process used to develop a risk-based reliability plan is shown in Fig. 9.

Fm = IEFa (PFDt)(Pp)(Pi)     (4)

The LOPA results designate a target MTBF for meeting a designated risk tolerance. MTBF improvements have a number of advantages. For example, they reduce both maintenance costs and the potential to introduce some major leak failure modes during repairs like the one described in the first case history. MTBF improvements are typically preventive instead of reactive. However, it may be difficult to quantify the expected MTBF improvement available through failure analysis and investigation. Failure analysis skills, training and methods are involved in developing an effective set of corrective actions to increase MTBF. This depends greatly upon the failure investigator’s individual capabilities.

Machinery engineers must be consulted to determine if the MTBF improvement is realistically achievable. Consideration should be given to proven technology and both industry and personal experience with the process requirements. MTBF improvements can be applied together with additional safeguards to meet the overall risk tolerance criteria. If MTBF alternatives are selected as a part of the strategy to meet the risk tolerance, MTBF becomes a part of the process safety risk management for the pump group under consideration. It should be managed with the same diligence and priority as defined by safe operating limits.

Case history in preventive risk mitigation.

An investigation was used to determine the cause for a series of recurring seal and thrust bearing failures on two heavy vacuum gasoil (HVGO) service pumps operating side-by-side in a refinery vacuum crude unit. The maintenance history of these pumps is shown in Table 2. The investigation determined that high frequency vibration caused by vortex cavitation suction recirculation (VCSR) was responsible for the low MTBF. Based on this diagnosis, an action item was created to increase the pumps’ NPSH margin ratio to reduce the cavitation forces responsible for excessive stress on the thrust bearings.


Addressing this action item would require either redesigning or replacing the pumps at considerable expense. Based on the resulting maintenance expenses, other competing reliability improvement projects offered a greater return on investment. Therefore, it was decided that the risk of catastrophic HVGO pump failure should continue to be managed by repairs. The repairs were to be triggered by condition monitoring until the higher priority reliability improvement projects could be completed.

The potential consequences of HVGO leaks in, vacuum crude unit service are not comforting (Fig. 10). A leak of sufficient size would likely autoignite upon contacting air. The consequence for a catastrophic pump failure represents a potential PSM incident in addition to property damage and business interruption. But condition monitoring seemed to be an acceptable approach to managing the risk for a catastrophic HVGO pump failure based on previous operating history.


  Fig. 10. Catastrophic HVGO pump seal
  failure consequences.

Upon developing the RBPA guidance, the HVGO pumps were reevaluated to verify that the reliability strategy was in agreement with refinery risk tolerance. The analysis showed that the pump group was one protective layer short at its present MTBF, and its reliability would have to be increased to at least six years MTBF to operate within refinery risk tolerance. This immediately changed the basis for the project from a reliability improvement opportunity to a process safety risk mitigation project. The priority of the HVGO pump project was elevated and an execution date was scheduled.

Disclaimer and conclusions.

The guideline and methodology discussed in this article attempts to be generally applicable to all centrifugal pumps. However, good engineering judgment must prevail while applying this guideline. The approach can be modified as appropriate following a recommended peer-review and documenting the technical basis for deviations.

Tolerating repeat failures on machinery that contains potentially hazardous process materials can have disappointing consequences. However, it is not uncommon for equipment failures to be accepted without comparing actual reliability performance against a designated risk tolerance. In cases where breakdown maintenance is the option selected to manage the risk for catastrophic process releases, a definitive and objective basis is needed to expose a potentially unacceptable process safety hazard before an incident occurs.

A standard RA method can be developed to evaluate pump reliability on the basis of managing its failure frequency sufficiently low to realistically avoid a process safety incident. However, the MI program by itself may not sufficiently elevate equipment reliability to a level where process safety consequences can confidently be prevented. Risk tolerance ultimately determines the complete plan needed to fully address a process safety risk. In many cases, a complete plan represents a combination of reliability (MI program) improvements and safeguards (layers of protection).

The RBI method described in API 581 Sections 7 and 8 provides a sound engineering basis to do risk analysis on equipment whose failure may represent process safety consequences. This information can be supplemented with site specific failure data and industry statistics to develop RA criteria for centrifugal pumps operating in the petroleum and chemical processing industries. Practicing this approach to process safety is expected to provide more satisfying and effective results than dedicating resources to random safety improvements that may ultimately fall short in avoiding a process safety failure. HP


This is an updated and refreshed version of the original paper presented at the American Institute of Chemical Engineers 2011 Spring Meeting, 7th Global Congress on Process Safety, Chicago, Illinois, March 13–16, 2011.


1 “Oil Refinery Fire and Explosion,” 2004-08-I-NM, U.S. Chemical Safety and Hazard Investigation Board, October 2005.
2 Safe Ups and Downs, 1st Ed. (booklet), Standard Oil Co., 1960.
3 “The Report of the BP U.S. Refineries Independent Safety Review Panel,” January 2007.
4 “Risk Based Inspection (RBI) Base Resource Document,” API Publication 581 1st ed., May 2000.
5 “OSHA National Emphasis Program Directive,” CPL 03-00-004, June 7, 2007, http://www.osha.gov/ (accessed December 26, 2010).
6 “Refining Capacity Report, January 1, 2009,” National Petrochemical & Refiners Association, August 2009.
7 Bloch, H. P., “Understanding canned motor pumps,” Lubrication Management and Technology, September/October 2008.
8 Bloch, H. P., “Pump statistics should shape strategies,” Maintenance Technology, October 2008.
9 “Layer of Protection Analysis Simplified Risk Assessment,” Center for Chemical Process Safety of the American Institute of Chemical Engineers, 2001.
10 OREDA, “Offshore Reliability Data,” 4th and earlier eds.,” SINTEF, 2002.
11 HSE, “Offshore Hydrocarbon Releases Statistics and Analysis, 2002,” HSR 2002 02, 2003.
12 Mannan, S., Lees’ Loss Prevention in the Process Industries, 3rd ed., 2005.
13 Cox, A.W., F.P. Lees, and M.L. Ang, “Classification of Hazardous Locations,” Institute of Chemical Engineers, 1990.
14 DNV (for DTI in UK), “White Rose DA Volume 5 Part Two (Concept Safety Analysis),” July 2000.
15 HSE, “Offshore Technology Report – OTO 1999 079,” January 2000.
16 Spouge, J., “New Generic Leak Frequencies for Process Equipment,” Process Safety Progress, Vol. 24, No.4, December 2005.
17 “Guidelines for Quantitative Risk Assessment,” Purple Book, CPR18E, SDU, Committee for the Prevention of Disasters (CPR), The Hague, 1999.
18 Flemish Government, LNE Department, “Handbook Failure Frequencies 2009 for drawing up a Safety Report,” May 5, 2009.
19 Bloch, H. P. and F. K. Geitner, Machinery Failure Analysis and Troubleshooting, Vol. 2, 1999
20 Schiavello, B., “Cavitation and Recirculation Troubleshooting Methodology,” Proceedings of the 10th International Pump Users Symposium, 1993.

The authors 

Kenneth Bloch is a PHA/Loss control engineer at Flint Hills Resources’ Pine Bend Refinery in Rosemount, Minnesota. He is responsible for detecting and addressing potential process safety failures. He specializes in root-cause analysis and catastrophic equipment failure investigation. He publishes articles about equipment failure analysis, life-cycle extension and reliability improvement, and speaks regularly at the API/NPRA Operating Practices Symposium, NPRA National Safety Conference, and AIChE Loss Prevention Symposium. Mr. Bloch holds a BS degree (honors) from Lamar University in Beaumont, Texas, as well as, API -510, 570, and 653 inspection certifications. 

Jeremy Bertsch is a reliability center manager at the Flint Hills Resources’ Pine Bend Refinery in Rosemount, Minnesota. He is responsible for providing reliable production and excellent operating team performance for the process units within his department. Mr. Bertsch has over 15 years of refinery experience, primarily within the rotating equipment and reliability engineering disciplines. He holds a BS degree in mechanical engineering from the South Dakota School of Mines and Technology in Rapid City, South Dakota. 

Doug Dunmire is a refinery consultant with Western ROPE. His work includes development of risk analysis and management tools for refining clients. Mr. Dunmire holds a BS degree in chemical engineering from the University of California Davis. 

Have your say
  • All comments are subject to editorial review.
    All fields are compulsory.


Hi dear
could you send pdf file for this paper


I would like to clarify sohnemitg regarding the StaRite pumps for the Vector Scrubbers. I have been informed that in addition to the improved Magnetically coupled pumps, there is now also a direct replacement for the original pump. Both are available.Steve Griffing


This article has dwan the the attention to the Top managers as well as the Mid Level Managers regarding the Plant Reliability, OHSAS erequirement and the EMS requirement and most importantly Risk Management System Managers in Petro Chemical Industries around the World. The Great PDCA Cycle of Dr. Walter Shewhart and PDSA (plan, do ,Study, and Act ) of Dr. Dr. W.Edwards Deming are synthesised with BRA ( Business Risk Assessment ) and also the CMMS, RBI ( Risk based Inspection together gives the Top Management more confidence in the Establishing the Risk Free Management System for Maintainibility,Sustainablity and continual Improvement.
Thanks for the Eye Opening Article..

Related articles


Sign-up for the Free Daily HP Enewsletter!

Boxscore Database

A searchable database of project activity in the global hydrocarbon processing industry


Is 2016 the peak for US gasoline demand?




View previous results

Popular Searches

Please read our Term and Conditions and Privacy Policy before using the site. All material subject to strictly enforced copyright laws.
© 2016 Hydrocarbon Processing. © 2016 Gulf Publishing Company.