August 2019

Special Focus: Valves, Pumps and Turbomachinery

Analyze machinery failure data with a spreadsheet

Some basic, yet insightful, machinery reliability tracking tools are introduced here that can be developed with common spreadsheet applications, such as Excel.

Perez, R. X., Consultant

Some basic, yet insightful, machinery reliability tracking tools are introduced here that can be developed with common spreadsheet applications, such as Excel. After practicing machinery reliability in the field for many years, the author has become familiar with many analysis methods, ranging from those that were mathematically complex but not very useful to others that were simple but provided immediate and actionable insights. While numerous ways exist to present and interpret machinery failure data, the methods presented here can provide the most information for the least cost and effort.

Why reliability tools are used

Useful reliability analysis tools take available historical failure data and transform them into either visual or concise tabular results that identify significant reliability problems requiring attention. Types of reliability analysis tools reviewed here include:

  • Pareto failure plots
  • Bad actor forced rankings
  • Reliability growth plots
  • Mean time between repairs (MTBR) trends.

A good starting point is reviewing a simple tool that looks at failures on a sitewide basis. TABLE 1 contains a forced ranking of pump failures for various processing units across a site. By listing the MTBR over the last 12 mos, potential areas that may need addressing can be quickly identified.

 

The pump failure data from TABLE 1 can also be converted into a Pareto chart (FIG. 1) to provide a summary of pump reliability at a glance. FIG. 1 shows pump failure frequencies over the last 12 mos for various processing areas plotted in order of decreasing failure frequency from left to right. Pareto charts are extremely useful for identifying issues that should be addressed first. The “cumulative percentage” line helps the reader determine how various groups add to the total failure population. For example, the cat cracker and coker unit failures represent about 35% of total plantwide pump failures. Clearly, the cat cracking area had the most pump repairs over the last 12 mos, and the South terminal area had the fewest repairs over the same time period. The visual results from this Pareto chart suggest that more study of cat cracker pump failures is warranted.

FIG. 1. Pareto chart of total pump failures over the last 12 mos for various processing units.
FIG. 1. Pareto chart of total pump failures over the last 12 mos for various processing units.

 

Narrowing the focus

After it has been determined that most pump failures occurred in the cat cracking unit, the next step is narrowing the focus to those pumps. TABLE 2 shows a force ranking of the pumps with the most failures. In this hypothetical case, pumps 31-P-09 A&M failed five times in the last 12 mos. If each repair runs approximately $10,000, this worst actor cost the facility $50,000 in the last year.

 

The least reliable pumps onsite may be labeled as “bad actors.” These 5–10 pumps cost the most to maintain and cause the most problems. It makes sense to aggressively address bad actors first.

Cumulative failure trends

Management is always interested in knowing if machinery reliability is getting better or worse. A simple means of visualizing historical failure data is constructing and then analyzing a special trend called a reliability growth plot, which is a plot of cumulative failures vs. time (FIG. 2). These types of graphs are constructed by first creating a table of cumulative (total) failures in a population for consecutive time intervals, and then plotting cumulative failures over the time period of interest. For example, 20 failures occur in a population in the first month, 25 failures occur in the second month and 30 failures occur in the third month. So, the first three points in the reliability growth plot would be: in the first month, 20 failures; in the second month, 20 + 25 = 45 failures; and in the third month, 20 + 25 + 30 = 75 failures, or (1,20), (2,45) and (3,75).

FIG. 2. Reliability growth plot of pump failures in an operating area.
FIG. 2. Reliability growth plot of pump failures in an operating area.

 

Reliability growth plots allow the easy identification of tendencies in the failure data. FIG. 2 shows three idealized reliability growth plots:

  1. A trend where the slope of the cumulative failures vs. time is essentially straight (constant failure rate)
  2. A trend where the slope of the cumulative failures vs. time sharply increases in October 2016 (deteriorating reliability)
  3. A trend where the slope of the cumulative failures vs. time decreases in October 2016 (improving reliability).

A study of the reliablity growth plot indicates whether failures are constant, deteriorating or improving. Reliability growth plots also provide a means of easily spotting any sudden changes in slope, which indicate a change in reliability. In this case, it can be determined in the deteriorating case that something changed after July 2016. Changes should be sought in operating procedures, repair methods, processing rates, etc. to explain changes in pump reliability. Persistent changes in pump reliability may represent a major change relative to the pumps, while a one-off event can simply indicate a measurement error, or some sporadic factor, such as a plant upset.

FIG. 3. A trend plot of the plantwide MTBR.
FIG. 3. A trend plot of the plantwide MTBR.

 

The final reliability plot covered here is the MTBR trend plot (FIG. 3) MTBR is a simple calculation that provides insight into the mechanical reliability of a single pump or a group of pumps, as calculated in Eq. 1:

            (1)

For example, if 20 pumps are repaired in a 12-mos period with a total population of 50 pumps, then the MTBR is calculated in Eq. 2 as: 

            (2)

This type of plot can be used to monitor a single or large population of process machines, such as pumps. The MTBR calculated value can be determined from historical data, then plotted vs. time on a monthly, quarterly or annual basis. An inspection can reveal that the MTBR of a hypothetical pump population in FIG. 3 is gradually deteriorating. The next step might be to examine the individual trends of each process unit to determine if all are deteriorating, or if poorly performing unit populations are reducing overall plantwide reliability.

Note: Readers should remember that the MTBR metric is a sitewide metric and provides only limited information about a given machine population. As operators and engineers look at equipment levels, more advanced analyses, such as Weibull analyses, may be warranted to examine the failure data and better understand the nature of the failures.

Takeaway

A few reliability tools have been reviewed that allow the visualization of equipment reliability performance. The usefulness of these methods has been validated through years of successful use. In the heat of battle, tools are needed that require easy-to-collect data that can be easily analyzed using Excel or similar software applications. Visual results can then be presented and interpreted by colleagues and management.

A list of suggestions for developing and maintaining a reliability tracking program include:

  • Regularly gather machinery failure data (monthly or quarterly)
  • Review and clean the data to ensure validity
  • Regularly generate reliability plots based on historical data
  • Identify bad actors using the reliability tools presented here, and attack bad actors first
  • Continuously track machinery reliability and look for changes.

These simple yet effective methods can help visualize machinery reliability and identify improvement opportunities. HP

The Author

Related Articles

From the Archive

Comments

Comments

{{ error }}
{{ comment.comment.Name }} • {{ comment.timeAgo }}
{{ comment.comment.Text }}