March 2013

Bonus Report: Safety Developments

Process safety management: Going beyond functional safety

Operating companies are increasing efforts to reduce the risk of catastrophic events such as the release of toxic, reactive or explosive chemicals that can damage the environment or plant assets.

Turk, M. A., Mishra, A., Invensys Operations Management

Modern hydrocarbon processing facilities have become increasingly more complex. Likewise, the risks in managing greater capacity refineries and petrochemical complexes have increased. Ensuring the safety of employees, the environment and physical plant assets in the event of an unexpected process excursion cannot be overstated. The development of new techniques and technologies designed to improve operational safety has evolved to meet these challenges. Operating companies are increasing efforts to reduce the risk of catastrophic events such as the release of toxic, reactive or explosive chemicals that can damage the environment or plant assets, as well as, cause injury or death to employees and the general public.


This journey begins with the development of the modern process safety management (PSM) systems and requirements. Efforts to improve plant safety were led by state-of-the-art functional safety systems. These systems enable the orderly shutdown of processing units when abnormal situations occur that are beyond the capabilities of the regulatory control system or operators to correct or to prevent a catastrophe.

While functional safety has proven successful in reducing the probability of catastrophic events and recognizes the role of human factors, it does not explicitly address the key roles of management and business processes in maintaining operational integrity and profitable performance of process plants. In this context, what are the approaches that operating companies should take to go beyond functional safety to proactively measure, monitor and display a plant’s risk profile in near real time so that proper actions can be taken in a more timely manner to improve process safety performance?

Why invest time and resources to go beyond the limitations of functional safety? To answer this question, we must discuss the pivotal concepts of safety-performance indicators and values (plant assets, the environment, the public and employees) at risk from potential catastrophic events. What are the best practices for establishing a PSM culture along with designing, implementing and maintaining a proactive PSM system to complement existing functional safety systems?


As industrialization and technology progressed in the early 20th century, the pattern of intermittent catastrophes began. In 1921, at the BASF plant in Oppau, Germany, explosions destroyed the plant, killing at least 430 people and damaging approximately 700 houses nearby. This explosion occurred as blasting powder was used to breakup the storage pile of a 50/50 mixture of ammonium sulfate and ammonium nitrate. This procedure had previously been used 16,000 times without any mishap. In 1947, a fire and explosion in Texas City, Texas, on the Monsanto Chemical Co.’s S.S. Grandcamp while loading ammonium nitrate fertilizer killed over 430 people. There was no specific legislative response to these incidents.1

Interestingly, the US Center for Chemical Process Safety (CCPS), which provides leadership and infrastructure to promote and advance PSM, suggests process safety was born on the banks of the Brandywine River in the early days of the 19th century at E. I. du Pont’s black powder works. Recognizing that even a small incident could precipitate considerable damage and loss of life, du Pont directed the works to be built and operated under very specific safety conditions.2 Industry has a short memory; here is a brief list of several recent major industrial disasters with dire consequences:

  • 1984—Bhopal, India. A toxic material released caused 2,500 immediate fatalities and many other offsite injuries over time.
  • 1984—Mexico City, Mexico. An LPG explosion caused 300 fatalities (mostly offsite) along with $20 million in damages.
  • 1988—Norco, Louisiana. A hydrocarbon-vapor-cloud explosion resulted in seven onsite fatalities and 42 injuries, as well as over $400 million in damages.
  • 1989—Pasadena, Texas. An ethylene/isobutene explosion and fire caused 23 fatalities, 130 injuries and more than $800 million in damages.

Such catastrophic safety incidents damaged the public and the environment. They also caused significant economic loss. In response, governments continue to enact legislation and impose fines focused on reducing the probability of future events. Likewise, operating companies formed safety-related consortiums that include suppliers of process automation technology. The goal is to identify automation solutions that can enable operating companies to avoid catastrophic safety events through early detection and correction. As evidenced by recent safety-related catastrophes, such solutions have not been entirely successful.

The present state-of-the-art safety management includes safety studies (HAZID, HAZOP, risk analysis), safety instrumented systems (SISs) for fire and gas detection, and emergency shutdown, abnormal situation management applications, and operator guidance tools. As illustrated in Fig. 1, the first step in implementing a functional safety system is the upfront analysis and conceptual design. It begins with a meeting with all stakeholders to determine possible hazards and hazard characteristics, and to establish the basic scope of the project. Work then proceeds to develop the detailed design for the SIS. The next steps involve:

  • Executing the process hazard analysis (PHA) and layers of protection analysis (LOPA)
  • Specifying the safety instrumented functions (SIFs) and preparing the safety requirements specification (SRS) reports
  • Developing the safety integrity level (SIL) verification worksheet and report.
  Fig. 1. Steps in the FEED of a safety
  instrumented system.

While these approaches to safety management have produced positive results in reducing the probability of potentially dangerous process upsets or failures, they are either static (e.g., HAZOP studies) or reactive (e.g., emergency shutdown systems) in nature. Their performance is also hampered by complacency. Time passing without an incident is not necessarily an indication that all is well. There is always a succession of failings that lead to an incident, as shown by the Swiss-cheese model (Fig. 2). If unchecked, all systems will deteriorate over time, and major incidents can occur when defects cross a number of risk-control systems concurrently. In effect, the “holes” in the Swiss-cheese model become larger. Without setting leading and lagging indicators for each risk-critical control system, it is unlikely that failings in these barriers will be revealed as they arise before all of the important barriers are defeated.

  Fig. 2. Swiss-cheese model of how a hazard
  can propagate and become a harmful event.

Numerous recent high-profile incidents have heightened the awareness that organizations need to pay more attention to process safety. By definition, process safety is a blending of engineering and management skills focused on preventing catastrophic accidents and near hits—particularly, explosions, fires and damaging releases associated with the loss of containment of energy or dangerous substances such as chemicals and petroleum products.

These engineering and management skills exceed those required for managing the workplace. As industrial infrastructures continue to age, the consequences of applying process safety incorrectly increases with escalating consequences, such as:

  • Damage to people, the community and environment
  • Litigation against corporations and individuals
  • Increased scrutiny by regulators and governments
  • Undermined investor confidence with resulting loss in stock price.

In some cases, even when executives and managers have prioritized process safety, things still go wrong. Too often, organizations or individuals make process-safety decisions under pressure, or without proper context or sufficient information. What’s missing is the ability to provide plant personnel with real-time, proactive actionable information about the plant’s risk profile via continuous measurement, monitoring and visualization of key operating and safety-related parameters. Result: Potentially hazardous events can be averted without resorting to a plant trip or an emergency shutdown. This is the goal of PSM; it involves next-generation automation solutions aimed at making step-change improvements in safety performance. Such systems can provide a “safety early warning and hazard avoidance system.” This should be an essential component of the modern hydrocarbon enterprise.

By way of definition, PSM is the application of management systems to identify, understand and control process hazards, thus preventing process-related injuries and incidents.3 The goal is to minimize process incidents by evaluating the whole process. PSM came into widespread use after the adoption of OSHA Standard 29 CFR 1910.119 Process Safety Management of Highly Hazardous Chemicals in 1992. PSM covers:

  • Process safety information
  • Employee involvement
  • PHAs
  • Operating procedures
  • Training
  • Contractors
  • Pre-startup safety reviews
  • Mechanical integrity
  • Hot work
  • Management of change
  • Incident investigation
  • Emergency planning and response
  • Compliance audits
  • Trade secrets.

Another definition of PSM is “the proactive and systematic identification, evaluation and mitigation or prevention of chemical releases that could occur as a result of failures in processes, procedures or equipment.”4 PSM is intended to ensure freedom from unacceptable risk due to:

  • Fire
  • Explosion
  • Suffocation
  • Poisoning.

Fig. 3 shows where PSM fits into the overall context of operational integrity (i.e., keeping the process in the pipe), and how functional safety is a key element of PSM.

  Fig. 3. Role of PSM in supporting operational integrity.

Business case for PSM

A cost/benefit analysis is at the center of decision-making on investments. To justify cost, it is necessary to determine if the magnitude of the value delivered justifies the cost in terms of time, effort and money. Investments in safety—functional safety systems, abnormal situation management applications, etc.—have been made largely to satisfy legislative requirements and to maintain the license to operate. There is no legislation that directly defines the requirements for a real-time PSM system or the penalties for not implementing one. Thus, investments in a PSM system may be made if it can be shown that it delivers a significant, tangible reduction in the risk of a catastrophic failure, and that it produces a measurable economic benefit for the plant. Table 1 summarizes estimated annual benefits associated with implementing a PSM system. For a 100,000-bpd petroleum refinery, operating for 330 days/yr at an average refining margin of $5/bbl, the estimated annual PSM benefit is $2.85 million. In addition to the stated benefits from Table 1, the “incremental value-at-risk” can provide ongoing quantified measures of the economic impact from the PSM system.



It is important to find the right level of balance among the various possible safety indicators so that process-safety decisions accurately reflect the company’s desired operational risk profile. Although risk can never be eliminated, a variety of mechanisms can be put in place to balance desired safety outcomes with day-to-day business imperatives and pressures.

Too often, many organizations rely heavily on failure data to monitor performance. Thus, improvements or changes are only determined after something has gone wrong. Often, the difference between whether a system failure results in a minor or catastrophic outcome is purely down to chance. The consequence of this approach is that improvements or changes are only determined after something has gone wrong. Discovering weaknesses in the quality of managing the process and control systems by having a major incident is too late and costly. Early warning of dangerous deterioration within critical systems provides an opportunity to avoid major incidents.

Knowing that process risks are successfully controlled has a clear link with business efficiency, as several indicators can be used to show plant availability and optimizes operating conditions. Effective management of major hazards requires a proactive approach to risk management. Information to confirm that critical systems are operating as intended is essential. Leading indicators that can confirm that risk controls are contining to operate is an important step forward in the management of major hazard risks.

Measuring performance

The main reason for measuring process safety performance is to provide ongoing assurance that risks are being adequately controlled. Directors and senior managers need to monitor the effectiveness of internal controls against business risks. For petroleum refineries and petro-chemical manufacturers, process safety risks are a significant aspect of business risk, asset integrity and reputation. Many organizations lack good information to show how well they are managing major hazard risks. This is because the information gathered tends to be limited to measuring failures, such as incidents or near misses.

Those involved in managing process safety risks need to ask fundamental questions about their systems, such as:

  • What can go wrong?
  • What controls are in place to prevent major incidents?
  • What does each control deliver in terms of a “safety outcome”?
  • How do we know that the controls continue to operate as intended?

Measuring performance before a catastrophic failure

According to James Reason, (major) accidents result when a series of failings within several critical risk-control systems materialize concurrently.5 Each risk-control system represents an important barrier or safeguard within the PSM system. A significant failing in just one critical barrier may be sufficient to give rise to a major accident. Continuously measuring and monitoring the actual real-time performance of these safety barriers ensures that operational integrity is not compromised due to degradation of barriers.

Leading and lagging indicators are set in a structured and systematic way for each critical risk-control system within the whole PSM system. In tandem, they act as system guardians, providing dual assurance to confirm that the risk-control system is operating as intended or providing a warning that problems are starting to develop.

Leading indicators are an active monitoring form focused on a few critical risk-control systems to ensure continued effectiveness. Leading indicators require a routine systematic check that key actions or activities are undertaken as intended. They can be considered as measures of process or inputs essential to deliver the desired safety outcome. The leading indicators identify failings or “holes” in vital aspects discovered during routine checks on the operation of a critical activity within the risk-control system.

Lagging indicators are reactive monitoring methods requiring the reporting and investigation of specific incidents and events to discover weaknesses within that system. These incidents or events do not have to result in major damage or injury or even loss of containment, providing they represent a failure of a significant control system that guards against or limits the consequences of a major incident. Lagging indicators show when a desired safety outcome has failed or has not been achieved. The lagging indicator reveals failings or “holes” in that barrier discovered following an incident or adverse event. The incident does not necessarily have to result in injury or environmental damage, and it can be a near miss, a precursor event or an undesired outcome attributable to a failing in that risk-control system.

Several organizations and standards recommend applying leading and lagging metrics to understand the quality of the PSM system. Several examples are:

  • ISA 84.00.04—Recommended Practices for Guidelines for the Implementation of ANSI/ISA-84.00.01-2004 (IEC 61511 Mod)
  • CCPS
  • The Energy Institute (EI), formerly known as the Petroleum Institute.

The common theme of these metrics is applying key performance indicators (KPIs) generated from the management of the process/functional safety equipment and the people and processes that are used in terms of their competence, leadership and risk-management capabilities.

For example, the EI has published a Process Safety Management framework, developed by the energy industry, for use by various industry sectors.6 The framework is intended to be applicable worldwide, to all process industries such as power, petroleum, chemicals, refining, etc. The framework encapsulates learning from people with practical experience of developing and implementing PSM as part of an integrated management system. It clearly sets out what needs to be done to ensure the integrity of the operation and define what measures should be in place and how they are performing. Note: It is not intended to replace existing process safety or health, safety and environmental (HSE) management systems.

The EI’s framework consists of three levels: focus areas, elements and expectations. The focus areas set out the high-level components of the PSM framework. Within each of the focus areas are a number of elements. Each element contains expectations defining what organizations need to do properly to meet the intent of each element. Details for EI’s PSM elements set four key operating aspects that organizations should do to ensure the integrity of the operations:

  • Process safety leadership
    • Leadership commitment and responsibility
    • Identification and compliance with legislation and industry standards
    • Employee selection, placement, competency and health assurance
    • Workforce involvement
    • Communication with stakeholders
  • Risk identification and assessment
    • Hazard identification and risk assessment
    • Documentation, records and knowledge management
  • Risk management
    • Operating manuals and procedures
    • Process and operational status monitoring, and handover
    • Management of operational interfaces
    • Standards and practices
    • Management of change and project management
    • Operational readiness and process startup
    • Emergency preparedness
    • Inspection and maintenance
    • Management of safety-critical devices
    • Work control, permit to work and task risk management
    • Contractor and supplier, selection and management
  • Review and improvement
    • Incident reporting and investigation
    • Audit, assurance, management review and intervention.

Fig. 4 shows the proposed PSM framework—based on industry guidelines—and the associated components of a well-designed PSM system to enable real-time measurement and monitoring of a plant’s risk profile. It provides actionable information that can be used to prevent catastrophic events. Where an organization has an existing HSE or PSM system, it may be useful to benchmark against the framework or to carry out a risk assessment vs. the expectations of each element and identify any aspects of the existing system that may need enhancing.

  Fig. 4. PSM framework and components.

Implementing such a PSM system establishes the foundation of a PSM “control loop.” Fig. 5 illustrates such a control loop to prevent complacency from increasing the probability of a catastrophic event due to plant personnel ignoring leading and lagging indicators about degradation of protection levels provided by risk-control loops.

  Fig. 5. PSM control loop.

During plant operations, systems are modified to adapt to the changing system needs. Systems and procedures can deteriorate over time, and system failures discovered following a major incident frequently surprise senior managers, who sincerely believed that the controls were functioning as designed. Used effectively, process safety KPIs can provide an early warning that critical controls have deteriorated to an unacceptable level.

Measuring performance to assess how effectively risks are being controlled is an essential part of an HSE system. This can be accomplished in two ways:

  • Active monitoring. It provides feedback on performance before an accident or incident
  • Reactive monitoring. It involves identifying and reporting on incidents to check that the controls in place are adequate, to identify weaknesses or gaps in control systems and to learn from mistakes.

SPIs and incremental value-at-risk

After a set of KPIs have been adopted, the asset owner’s management is responsible for monitoring these KPIs and responding to deviations from their baselines. At higher management levels, the relevance of the KPIs associated with managing plant equipment can be lost. Therefore, it becomes necessary to translate the individual equipment level KPIs and their business impact into plant-level safety performance indicators and its business impact. This concept can be extended to any number of facilities enabling upper management to understand the quality of PSM across the enterprise.

Using the individual equipment KPIs, a new approach allows an asset owner to understand the overall safety state of the plant and its economic impact on the business. In addition, this approach is tied to the existing LOPA and financial impact analysis.

KPI metrics are gathered based on the asset owner’s management of the plant equipment, capability of employees and processes followed to manage process safety. Typically, 10–20 key metrics can be covered and include 1) management of safety-related equipment (e.g., completion of periodic field-device proof tests associated with a distillation column), 2) competence of plant personnel (e.g., their level of training and skills testing), 3) adherence to established procedures (e.g., near-miss investigations) and 4) leadership (e.g., involvement of leadership in periodic, formal safety reviews). These metrics can originate from management based on the layers of protection (LOPs) associated with the different lines of equipment, from at a LOP level (e.g., SIS) or at the line of equipment level (leadership).

The safety performance indicator (SPI) is an aggregation of the individual KPIs into a single number. The SPI can be calculated at the equipment level (equipment SPI) and at the plant level. Fig. 6 illustrates the owner safety model for an enterprise’s global assets. This model can consist of plants distributed over different geographic regions. A plant is decomposed into lines of equipment (LOE), which have LOPs associated with the plant-safety model, as shown in Fig. 7.

  Fig. 6. Asset-owner safety model.


  Fig. 7. Plant-safety model with KPIs and SPI.


Underlying the plant-safety model is a safety related KPI framework; it addresses the management of process safety related to plant equipment, business processes, and procedures used to manage the equipment and the capabilities of employees applying these processes and procedures.

Calculating the weighted KPI for a protection layer

The KPI for a LOP can be calculated as:


KPI_LOP = Weighted average KPI of a layer of protection
w = Weight of a KPI7
KPI = Key performance indicator related to plant, process, people (as applicable)
K = Number of KPIs for an LOP
I = Index for counting number of KPIs
J = Index for counting number of LOPs.

Calculating safety performance index for equipment

Consider that a piece of equipment has a number of LOPs. From a safety perspective, the LOPs are of different importance and risk levels. From the LOPA, each layer has an associated risk-reduction factor. The weighted KPIs associated with the equipment can be aggregated and weighted, using the risk-reduction factor associated with the LOP:


L = Number of layers of protection
w_lop = Weight of a layer of protection (= RRF for the layer of protection)
I = Index for counting LOP
J = Index for counting number of pieces of equipment.

Calculating safety performance index for a facility

Consider that a facility has a number of LOEs. From a safety perspective, LOEs are of different importance/risk levels. From the LOPA, each LOE has associated with it a total equipment risk. The SPIs for the LOEs can be aggregated using the total risk factor calculated from the LOPA:


E = Number of pieces of equipment in a plant
I = Index used to count the pieces of equipment in the plant
EQ_RISK = Total mitigated risk for a piece of equipment 8
SPI_PLANT = SPI for the plant

Estimated losses associated with LOE risk and plant

Based on the SPI, a safety performance state can be calculated. For example, the SPI can have ranges such as good (> 95%), warning (90% to 95%) and bad (< 90%). Associated with each LOE is an asset impact. For example, the asset impact may be defined as S0 to S5, as shown in Table 2. Incremental estimated asset value-at-risk is a safety performance adjusted metric (expected value) that can be calculated using the SPI, the safety performance state and the asset impact.

For example, the incremental asset value-at-risk can be estimated as follows: 100% of the asset loss value-at-risk if the safety performance state is determined to be “bad”; 50% of the asset loss value-at-risk if the safety performance state is determined to be “warning”; 0% of the asset loss value-at-risk if the safety performance state is determined to be “good”:

LOE: Estimated incremental asset value-at-risk:


The plant-level incremental asset value-at-risk can be estimated by adding the estimated incremental asset values-at-risk for the LOEs with the facility. The plant-level incremental production value-at-risk can be estimated by adding the incremental production values-at-risk for the underlying lines of equipment:


For a corporation with many plants, the incremental asset values-at-risk and the product values-at-risk can be aggregated as:



To display the SPI and related incremental asset value-at-risk and incremental production loss, dashboards can be used, as shown in Figs. 8 and 9. The plant-level dashboard could display the plant safety-performance data and provide drill-down capability to the underlying KPIs for analysis of the underlying causes of identified risks. Once identified, corrective action plans can be defined and implemented in a timely manner to avoid costly catastrophic safety events.

  Fig. 8. Example of a corporate dashboard.

  Fig. 9. Example of a plant-level dashboard.

Best practices and lessons learned

As proven with the name of the American Fuel and Petrochemical Manufacturers’ (AFPM’s) safety conference, i.e., the National Occupational and Process Safety Conference, the refining and petrochemical industries are clearly focused on PSM as a key component of their operational strategies. To support these operational strategies, there are nine steps or best practices to use when implementing and maintaining an effective process safety-performance management system:

Step 1. Establish the organizational arrangements/relationships needed to implement indicators.

Step 2. Decide on the scope of the indicators.

Step 3. Identify the risk-control systems and decide on the outcomes.

Step 4. Identify critical elements of each risk-control system.

Step 5. Establish the data collection and reporting system.

Step 6. Review (benchmark against the IE PSM Framework or equivalent).

Step 7. Deploy the KPI model and SPI calculations.

Step 8. Educate management on the importance of PSM.

Step 9. Establish management roles and actions for review of KPIs, SPIs, estimated asset value-at-risk and estimated production value-at-risk. HP


1 “A Canadian Perspective of the History of Process Safety Management Legislation,” 8th International Symposium: Programmable Electronic System in Safety-Related Applications, Sept. 2–3, 2008, Cologne, Germany.
2 Center for Chemical Process Safety website:
3 Center for Chemical Process Safety website:
4 H. J. Toups, LSU Department of Chemical Engineering, with significant material from SACHE 2003 Workshop.
5 Managing the Risks of Organizational Accidents, Ashgate Publishing Co., 1997.
6 Energy Institute, London, 1st Ed., December 2010.
7 A weight of 0 signifies that a KPI is not used.
8 This is equal to the sum of all the mitigated risks for an item of equipment.

The authors 
  Martin A. Turk, PhD is the director of Global Industry Solutions for the HPI for Invensys Operations Management at Houston, Texas. For most of his 40+ years of experience, Dr. Turk has been involved in engineering, consulting, sales and marketing activities related to process automation. These activities include process simulation, advanced control and information/automation system strategic planning. Dr. Turk is responsible for definition of industry-specific solutions for downstream petroleum refining and petrochemicals, participation in industry conferences and working with Invensys clients worldwide to identify and quantify automation opportunities in their manufacturing facilities that will provide them with significant returns on investments. He received his BS degree in chemical engineering from the University of Dayton and his PhD in chemical engineering from the University of Notre Dame. Also, he has published technical papers and made presentations at domestic and international seminars on a variety of subjects related to advanced automation solutions for the process industries. 

  Ajay Mishra is the R&D program manager at Invensys. He helps define the detailed features and technology roadmaps for the Triconex branded safety & critical control products. Mr. Mishra holds a BSEE degree from the College of Engineering, Pune, India and an MBA from the UCLA Anderson School of Management. He has over 20 years of experience in safety and critical control systems in process control SIS, and railways systems including product development, project engineering, project management and product management. Mr. Mishra is a TÜV certified Functional Safety Engineer for hardware/software design (IEC 61508) and Safety Instrumented Systems (IEC 61511).  

The Authors

Related Articles

From the Archive



{{ error }}
{{ comment.comment.Name }} • {{ comment.timeAgo }}
{{ comment.comment.Text }}