October 2017


Roundtable: The reality of process safety risk

Unavoidably, process safety risks are often managed in different parts of an organization.

Cline, G., Keim, K., Aberdeen Group; Neill, M., Petrotechnics North America; Thomas, J., Process Improvement Institute

Unavoidably, process safety risks are often managed in different parts of an organization. Consolidating these risks to view their impact on the operational reality of hydrocarbon assets is a real challenge.

What the industry needs is to make sure everyone assesses risk using the same criteria, and has a practical understanding of how their decisions directly or indirectly influence risk and, ultimately, process safety performance. By making process safety more operational—ensuring frontline personnel are aware of their roles and responsibilities, and are effectively and consistently implementing processes and procedures—we can reduce incidents and improve sustainable production.

What is the reality of risk in the hydrocarbon sector? In this roundtable, senior industry executives discuss what happens when process safety intent meets the reality of operations. The roundtable participants include Greg Cline (GC) of the Aberdeen Group, Kelly Keim (KK) of ExxonMobil (retired), Mike Neill (MN) of Petrotechics and Jeff Thomas (JT) of the Process Improvement Institute. This roundtable will discuss how we think we manage risk, how we actually manage it, and how we can improve it practically and tangibly.

Industry regulation is at an all-time high. Every operator is committed to safety and risk avoidance. Why do you think incidents and accidents still happen?

JT: There are several reasons why accidents still happen. First, not all countries have process safety regulations. Second, even where good, detailed regulations exist, it is hard to implement all of the processes and procedures they require 100% correctly, all of the time. There are often conflicting priorities, particularly in the field, between safety, production and cost. In addition, there are often no thorough operating and maintenance procedures that cover all modes of operations, such as startup, shutdown and other infrequent tasks. In some cases, companies in countries without regulations have implemented excellent process safety management (PSM) programs; adding regulations may not always be the answer.

GC: Incidents and accidents depend on many things, including the regulatory environment and the overall level of safety awareness. Often, it is just human nature. People try to prepare and create a culture of safety, but slip-ups happen.

JT: Also, people do not always understand all of the hazards or safeguards. They get used to doing things a certain way, and if nothing has happened, they feel it is okay to continue, even if it is not the safest thing to do. In addition, we do not often identify all the hazards, especially those related to infrequent modes of operation—like startup and shutdown—where a majority of incidents occur. Human factors are not generally evaluated and included in most PSM systems, so we often “set the operators up to make errors.”

KK: It is important to note, accident rates for process safety incidents across the refining and petrochemical industries are actually incredibly low.

MN: I would say that most people in the industry think, “I could almost guarantee we will have an accident,” rather than, “I can guarantee that we won’t.” But they do not know when, and they do not know how big. The chances are that if you are a big organization with a lot of operations, you pretty much know that something will happen eventually.

KK: The good news is that the American Petroleum Institute (API), American Fuel and Petrochemical Manufacturers (AFPM) and the American Chemistry Council (ACC) collect information on causes and causal factors on a consistent basis. They are beginning to get a much clearer picture of process safety related issues. Traditionally, the industry looked at facility causes—equipment failure, corrosion, etc. Those risks are still big factors, but the greatest proportion of incidents, based on industry evidence, is related to human performance, which is people failing to execute a procedure properly or missing an operating step.

MN: There is a lot of focus on humans as the weak link in the chain. That is obviously part of it, but as much as we blame individuals when things go wrong, we need to credit them for reacting and recovering from problems. Where poor decisions are seen as the root cause of incidents, we need to examine whether competence was lacking, or if people just did not have the correct information with which to make a decision.

KK: We are only just learning how to classify—let alone improve—human performance. There is still a lot to do. The industry does seek to get better by making reference to the nuclear and airline industries as shining examples. Those industries addressed things like equipment design, maintenance, management systems and people simultaneously, without preference. For example, in the oil and gas industry, the Mexico City and Bhopal disasters sparked PSM regulations in the US, focusing on systems and then eventually people. I think the nuclear and airline industries have been far more successful vs. the oil and gas industry’s phased approach.

Is there a gap between what process safety key performance indicators (KPIs) and operational management systems are telling us, and the feeling on the front line?

MN: I think there is. I have heard, anecdotally, from operators that the KPIs say one thing and the reality at the plant is another.

JT: A lot of people are still trying to figure out the process safety indicators they should focus on. We have had API standards in place for less than 10 yr. There is also a communications gap between field and office personnel, engineers and management who set up process safety indicators and processes. Generally, these indicators are not clearly communicated at an operator level in terms of what they are and their importance. I am not sure actions are taken as a result of the process safety-related data and the KPIs produced. One important KPI mentioned in the Center for Chemical Process Safety (CCPS) book on incident investigation is near-miss data. It is critical to report both incidents and near-misses, and periodically analyze them to determine causal factors and root causes to prevent future incidents.

KK: I do not think we do a great job on KPIs. I know very few sites that make a big deal of reporting their process safety performance to operations. They also do not publicize their safety-critical equipment performance and inspections. If operations are not aware, then performance starts to slip.

GC: There is always a gap, and there should not be. We need to put capabilities in place to minimize gaps and ensure that metrics are available enterprise-wide. Also, it is important that peoples’ perception of certain metrics match the reality of operations.

MN: Major accidents are low-frequency but high-consequence. If something happens, you cannot really make a judgement on whether there is a trend, or whether you are particularly vulnerable. Some people try and extrapolate near-misses and look at other performance indicators, but a lot of KPIs are based on how well an organization implements safety processes.

KK: Evaluating risk is always somewhat subjective. For the most part, companies have not been terribly transparent in the information they use for monitoring process safety risk. Most people can point to their numbers for personnel injuries and behavioral safety observations, but catastrophic events are rare, so they are not front-of-mind, even if the risk is always there.

Does the reality of risk management measure up to the intent of risk management?

JT: I would say most companies probably recognize their process safety performance is not where they want it to be. But we are doing a better job of understanding risk than we did when I started, 30 yr to 40 yr ago.

MN: People are experienced enough to know that hazardous industries mean risky business. I do not think people would publicly admit that risk is unpredictable. However, other industries—nuclear and airline—have managed to eliminate some sources of unpredictable risk. These sectors put a lot of emphasis on training, stop-work authority and redundancies in design so that if a system fails, there is another that would take over. In the process industries, we have become somewhat normalized to risk, and we do not come anywhere close to investing the same level of risk management resources. There is a lot to gain from investing in safety. Typically, with safety comes improved operational performance.

KK: I do think there is an undue confidence at both the executive and field levels that “those things just don’t happen to us.” There is not that everyday sense of caution that should be present in people who are one procedure away from a major catastrophic event. Most plant workers and managers have never experienced a major process safety event, so they believe it will not happen to them. We know that is not true.

GC: Real safety happens on the ground when people internalize it and do not view it as a burden on everyday business. That means risk exposure must be made visible, prominent and available so everyone can understand its impact on the operational reality.

KK: When I first started in the industry 40 yr ago, fires and explosions were relatively common. Most workers had experienced one. There was a belief that these could happen, and people paid attention to avoid them. There was maybe a negative that people felt responsible for putting their own lives at risk to minimize those events. Thankfully, we have almost eliminated this ‘cowboy approach.’ The industry has the newest and rawest process safety data. We have really only been managing it for approximately 5 yr. With more time and data, we will be able to say whether we are better than we think.

GC: No! I think PSM is always aspirational, and the relationship between process safety and its impact on front-line operations can be better understood.

JT: There are gaps in most cases. There has been a lot of work focused on developing PSM systems, improving risk related practices and developing PSM tools. However, there is often a “disconnect” between what the practices and processes intend and what actually happens at the grassroots operator level. Lots of companies are working on it, but I do not know of any that have a magic bullet.

MN: Process safety is in a different part of the organization, so operations personnel struggle to understand some of the language and how to apply it to their reality. Process safety people sit in a world of scenarios and models in which it is easy to diverge from reality. It is a bit like your house being about to fall because it has wood termites, but I am spending all my time painting it! I focus all my efforts on a process for painting. It is a false sense of security. Operators need to know how to practically apply process safety in the plant.

KK: Operators do not get a good picture of how change affects risk management, or the aspects of the job where they are the critical factor in managing risk. Often, when investigating the failure of an asset, the question to operations is typically: Why weren’t you paying better attention? The challenge back: Pay better attention to what, and how?

MN: Process safety designs safeguards. It does not really look at how risk is managed in real time. Process safety teams are also not a strong voice in the organization. They do not have a significant budget, and are always vying for priority with plenty of other groups in the organization.

KK: Our risk models rely on the operator for 99% execution. We do not often explain where operations teams really need to be at the top of their game. We do not explain that when facilities change, they are potentially operating in a higher-risk environment. The US Chemical Safety Board (CSB) report on the explosion of the electrostatic precipitator in the Torrance refinery pointed out that as operations became focused on the tasks required to complete the shutdown, they became unaware that the situation continued to change. They did not know the importance of the key process safety barriers they controlled.

JT: It takes a lot of hard work and communication between the engineer, the management and the operators on what risk management is all about.

Who is responsible for managing risk?

JT: From the CEO to an operator, mechanic, engineer, supervisor—all levels of management and workers. Everyone has a key and different role to play, but risk management should permeate throughout the organization.

KK: We are a long way from being able to take the operator out of risk management, particularly in refining and petrochemicals. Management is responsible for having systems in place to make operators aware of changing risk patterns. Ultimately, executives must recognize that this is part of managing process safety risk.

MN: Ultimately, it lands at the top of the tree. Executives must make sure the right people are involved in the right processes and that they do the right things. However, I would say operations are in control of the plan. They are at the sharp end, so they should be satisfied personally that the risk level is acceptable. That said, where there are multiple levels of decision-making, it can be confusing when it comes to who owns risk.

GC: In our most recent environmental, health and safety study, about one third of respondents have a formal risk management organization in place. That is presumably how they establish a framework for risk management. Does it build a risk awareness culture across the organization? It can. Whether those companies also have the necessary collaborative approach across business units to make it happen is another question.

What critical process safety information do people who make the daily decisions about operating a plant need?

GC: When we talk about making daily decisions, operational data must correlate with the management of process safety and vis-à-vis. Management needs to analyze the plant and the processes that relate to PSM. Then, this needs to be incorporated into operational dashboards in an actionable way.

MN: Operators need data that clearly shows if something unexpected is happening, what the impact could be, how that affects the program of work, the threats it creates and the effects of any remedial reaction. Their number-one priority is containment. They need data on the integrity of pipes and vessels and the condition of the actual detection systems themselves.

JT: People need a lot of information to make decisions. KPIs are needed at the management level to help make decisions about operations, resources and priorities. At the engineering level, they need inspection and test data to help determine frequencies of maintenance and repairs. The operator needs data to understand the current state of a process and what the risk is of the tasks they are completing.

One of the key issues is that there is so much data; it is hard to figure out what is meaningful. You need to clearly identify that type of information. The importance and the timing of activities are key, so operators can determine what is urgent and what can wait. You need a whole picture of risk based on data so decisions are not isolated from everything else that is going on in a facility.

KK: That consolidation of information is certainly vital to more rational decision-making. The trouble is we do not provide consolidated systems for operations to effectively assess if they can take one more step in their procedure.

For example, the Deepwater Horizon event needed approximately 11 layers of protection in place to prevent the scenario that happened. One by one, those layers of protection were whittled away. The response was always, “Well that is okay because we have got this other ultimate layer of protection.”

It shows that even a plant with multiple protection layers can experience a major hazard because of an accumulation of relatively harmless decisions. The existing process safety barrier status must be visible to operations, but also to management so appropriate decisions can be made.

MN: Ultimately, operators need data that shows whether it is safe to operate the plant.

How well-informed are front-line leaders and workers about the role of process safety barriers in preventing incidents?

KK: I would say they are only barely aware of the layers of protection. In many cases, operations—even first-line engineers—are not aware of the scenarios that could lead to a catastrophic event in their unit. The scenarios have never really been collated in a useful way for them. I think there is a general failure to really communicate, on a shift-by-shift basis, the status of key barriers on any given day. For example, I spoke to a team recently where there was something wrong with a detection device for a piece of safety-critical equipment. The company said that the operator is going to pay more attention, but nobody translated that into what that meant for the operator, and how they would do it. That is the most important thing when operations are making daily decisions.

JT: It varies by facility. To be frank, some do not have a clue, but some are doing a pretty good job. There is an opportunity for improvement to ensure operators, maintenance technicians and the front line really understand key hazards, safeguards and the ideal state of the process safety barriers. That is critical. I have not seen a lot of facilities where they really have a good handle on the barriers and how they interrelate to prevent an incident.

GC: I think building a culture with the right tools, right attitudes and right training can enhance the awareness of process safety barriers by making them part of the standard operating procedures of front-line leaders and workers.

MN: I think that there is still a lack of information available. The further down the chain you go, the more abstract some of that information is. I am not sure people really understand risk and what it means to them. That can put them in a vulnerable position to be exposed to risk they do not understand. If they did understand it, I think some of their decisions might be different. I think that is the industry’s challenge. We need to give the front line the ability to be better informed about the possible consequences of their actions, even when making minor decisions.

What are the obstacles to access this information in a timely manner, and how can they be eliminated?

JT: There are a few. First, we have so much data, particularly with things like digital process control systems (DCS), safety instrumented systems (SIS), maintenance systems, etc. We get information overload, and it is not always clear what is most important. Second, there can be a lag in the data. We do not always get it when we need it, and things can be missed. Third, maintenance management and DCSs do not always make it easy to extract data. And that is just the start!

KK: The information is there, but it is often in lots of different systems—some of which may still be paper-based. Even for a process safety engineer who has been on the site forever, it will take time to pull all of that data together. If it is not consolidated and condensed in useful forms, nobody actually uses it for making critical risk-based decisions.

GC: The Industrial Internet of Things (IIoT) is enabling a new era where we have the capability to monitor and improve processes to ensure that they are safe. Safety must be implicit. I think, to the extent that operators can connect operations with the information needed, via IIoT or another framework, they can overcome risk and help prevent incidents.

MN: We need to connect the data we have. We also need ways of assessing the impact of doing something or—equally important—NOT doing something. From maintenance and asset integrity to drilling and subsea, individuals need multiple viewpoints. That is the source of informed decision-making: using technology to put everyone in a much better position.

How can operators maintain their safety and risk management standards over time?

KK: For the most part, operators do not get feedback on their risk levels, let alone their risk management performance. Even companies that are doing a good job of tracking tier-three and tier-four process safety indicators are basing performance on lagging data. They are certainly not communicating this to operations. If you do not get good feedback, you cannot improve. I am not aware of anyone using any process safety solution, other than the most basic tools, such as work permit systems, to manage it.

JT: I think there is merit to having a tool that shows an overall picture of hazards, operational risk, barriers and safeguards—updated on a real-time basis. Constant communication with operations is key so that they know the impact of any change—for example, management of change (MoC), and how best to adjust.

GC: Safety and risk factors change all the time, so best practices must be responsive to changing conditions. Creative solutions can help organizations maintain and improve their safety performance over time.

MN: Safety standards define our risk tolerance, and risk tolerance is not an exact science. It is an interpretation of risk and whether certain outcomes are acceptable. Managers would love to have a physical device with traffic signals that tell them they need to do something or prioritize differently; we all would. But it is more about being sure that systems are effective. It is about an attitude of constant vigilance and questioning—giving people confidence in each other and their data, and empowering them with systems they can rely on. HP

The Authors

Related Articles

From the Archive