Managing risk during turnarounds and large capital projects: Experience from the chemical industry
Large capital projects such as turnarounds and shutdowns require the management of a vast number of employees and tasks simultaneously. At chemical sites, where assets are highly complex, ensuring that risks are managed properly is of vital importance both to the safety of the workforce and the success of the project as a whole. Within the framework of turnarounds and shutdowns in the chemical industry, this article looks at the key aspects of risk and outlines how different tools can be used to overcome the challenges of risk management. Using practical experience gained onsite, the most risk-prone aspects in turnarounds as well as ways in which risk management tools can aid project success are highlighted.
1 Introduction
All business activities, regardless of the underlying sector, involve some element of risk. Whether the risk is operational or strategic, concerns market failures or environmental disasters, all business processes will at some point be exposed to risk. The term “risk management” therefore describes the attempt to identify, evaluate, measure, mitigate and monitor risks and their consequences – either on a particular project or on a business as a whole.
Within the context of capital-intensive industries, whose complex assets and processes often require highly technical skills and operational knowhow, the perception of risk is often limited to health and safety rules and regulations. As such, when discussing risk management with managers at chemical sites, the first notion that tends to spring to mind is the way in which his or her team will be protected from accidents at work.
During large capital projects – such as turnarounds and shutdowns, where an entire section of a plant may be taken offline for scheduled repair or renewal – the sheer number of people needed to conduct work onsite means that managers’ concerns over risks to team safety are valid. During such projects, companies are forced to engage contractors to ensure that the huge volume of work can be completed on time. These contractors are often not familiar with site processes and systems including safety protocols and therefore could be considered to be at higher risk of accident or injury than those who work at the site all year round.
While undoubtedly a crucial factor, this viewpoint excludes a number of other risks at play. The vast number of tasks being carried out by multiple workers at any one time means that one particular action (or lack of it) could completely derail the schedule and have repercussions across the rest of the planned work for that day, week or even the whole project. Managers must understand that the risk of one single delay to the schedule could put pressure on some workers to complete their tasks in less time, as well as cause confusion as to what should occur, where and when. Tasks may either then not be conducted or be carried out in a hurried or unsafe manner, which could lower efficiency and impact the safety of all employees while onsite. If the potential impact of risks to the schedule on both health and productivity is not taken seriously by managers, it could have a disastrous effect on the success of the turnaround and ultimately, the bottom line.
With specific reference to the chemical industry, this paper will look at the key aspects of risk within the framework of turnarounds and large capital projects. The term “risk management” is therefore used in this context to refer to the process of identifying, evaluating, measuring and mitigating risks to the turnaround before they occur. The risk of not achieving the project’s defined objectives – such as cost, quality, duration and safety – on time or at all is therefore a chief concern.
Beginning by outlining some of the key problems turnaround managers have when attempting to manage risk, the paper will then examine the use of risk management tools in planning, scheduling and project execution as a means of addressing those challenges. Finally, it will make use of experience obtained while onsite at a chemical plant in France to evaluate the benefits and obstacles encountered during a turnaround risk review where such a risk management tool was used. The ways in which risk is addressed and handled within the industry will be outlined and critically assessed.
2 The problem with identifying and cataloging risk
While most managers understand that risk management is important and needs to be addressed, the way in which it should be approached and dealt with is often misunderstood on a number of levels. For example, a risk register listing risks, their causes and consequences is usually compiled before a project starts and includes anything and everything that could threaten the project. This could range from bad weather to unexpected repairs, missing parts or the absence of appropriately qualified personnel. This register is then filed away and seldom referred to again, if at all.
As each event has a unique set of requirements, some risks which were present at previous projects may have disappeared due to mitigation measures, some new risks may arise which were not relevant in the past, or the impact of existing risks may be greater under new circumstances. As a result, the register provides a good starting point from which to begin the risk management process, but if risk is treated as a static “problem” which does not change over time, it will only serve to give the entire team a false sense of security and will not actually help to safeguard the turnaround’s success in any practical sense.
Part of the reason for this is the sheer scope of risk: there are often so many potential events and delays to a project as complex as a turnaround that even trying to identify and quantify those possibilities can seem extremely daunting. It is also perceived to be time consuming and expensive, so managers lean towards the “so far, so good” approach, where no new actions or processes are developed as those risks have not materialized in the past. It is therefore vital that the turnaround manager understands that the process of evaluation cannot be a one-size-fits-all approach: As the nature of risk means that it changes over time and according to location, environment and circumstances, managers must learn to move away from the idea of it as a static obstacle which can be swiftly overcome with a few meetings and a hastily written report. Dealing with risk in an effective way means taking a dynamic approach to a constantly evolving situation, which in practice means that risks need to be continuously evaluated and measured at different stages of the project.
3 The difference between hazards and risks
The key to avoiding the above mentioned problems is to take an active approach to identifying all of the risks that could occur during the project. This means that the fundamental difference between a risk and a hazard must be understood – where a hazard is latent and only develops into a risk when it directly impacts a project. A good example of this is the weather: in regions where heavy rain can stop people from working, the rain only becomes a risk when measures such as temporary roofing have not been organized in advance by management.
The identification of risks should therefore begin much sooner than most managers realize, ideally at the same time the turnaround is being planned. The use of different tools which give the process some structure tend to aid the process, and usually begin with a risk register as mentioned above. Some teams begin with a brainstorming workshop to identify potential risks to the project, while others start with a list of common risks. Where possible, using risk registers from previous turnarounds to build a list of what could potentially occur within a new project is perhaps the easiest approach. While this provides a good foundation, it can be misleading. Only using the information from previous projects ignores problems that by pure chance did not occur in the past and lays the project open to delays or even failure.
One way of overcoming this is to begin with a non-project-specific risk register; from experience, there are approximately between 100 and 150 hazards that are applicable to most turnarounds. The hazards could range from the late delivery of materials to more banal items such as a lack of parking spaces and gates. When a site normally operates with 600 workers, accounting for the access of 3,000 during a turnaround is vital.
Managers should then evaluate which hazards are relevant to the project at hand and then divide them into themes which allow them to be more easily dealt with. The nine themes to which hazards are typically assigned to include: 1. Scope, 2. Organization, 3. Management, 4. Work planning, 5. Capex involvement, 6. Scheduling, 7. Purchasing and sourcing, 8. Environment, safety, health and quality and 9. Execution.
4 Using a risk matrix to evaluate risk severity
This is where many managers stumble, primarily because the process of risk assessment is by nature rather subjective. What is viewed as highly dangerous to one manager might seem only moderately dangerous to another. It is therefore important that the different tools discussed below are created and evaluated by a team involving not just the turnaround manager, but also managers from production, maintenance and operations who all agree on the threat level and try to give as much detail to the definitions as possible. That way, a more balanced and concrete assessment will be conducted, the results of which will be far more helpful to the turnaround manager during execution.
Once the risk register has been agreed, the most straightforward way of evaluating the severity of the risks listed in it is to apply a matrix to each one. On one side of the table, the probability of occurrence is measured against the impact on the project. On the other, the severity is assessed. When looking at figure 1, on the left hand side of the table, the severity of the impact of a particular risk is clearly defined across a number of different fields, from health, safety and environmental to media attention and financial impact. This is then measured on a scale from one to four, where one is severe and four is significant. Consequently, if a risk evaluation team considers a risk to have the potential to cause serious injury, could be reported by local news stations and has a 50% probability of occurring, it would be in the 2C category. However, if the probability of a risk occurring is high – such as between 90 and 100% – and it carries equally high human and financial costs, it would be classified as 1A and require immediate action.
This evaluation is necessary for all of the risks on the risk register and can then be used to prioritize the risks at hand in a structured and coordinated fashion. If a risk is agreed to be in the bottom right, dark green corner, mitigating actions are not necessary, whereas risks classified in the orange, red or dark red areas would need imminent or even immediate attention.
5 Preventing risks from occurring: The bow tie model
Understanding probability and the severity of risk is only half of the process. Countermeasures which prevent that risk from occurring must be defined and acted upon. The bow tie model (figure 2) is another tool which can be helpful to managers in the visualization of how the approach to risk should be structured.
On the left hand side, the hazard is described along with “barriers” which can be put in place to stop the hazard becoming a risk. In the middle, the risk is clearly defined and on the right, the consequences and countermeasures are noted – what some might call a “plan B”. To refer back to the original example of poor weather as a hazard, a typical barrier would involve putting temporary tents in place so that, should heavy rain occur, work can continue without interruption. Another barrier example would be to have a contractor on standby should another contractor not be available or is too slow. Most of the time, relatively simple, mundane things go wrong and therefore, taking the time to address even the most minor risks at an early stage can save significant time, energy and money in the future.
Once these steps have been taken, planners can then work together with maintenance and production to ensure that the most important, dangerous tasks are attended to first and action is taken to prevent them from occurring. Repairs and their actions must be written into a system which records their status as well as the nature and date of preventive action. Once that action has been taken, risks can then be reclassified to assess their criticality.
The key to the success of using this model lies in the consistent assessment of risk on a regular basis, which is where the use of risk management tools can be helpful. Beginning with the bow tie model, the onsite team or external risk experts can use detailed questions and answers to establish how critical the impact of a risk would be, usually placing it on a scale of one to five. The risk exposure index (REI) in figure 3 below is an example of this in practice, where the impacts of a documented risk on quality, costs, environment, safety and health and duration are plotted on a graph to show exactly the level of urgency with which preventive measures are needed.
Where the risk matrix helps to highlight the severity of risk – that is, the consequences the occurrence of a risk would have on health, safety, environment and business impact – the REI provides guidance as to the urgency with which a particular risk should be tackled in the function of time remaining, i.e. the criticality of a risk. Understanding the difference between severity and criticality of risk is vital to making pragmatic decisions in the context of a turnaround. The team is in a constant race against the clock, both in terms of the start date and the duration of the whole project. This may mean that, paradoxically, a risk of moderate severity (in terms of potential consequences) might be highly critical because mitigation measures have a throughput time that may not fit into the remaining turnaround duration. For example, the acquisition of spare parts is important for the production process, but working on that process may not be difficult, nor expensive, nor pose health, safety or environment issues. However, obtaining the part, which may have to be manufactured or imported from the Far East, will take 10 weeks with only eleven weeks remaining on the clock. It is therefore critical that the action is taken forthwith, even if the consequences of the risk as such would not rate “severe”.
The REI can further be used along with different mathematical models to create a risk threat potential (RTP), which gives a snapshot calculation of impending additional costs. The example in figure 4 is taken from a project where the REI was used to estimate the impact of a risk on days of production lost. The risk was judged to have the potential to increase turnaround duration beyond what was agreed and required action, meaning that the supplemental production loss would have translated into non-generation of revenue. Based on the different cost and price information provided by the client, the figure of € 5.23 million therefore represents a time, site and product mix which is specific to a particular market and economic situation.
Typically, addressing a clearly identified and correctly weighted risk could entail providing supplementary resources in order to reduce the throughput time of an at-risk activity. When the financial exposure that not dealing with a risk is known, the decision process is lifted out of the emotional phase and the cost of mitigation measures can be compared to the impact that doing nothing may have on the bottom line. For example, when the activities that will be carried out during a turnaround are known to produce a given quantity of effluent over a given period, this can be set off against the available treatment capacity. Any shortfall can be identified up front and countered beforehand, either by modifying the pattern of activities to alter the outflow or by bringing in temporary treatment or storage capacity or by moving untreated effluent off-site, or a combination of these.
6 Managing schedule risk
As seen, the process of addressing continuously changing risks can be aided and structured by the use of different tools to clarify and quantify the potential impact that risks can have on a project. When it comes to ensuring that tasks are done in a timely, efficient manner and in a way that balances minimum downtime with realistic time allowance for the work to be executed, the turnaround schedule is a particular source of concern to many managers, mainly because one single day leads to significant loss in production and hence profit margin. The schedule quickly becomes redundant if it is not kept up to date and viewed as a static document which does not change over time. As with project risk management, the schedule itself must be treated as a dynamic tool which aids and structures execution. Thus, the potential impact that risks have on a schedule must also be treated in a dynamic fashion.
A tool that helps address this aspect of risk – on the schedule as separate from the project as a whole – is applying Monte Carlo simulation, which utilizes algorithms to evaluate and quantify time and cost risks. By working through all project scenarios and analyzing various types of recorded risks, the simulation works out the potential impact of the combined schedule risks giving planners concrete information regarding the likelihood of meeting deadlines and the project end date. For example, a particular risk to the schedule could produce the outcome that the probability of reaching the project end date would be only 20% and take an extra five days to achieve the target probability of 80%.
As the simulation uses mathematical data, the quality of the results is dependent on the quality of the input data. The more accurate the timeline and information for expected completion times for tasks entered into the system, the more accurate the results will be. Often, this information is not on hand and needs to be requested from on-site experts, but the effort to obtain detailed information is worth making as the simulation then allows managers to link up work packages in a dynamic plan and assess how much the end date will shift according to different changes to the schedule. As a result, planners can adjust schedules on a continuous basis and are able to react to situations as they occur with a better understanding of the impact of their decisions on the project as a whole.
7 Risk management in practice
7.1 A case study applying tools in a turnaround project
Risk management tools such as those mentioned above were recently used during a turn-around review at a chemical plant in France. The onsite team was aware of the need to address the potential risks to the project but did not have the internal resources available to properly address the issue. As such, a structured questionnaire was conducted with key personnel at the site using questions which are specifically designed to elicit the appropriate information concerning turnarounds.
Ranging from broad asset management strategy to how work permits would be managed during execution, approximately 90 questions were asked and answers rated from one to five, with five being the best. Although certain topics such as the language skills of contractor personnel, the state of plant documentation and the organizational setup of the turnaround team have demonstrated a particularly high level of risk-sensitivity, the questions were structured in such a way that those topics did not disproportionately dominate the answers in order to gain as realistic a picture as possible.
In order to counteract the subjectivity of the assessment from both sides, model answers to the questionnaire had already been defined. For example, one answer read “80% of the work for the upcoming turnaround has already been carried out in the past by the same contractor” which would rate a 4 on the scale. This means that the process is weighted heavily towards a fact-based assessment and less prone to the interpretation of the interviewee.
Once all scores were gathered, they were then weighted with regard to how much time was remaining before execution in order to get an indication of prioritization. As a result, a clear list of the most risk-prone aspects of the project was produced which was then used along with the value of a day’s worth of production to calculate the potential financial exposure created should a particular risk occur. Following this process, the management team was able to fully understand both the potential impact of risk on the project as well as the importance of addressing it as early as possible.
7.2 The most risk-prone aspects of turnarounds
In practice, one of the single most frequent topics that poses the greatest risk to turnarounds is the lateness of the availability of actionable information. Most companies carry out major projects on their sites continuously, which may concern civil engineering/construction (e.g. building or demolishing structures or roads, resurfacing roads), utilities (e.g. (re)laying or removing water or sewer conduits, power lines or working on electrical substations, wastewater treatment lines, sometimes even product supply lines), or engineering (e.g. introducing new technology, modifying or expanding the production environment). Most of this work is of no concern to the turnaround, but parts of it will have an impact, either in terms of actual interaction, i.e. the equipment affected by the turnaround changes, or as interference, i.e. by blocking roads, by interrupting power supplies or by putting a crane in the middle of the area where the turnaround is supposed to take place. Often, a small amount of vague information is provided regarding these other projects but the precise detail of what will happen and when is only communicated very late in the day (or not at all), leaving the turnaround team to suddenly realize that a lot of the assumptions they have worked with are not valid.
Another frequently-observed risk is the scope never reaching freezing point. The theory states that more than a year before the planned start of the turnaround, all possible items should have been selected, challenged and rejected or confirmed, in order to allow planning to move forward. A major part of the turnaround scope is inspection work that is mandated by the government and that cannot be done at any other time, for example, entering into a production vessel to check it for wear and tear. This is normally well-defined on a multiyear calendar. In theory, including all of this work in the scope should be feasible well before the start date, but in practice, it rarely is. Teams often find that even legal inspections can surface quite suddenly and late in the day.
Furthermore, cultural problems – whereby the senior management does not enforce the “scope freeze” practice and allows the late addition of major jobs – can create risks. Failing to restrict the amount of tasks to be included in the turnaround, is a risk in several aspects, e.g. regarding the identification of the resources needed both in terms of numbers and of trades, the selection and contracting of third parties to provide these resources, the definition of what needs to be procured in terms of materials and equipment (if a major vessel or piece of plant needs to be replaced it may take a year or more for it to be fabricated, quality checked and brought to the site), the reservation of cranes and special tools, and so on. As chemical plants are often clustered and they all conduct turnarounds, even getting the right people in sufficient numbers can be quite a challenge.
A final example of a “typical” risk is the organization and management of the actual turnaround execution. An unusually large number of people, many of whom may never have been on site, need to get work permits and access the equipment for the specific job they are scheduled to carry out at a particular time. They are also required to deliver quality work (“first time right”), be able to get onto and off the site, and need a place to eat and wash. As the client organization is ultimately responsible for this, steps need to be taken to manage the extra workforce and its needs. This requires preparation months in advance as well as a high level of qualified personnel which is often underestimated and may have adverse consequences. Firstly, in view of the limited amount of time of the turnaround window, any delay has knock-on effects. Secondly, when things turn out to have been done substandard on start-up, it may mean the whole plant has to be taken down again and start again from scratch.
8 Conclusion
The use of risk management tools to evaluate risk is in no way a new concept, but the way in which they are used often restricts their impact. As turnarounds and large capital projects involve so many variables and are constantly changing, it is vital that managers overcome the idea of risk as a static obstacle and understand that in order to address it properly in this context, a dynamic approach must be used. This means that the tools created to aid risk identification and mitigation must be used on a continuous basis and applied to planning and scheduling processes as the event continues.
Part of the problem with risk management as a whole is that many people have great difficulty understanding the very concept of probability and tend to rate the risk of something spectacular occurring far higher than of something mundane. As a result, risk evaluation processes tend to focus on extreme risks and preventive measures and ignore the risk of rather more “standard” risks occurring. This can give managers a false sense of security during execution and expose the project and the team to very real dangers. As a result, really understanding the probability of risks occurring and being able to evaluate both their severity and criticality requires a good degree of experience and judgement.
Within the industry, more must be done to improve the understanding of and definition of risk. Companies should be prepared to commit both time and money to in-depth risk evaluation and to ensuring that managers are fully aware of what risk actually means in practice. When such large numbers of workers are involved and the volume of product and money at stake are vast, getting risk management right is not beneficial, but an absolute necessity.