Part 1 Proactive Technical Monitoring of alarms
Proactive monitoring is a wide-ranging topic with many different methodologies to it but the outcomes we are all looking for is to understand potential problems as early as practical and to take action at the appropriate time to prevent failure or lost production.
Effective operations management is critical to every industry. It encompasses multiple processes and people to help manage equipment, execute maintenance and ensure safe efficient production.
While it may seem complex, expensive and aspirational, it doesn’t have to be that way when you have the right tools and relevant data integrated in context to provide easy navigation to the right information to react proactively.
Security needs to be appropriately managed so that data from the control system network is available for analysis against other relevant data by individual discipline engineers and remote/expert operating centres.
The most common under-utilized source of proactive monitoring is control system alarms. Who has visibility and who responds to these alarms? Generally, only the Control Room Operator.
Control room operators react to alarms based on their current situation. Assets with a high number of low priority alarms can become a nuisance, particularly if they are fleeting alarms that may be ignored by the operators. Providing KPI’s and alarm categorization is necessary with the ability to then extend those alarms to the discipline engineers who will facilitate proactive decisions. This can only be done by empowering those engineers with relevant data for that asset or process. For example, a slow trending vibration issue may not receive much attention from the control room operators but extending the alarms to the rotational engineer will alert the engineer to actively investigate a problem and prevent a potential failure, downtime, or safety event.
Subscribing to alarm categories allows the relevant discipline engineer to take appropriate action as required. Process safety, environmental, production risk or asset risk are key areas that interpret the alarms in different ways depending on the impact of their group.
For example, the environmental manager should be able to subscribe to alarms that are classified as environmental and then have rules around when he/she will be notified and what the recommended action may be. The email might show the details of the alarm but it may also be necessary to access KPI’s or more detailed data by drilling into other operational data or systems. If we want to be able to make the best decisions from our proactive technical monitoring, we need integration of key platforms.
Anomaly Detection, Notifications, Alerts, Analytic Based Events, Operator Logs, IOT alerts…
Prior to an alarm occurring there are likely to be subtle changes that indicate a potential problem. Being time poor, it is no longer acceptable to monitor trends and data feeds looking for a needle in a haystack. That is why we have engines and Machine Learning (ML) right? Well yes and that helps, but it takes a lot of work and data cleansing just to get a point where basic ML models work reliably.
We use the analogy to walk, crawl, run. In many cases, potential problems can be found by making the right information available to the appropriate people to take action. This can be achieved by integrating data in context from time-series data to logbooks to asset maintenance systems etc.
Standard alarm management is no longer the only source or clue to problem identification. Sources now include SCADA, DCS, PI, ML, IIoT analytics, maintenance management, and other knowledge capture. Logic such as that in OSIsoft’s PI Event Frames can show an abnormal deviation e.g. current draw on a motor higher than nominal at steady state, vibration.
Oh, but we are already getting bombarded with a ton of alerts! That is where an integrated operations information solution provides navigation to the relevant data to take appropriate action and it also provides the opportunity to refine the parameters that are used to generate alarms. The right action at the right time can prevent downtime, improve production, and keep the control room operators available to run the plant optimally.
Data that matters – what data?
So, you’re a production engineer and the alarm & event logs are showing several unusual high priority unacknowledged alerts for a key asset.
What relevant data would help quickly determine the problem? Part 2 will examine why it is desirable to integrate key platforms