The present disclosure relates to the field of intelligent alarm management, particularly in industrial processes.
At least some industrial processes, e.g., in a plant, may be of such a complexity that its behavior is not always clear to operators and/or service personnel. Particularly the recognition of abnormal behavior may be difficult and/or error-prone for at least some situations. This may become even more complicated, because sometimes too many alarms may be raised to get a clear understanding of the current criticality of the industrial processes.
In a general aspect, the present disclosure describes an improved alarm management for industrial processes. One aspect of the disclosure relates to a method for finding an abnormal behavior of an industrial process. The method comprises the steps of:
The machine learning model 10 further has score data 30, which comprise a first criticality value 32 and a fourth time-series 34. The first criticality value 32 may be based on and/or may be a function of a current observable process-value PV and/or on a first time-series 21 of observable process-values PV, thus considering a longer time-span of PVs. Said function may be built by a mapping device 62. The mapping device 62 may further output the first criticality value 32 to an alarm display 64. There may be one or more alarm displays 64 in the system. The alarm display(s) 64 may further be fed by another components and/or modules of the system, e.g. by sensor-outputs, like temperature, pressure, and many more, dependent on the industrial process 50. The fourth time-series 34 comprises at least one predicted observable process-value PPV of the industrial process 50. The prediction of the predicted observable process-value PPV may be based on historical data. The machine learning model 10 outputs an output value 40. The output value 40 comprising a second criticality value 42 and a fifth time-series 44. The fifth time-series 44 may be similar to the fourth time-series 34, and/or simply “feed forward” the fourth time-series 34, thus making simulation results available as an integral part of the model’s data. The second criticality value 42 is a function of the at least one predicted observable process-value PPV. Hence, the second criticality value 42 is a kind of “condensed knowledge” of the process behavior, possibly including aspects of future development of PVs.
The ANN (sometimes called “ML learning algorithm” or “ML algorithm”) is trained using data samples of process values (PVs) over a time window ranging from time t0 to tn, of control loop setpoints, manipulated variable (MVs), or operator actions over the same time window t0 to tn and planned future setpoint from tnto tend. The target (the process variables to be predicted) are the process values from tn+1 until tend. During operation, the trained ML model will be fed with data similar to the predictor: process values and setpoints from t0 until tnand the planned future trajectory of setpoints from the operator from tn+1 until tend. The model may output the expected plant behavior in terms of future process variable trajectories.
The trainings samples are used to train a machine learning algorithm, for instance a recurrent neural network or ANN. The trained algorithms, for instance the recurrent neural networks with trained weights, will be the surrogate model or parts of it.
The training data are collected in a knowledge-base and are used by a ML-Training system that trains two ML modules: (1) An ML Deferment module: If there exists a combination of setpoints (MV) that leads to a feasible solution of the problem for the specified time-head then the corresponding data are used to train this module. The module outputs a setpoint action for a given time-series that retains the future evolution of the PV in the safe-zone. The setpoint actions are ranked according to their KPIs from most effective to least effective. (2) An ML Delay module: If there exists no combination of setpoints in the knowledge data-based that provides a feasible solution (remain in the safe zone) for the specified time-ahead then the corresponding data are used to train this module. Different setpoint actions are ranked according to the time-delay they can append to the PV before exiting the safe-zone (compared to no-action). There may not exist any setpoint action that can add a delay to the PV in the current knowledge base.
The Human Plant Operator can be employed to guide the ML training and facilitate its effort. The main input of the PO consists in specifying which is the most-suitable MV for each incident or PV. Moreover, the PO can suggest an approximate setpoint-action, e.g. quantify the MV, according to empirical knowledge to assist even further the ML training. The ML training module prompts the PO with a query in a graphical user interface to facilitate the interaction. The query is prompted during low or zero cognitive burden of the operator. This way the ML-training system can significantly limit its exploration space and attain good initial conditions for the training of the ML modules.
If a feasible solution exists for this incident the ML Deferment module is called which initiates all the corresponding actions to retain the PV inside the safe-zone. If the actions undertaken are successful, the prediction system stops issuing alarm predictions.
If a feasible solution does not exist in the Knowledge base of the ML Deferment module that can resolve successfully the predicted incident, the ML Delay module is called. The ML Delay module attempts to insert a time-buffer before the actual alarm is issued. If the PO is under a heavy cognitive load (the PO is already processing multiple issued alarms on other incidents) the ML Delay module selects the maximum feasible delay. If the PO is under low or zero cognitive load the module selects a small or zero delay accordingly.
If no feasible solution exists and no-delay can be added to the evolution of predicted alarm the PO is notified appropriately. The actions of the PO to resolve the alarm are recorded and augment the knowledge base accordingly, so that the incident will be resolved autonomously in a future occurrence augmenting the problem solving capacity of the AID.
In a variant, the alarm logic may still evaluate based on a static threshold but analyses if (a) there is still sufficient time to respond if the alarm is issued at the static threshold, (b) there is more time than required to respond, or (c) the PV may not reach the critical threshold. In case of (a), the logic may activate the alarm earlier than usual and provide the HMI with information why, e.g. projected trajectory. In case of (b), the logic may not suppress the alarm, but add additional information on the HMI that there is still time to respond, e.g. projected trajectory. In case of (c), the logic may not suppress the alarm, but add additional information on the HMI that there may be no need to react at all, e.g. projected trajectory.
In Training, the simulation is executed with a large number of disturbance profiles and combination of disturbance profiles. The simulation produces training data with predictor - e.g., process values, setpoints, alarms and events - and the disturbance profiles used during the simulation, either as continuous signal or just as disturbance identifier. In a second step, a Machine Learning classifier is trained using the disturbance information as label or a machine learning regression is trained to reproduce the disturbance profile. The created model is then used for the RCA task.
During RCA, the RCA is request either by the operator or monitoring system, e.g. an anomaly detection system. The data collected from the plant is fed into the machine learning model. The output may then be presented as probable root causes to the operator. If for certain type of disturbances counteractions are known (recipes), either by definition from experts or from machine learning, the system can (4) recommend these measure to the user or directly trigger the execution of the actions.
A variant may comprise to try the actions from the disturbance recipes first on the surrogate models and evaluated. The course of actions - e.g. timing, sequence, values of setpoints, etc. - may be varied in an optimization loop, e.g. using Bayesian Optimization, and may optimize the action based on an objective time, e.g. minimize time-of-execution, maximize throughput during execution, etc.
A variant may be implemented in deployments that run a Digital-Twin of the plant processes. The digital-twin is digitally replicating the plant-process using model-based dynamics. However, those are just an approximation of the real-process and they slowly deviate from what happens in the plant. The standard practice is to synchronize the digital-twin to the physical plant-process using measurements from the latter. However, these measurements are not always sufficient to distinguish between different internal states that may be producing the exact same measurements. This may have a different impact in the future evolution of the plant. The digital-twin may not run one instant that conforms with the state of the real-plant, but multiple possible scenarios, weighted according to some probability. Keeping, discarding or reweighting these scenarios can happen using the ML Model. Whenever a signature of a disturbance is detected, some of the instances that are running in parallel are discarded or re-weighted. Additionally, if no internal state exists that conforms with the ML Model, the ML may need to augment its training data-based using the relevant scenario from the digital-twin.
In a step 81, a machine learning model 10 (see
The industrial process may be run in an industrial plant, as used, e.g., in chemical and process engineering. The industrial process may be configured for producing and/or for manufacturing substances, for instance materials and/or compounds. The abnormal behavior may be a behavior that deviates from an intentional behavior of the industrial process and/or of the industrial plant. The abnormal behavior may be indicated by an observable (“external”) value - such as temperature or pressure in a vessel of the plant - and/or may be indicated by a non-observable (“internal”) value, for instance an internal non-intentional disturbance of mixed compounds. The abnormal behavior may lead to an alarm, either in short-term, e.g. immediately, and/or in some temporal distance, e.g. in a couple of seconds, minutes, and/or other time-spans.
The machine learning model is an artificial neural net, ANN, which is used after a training and/or a training phase. The training may be done once or may be repeated during the model’s use. The training may be done by means of input data and score data; however, further data may also be used for the training. The time-series of the input data and/or of the score data may be based on data recordings of the past. Due to this, a “future behavior” of the industrial process may be “known”, i.e. may be part of the time-series. For instance, a fast change of one process-value may have led to a critical situation in a couple of minutes, whereas a fast change of another process-value may have turned out to be uncritical, even if an alarm has been raised. The input data may comprise: observable process-value(s), non-observable internal variable(s), possibly from simulations and/or e.g. from non-observable disturbances, and/or manipulated variable(s), e.g. from an operator that reacts on an alarm and/or other behavior of the industrial process.
The score data may be rewards and/or punishments of the ANN. The score data may comprise: a first criticality value, which may be a function of the observable process-value(s) and/or of the internal variable(s). The function may be a complex and/or a composed function of one or more variable(s). The function may be a simple one, for instance: “if temperature is lower than 32° C., then alarm5=true”. The predicted observable process-value may be based on historical data, which may show “developing” process-values, for instance: “if temperature is higher than 76° C., then alarm8=true”, because the process-behavior became critical within 2 minutes. The temporal distance may be a fixed one, e.g. 5 minutes, it could be more than one distance, and/or a variable distance, possibly influenced by at least one of the historical time-series.
The running of the trained machine learning model may be done after the training phase. At least the first time-series is applied, during operation of the process, to the trained machine learning model; however, there may further data be applied to the model.
The outputting, by the trained machine learning model, may comprise an alarm and/or and additional alarm. The alarm may be based or may comprise the second criticality value. The model’s output may be similar or different to other alarms, e.g. by other components and/or subsystems of the industrial process. In some cases, the model’s output may lead to a re-evaluation of an alarm, e.g. may lead to an “over-weighting” of one alarm and/or may lead to an “under-weighting” of one alarm. This “correction” may contribute to a more efficient alarm management in the industrial process and/or an easing of the operator’s burden. Particularly, this may ease the recognition of an abnormal behavior of the industrial process.
In various embodiments, the output value further comprises a scenario number of the industrial process, dependent on at least one of the first time-series, the second time-series, and/or the third time-series. In cases, where nor scenario number can be found, an “undefined” scenario number may be output. The scenario number may advantageously contribute to a better understanding what is currently going on in the industrial process, thus leading to a faster reaction and/or to a further investigation of the current - and/or the related historical - circumstances.
In various embodiments, the output value further comprises a fifth time-series, dependent on at least one of the first time-series, the second time-series, and/or the third time-series. The fifth time-series may be similar to the fourth time-series of predicted observable process-value(s). The fifth time-series may comprise to “bypass” the fourth time-series, thus advantageously making use of the knowledge base provided by the plurality of historical time-series. Thus, the method may facilitate or contribute to a prediction of the plant’s behavior, based on given process past and current measurements and/or data and, when considering the manipulated variables, also on planned future operator actions. This, further, may be used to train a machine learning algorithms to create fast and accurate surrogate models for the method above to be used for online deployment.
In various embodiments, the output value further comprises the first criticality value of the at least one observable process-value, i.e. of the current value. The input may also be used as an output, e.g. simply “forwarded” and/or as a kind of “shortcut” of the first criticality value. This further improves the understanding of the current process behavior.
In various embodiments, the method further comprises the step of outputting a manipulated variable dependent on at least one of the first time-series and/or the third time-series. The implementation may comprise a “simple forwarding or bypass” of this value, i.e. of an observable and/or a non-observable value. This may advantageously to a kind of seamless integration of sensor values and simulations results. This may further contribute to an insight if a feasible solution - or “standard solution” - exists for this case. Moreover, new situations may be told to the operator, possibly as an indicator of some particular attention to this situation and/or scenario.
In various embodiments, the method further comprises the step of determining a temporal distance to a second criticality value that exceeds a predefined criticality value. This may advantageously be an answer to a question like: “When will, in this situation/scenario, happen the next alarm?” or “Will, for this situation/scenario, there be an alarm?” This may advantageously be used for being able to “shift” an alarm message to discharge the personnel in some situations. Hence, a time-buffer may be inserted for this particular alarm, possibly dependent on the rising-velocity of the criticality value, e.g. in the future. This may further improve the alarm management.
In various embodiments, the method further comprises the step of determining an increasing-velocity of the second criticality value; and outputting an alarm when the increasing-velocity exceeds a predefined criticality value. This may raise an alarm as an reaction on an acceleration of some process-values, e.g. of a heating up.
An aspect relates to a computer program product comprising instructions, which, when the program is executed by a computer and/or an artificial neural net, ANN, cause the computer and/or the ANN to carry out the method described above and/or below.
An aspect relates to a computer-readable storage medium where a computer program or a computer program product as described above is stored on.
An aspect relates to a machine learning model, particularly a trained machine learning model, configured for executing a method as described above and/or below.
An aspect relates to a use of a machine learning model for monitoring and/or controlling an industrial process.
An aspect relates to an industrial plant, comprising a computer and/or an ANN (Artificial Neural Net) on which instructions are stored, which, when the program is executed by the computer and/or by the ANN, cause the computer or the industrial plant to carry out the method as described above and/or below.
For further clarification, the invention is described by means of embodiments shown in the figures. These embodiments are to be considered as examples only, but not as limiting.
All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.
The use of the terms “a” and “an” and “the” and “at least one” and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The use of the term “at least one” followed by a list of one or more items (for example, “at least one of A and B”) is to be construed to mean one item selected from the listed items (A or B) or any combination of two or more of the listed items (A and B), unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
Preferred embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.
This patent application claims priority to International Patent Application No. PCT/EP2021/059529, filed on Apr. 13, 2021, which claims priority to International Patent Application No. PCT/EP2020/060755, filed on Apr. 16, 2020, each of which is incorporated herein in its entirety by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2021/059529 | Apr 2021 | US |
Child | 17966012 | US |