The following relates to a computer-implemented method for a combined machine learning model. Further, the following relates to a corresponding computer program product and technical system.
Currently there is a trend of digitalization in the industry domain. Hence, e.g., a manufacturing process for a product may be digitally controlled. For describing several facets of this trend, the term “Industry 4.0” (in German: “Industrie 4.0”) is a commonly used term. It has become not only a major part of today's industry, but it also drives the business in many industrial domains.
Considering complex industrial plants, the industrial plants usually comprise distinct components, parts, modules or units with a multiplicity of individual functions. Exemplary units include sensors and actuators. The units and their functions have to be controlled and regulated in an interacting manner. They are often monitored, controlled and regulated by automation systems, for example the Simatic system of Siemens AG. The increasing degree of digitalization allows for e.g., manufacturing or industrial installation of products in a production line of the industrial plant to be performed by robot units or other autonomous units in an automatic manner and thus efficiently at least in part up to now.
A reliable and robust operation of the machinery is of importance in such industrial environments since any uncertainty, defect or anomaly often has a negative impact such as a decrease of quality and throughput. Severe anomalies such as sudden failures might even lead to downtimes of the industrial plant and hence increased costs, or even personal harm in the worst case.
Hence, the machine learning models deployed and operated on the industrial environments for e.g., control, production or monitoring processes have to be “industrial-graded”. This means that the deployed machine learning models have to be reliable and robust, even though the conditions around its application scenario may change.
Therefore, it is desirable to predict the failures or anomalies before they happen. Conventional art data-driven approaches in the context of machine learning are capable of learning to separate between normal and anomalous machine behavior based on previously collected sensor data from the machine (historic data). Once the algorithm has learned the intrinsic behavior it can judge whether new data from the machine is normal or anomalous.
Supervised machine learning uses labelled data such as sensor data, wherein the label indicates in which condition the machine is. On the contrary, unsupervised machine learning uses unlabeled data. The latter case is quite common today, since the machines do not fail that often.
The disadvantage of conventional art approaches, however, is that new knowledge or data e.g., new labels from e.g., user feedback or another technical system cannot be incorporated efficiently and reliably into the existing trained machine learning model. According to conventional art, the trained machine learning model can be retrained solely with the new data.
Retraining on the basis of the new data brings the risk of running into the problem of catastrophic inference. This means that the trained machine learning model forgets the previously learned data resulting in an unstable behavior. Alternatively, the trained machine learning model can be trained with all the available data (new and old data). This retraining approach is, however, slow and hence time-consuming. The retraining becomes slower the more data is used as input for training. Moreover, in general the old model is disregarded, and a complete separate new model is trained.
An aspect relates to a computer-implemented method for generating a combined machine learning model in an efficient and reliable manner.
This problem is according to one aspect of the embodiment of the invention solved by a computer-implemented method for generating a combined machine learning model, comprising the steps:
Accordingly, the embodiment of the invention is directed to a computer-implemented method for generating a combined machine learning model.
First, a trained unsupervised machine learning model for anomaly detection is received e.g., via an input interface. The trained unsupervised machine learning model can be equally referred to as “anomaly detector”, abbreviated with AD in the following. In other words, the trained unsupervised machine learning model is configured for binary classification in context of machine learning.
The unsupervised machine learning model is trained on training data before being applied on application data, as usual in context of machine learning. Accordingly, two distinct data sets are used, one for training and one for application.
Thereby, the training data and the application data each comprise data samples or data items, wherein the data samples are not tagged with one or more labels in the case of unsupervised machine learning. In other words, the data samples or data items are unlabeled. Exemplary data samples or data items are images, text files, videos etc.
In a further step, the unsupervised machine learning model is applied on the application data to detect anomalies. The application of the unsupervised machine learning model results in one or more output labels for each data item of the plurality of data items of the unlabeled application data. The output label can be equally referred to as class. The output label is an anomaly or a normal state. An anomaly can be any kind of faulty machine behavior that does not correspond to a normal or a healthy operation. There can be different types of anomalies or faults as well as different types of normal operation e.g., normal mode 1, normal mode 2, . . . etc. Referring to the example of an induction motor, there exist the anomalies such as imbalances and broken bearing. The operation at frequency 45 Hz and the operation at frequency 60 Hz are two types of normal operation modes.
In an embodiment, the new data items correspond to a new unseen operation mode, therefore they are wrongly classified as an anomaly. The user can correct these incorrect output labels and send them back according to the present embodiment of the invention.
In other words, the class represents either an anomaly or a normal state. Alternatively, the anomaly can be any anormal state, anomalous state, inconsistent state, erroneous state.
Alternatively, the output label is a first mode or a second mode, such as operating mode.
In further steps, the detected anomalies or output labels are transmitted via a user interface to a user for verification. The user can be a domain expert. After reception of the output labels, the user performs a verification for each received output label. The user decides whether the output label is maintained or processed.
In case of maintaining the at least one determined output label unprocessed, step e. is skipped. Hence, training at least one additional machine learning model for anomaly detection in accordance with the at least one processed output label or the at least one additional data item is not performed in this case.
In the other case of processing the at least one determined output label, step e. is performed by using the processed at least one determined output label or at least one additional data item provided by the user. Training at least one additional machine learning model for anomaly detection in accordance with the at least one processed output label or the at least one additional data item is performed in this case.
The system is constantly evolving with arbitrary many additional machine learning models. The more feedback the system receives, such as correction of the labels and new data items etc., the more additional machine learning models are incorporated. In an embodiment, if the user changes several data items simultaneously and then sends the corrected data items back to the system, all of these data items are comprised in one additional machine learning model.
In the last steps, the trained unsupervised machine learning model and the at least one trained additional machine learning model are combined using a connection function and provided as output.
The present embodiment of the invention provides a computer-implemented method for generating a combined machine learning model efficiently and reliably.
Contrary, to conventional art, the combined machine learning model combines the original unsupervised machine learning model (“old”) trained on the original training data with the additional machine learning model (“new”) trained on the additional or new training data e.g., additional data item etc. Catastrophic interference or catastrophic forgetting as a result of conventional art retraining solely on the basis of new training data is avoided.
Moreover, the user interference allows to interact with user to consolidate, validate or verify the result in the form of output labels. This way, classes or labels can be incorporated and learned, as well as wrongly or badly labelled data can be corrected by the user.
The combined machine learning model is significantly improved and leads to an improved prediction result in the sense of anomaly detection after deployment e.g., on the industrial plant. For example, the anomalies are detected sooner and more reliably. This leads to less downtime and less injuries.
The combined machine learning model can be deployed. Moreover, the system is thought to be constantly evolving and can change in the future to include several combined machine learning models. The advantage of the present embodiment of the invention is to quickly start with a small and simple system that then grows over time the more the model learns, and the more data is acquired.
In one aspect, the processing comprises at least one processing step, selected from the group comprising: Adapting the determined at least one output label. Accordingly, the at least one output label can be adapted or changed, e.g., corrected by the user. The output label “anomaly” can be changed to the output label “normal state”. The output label “normal state” can be changed to the output label “anomaly”. The adaptation comprises moreover e.g., the deletion of the output label or the extension of the output label.
In a further aspect, the at least one additional machine learning model is an unsupervised or supervised machine learning model. Accordingly, any machine learning model can be used as an additional machine learning model and combined with the original provided unsupervised machine learning model in a flexible and reliable manner.
In a further aspect, the connection function is a logical function or logical operator, an AND and OR logical operator.
In a further aspect, the connection function is a weighted function.
A further aspect of the embodiment of the invention is a computer program product (non-transitory computer readable storage medium having instructions, which when executed by a processor, perform actions) directly loadable into an internal memory of a computer, comprising software code portions for performing the aforementioned method steps when said computer program product is running on a computer.
A further aspect of the embodiment of the invention is a technical system for performing the aforementioned method steps.
Some of the embodiments will be described in detail, with reference to the following figures, wherein like designations denote like members, wherein:
Generating a combined machine learning model 10
The provided unsupervised machine learning model 20, denoted with f(X) and AD for anomaly detector, is trained based on unlabeled training data X before being applied on the application data S1. The training data and the application data each comprise data items. Then, the unsupervised machine learning model 20 is applied on the application data resulting in output labels 22, S2. The unsupervised machine learning model can e.g., use a regression or a classification algorithm (i.e., a score, a probability).
According to an embodiment, the output label can be determined with f(X)<0 for the anomaly and with f(X)>0 for the normal state. Alternatively, the output label can be determined with 0<f(X)<1, thereby f(X)<0.5 corresponds to the anomaly and f(X)>0.5 corresponds to the normal state.
Exemplary machine learning models are Neural Network, Autoencoder, Random Forest, Gradient Boosting and Logistic Regression etc.
The output labels 22 are outputted to a user 30 for interaction S3. In an embodiment, a domain expert 30 is assigned for verification.
The user 30 can decide to maintain the output label 22 or to process the output label 22, S4. In case of maintenance, the trained unsupervised machine learning model 20 remains constant and no additional machine learning model 40 is trained. In the other case, e.g., an additional data item or an additional output label 32 can be added and provided by the user 30. Alternatively, the output label 22 can be corrected etc. Hence, in this other case, the original unsupervised machine learning model 20 gets extended by at least one additional machine learning model 40, denoted with gi(Z) with i∈I={1, . . . , k} where k is the number of additional models.
Thereby, the at least one additional machine learning model 40 is trained on the input data Z received from the user S4, S5. The trained unsupervised machine learning model 20 remains unchanged. The at least one additional machine learning model 40 can be designed as supervised or unsupervised machine learning model e.g., depending on the use case, technical system or the available data.
The trained unsupervised machine learning model 20 and the at least one trained additional machine learning model 40 are combined using a connection function S6, denoted with h(f (X), g(Z)). In case of a plurality of additional machine learning models 40: g(Z)=g(gi(Z), i∈I) is an ensemble of additional machine learning models. The combined machine learning model 10 is provided as output.
This solution applies if either new data Z with new labels becomes available or new labels for the old data X become available.
The new labels can either become available through new information or if the user adds its own labels to the available data. It is assumed that a new dataset Z is available together with new labels L. If multiple models are learned from a stream of incoming labels e.g., labels L1 with |L1|=l1 are available for retrain at a given timepoint and later labels L2 with |L2|=l2 become available, multiple models g1 and g2 are learned.
Given labels Li and corresponding data, a supervised algorithm gi is learned. Therefore, any supervised learning algorithm suitable for classification or regression, e.g., a Neural Network Classifier, a Random Forest Classifier, a Gradient Boosted Tree classifier etc. can be used.
In case an intrinsically interpretable model (e.g., logistic regression, decision tree) is chosen, the additional machine learning model gi can be used to make the result more interpretable. Given the model f and the newly trained model gi it must be decided on how to aggregate their decision given new data Y, especially if gi(Y)≠f(Y). This is achieved by the connection function, that can either be static (see the following section) or a function depending on the number of labels li available (the labels used to train g1(Y)) (see section “Connection function is a weighted function”).
Connection function is a logical function (static)
Add a non-anomaly/normal
A certain situation (that was detected as an anomaly before) is treated as non-anomaly/normal.
All past non-anomalies should remain non-anomalies. The reason is that more cannot be expected, since the anomaly behavior must change.
The unsupervised machine learning model AD/f (X) is e.g., trained with previous non-anomaly.
The additional machine learning model AD2/g(Z) is trained only with to-be-added non-anomaly.
The connection function h(f (X), g(Z))=AD1 and AD2 is used, wherein the anomaly is there, if both detectors detect the anomaly and hence the constraint is fulfilled.
A certain situation (that was detected as a non-anomaly/normal before) is treated as anomaly.
All past anomalies should remain anomalies. The reason is that more cannot be expected, since the non-anomaly/normal behavior must change.
The unsupervised machine learning model AD/f (X) is trained with previous non-anomaly.
The additional machine learning model AD2/g(Z) is trained only with the to-be-added anomaly (note: inverted behavior).
The connection function h(f (X), g(Z))=AD1 or not AD2 is used.
If not only the binary anomaly information is requested, but also an anomaly score, the score can be updated as follows:
If both models give an anomaly the combined score is the minimum of both, i.e., min (SCORE1, SCORE2). If only one of the models indicates an anomaly the score of this model is the combined score. If both models indicate no-anomaly also the maximum of both is taken.
It is assumed that positive values indicate the degree of anomaly with zero, or negative values, being the case of “no-anomaly”. If this is inverted and negative values indicate the degree of anomaly, a simple multiplicative inversion is applied to the score before and after applications of the above rules.
Connection function is a weighted function (evolving depending on the new labels) The connection function can be a function depending on the number of labels li available (the labels used to train g1(Y)).
In the simplest case only one additional model (i.e., l1=l) is considered:
where q(f (Y)) maps the outcomes of the original model f (Y) into the same outcome space as the new model g1(Y). Here an example, if f (Y) outputs a score in , and g1 outputs a probability,
being a sigmoid mapping to (0,1) would be appropriate, but this depends on the concrete algorithms f(Y) and g1(Y).
Moreover, the following equation is considered:
Where α(l) and β(l) are the corresponding weights in the connection function.
A practical example would be α(l)=exp(−ζl) and β(l)=1−exp(−ζl) for ζ>0. For the special case ζ=0.05, these weights as a function of the labels l are shown in
In the case of more than one classifier, (i.e., k>1), h(Y) is defined as a weighted average
Given these definitions, the original model f(X) gets overruled as a function of the available labels. If f(X) is an unsupervised algorithm, it turns gradually into a supervised one, which can be seen in
Although the present invention has been disclosed in the form of embodiments and variations thereon, it will be understood that numerous additional modifications and variations could be made thereto without departing from the scope of the invention.
For the sake of clarity, it is to be understood that the use of “a” or “an” throughout this application does not exclude a plurality, and “comprising” does not exclude other steps or elements.
| Number | Date | Country | Kind |
|---|---|---|---|
| 21187698.2 | Jul 2021 | EP | regional |
This application claims priority to PCT Application No. PCT/EP2022/068836, having a filing date of Jul. 7, 2022, which claims priority to EP Application No. 21187698.2, having a filing date of Jul. 26, 2021, the entire contents both of which are hereby incorporated by reference.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/EP2022/068836 | 7/7/2022 | WO |