The present invention relates to the technical field of fault diagnosis, and in particular to a deep learning fault diagnosis method incorporating a priori knowledge.
The development of a sensor technology has enabled enterprises to collect fault data conveniently, economically, and quickly. Due to the superiority of a deep learning technology in extracting predictable features from the fault data, its application in the fault diagnosis has achieved remarkable results.
A fault diagnosis method based on deep learning mainly has the following steps:
The existing specific deep learning fault diagnosis methods are usually the embodiment of the above six steps. Among the above steps, the main ones affecting the performance of the deep learning fault diagnosis method are step 1), step 2), and step 3). The quality of the data obtained in step 1) directly affects the performance of a final model; suitable preprocessing methods in step 2) can further improve the quality of the data; and the model architecture designed in step 3) needs to be applicable to a specific problem, and if the design is not proper, an over-fitting or under-fitting problem may occur.
Currently, the fault diagnosis methods based on the deep learning technology have two problems as follows:
The present invention provides a deep learning fault diagnosis method incorporating a priori knowledge, to overcome one or more defects of the prior art.
A deep learning fault diagnosis method incorporating a priori knowledge according to the present invention includes the following steps:
In the present invention, the a priori knowledge can be better incorporated into the constructed model by constructing the attention matrix A, thereby enabling the method in the present invention to extract predictable features from the data in a targeted manner, that is, to better obtain more effective information from a small amount of data, so that the method can better improve the accuracy of fault diagnosis and can be better applied to the application scenarios with a small amount of data. In addition, since the incorporation of a priori knowledge can better improve the interpretability of the whole model, the reliability of the fault diagnosis can be better improved.
Preferably, in step S1, X={Xi|i=1, . . . , M}, Xi∈RT×N, Xi is a multi-attribute time series of an ith class of fault, Xi is used to record values of different attributes at different times, M is the total number of fault classes, T is a length of the time series, and N is the total number of attributes. In the present invention, a corresponding data set Xi can be created for each different fault class, and the data set can be obtained from data in actual production, thus better facilitating the subsequent model construction, training, and testing.
Preferably, in step S1,
{tilde over (X)}ij represents a jth sample set of the ith class of fault, D is a width of a sliding window, λ is a step size of the sliding window, and └*┘ is a rounding up operation. Since the 2D-CNN is used as a basic deep learning architecture in the present invention, when the fault diagnosis data set X in the present invention is multi-attribute time series data, the fault diagnosis data set cannot be directly used as an input to the 2D-CNN, and the fault diagnosis data set X can be better converted into picture-like data through sliding window processing on it, so that the input to the 2D-CNN can be better implemented.
Preferably, an attention matrix corresponding to each sample set {tilde over (X)}ij is Aij, Aij∈RD×N, and the attention matrix Aij is a 0-1 matrix and is used to characterize a priori knowledge. Thus, the a priori knowledge can be better characterized.
Preferably, the attention matrix Aij for each sample set {tilde over (X)}ij is obtained through threshold characterization, that is, for the jth sample set {tilde over (X)}ij of the ith class of fault, a value of each element in the sample set {tilde over (X)}ij one by one is denoted as 1 when it reaches a threshold enough to characterize that the ith fault has occurred, otherwise it is denoted as 0, and then the corresponding attention matrix Aij can be obtained. Thus, the a priori knowledge can be better characterized.
Preferably, in step S2, all the sample sets {tilde over (X)}ij are processed one by one based on the 2D-CNN model in a case of keeping the number of rows and the number of columns unchanged after processing, an output of the sample set {tilde over (X)}ij is a feature map Fij, Fij=[Fij1, . . . , FijK], and K is the number of channels of the 2D-CNN model. Thus, the final weight matrix can be better obtained.
Preferably, in step S2, for the sample set {tilde over (X)}ij,
Therefore, the corresponding outputs of the average pooling and the maximum pooling can be better obtained.
Preferably, in step S2, for the feature map Fij, a calculation formula of its weight matrix Wij is
Through the above, the model constructed in the present invention can better output the feature map incorporating the a priori knowledge, so as to better incorporate the a priori knowledge.
In order to further understand the content of the present invention, the present invention will be described in detail with reference to the accompanying drawings and the embodiments. It should be understood that the embodiments are only for explaining but not for limiting the present invention.
As shown in
In this embodiment, the a priori knowledge can be better incorporated into the constructed model by constructing the attention matrix A, thereby enabling the method in this embodiment to extract predictable features from the data in a targeted manner, that is, to better obtain more effective information from a small amount of data, so that the method can better improve the accuracy of fault diagnosis and can be better applied to the application scenarios with a small amount of data. In addition, since the incorporation of a priori knowledge can better improve the interpretability of the whole model, the reliability of the fault diagnosis can be better improved.
In addition, in this embodiment, the constructed weight matrix W is composed of the attention matrix A, the output P1 of the average pooling, and the output P2 of the maximum pooling together, i.e., it can better consider both the a priori knowledge (represented by the attention matrix A) and the features extracted after pooling, so that the interpretability of the data output results is higher.
In step S1 of this embodiment, X={Xi|i=1, . . . , M}, Xi∈RT×N, Xi is a multi-attribute time series of an ith class of fault, Xi is used to record values of different attributes at different times, M is the total number of fault classes, T is a length of the time series, and N is the total number of attributes. In this embodiment, a corresponding data set Xi can be created for each different fault class, and the data set can be obtained from data in actual production, thus better facilitating the subsequent model construction, training, and testing.
In step S1 of this embodiment,
{tilde over (X)}ij represents a jth sample set of the ith class of fault, D is a width of a sliding window, λ is a step size of the sliding window, and └*┘ is a rounding up operation. Since the 2D-CNN is used as a basic deep learning architecture in this embodiment, when the fault diagnosis data set X in this embodiment is multi-attribute time series data, the fault diagnosis data set cannot be directly used as an input to the 2D-CNN, and the fault diagnosis data set X can be better converted into picture-like data through sliding window processing on it, so that the input to the 2D-CNN can be better implemented.
In the patent with Chinese patent No. “201811472378.6”, a method for performing sliding window processing on multi-attribute time series data has been disclosed. Therefore, it is not repeated in this embodiment.
In this embodiment, an attention matrix corresponding to each sample set {tilde over (X)}ij is Aij, Aij∈RD×N, and the attention matrix Aij is a 0-1 matrix and is used to characterize a priori knowledge. Thus, the a priori knowledge can be better characterized.
In this embodiment, the attention matrix Aij for each sample set {tilde over (X)}ij is obtained through threshold characterization, that is, for the jth sample set {tilde over (X)}ij of the ith class of fault, a value of each element in the sample set {tilde over (X)}ij one by one is denoted as 1 when it reaches a threshold enough to characterize that the ith fault has occurred, otherwise it is denoted as 0, and then the corresponding attention matrix Aij can be obtained. Thus, the a priori knowledge can be better characterized.
The a priori knowledge that defines a fault diagnosis in this embodiment is a characteristic that reflects a mechanism of fault occurrence. For a certain system, it can be determined that a certain class of failure occurs in the system when the value of an observed variable reaches (reaches or falls below) a certain threshold according to theory or experience. This threshold in the determination process can be used to characterize the a priori knowledge.
Certainly, existing a priori knowledge such as time domain characteristics, frequency domain characteristics, and stationarity of time series can be used for constructing the a priori knowledge in this embodiment.
In this embodiment, any attention matrix Aij only includes 0 and 1, that is, the attention matrix is a 0-1 matrix; Through this configuration, a data area that the 2D-CNN needs to pay attention to can be better characterized, so that the a priori knowledge can be better incorporated.
In step S2 of this embodiment, all the sample sets {tilde over (X)}ij are processed one by one based on the 2D-CNN model in a case of keeping the number of rows and the number of columns unchanged after processing, an output of the sample set {tilde over (X)}ij is a feature map Fij, Fij=[Fij1, . . . , FijK], and K is the number of channels of the 2D-CNN model. Thus, the final weight matrix can be better obtained.
In this embodiment, the 2D-CNN model can be defined as having a convolution kernel size of 3*3 and a step size of 1.
In step S2 of this embodiment, for the sample set {tilde over (X)}ij,
Therefore, the corresponding outputs of the average pooling and the maximum pooling can be better obtained.
In step S2 of this embodiment, for the feature map Fij, a calculation formula of its weight matrix Wij is
Through the above, the model constructed in this embodiment can better output the feature map incorporating the a priori knowledge, so as to better incorporate the a priori knowledge.
In addition, in this embodiment, after being output, the feature map F based on the attention mechanism can be processed through a fully connected layer to obtain an output result, and the model can be trained and tested based on a conventional method.
In this embodiment, through the incorporation of a priori knowledge, the deep learning technology can extract features from the data in a targeted manner, thereby making it applicable to fault diagnosis with a small amount of data; and in addition, the interpretability of deep learning is improved, thereby making it applicable to fault diagnosis with high reliability requirements.
In conclusion, this embodiment solves the problems of a small amount of fault diagnosis data and high reliability requirements by the deep learning technology incorporating the a priori knowledge, thereby further improving the practical application prospect of the deep learning technology in fault diagnosis.
In order to verify the method proposed in this embodiment, this embodiment uses a Tennessee chemical process data set to verify the method in this embodiment.
The Tennessee chemical process is a simulation process based on an operating process of a real chemical company. In the Tennessee chemical process, 41 measured variables and 11 manipulated variables are included. In this embodiment, these 52 variables are all used to construct a fault diagnosis model. The Tennessee chemical process includes 21 classes of fault states and one normal state. For each state, there is training set data and test set data acquired from 52 variables at a frequency of one sample every 3 minutes. Each training set includes 500 samples (the first 20 are normal samples), and each test set includes 960 samples (the first 160 are normal samples).
In the verification of this embodiment, the method disclosed in the Chinese patent with the patent No. “201811472378.6” is used to complete the data processing in step S1, and the width of the sliding window is set to 10 and the step size of the sliding window is set to 1; and therefore, a corresponding data set can better be converted into a picture-like data set.
In step S1 of this embodiment, a Pearson correlation coefficient can be used to obtain the attention matrix Aij. Details are as follows:
Step S1 can preferably be implemented after the above is completed. After completing step S1, the 2D-CNN model can be constructed.
The 2D-CNN model constructed in this embodiment sequentially includes an input layer, a convolutional layer 1 (Conv-1 #), a pooling layer 1 (MaxPool-1), a convolutional layer 2 (Conv-2 #), a pooling layer 2 (MaxPool-2), an attention mechanism layer 1 (Atten-1), a convolution layer 3 (Conv-3 #), a pooling layer 3 (MaxPool-3), an attention mechanism layer 2 (Atten-2), a convolutional layer 4 (Conv-4 #), a pooling layer 4 (MaxPool-4), a fully connected layer (FC-1*), and an output layer (Softmax). Among them, the convolutional layer 1 (Conv-1 #) to the pooling layer 4 (MaxPool-4) constitute a feature extractor, and the fully connected layer (FC-1*) and the output layer (Softmax) constitute a classifier.
In this embodiment, output results of all the convolutional layers are processed by means of batch normalization (BN), which can better speed up the network training.
In this embodiment, the Dropout method is used to process the fully connected layer (FC-1*), so that overfitting can be better avoided. A dropout probability of neurons can be set to p=0.75.
Table 1 shows parameter settings of the 2D-CNN model constructed in this embodiment.
Herein, hyperparameters of the 2D-CNN model D=10 and r=0.07, and D is the width of the sliding window processing. Therefore, the 2D-CNN model in this embodiment can be expressed as M(10,0.07).
It can be learned from
In this embodiment, the input to the attention mechanism layer 1 (Atten-1) is the feature map output by the pooling layer 2 (MaxPool-2) and the attention matrix Aij corresponding to the sample set {tilde over (X)}ij, and in the attention mechanism layer 1 (Atten-1), a matrix addition operation on the output of the average pooling and the output of the maximum pooling of the feature map Fij and the attention matrix Aij can be performed and an output is made; and in the convolutional layer 3 (Conv-3 #) and the pooling layer 3 (MaxPool-3), the output of the attention mechanism layer 1 (Atten-1) can be processed to output the corresponding feature map.
In this embodiment, the input to the attention mechanism layer 2 (Atten-2) is the feature map output by the pooling layer 3 (MaxPool-3) and the attention matrix Auf corresponding to the sample set {tilde over (X)}ij, and in the attention mechanism layer 2 (Atten-2), a matrix addition operation on the output of the average pooling and the output of the maximum pooling of the corresponding input feature map and the attention matrix Aij can be performed and an output is made; and then, the feature maps incorporating a priori knowledge are output after the processing of the convolutional layer 4 (Conv-4 #) and the pooling layer 4 (MaxPool-4).
In the example, after the processing of the fully connected layer (FC-1*) and the output layer (Softmax), the probability that the input data belongs to each different state may be better output, thereby performing data diagnosis.
In this embodiment, after the 2D-CNN model is constructed, it is trained based on the Adam algorithm, and a learning rate can be set to 0.0001, and the number of batch samples is 100; and in addition, a mean square error (MSE) loss can be used as an optimized objective function.
The change of the mean square error loss (MSE loss) in training and testing processes with the training times is shown in
The result of using the 2D-CNN model in this embodiment to operate on the test set is shown in
In addition, this embodiment provides the comparison results in terms of the fault diagnosis rate (FDR) and false positive rate (FPR) between the algorithm in this embodiment and the existing DL algorithm, EDBN-2 algorithm, MPLS algorithm, and PCA algorithm.
It can be learned from Table 2 that the algorithm in this embodiment not only achieves significant FDR and FPR performance improvement on general classes, but also can accurately classify the classes that are difficult to identify in previous studies (such as fault 3, fault 9, and fault 15). This result shows that the model proposed in this embodiment is significantly better than other fault diagnosis models.
The present invention and implementations thereof have been described above schematically, and the description is not restrictive, and what is shown in the accompanying drawings is only one of the implementations of the present invention, and an actual structure is not limited thereto. Therefore, if those of ordinary skill in the art are inspired by it and design a structural mode and embodiment similar to the technical solution without creativity without departing from the purpose of creation of the present invention, all structural mode and embodiment shall fall within the scope of protection of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
202110266124.4 | Mar 2021 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/126687 | 10/27/2021 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2022/188425 | 9/15/2022 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20210102197 | Sabeti | Apr 2021 | A1 |
20210334656 | Sjögren | Oct 2021 | A1 |
20220405480 | Huang | Dec 2022 | A1 |
Number | Date | Country |
---|---|---|
109814523 | May 2019 | CN |
Entry |
---|
Wikipedia CNN page retrieved from https://en.wikipedia.org/wiki/Convolutional_neural_network (Year: 2024). |
Number | Date | Country | |
---|---|---|---|
20240184678 A1 | Jun 2024 | US |