The invention belongs to the field of health management and prediction technology for acro-engines, and relates to a deep learning modeling method for a multi-scale hybrid attention mechanism for aero-engine remaining useful life prediction.
As an important component of aircraft, the safety and reliability of acro-engines are crucial. However, due to the fact that most parts work in harsh working environments such as high temperature, high pressure, and high-speed rotation for a long time, the probability of acro-engine failure is high. With the increase of usage time, cach component gradually ages, and the failure rate increases, seriously affecting the safe operation of the aircraft. The traditional aero-engine maintenance methods are mainly divided into planned maintenance and post maintenance, which often lead to two situations: “over repair” and “out of repair”. This not only causes serious resource waste, but also fails to eliminate potential safety hazards of aviation engines. The effective method to solve this problem is mainly to propose a data-driven machine learning or deep learning model based on historical sensor data of acro-engines, in order to predict the remaining useful life of aero-engines. Provide decision-making support for ground systems, assist ground maintenance personnel in engine maintenance work, ensure aircraft safety performance, and avoid waste of manpower and material resources caused by “excessive maintenance”.
At present, there are several methods for aero-engine remaining useful life prediction:
In this method, samples are constructed by sliding time windows on the historical sensor data of aero-engines, and then features are extracted by convolutional neural network. Finally, the remaining useful life is predicted through the full connection layer. Convolutional neural network is a kind of feedforward neural network through convolution calculation, which is inspired by the mechanism of receptive field in biology. It has translation invariance, uses convolution kernel, makes maximum use of local information, and retains plane structure information. However, in all time steps of historical sensor data, the receptive field is limited by the size of convolution kernel, so it is unable to mine the correlation between two groups of data that are far away in time dimension, and its prediction ability is relatively limited.
This method also uses the sliding time window to construct samples on the aero-engine historical sensor data, then extracts features through the long short-term memory neural network, and finally introduces the full connection layer to predict the remaining useful life. Long short-term memory neural networks design the flow and loss of historical data characteristics by introducing a gating mechanism, which solves the long-term dependence problem of traditional recurrent neural network. Although the long short-term memory neural network can make full use of the timing information, the information of each time step is serially connected, the parallelism is poor, and the training and prediction take a long time. At the same time, because the weight of each time step is not considered, there is more redundant information, which ultimately affects the prediction ability.
Based on the above discussion, the multi-scale hybrid attention mechanism deep learning model designed by the invention is capable of accurately predicting the remaining useful life of acro-engines with coupled time series data. This patent is supported by the China Postdoctoral Science Foundation (2022TQ0179) and the National Key Research and Development Program (2022YFF0610900).
The invention provides a multi-scale hybrid attention mechanism model to solve the limitations of convolutional neural network and long short-term memory neural network in the prediction of the remaining useful life of aero-engines, and achieves better prediction accuracy. Due to the fact that aero-engines are highly complex and precise acrodynamic thermodynamic mechanical systems, the time series data generated by their sensors have strong temporal correlation, coupling, and multimodal characteristics. Therefore, predicting the remaining useful life of aero-engines in a constantly changing full envelope environment has always been a challenging challenge.
In order to achieve the above objectives, the technical solution adopted by the invention is:
A multi-scale hybrid attention mechanism modeling method for aero-engine remaining useful life prediction (the method flowchart is shown in
The specific steps are as follows:
wherein, x is the original time series data generated by various sensors of the aero-engine, μ is the mean of the original time series data, δ is the variance of the original time series data, z is standardized time series data.
For the last piece of data in the sample X∈n*k constructed in step 1.3, the last piece of data here refers to the nth piece of data, selecting the smaller of the difference between the total number Cycletotal of flight cycles and the current number Cyclecur of flight cycles and the threshold value RULTH of the remaining useful life, and calculating the remaining useful life RULlabel:
RULlabel is taken as the true value of the remaining useful life of the sample and will be used in training in step 4.
The network structure of the multi-scale hybrid attention mechanism model (the network structure diagram is shown in
Firstly, mapping the sample X∈n*k to a higher dimensional space Y∈
n*d through a linear layer so that the data dimension d can be evenly divided by the number H of subsequent attention heads:
wherein WY∈k*d is a trainable projection matrix;
Then, adding sine-cosine position encoding to obtain U∈n*d as the input of step 3.2, wherein the values of positions in the position encoding matrix P∈
n*d are as follows:
wherein Pi,2j is the value of the 2jth column in the ith row of the encoding matrix P; Pi,2j+1 is the value of the 2j+1th column in the ith row of the encoding matrix P; i∈[0,n−1] indicates the number of rows; and
indicates the number of columns;
The feature extraction layer comprises two parts: a multi-head hybrid attention mechanism and a multi-scale convolutional neural network, and residual connection and layer normalization are added simultaneously at the end positions of the two parts to suppress overfitting; the multi-head hybrid attention mechanism is formed by mixing a multi-head self attention mechanism and a multi-head external attention mechanism.
Firstly, mapping the result U∈n*d obtained in step 3.1 as input to 3 subspaces: query Q, key K and value V through the linear layer:
wherein WQ, WK, WV∈d*d is the trainable projection matrix. Then split them into H attention heads:
wherein
are the query, key and value of the ith attention head;
Then, conducting dot product operation of the query Qi and the key Ki in each attention head, performing zooming by dividing by the square root of the data dimension d, carrying out index normalization by column and then multiplying by the value Vi to obtain the result of a single attention head;
Finally, splicing the results of the attention heads to obtain a final result MultiHeadSelfAttention, so as to realize feature extraction of the correlation among data by the multi-head self attention mechanism at different time steps;
wherein headi=SelfAttention(Qi, Ki, Vi), and WO∈d*d is a trainable projection matrix;
Firstly, mapping the result U∈n*d obtained in step 3.1 as input to the query subspace Q through the linear layer:
wherein WQ∈d*d is the trainable projection matrix. Then split it into H attention heads:
wherein
is the query of the ith attention head;
Then, conducting dot product operation of the query and the external key memory unit in each attention head, carrying out standardization, and then multiplying by the external key memory unit to obtain the result of a single attention head:
wherein the standardization is double normalization, that is to carry out index normalization by column and then carry out normalization by column:
wherein {tilde over (α)}i,j is the value in row i and column j of the original data, and αi,j is the value in row i and column j of the normalized data.
Finally, splicing the results of the attention heads to obtain a final result MultiHeadExternalAttention, so as to realize feature extraction of the correlation among data by the multi-head external attention mechanism at different time steps;
wherein headi=ExternalAttention(Qi), and WO∈d*d is a trainable projection matrix;
{circle around (3)} The multi-head hybrid attention mechanism formed by mixing a multi-head self attention mechanism and a multi-head external attention mechanism. Unlike traditional single attention mechanisms, the multi-head hybrid attention mechanism combines two different attention mechanisms, retaining the excellent temporal correlation feature extraction ability of the self attention mechanism for single sample data. Additionally, due to the introduction of shared external key memory units and external value memory units on the entire dataset, the correlation between different samples is considered, the ability of attention mechanism to summarize temporal data is improved.
Firstly, setting a trainable parameter α∈1*2, wherein α=[α1, α2] and the initial value is 1, then carrying out index normalization, and finally, using the parameter to conduct weighted summation of the feature MultiHeadSelfAttention extracted by the multi-head self attention mechanism and the feature MultiHeadExternalAttention extracted by the multi-head external attention mechanism to obtain a final result HybridAttention:
{circle around (4)} The multi-scale convolutional neural network does not contain a pooling layer or a fully connected layer, only uses multiple convolutional kernels of different sizes for feature extraction of time series data, and integrates the results to enhance the local feature extraction capability of the data;
Taking the feature HybridAttention extracted by the multi-head hybrid attention mechanism as the input, firstly, using three convolutional kernels of different sizes (1*1, 1*3 and 1*5) to extract features respectively, and then, setting a learnable parameter β∈1*3, wherein the initial value is 1, and the parameter β will be subjected to gradient update during training in step 4; and carrying out index normalization of the parameter β, and finally using the parameter to conduct weighted summation of the features extracted by three convolutional kernels to obtain a final result MultiScaleConv:
wherein Convk
Firstly, expanding the result MultiScaleConv∈n*d obtained in step 3.2 as F∈
1*(n*d), and then, calculating the result through a two-layer fully connected neural network to obtain the predicted value RUL of the remaining useful life of an aero-engine:
wherein W1∈(n*d)*d
1*d
d
1*1 is the bias of the second layer of fully connected neural network, the projection matrixes and the biases are both trainable, and Relu is an activation function;
Gradually reducing the difference between the predicted value RUL and the true value of the remaining useful life output by the model by minimizing the loss function until the stopping standard is reached, wherein the true value is an RUL label RULlabel set in step 2; and the loss function is a mean square error (MSE) loss function;
wherein n is the number of samples, RULi is the true value of the remaining useful life of the ith sample, and is the predicted value of the remaining useful life of the ith sample;
Firstly, inputting the samples obtained in step 1.3 into the multi-scale hybrid attention mechanism model constructed in step 1.3 in batches to obtain the predicted value RUL, then calculating the MSE loss value, and conducting gradient update of the model using an adaptive moment estimation optimizer to complete an iterative training session; and setting the total number of model training iterations, and iteratively training the model several times;
At the on-line testing stage, calculating the output value by preprocessing the data in step 1 and inputting the data into the multi-scale hybrid attention mechanism model trained in step 4 according to the real-time data collected by the aero-engine sensor, wherein the output value is the predicted value of the remaining useful life of the aero-engine.
The multi-scale hybrid attention mechanism model adopted by the invention fully considers the natural relationship of mutual coupling and influence between aero-engine data. Firstly, the self attention mechanism first obtains attention weights by calculating the correlation between query vectors and key vectors, and then uses this attention weight and value vector weighting to obtain feature maps, achieving full fusion of information from different time steps of a single sample. Secondly, the external attention mechanism introduces external key and value memory units, which are shared across the entire dataset, allowing for the correlation between all samples. Simultaneously introducing a multi-head mechanism not only achieves information feature extraction for different subspaces of data, but also increases the parallelism of the algorithm. Finally, multi-scale convolutional neural network enhances the ability to extract local features of data due to the use of convolutional kernels with different sizes. So this model can more accurately predict the remaining useful life of acro-engines.
The following will further explain the specific implementation method of the invention in conjunction with the accompanying drawings and technical solutions.
The invention uses a subset of FD001 from the C-MAPSS dataset for the degradation simulation of turbofan engines, which is divided into a training set and a testing set. The training set contains all data information from the initial state of the engine to the time of complete failure, while the testing set only contains data from the first part of the engine's life cycle. This data set contains 26 columns of data, the first column is the engine unit number, the second column is the number of engine cylce number, and the third to fifth columns are the engine operating conditions, which are flight altitude, Mach number and throttle lever angle respectively. The remaining 21 columns of data are monitoring data from various sensors in the engine, as follows:
Step 1: For the FD001 training set and test set, first analyze the correlation between the original data of the aero-engine sensor and the remaining useful life. Since the values of sensors 1, 5, 6, 10, 16, 18, and 19 are constant and do not change with the increase of the number of flight cycle number, so select the remaining 14 sensor data, then conduct Z-Score standardization for each column of sensor data, and finally construct samples through the sliding time window, The sliding window size is 30, with a step size of 1, and the final constructed sample form is X∈30*14.
Step 2: For the last data (i.e. the 30th data) in the sample X∈30*14 constructed in step 1, calculate the remaining useful life of the sample as the remaining useful life of the sample by comparing the difference between the total flight cycle number Cycletotal and the current flight cycle number Cyclecur with the remaining useful life threshold RULTH, whichever is smaller. wherein RULTH is 125.
Step 3: For the FD001 training set, first map the constructed sample X to a higher dimensional space Y through a linear layer, and then add sine cosine position encoding to obtain U. Then, use the multi-head self attention mechanism and the multi-head external attention mechanism to extract features related to data at different time steps, and then weight and sum the features extracted by these two attention mechanisms to form a hybrid attention mechanism, next, multi-scale convolutional neural network is used to further extract features, and finally, the features are expanded, and the prediction value of acro-engine remaining useful life (RUL) is obtained through the calculation results of two-layer fully connected neural network, completing the construction of multi-scale hybrid attention mechanism model. Wherein Y∈30*128, the number of attention heads is 8, the projection matrix of the first fully connected neural network is W1∈
(30*128)*64, the bias of the first fully connected neural network is b1∈
1*64, the projection matrix of the second fully connected neural network is W2∈
64*1, and the bias of the second fully connected neural network is b2∈
1*1.
Step 4: For the FD001 training set, first input the batch of samples constructed in Step 1 into the multi-scale hybrid attention mechanism model constructed in Step 3, calculate the predicted remaining useful life (RUL) of the aero-engine, and then calculate the MSE loss value based on the RUL predicted value and the RUL label set in Step 2. Then, use the Adaptive Moment Estimation (Adma) optimizer to perform gradient updates on the model and complete an iterative training. Finally, the model was trained repeatedly, with batch size of 128, Learning rate of 0.0003, and total iterations of 50.
Step 5: For the FD001 test set, input the samples constructed in Step 1 into the multi-scale hybrid attention mechanism model trained in Step 4, and calculate the predicted remaining useful life (RUL) of the aircraft engine.
The FD001 subset in the C-MAPSS dataset for turbofan engine degradation simulation is taken as the research object for example analysis. This data set simulates the degradation process of five main turbofan engine components, namely, low-pressure turbine (LPT), high-pressure turbine (HPT), low-pressure compressor (LPC), high-pressure compressor (HPC) and fan (Fan), to obtain the performance degradation data of each flight cycle number of the engine under different working conditions. All data is generated through the thermodynamic simulation model of the turbofan engine, and the specific sensor parameters of the turbofan engine are shown in Table 1. The dataset is divided into a training set and a testing set. The training set is used to train the model, and the testing set is used to verify the prediction accuracy of the model. The evaluation indicators for the prediction of aero-engine remaining useful life (RUL) are root mean square error (RMSE) and Score:
Wherein n is the number of samples, i is the sample number, and hi is the difference between the RUL predicted value and the actual value. The RMSE indicator has the same degree of punishment for RUL predicted values greater than or less than the true value, while the Score indicator has a higher degree of punishment for RUL predicted values greater than the true value, which is more in line with the actual situation. Overestimating RUL often leads to more serious consequences. The smaller the RMSE and Score values of the prediction results, the higher the prediction accuracy.
Accurate prediction of remaining useful life can predict the failure time of aero-engines in advance, providing decision-making support for ground systems, assisting ground maintenance personnel in engine maintenance work, ensuring aircraft safety performance, and avoiding the waste of manpower and material resources caused by traditional planned maintenance.
The comparison between the evaluation indicators of the prediction results of the multi-scale hybrid attention mechanism model of the invention on the FD001 dataset and other methods is as follows:
Therefore, such results conform to the essential characteristics of multi-scale hybrid attention mechanism models. It also proves that the multi-scale hybrid attention mechanism model has more accurate prediction ability for the remaining useful life of aero-engines.
Although the example of the invention has been shown and described above, it can be understood that the above example is only used to illustrate the technical solution of the present invention and cannot be understood as a limitation of the present invention. Ordinary technical personnel in the art can modify and replace the above embodiments within the scope of the present invention without departing from the principles and purposes of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
202211299946.3 | Oct 2022 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/128100 | 10/28/2022 | WO |