This patent application claims the benefit and priority of Chinese Patent Application No. 202111261992.X, filed with the China National Intellectual Property Administration on Oct. 28, 2021, the disclosure of which is incorporated by reference herein in its entirety as part of the present application.
The present disclosure relates to the field of remaining useful life (RUL) prediction for aero-engines, and in particular to a method for predicting a RUL of an aero-engine based on an automatic differential learning deep neural network (ADLDNN).
Aero-engine, a highly complex and precise thermal machine, is the engine that provides the aircraft with the necessary power for flights. It is more susceptible to faults for the complex internal structure and harsh operating environment. Hence, accurate prediction on a RUL of the aero-engine is of great significance to operation and maintenance of the aero-engine.
With the development of sciences and technologies, the long short-term memory (LSTM) and convolutional neural network (CNN) have been widely applied to predict a RUL of a rotary machine. However, existing neural networks all process data in a uniform mode, cannot mine different levels of feature information in various feature extraction modes, and have a poor prediction accuracy.
An objective of the present disclosure is to provide a method for predicting a RUL of an aero-engine based on an ADLDNN, which can be used to predict the RUL of the aero-engine.
The objective of the present disclosure is implemented with the following technical solutions. A method for predicting a RUL of an aero-engine based on an ADLDNN includes the following specific steps:
1) data acquisition: acquiring multidimensional degradation parameters of an aero-engine to be predicted, analyzing a stable trend, and selecting a plurality of parameters capable of reflecting degradation performance of the aero-engine to obtain acquired data;
2) data preprocessing: segmenting the acquired data by a sliding window (SW) to obtain preprocessed data;
3) model construction: constructing a RUL prediction model of the aero-engine based on an ADLDNN, the RUL prediction model including a multibranch convolutional neural network (MBCNN) model, a multicellular bidirectional long short-term memory (MCBLSTM) model, a fully connected (FC) layer FC1, and a regression layer;
4) feature extraction: taking the preprocessed data as input data of the MBCNN model, extracting an output of the MBCNN model, taking the output of the MBCNN model and recursive data as input data of the MCBLSTM model, and extracting an output of the MCBLSTM model; and
5) RUL prediction: taking the output of the MCBLSTM model as an input of the FC layer FC1 to obtain an output of the FC layer FC1, and inputting the output of the FC layer FC1 to the regression layer to predict a RUL.
Further, the MBCNN model includes a level division unit, and a spatial feature alienation-extraction unit; and
the MCBLSTM model includes a bidirectional trend-level division unit, and multicellular update units.
Further, the extracting an output of the MBCNN model in step 4) specifically includes:
4-1-1) level division: taking the preprocessed data in step 2) as the input data, inputting input data xt at time t to the level division unit of the MBCNN model for level division, the level division unit including an FC layer FC2 composed of five neurons, and performing softmax normalization on an output Dt of the FC layer FC2 to obtain a level division result D1t:
D
t=tanh(wxd
D
1t=soft max(Dt)=[d11td12td13td14td15t] (2)
where in equations (1) and (2), wxd
4-1-2) feature extraction: inputting, according to a level division result D1 of the input data, the input data to different convolution paths of the spatial feature alienation-extraction unit for convolution, and performing automatic differential processing on an input measured value according to the level division result and five designed convolution paths to obtain a health feature ht1:
h
ti
1
=P
15(C15(P14(C14(P13(C13(P12(C12(P11(C11(xt))))))))))
h
tj
1
=P
24(C24(P23(C23(P22(C22(P21(C21(xt))))))))
h
tk
1
=P
33(C33(P32(C32(P31(C31(xt))))))
h
tl
1
=P
42(C42(P41(C41(xt))))
h
tm
1
=P
51(C51(xt))
h
t
1
=D
1t
[h
ti
1
h
tj
1
h
tk
1
h
tl
1
h
tm
1]T (3)
where in equation (3), Pij and Cij respectively represent a jth convolution operation and a jth pooling operation for an ith convolution path, hti1 is a convolution output of data of the important level, htj1 is a convolution output of data of the relatively important level, htk1 is a convolution output of data of the general level, htl1 is a convolution output of data of the relatively minor level, and htm1 is a convolution output of data of the minor level.
Further, the extracting an output of the MCBLSTM model in step 4) specifically includes:
4-2-1) trend division: taking an output ht1 of the MBCNN model at time t and recursive data ht-12 of the MCBLSTM model at time t−1 as input data of the MCBLSTM at time t, and inputting the input data to the bidirectional trend-level division unit for trend division, the bidirectional trend-level division unit including an FC layer FC3 and an FC layer FC4 for dividing a trend level of the input data along forward and backward directions, the FC layer FC3 and the FC layer FC4 each including five neurons, and the FC layer FC3 and the FC layer FC4 respectively having an output {right arrow over ({tilde over (D)})}2t and output 2t:
where in equation (4),
each are a weight of the FC layer FC3,
and
each are a weight of the FC layer FC4, {right arrow over (b)}d
respectively performing a softmax operation on the {right arrow over ({tilde over (D)})}2t and the 2t to obtain forward and backward trend levels {right arrow over (D)}2t and 2t:
{right arrow over (D)}
2t=soft max({right arrow over ({tilde over (D)})}2t)=[{right arrow over (d)}21t{right arrow over (d)}22t{right arrow over (d)}23t{right arrow over (d)}24t{right arrow over (d)}25t]
2t=soft max (2t)=[21t22t23t24t25t] (5)
where in equation (5), {right arrow over (d)}21t(21t), {right arrow over (d)}22t(22t), {right arrow over (d)}23t(23t), {right arrow over (d)}24t(24t), and {right arrow over (d)}25t(25t) respectively represent a local trend, a medium and short-term trend, a medium-term trend, a medium and long-term trend and a global trend in bidirectional calculation, and {right arrow over (d)}2 max t and 2 max t in {right arrow over (D)}2t and 2t represent trend levels along two directions at the time t; and
4-2-2) feature extraction: inputting, according to the trend division results {right arrow over (D)}2t and 2t, data of different trends to the multicellular update units and , which perform differential learning along the two directions, for update, the lc, comprising five subunits (i), (j), (k), (l), (m), and comprising five subunits (i), (j), (k), (l), and (m):
where in equation (6), arrows → and ← respectively represent forward and backward processes, (m), (m) are corresponding data update units of the global trend in the bidirectional calculation, (i), (i) are corresponding data update units of the short-term trend in the bidirectional calculation, (k), (k) are corresponding data update units of the medium-term trend in the bidirectional calculation, (l), (l) are corresponding data update units of the medium and long-term trend in the bidirectional calculation, (j), (j) are corresponding data update units of the medium and short-term trend in the bidirectional calculation, σ is a sigmod activation function,
are weights of input gates of the MCBLSTM model,
are weights of forget gates of the MCBLSTM model,
are weights of cell storage units of the MCBLSTM model, i and i are biases of the input gates of the MCBLSTM model, f and f are biases of the forget gates of the MCBLSTM model, c and c are biases of the cell storage units of the MCBLSTM model, ⊙ is a dot product operation, and s1, s2, s3 and s4 each are a mix proportion factor obtained by learning; and
combining weights of alienation outputs of daughter-cell units in the multicellular update units according to update results of five alienation units and the trend division results {right arrow over (D)}2tand 2t to obtain outputs and of the multicellular update units, and controlling output gates {right arrow over (o)}t and t of the MCBLSTM model to obtain an output h2t of the MCBLSTM model at the time t:
{right arrow over (c)}
t
={right arrow over (D)}
2t
[{right arrow over (c)}
t(i){right arrow over (c)}t(j){right arrow over (c)}t(k){right arrow over (c)}t(l){right arrow over (c)}t(m)]T
=2t[(i)(j)(k)(l)(m)]T
{right arrow over (o)}
t=σ({right arrow over (w)}ox{right arrow over (h)}t1+{right arrow over (w)}oh{right arrow over (h)}t-12+{right arrow over (b)}o)
t=σ(ox+oh+)
{right arrow over (h)}
t
2
={right arrow over (o)}
t□tanh(ct)
=□tanh()
h
2
t
={right arrow over (h)}
t
2⊕ (7)
where in equation (7), ox, oh and ox, oh are weights of the output gates of the MCBLSTM model, and σ and tanh each are an activation function.
Further, the predicting a RUL in step 5) specifically includes:
inputting h2t to the FC layer FC1, preventing overfitting by Dropout to obtain an output h3t of the FC layer FC1, and inputting the h3t to the regression layer to obtain a predicted RUL yt:
where in equations (8) and (9), wh
By adopting the foregoing technical solutions, the present disclosure achieves the following advantages:
1. The present disclosure constructs a deep mining model (ADLDNN model) according to different sensitivities of different measured values for mechanical faults in different periods, automatically screens features through the ADLDNN model and combines with differential learning, thereby improving the accuracy and generalization of RUL prediction.
2. Input data are classified by a level division unit of an MBCNN model. Classified data are input to an MBCNN, in which each branch can execute corresponding feature extraction in accordance with a level of its input data. A bidirectional trend-level division unit of the MBCNN model is used to classify output features of the MBCNN into various levels of degradation trends along the forward and backward directions. Multicellular update units are then used to perform corresponding feature learning on bidirectional trend levels of input features to output health indexes. The present disclosure can better mine different degradation trends for a health state of the aero-engine.
Other advantages, objectives and features of the present disclosure will be illustrated in the subsequent description in some degree, and will be apparent to those skilled in the art in some degree based on study on the following description, or those skilled in the art may obtain teachings by practicing the present disclosure. The objectives and other advantages of the present disclosure can be implemented and obtained by the following description and claims.
The accompanying drawings of the present disclosure are described as follows:
The present disclosure will be further described below in conjunction with the accompanying drawings and embodiments.
As shown in
1) Data acquisition: Multidimensional degradation parameters of an aero-engine to be predicted are acquired, a stable trend is analyzed, and a plurality of parameters capable of reflecting degradation performance of the aero-engine are selected to obtain acquired data, specifically:
1-1) Degradation data of the aero-engine are simulated by commercial modular aero-propulsion system simulation (C-MAPSS) to acquire the multidimensional degradation parameters of the aero-engine to be predicted, as shown in Table 1:
As shown in Table 2, the C-MAPSS dataset is divided into four sub-datasets according to different operating conditions and fault modes:
Each sub-dataset contains training data, test data and an actual RUL corresponding to the test data. The training data contain all the engine data from a certain health state to the fault, while the test data are data before the engine running fault. Moreover, the training and test data respectively contain a certain number of engines with different initial health states.
Due to the different initial health states of the engines, the running cycles of different engines in a same database are different. Taking the FD001 dataset as an example, the training dataset includes 100 engines, with a maximum running cycle of 362, and a minimum running cycle of 128. In order to fully prove the superiority of the method, a simplest subset (namely the subset FD001 having a single operating condition and a single fault mode) and a most complex subset (namely the subset FD004 having various operating conditions and various fault modes) are taken as experimental data.
1-2) Some stable trend measurements (measurement data of sensors 1, 5, 6, 10, 16, 18 and 19) are excluded in advance. These sensors are unsuitable for RUL prediction, because their full-life cycle measurement curves are stable and constant, namely containing less degradation information of the engine, and operating conditions have a significant impact on a prediction capability of the model. Therefore, measurements and operating conditions of screened 14 sensors are formed into original data to obtain the acquired data.
2) Data preprocessing: The acquired data are segmented by an SW to obtain preprocessed data, specifically:
As shown in
When the ith sample is input, the actual RUL is T−l−(i−1)×m.
RUL labels are constructed by a piece-wise linear RUL technology, and are defined as follows:
In Equation (10), Rulmax is a maximum RUL and a preset threshold.
In the example of the present disclosure, for FD001 and FD004, the maximum RUL is 130 cycles and 150 cycles respectively, while the sliding window size l is 30, and the sliding step size m is 1. There are 17,731 and 54,028 training samples for the FD001 and the FD004. Both the FD001 and the FD004 contain 100,248 test samples, because only the last measured value of the test set is used to validate the prediction capability.
3) Model construction: A RUL prediction model of the aero-engine is constructed based on an ADLDNN, the RUL prediction model including an MBCNN model, an MCBLSTM model, an FC layer FC1, and a regression layer.
The MBCNN model includes a level division unit, and a spatial feature alienation-extraction unit.
The MCBLSTM model includes a bidirectional trend-level division unit, and multicellular update units.
4) Feature extraction: The preprocessed data are taken as input data of the MBCNN model, an output of the MBCNN model is extracted, the output of the MBCNN model and recursive data are taken as input data of the MCBLSTM model, and an output of the MCBLSTM model is extracted, specifically:
4-1) The step of extracting an output of the MBCNN model specifically includes:
4-1-1) Level division: The preprocessed data in Step 2) are taken as the input data, input data xt at time t are input to the level division unit of the MBCNN model for level division, the level division unit including an FC layer FC2 composed of five neurons, and softmax normalization is performed on an output Dt of the FC layer FC2 to obtain a level division result D1t:
D
t=tanh(wxd
D
1t=soft max(Dt)=[d11td12td13td14td15t] (12)
In Equations (11) and (12), wxd
4-1-2) Feature extraction: According to a level division result D1 of the input data, the input data are input to different convolution paths of the spatial feature alienation-extraction unit for convolution, and automatic differential processing is performed on an input measured value according to the level division result and five designed convolution paths to obtain a health feature ht1:
h
ti
1
=P
15(C15(P14(C14(P13(C13(P12(C12(P11(C11(xt))))))))))
h
tj
1
=P
24(C24(P23(C23(P22(C22(P21(C21(xt))))))))
h
tk
1
=P
33(C33(P32(C32(P31(C31(xt))))))
h
tl
1
=P
43(C42(P41(C41(xt))))
h
tm
1
=P
51(C51(xt))
h
t
1
=D
1t
[h
ti
1
h
tj
1
h
tk
1
h
tl
1
h
tm
1]T (13)
In Equation (13), Pij and Cij respectively represent a jth convolution operation and a jth pooling operation for an ith convolution path, hti1 is a convolution output of data of the important level, htj1 is a convolution output of data of the relatively important level, ht1 is a convolution output of data of the general level, htl1 is a convolution output of data of the relatively minor level, and htm1 is a convolution output of data of the minor level.
Further, the step of extracting an output of the MCBLSTM model specifically includes:
4-2-1) Trend division: An output ht1 of the MBCNN model at time t and recursive data h2t-1 of the MCBLSTM model at time t−1 are taken as input data of the MCBLSTM at time t, and input to the bidirectional trend-level division unit for trend division, the bidirectional trend-level division unit including an FC layer FC3 and an FC layer FC4 for dividing a trend level of the input data along forward and backward directions, the FC layer FC3 and the FC layer FC4 each including five neurons, and the FC layer FC3 and the FC layer FC4 respectively having an output {right arrow over ({tilde over (D)})}2t and an output 2t:
In Equation (14),
each are a weight of the FC layer FC3,
each are a weight of the FC layer FC4, {right arrow over (b)}d
A softmax operation is respectively performed on the {right arrow over ({tilde over (D)})}2t and the 2t to obtain forward and backward trend levels {right arrow over (D)}2t and 2t:
{right arrow over (D)}
2t=soft max({right arrow over ({tilde over (D)})}2t)=[{right arrow over (d)}21t{right arrow over (d)}22t{right arrow over (d)}23t{right arrow over (d)}24t{right arrow over (d)}25t]
2t=soft max(2t)=[21t22t23t24t25t] (15)
In Equation (15), {right arrow over (d)}21t(21t), {right arrow over (d)}22t(22t), {right arrow over (d)}23t(23t), {right arrow over (d)}24t(24t), and {right arrow over (d)}25t(25t) respectively represent a local trend, a medium and short-term trend, a medium-term trend, a medium and long-term trend and a global trend in bidirectional calculation, and {right arrow over (d)}2 max t and 2 max t in {right arrow over (D)}2t and 2t represent trend levels along two directions at the time t.
4-2-2) Feature extraction: According to the trend division results {right arrow over (d)}2t and 2t, data of different trends are input to the multicellular update units and , and which perform differential learning along the two directions, for update, the comprising five subunits (i), (j), (k), (l), (m), and the comprising five subunits (i), (j), (k), (l), and (m):
In Equation (16), arrows → and ← respectively represent forward and backward processes, (m), (m) are corresponding data update units of the global trend in the bidirectional calculation, (i), (i) are corresponding data update units of the short-term trend in the bidirectional calculation, (k), (k) are corresponding data update units of the medium-term trend in the bidirectional calculation, (l), (l) are corresponding data update units of the medium and long-term trend in the bidirectional calculation, (j), (j) are corresponding data update units of the medium and short-term trend in the bidirectional calculation, σ is a sigmod activation function, th
Weights of alienation outputs of daughter-cell units in the multicellular update units are combined according to update results of five alienation units and the trend division results {right arrow over (D)}2t and obtain outputs and of the multicellular update units, and controlling output gates {right arrow over (o)}t and of the MCBLSTM model to obtain an output h2t of the MCBLSTM model at the time t:
{right arrow over (c)}
t
={right arrow over (D)}
2t
[{right arrow over (c)}
t(i){right arrow over (c)}t(j){right arrow over (c)}t(k){right arrow over (c)}t(l){right arrow over (c)}t(m)]T
=[(i)(j)(k)(l)(m)]T
{right arrow over (o)}
t=σ({right arrow over (w)}ox{right arrow over (h)}t1+{right arrow over (w)}oh{right arrow over (h)}t-12+{right arrow over (b)}o)
t=σ(++)
{right arrow over (h)}
t
2
={right arrow over (o)}
t□tanh({right arrow over (c)}t)
=□tanh()
h
2
t
={right arrow over (h)}
t
2⊕ (17)
In Equation (17), ox, and ox, are weights of the output gates of the MCBLSTM model, and σ and tanh each are an activation function.
In the example of the present disclosure, in order to keep the global trend as long as possible, the cell units (k) and (k) are updated from a state at previous time. In order to replace the local trend timely, the units (k) and (k) are updated from an internal state at this time. According to the conventional cell update mechanism in the BLSTM, (k) and (k) in the medium-term trend are updated with (k) and (k) in the global trend as well as (k) and (k) in the local trend, the units in the medium and long-term trend are updated with (k) and (k) in the global trend as well as (k) and (k) in the medium-term trend, and the units in the medium and short-term trend are updated with (k) and (k) in the medium-term trend as well as h2t and h2t in the local trend.
5) RUL prediction: The output of the MCBLSTM model is taken as an input of the FC layer FC1 to obtain an output of the FC layer FC1, and the output of the FC layer FC1 is input to the regression layer to predict a RUL, specifically:
h2t is input to the FC layer FC1, overfitting is prevented by Dropout to obtain an output h3t of the FC layer FC1, and the h3t is input to the regression layer to obtain a predicted RUL yt:
In Equations (18) and (19), wh
In the example of the present disclosure, there are N samples in training. A mean square error (MSE) is defined as a loss function and calculated by:
In Equation (20),
Hyper-parameters of the ADLDNN are selected by a grid search method:
C11, C12, C13, C14, C15, C21, C22, C23, C24, C31, C32, C33, C41, C42, and C51 respectively have a kernel size of 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2, and 9.
P11, P12, P13, P14, P14, P21, P22, P23, P24, P31, P32, P33, P41, P42, and P51 respectively have a maximum pooling size of 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, and 2.
It is assumed that the convolution kernel has a step size of 1, the MCBLSTM has 30 neurons, the FC layer FC1 has 30 neurons, and the regression layer has one neuron. The Dropout is set as 0.5, and the window size and the step size are respectively set as 30 and 1.
6) Experimental validation:
6-1) Evaluation indexes: A score and a root-mean-square error (RMSE) of IEEE are taken as evaluation indexes to quantitatively characterize RUL prediction performance. The evaluation indexes can be respectively calculated by:
In Equations (21), (22), and (23), Ruli and
6-2) RUL prediction and comparison: The proposed ADLDNN is trained first by FD001, FD002, FD003 and FD004 training sets, and tested by corresponding test sets. Predicted results on the four subsets are respectively as shown in
In
The engine has a relatively simple degradation trend in a single operating condition, and there is a large overlapping degree between the training set and the test set. Hence, predicted results on FD001 and FD003 in the single operating condition are superior to those on FD002 and FD004 in various operating conditions. In addition, the predicted result on FD001 is more accurate than that on FD003, and the predicted result on FD002 is more accurate than that on the FD004. Therefore, the prediction accuracy in the single-failure mode is higher than that in the multi-failure mode. It can be further seen that the predicted result on FD003 is superior to that on FD002, which means that the number of failure modes has a less impact on RUL prediction than the number of operating conditions.
In order to further show the superiority of the ADLDNN in RUL prediction, comparisons are made between the proposed method and various typical methods based on the statistical model, shallow learning model, classic DL model and several recently published DL models. In addition, scores and RMSEs calculated according to predicted results of the above all methods are as shown in Table 3. As can be seen from the table, all methods show the best predictive effect to FD001 and the worst predictive effect to FD004. This is because FD001 is the simplest subset, while FD004 has the most complex operating conditions and fault types and more test engine numbers than other subsets. All methods are more accurate to FD003 than FD002, which further proves that the operating condition and the engine number have a greater impact on the accuracy of RUL prediction than the fault type.
As can be seen from Table 3, for the simplest FD001, except Acyclic Graph Network, a score and an RMSE in the result predicted by the method are smaller than those in the results predicted by existing other methods. However, for complex datasets such as FD002 and FD004, the method shows a stronger prediction capability than other typical methods. In addition, since the score is more practical than the RMSE in actual engineering, the ADLDNN is considered to be superior to Acyclic Graph Network in FD003. Compared with existing typical methods, the ADLDNN is more applied to process complex datasets including various operating conditions and fault types. In conclusion, the ADLDNN shows high overall performance, and can be better applied to predict the machine RUL.
Those skilled in the art should understand that the embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, the present disclosure may use a form of hardware only embodiments, software only embodiments, or embodiments with a combination of software and hardware. Moreover, the present disclosure may be in a form of a computer program product that is implemented on one or more computer-usable storage media (including but not limited to a magnetic disk memory, a CD-ROM, an optical memory, and the like) that include computer-usable program code.
The present disclosure is described with reference to the flowcharts and/or block diagrams of the method, the device (system), and the computer program product according to the embodiments of the present disclosure. It should be understood that computer program instructions may be used to implement each process and/or each block in the flowcharts and/or the block diagrams and a combination of a process and/or a block in the flowcharts and/or the block diagrams. These computer program instructions may be provided for a general-purpose computer, a dedicated computer, an embedded processor, or a processor of another programmable data processing device to generate a machine, such that the instructions executed by a computer or a processor of another programmable data processing device generate an apparatus for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
These computer program instructions may be stored in a computer-readable memory that can instruct the computer or any other programmable data processing device to work in a specific manner, such that the instructions stored in the computer-readable memory generate an artifact that includes an instruction apparatus. The instruction apparatus implements a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
These computer program instructions may be loaded onto a computer or another programmable data processing device, such that a series of operations and steps are performed on the computer or the another programmable device, thereby generating computer-implemented processing. Therefore, the instructions executed on the computer or the another programmable device provide steps for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
Finally, it should be noted that: the above embodiments are merely intended to describe the technical solutions of the present disclosure, rather than to limit thereto; although the present disclosure is described in detail with reference to the above embodiments. It is to be appreciated by those of ordinary skill in the art that modifications or equivalent substitutions may still be made to the specific implementations of the present disclosure, and any modifications or equivalent substitutions made without departing from the spirit and scope of the present disclosure shall fall within the protection scope of the claims of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202111261992.X | Oct 2021 | CN | national |