The present disclosure generally relates to systems and methods of detecting drift in data collected over time.
Data is collected over time from sensors on aircraft and other machines and vehicles for a variety of purposes. Classifiers, unsupervised or supervised, may be used to analyze such data. Common uses of classifiers may include, but are not limited to, analysis of performance of a component (e.g., an aircraft component), machine or system, as well as identification of a need for inspection, maintenance or repair of such. The use of a classifier is based on the assumption that the incoming data points are statistically similar to the data set that was used to train the classifier.
Drift is a phenomenon by which data collected from sensors monitoring a component or machine, or the like, starts to look different over time than what is expected. This may happen due to a number of reasons such as: aging sensors, aging parts which the sensor is monitoring, or a change in operation environment of the part or machine from that which was utilized to train the system that monitors the data provided by the sensors.
Typically, there are two different approaches to addressing data drift. The first analyzes the distribution of the data and determines if parameters of the distribution have changed. For example, the statistics of each individual sensor that contribute to the data stream is examined separately for drift. Once significantly statistical change has been detected, the classifier is re-trained to update the classifier operation. The disadvantage of this approach is that one will need as many thresholds as the number of sensors and this method does not account for the fact that there is correlation among sensor variables. This approach can be unwieldy because of the number of distribution parameters that must be monitored (which can grow in number up to the number of sensor variables).
A second method of assessing data drift is to do so through analyzing the error rate or error-driven statistics on the classification error. For example, tracking the changing statistics of multiple elements of a multi-class confusion matrix and detecting drift based on significantly changed statistics of any of the elements as a function of time. The disadvantage of this methodology is that it only applies to supervised learning classifiers (not unsupervised learning classifiers) since computing the confusion matrix requires labeled data.
Undetected drift in the data stream received from sensors effects the analysis and possibly the conclusions drawn from the sensor data. An effective method of detecting drift in data streams received by a supervised or unsupervised classifier is desired.
In accordance with one aspect of the disclosure, a data drift detection system is disclosed. The data drift detection system may comprise an autoencoder configured to receive a first set of sensor data and a second set of sensor data; and a training controller. The training controller may be configured to: train the autoencoder based on a first portion of the first set of sensor data; set an initial threshold to a value x at a percentile threshold of an empirical distribution of the reconstruction errors of the first portion of the first set of sensor data after decoding by a decoder layer of the autoencoder; and determine a final threshold based on a comparison of an empirical distribution of the reconstruction errors of a second portion of the first set of sensor data to the empirical distribution of the reconstruction errors of the first portion of the first set of sensor data; and a testing controller configured to test the autoencoder with the second set of sensor data.
In accordance with another aspect of the disclosure, a method for detecting data drift is disclosed. The method may comprise: training an autoencoder, and testing the autoencoder with a second set of sensor data detected by one or more sensors. The training may include: initializing the autoencoder, the autoencoder including an input layer, an encoder layer and a decoder layer; training the autoencoder based on a first portion of a first set of sensor data; setting an initial threshold to a value x at a percentile threshold of an empirical distribution of the reconstruction errors of the first portion of the first set of sensor data after decoding by the decoder layer; and determining a final threshold based on a comparison of an empirical distribution of the reconstruction errors of a second portion of the first set of sensor data to the empirical distribution of the reconstruction errors of the first portion of the first set of sensor data. The testing of the autoencoder with a second set of sensor data detected by one or more sensors may comprise: for an empirical distribution of the reconstruction errors of the second set of sensor data after decoding by the decoder layer, determining a value of a reconstruction error at the percentile threshold; determining that data drift is not present when the reconstruction error of the second set of sensor data is less than the final threshold; and calculating a deviation output for at least one of the one or more sensors.
In accordance with a further aspect of the disclosure, a method for detecting drift in data captured by a plurality of sensors monitoring an operation of an aircraft system is disclosed. The method may comprise training a three-layer autoencoder; testing the autoencoder with a second set of sensor data from a second plurality of sensors; and after training and the testing the autoencoder, detecting whether data drift is present in a third set of sensor data received from a third plurality of sensors. The training may include initializing the autoencoder, the autoencoder including an input layer, an encoder layer and a decoder layer. The training may further include training the autoencoder based on a first portion of a first set of sensor data, the first set of sensor data detected by a first plurality of sensors; setting an initial threshold to a value x at a percentile threshold of an empirical distribution of the reconstruction errors of the first portion of the first set of sensor data after decoding by the decoder layer, wherein the percentile threshold is 90-99.7; comparing an empirical distribution of the reconstruction errors of a second portion of the first set of sensor data to the empirical distribution of the reconstruction errors of the first portion of the first set of sensor data; and determining a final threshold based on a result of the comparing. The testing of the autoencoder with the second set of sensor data from the second plurality of sensors may comprise: receiving, encoding and decoding the second set of sensor data with the autoencoder; for an empirical distribution of the reconstruction errors of the second set of sensor data after decoding by the decoder layer, determining a value of a reconstruction error at the percentile threshold; comparing the reconstruction error of the second set of sensor data with the final threshold; determining that data drift is not present in the second set of sensor data when the reconstruction error of the second set of sensor data is less than the final threshold; and calculating a deviation output for one or more sensors in the second plurality of sensors.
The autoencoder 104 is configured to be an unsupervised neural network 110 that includes an input layer 112, an encoder 113 and a decoder layer 116. The autoencoder 104 is configured to learn efficient data codings in an unsupervised manner. More specifically, the autoencoder 104 learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the neural network 110 to ignore signal noise.
The input layer 112 receives data from one or more sensors 106 (“sensor data”) and includes a plurality of neurons 122 (see
Each sensor 106 (
The encoder 113 includes one or more encoder layers 114. Each encoder layer 114 includes a plurality of neurons 122 (see
The decoder layer 116 is configured to reconstruct the data received from the encoder 113 into the data received by the input layer 112. The decoder layer 116 is configured to reconstruct the data into the size of the input layer 112 and the size of the decoder layer 116 is set equal to the size of the input layer 112. This allows a direct comparison of the accuracy of the encoder 113 and from such comparison a measurement of the drift (if any) that may be present. This difference between the input and the reconstructed output may be used as a measure of how well the encoder 113 has learned the structure in the sensor data.
The training controller 102a (
The training controller 102a may be configured to determine empirical distributions of the reconstruction errors of the training set of sensor data after decoding by the decoder layer 116 and to determine empirical distributions of the reconstruction errors of the holdout set of sensor data after decoding by the decoder layer 116, as described later herein. The controller 102a is configured to set an initial threshold to a value “x” that is the value at a selected percentile (“percentile threshold”) of an empirical distribution of the reconstruction errors of the training set. The training controller 102a is configured to assess the initial threshold x using the holdout set of sensor data and to determine a final threshold for the autoencoder 104, as described later herein.
The testing controller 102b is configured to determine for an empirical distribution of the reconstruction errors of a sample set of sensor data (after decoding by the decoder layer 116) a value of the reconstruction error (e) at the percentile threshold that was previously determined for the autoencoder 104 during training, as discussed later herein. The testing controller 102b is configured to compare the value of the reconstruction error (e) of the sample set of sensor data with the value of the final threshold that was determined with the training data and to determine if data drift is present or not, as described later herein. The testing controller 102b may be configured to calculate a deviation output, if any, for one or more of the sensors 106 in the plurality of sensors 106 from which the sample set of sensor data was received and may be configured to transmit the result (data drift present or not present) and/or the deviation output, if any, to the output interface 108.
The testing controller 102c is configured to determine for an empirical distribution of the reconstruction errors of a set of sensor data (after decoding by the decoder layer) a value of the reconstruction error (e) at the percentile threshold that was previously determined for the autoencoder 104, as discussed later herein. The testing controller 102c is configured to compare the value of the reconstruction error (e) of the set of sensor data with the value of the final threshold that was set for the autoencoder 104 and to determine if data drift is present or not, as described later herein. The testing controller 102c may be configured to calculate a deviation output, if any, for one or more of the sensors 106 in the plurality of sensors 106 from which the set of sensor data was received and may be configured to transmit the result (data drift present or not present) and/or the deviation output, if any, to the output interface 108.
Each controller 102a, 102b, 102c (collectively, controller 102) may include a processor 118 and a memory component 120. The processor 118 may be a microcontroller, a digital signal processor (DSP), an electronic control module (ECM), an electronic control unit (ECU), a microprocessor or any other suitable processor 118 as known in the art. The processor 118 may execute instructions and generate control signals for executing appropriate blocks of the methods described herein. Such instructions may be read into or incorporated into a computer readable medium, such as the memory component 120 or provided external to the processor 118. In alternative examples, hard wired circuitry may be used in place of, or in combination with, software instructions to implement a control method.
The term “computer readable medium” as used herein refers to any non-transitory medium or combination of media that participates in providing instructions to the processor 118 for execution. Such a medium may comprise all computer readable media except for a transitory, propagating signal. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, or any other computer readable medium.
Each controller 102a, 102b, 102c is not limited to one processor 118 and memory component 120. The controller 102a, 102b, 102c may include several processors 118 and memory components 120. In an example, the processors 118 may be parallel processors that have access to a shared memory component(s) 120. In another example, the processors 118 may be part of a distributed computing system in which a processor 118 (and its associated memory component 120) may be located remotely from one or more other processor(s) 118 (and associated memory components 120) that are part of the distributed computing system. The controller 102a, 102b, 102c may also be configured to retrieve from the memory component 120 and formulas and other data necessary for the calculations discussed herein.
Referring now to
Block 210 includes initializing the autoencoder 104. Initializing includes setting up a z-layer autoencoder 104. In the exemplary embodiment described here, z=3 and thus, the autoencoder 104 is a three-layer autoencoder. Namely, the autoencoder 104 includes an input layer 112, an encoder layer 114 (i.e., the encoder 113 includes a single encoder layer) and a decoder layer 116. As mentioned earlier, a larger number of layers may be utilized (e.g., the encoder 113 may include multiple encoder layers 114).
The input data may be captured by each of “i” sensors for a sliding n-second window of time. For example, in the exemplary embodiment i=24 and n=10 and the interval=1 (second), thus there are twenty-four (24) sensors 106 (also known as sensor variables) configured to measure operating or performance parameters of the component, machine or system (in the exemplary embodiment, cabin air compressors of an aircraft during flight) and input was captured by each of the sensors 106 for each second (interval) of a 10-second window of time. Thus, the input dimension to the input layer is a 240 (data measurements/points) long vector. In other embodiments, a different interval may be utilized.
The size of the input layer 112 should be set equal to the size of the input dimension to be received. In the exemplary embodiment, the size of the input layer 112 is set equal to two-hundred forty (240) neurons. The quantity of neurons 122 in the encoder layer 114 may be set in a range which results in equivalency of results from the decoder layer 116 and the input sensor data to the input layer 112. In the exemplary embodiment, the quantity of neurons 122 in the encoder layer 114 may be in the range of 60-75% of the size of the input dimension, inclusive of the endpoints of the range. For example, in the exemplary embodiment the quantity of neurons 122 in the encoder layer 114 was one hundred fifty (150). The size of the decoder layer 116 is set equal to the size of the input layer 112. For example, in the exemplary embodiment, the size of the decoder layer 116 is set equal to two-hundred forty (240) neurons.
Referring back to
Block 230 includes setting an initial threshold to a value “x” that is the value at a selected percentile (“percentile threshold”) of an empirical distribution of the reconstruction errors of the training set. The empirical distribution of the reconstruction errors of the training set may be generated and determined by the training controller 102a. In one embodiment, the percentile threshold may be set to a high percentile indicative of a value “x” that is unlikely to occur. In some embodiments it may be appropriate to set the initial threshold value “x” equivalent to the value associated with a percentile threshold in the range of 90th-99.7th percentile of the empirical distribution of the reconstruction errors of the training set after decoding by the decoder layer 116. In another exemplary embodiment, the value of the initial threshold value “x” may be set equal to the value associated with a percentile threshold that is the 99.7th percentile of an empirical distribution of the reconstruction errors of the training set after decoding by the decoder layer 116. In embodiments in which the empirical distribution of the reconstruction errors for the training set is assumed to be a normal distribution, the 99.7th percentile is equivalent to the mean of such empirical distribution plus three standard deviations away from the mean of the reconstruction error for the decoder layer 116 (of the autoencoder 104).
Block 240 includes assessing by the training controller 102a the initial threshold x using the holdout set of sensor data. In block 240, the holdout set of sensor data is input to the input layer of the autoencoder 104. The encoder layer 114 receives the holdout set from the input layer 112, encodes the holdout set and provides the encoded holdout set to the decoder layer 116. The decoder layer 116 decodes the encoded holdout set of sensor data. An empirical distribution of the reconstruction errors of the holdout set after decoding by the decoder layer is determined by the training controller 102a. The controller 102a compares the reconstruction errors of the holdout set of sensor data to the reconstruction errors of the training set. More specifically, the training controller 102a calculates the percentage “y” of the reconstruction errors of the holdout set that are greater than the value that was set for the initial threshold “x”.
Block 250 includes determining by the training controller 102a a final threshold for the autoencoder 104 based on the result of block 240 and Equation (1) below:
100−y=d Equation (1)
Referring now to
Block 310 includes receiving, encoding and decoding a sample set of sensor data with the autoencoder 104. The sample set of data is received in the same format as the training data. For example, in the exemplary embodiment, the sample set of data is from twenty-four (24) sensors 106 configured to measure operating parameters of cabin air compressors of an aircraft during flight and input was captured by each of the sensors 106 for each second (interval) of a 10-second window of time, yielding a 240 datapoint vector input to the input layer.
Block 320 includes determining, by the testing controller 102b, for an empirical distribution of the reconstruction errors of the sample set (after decoding by the decoder layer) a value of the reconstruction error (e) at the percentile threshold that was previously determined for the autoencoder 104 during training.
Block 330 includes comparing by the testing controller 102b the value of the reconstruction error (e) of the sample set of sensor data determined in block 320 with the value of the final threshold that was determined with the training data. If the value of the reconstruction error (e) is less than the final threshold, the testing controller 102b determines that data drift is not present and the autoencoder 104 is trained appropriately (block 334). When the reconstruction error (e) is greater than or equal to the final threshold, the testing controller 102b determines that data drift is present in the sample set of sensor data (block 336) and the autoencoder 104 should be retained according to the method 200 of
Block 340 includes calculating by the testing controller 102b a deviation output, if any, for one or more sensors in the plurality of sensors from which the sample set of sensor data was received.
Block 350 includes transmitting the result (data drift present or not present) and/or the deviation output, if any, from the testing controller 102b to the output interface 108.
Block 510 includes receiving, encoding and decoding the set of data with the autoencoder 104. The set of data is received in the same format as the training data. For example, in the exemplary embodiment, the set of data is from twenty-four (24) sensors 106 configured to measure operating or performance parameters of cabin air compressors of an aircraft during flight and input was captured by each of the sensors 106 for each interval (in the exemplary embodiment the interval is a second) of a 10-second window of time, yielding a 240 datapoint vector input to the input layer 112.
Block 520 includes determining, by a testing controller 102c, for an empirical distribution of the reconstruction errors of the set of sensor data (after decoding by the decoder layer 116) a value of the reconstruction error (e) at the percentile threshold that was previously determined for the autoencoder 104 during training.
Block 530 includes comparing the value of the reconstruction error (e) of the set of sensor data with the value of the final threshold set for the autoencoder 104. If the value of the reconstruction error (e) is less than the final threshold, the testing controller 102c determines that data drift is not present, in block 534, then method 500 proceeds to block 540. When the reconstruction error (e) is greater than or equal to the final threshold, the testing controller 102c determines that data drift is present in the set of sensor data, in block 536, then method 500 proceeds to block 540.
Block 540 includes calculating, by the testing controller 102c, a deviation output, if any, for one or more sensors 106 in the plurality of sensors 106 from which the set of sensor data was received.
Block 550 transmitting the result (data drift present or not present) and/or the deviation output, if any, from the testing controller 102c to the output interface 108.
Also disclosed is a method 200, 300 for training 200 and testing 300 an autoencoder 104 to detect data drift. The method may comprise: training 200 the autoencoder 104 (illustrated in
Also disclosed is a method 200, 300, 500 for detecting drift in data captured by a plurality of sensors 106 monitoring an operation of an aircraft system 132. In an embodiment the method may comprise: training 200 a three-layer autoencoder 104 with a first set of sensor data (see
In general, the foregoing disclosure finds utility in applications relating to automatic detection of data drift in data collected by sensors or the like. In particular, use of the teachings herein improves the ability detect when a machine or a component may need maintenance or replacement by providing a tool that automatically detects changes in the values of the parameters monitored by sensors. The use of an unsupervised neural network (autoencoder) obviates the need for classifier labels to assess change in accuracy for drift analysis. The use of the reconstruction error (e) as a way to assess the amount of deviation (if any) in specific sensors enables identification of machines or components that may need maintenance or identification of sensors that may need to be replaced. The method disclosed herein is data-driven and does not rely on the physics of the data-generating device or machine. This is an advantage because this method can be applied to any data-generating machine and not restricted to any particular one. Because the method is data-driven, it is not necessary to identify prior to drift detection the environmental conditions under which drift appears. Furthermore, it is not necessary to compute the statistics of each individual sensor detecting data and set multiple thresholds for each sensor in order to detect drift. The method may use one threshold (final threshold) to detect drift. Some data-drift detection methods detect drift by watching how classifier results change over time. The method disclosed herein does not rely on the classifier results and thus can be used in conjunction with supervised and unsupervised classifiers (that is classifiers that do not require labels) for quicker detection of data-drift.
While the preceding text sets forth a detailed description of numerous different examples, it should be understood that the legal scope of protection is defined by the words of the claims set forth at the end of this patent. The detailed description is to be construed as exemplary only and does not describe every possible embodiment since describing every possible embodiment would be impractical, if not impossible. Numerous alternative embodiments could be implemented, using either current technology or technology developed after the filing date of this patent, which would still fall within the scope of the claims defining the scope of protection.
It should also be understood that, unless a term was expressly defined herein, there is no intent to limit the meaning of that term, either expressly or by implication, beyond its plain or ordinary meaning, and such term should not be interpreted to be limited in scope based on any statement made in any section of this patent (other than the language of the claims). To the extent that any term recited in the claims at the end of this patent is referred to herein in a manner consistent with a single meaning, that is done for sake of clarity only so as to not confuse the reader, and it is not intended that such claim term be limited, by implication or otherwise, to that single meaning.
This application claims priority to, and the benefit of, U.S. Provisional Patent Application No. 63/127,378 filed Dec. 18, 2020, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63127378 | Dec 2020 | US |