Effective data-driven analytics is possible using advancements in sensor technologies and networked industrial machinery design. Industrial assets often have multiple sensors monitoring operation. With connection to the Internet of Things (IoT), access to the sensor data can be obtained in data streams at almost real time. This increasing availability of streaming time series data can have a practical purpose in detecting anomalies in the operation of the industrial asset.
An industrial asset can be, among other things and without limitation, a generator, gas turbine, power plant, manufacturing equipment on a production line, aircraft engine, wind turbine generator, locomotive, imaging device (e.g., X-ray, MRI, CT, PET, SPECT systems), or mining operation drilling equipment. Each instance of a time series data set is recorded at a certain timestamp of an asset. An event is a failure case that happens at a certain timestamp within the time series data.
An anomaly in the time series data can indicate a change in the industrial asset's status (e.g., a change in turbine rotation). Identification of the anomaly can be beneficial in predicting faults and/or updating maintenance schedules.
Embodying systems and methods provide detection of temporal anomaly(ies) in multivariate time series data. An embodying temporal anomaly detection (TAD) algorithm can be an unsupervised, high-dimensional detector that incorporates manifold learning and provides kernel construction options. An embodying TAD algorithm can provide an anomaly score for each input sample, and also a corresponding feature/variable ranking.
A temporal anomaly can be difficult to detect, but the temporal anomaly can serve as an early warning that there could be an underlying problem with the industrial asset. An embodying TAD algorithm can provide an anomaly score for each input sample (stream of sensor data). The anomaly score is indicative of the likelihood of the corresponding sample being the anomaly—for example, a higher anomaly score makes it more likely of a sample being the anomaly source. It should be readily understood that the invention is not so limited, and that other scales can be applied to the anomaly score.
In accordance with embodiments, along with an anomaly score, an embodying TAD algorithm can support further decision-making by performing root cause analysis. Also, the TAD algorithm can provide a corresponding feature/variable ranking, which can rank the most contributing feature(s) to the detected anomaly(ies).
An embodying TAD algorithm can have one or more of the following characteristics:
(1) Unsupervised: Anomaly detection can be unsupervised (i.e., no anomaly label is required). In this implementation, the algorithm can detect those anomalies that are not well-understood or quantified. The algorithm's training set is assumed to be normal. The training set need not be purely normal, as long as the abnormality in the training data represents a small portion. Any samples with strongly deviated pattern in a testing set would be assigned a high anomaly score.
(2) High-dimensional: In this implementation there is an assumption that on the occurrence of an anomaly (e.g., component failure), an anomalous pattern appears in the time series data for multiple variables simultaneously. This approach can be effective when there are a lot of possibly-related variables to the anomalies. The algorithm can effectively detect anomalous patterns from a large number of input tags (raw variables or derived variables), where input data dimensions can be voluminous (e.g., hundreds or perhaps thousands).
(3) Multi-kernels: The algorithm can operate with the selection of differing options of kernel construction. The algorithm can include more than one kernel to measure the degree of anomaly, which can not only built upon Euclidean space (e.g. Gaussian kernel), but also other linear and non-linear kernel space.
Options of kernel construction can include, but are not limited to, “braycurtis”, “Canberra”, “Chebyshev”, “cityblock”, “correlation”, “cosine”, “dice”, “euclidean”, “hamming”, “jaccard”, “kulsinski”, “mahalanobis”, “matching”, “minkowski”, “rogerstanimoto”, “russellrao”, “euclidean”, “sokalmichener”, “sokalsneath”, “sqeuclidean”, “yule”. The kernel selection can depend on differentiability of the data in that kernel space. The distribution of training dataset (which can be normal or near normal) should be differentiable in the selected kernel space.
The raw sensor reading data variables can be transformed, step 115, to time series features (transformed variables) using temporal feature engineering techniques. In some implementations, feature transformation is calculated with a sliding-window with a certain length l. Although not limiting, two types of feature transformations can be: univariate and pair-wise. The input is vector b related to one/two raw features, with length(b)=l. The output is one scalar which describes the statistics of b in a certain way.
Given a time-series vector b ∈ Rl×1, the following univariate feature transformation options can include: Moving-average: mean(b); Standard deviation: std(b); Level-shift: lsf(b)=max(b)−min(b); Autocorrelation: autocorr(b); Standard deviation of delta: sdn(b)=std(diff(b)); Vibration degree: vbr(b)=std(b)×sdn(b); and Spike:
For the a two dimensional time-series vector b ∈Rl×2, the following pair-wise feature transformation options can include: Covariance: cov(b)=covariance (b:,1, b:,2); and Correlation: crl(b)=correlation(k:,1, b:,2). The transformed data set can be projected, step 120, onto a low embedding space using multi-kernel-based projection method(s).
The training data set can be expresses as Xtrn ∈ Rn1×m and the testing data set as Xtst ∈ Rn2×m, where n1 (n2) is the number of training (or testing) samples, and m is the number of transformed features. The number of low embedding kernels is represented by k. MKP algorithm 200 can provide an anomaly score output s ∈ Rn2×1 for the testing data set.
A similarity matrix (A1:t=[A1, A2, . . . , At] for Xtrn) is constructed, step 210; where t is the number of the chosen kernel options and Ai ∈ Rn1×n1 is the similarity matrix based on each kernel (1≤i≤t). For each element of the similarity matrix (Ai in A1:t), a projection matrix is calculated, step 215.
Calculation of the projection matrix includes first, calculating
Li=Di−Ai (EQ. 1)
where Di is the degree matrix of Ai; then, calculating top k eigenvectors Ψi ∈ Rn1×k with the smallest eigenvalues
λi ∈R1×k of Li (EQ. 2)
The projection matrix can then be calculated as
Pi ∈Rm×k (EQ. 3)
from Li to ψi using elastic net regression.
After a projection matrix is calculated for each element of the similarity matrix, step 220, the MKP algorithm proceeds. The MKP algorithm follows steps 225-235 to calculate projected embeddings and an anomaly score matrix.
At step 225, for each element in the projection matrix (Pi), projected embeddings are calculated by applying each Pi to Xtst to get the projected embeddings
ϕi ∈Rn2×k (EQ. 4)
Corresponding elements of an anomaly score matrix are calculated, step 230, and can be expressed as
Si,j=Σpe−λ
where j is the index of testing sample, p is the index of eigenvector/eigenvalue.
The projected embeddings and corresponding anomaly score matrix elements are calculated for all elements of the projection matrix, step 235. Once all elements are calculated, a final anomaly score vector (s by sj=Σi Si,j; where j is the index of each testing sample) is computed, step 240. The anomaly score is calculated by measuring the neighborhood density of each sample in the low embedding space. The results of MKP algorithm 200 are returned to TAD algorithm 100.
With reference again to
The data store can include sensor data records 326 that contain operational data monitored by sensor suite 355 in industrial asset 350. Only one industrial asset is depicted, however, there can be multiple industrial assets each including sensor suites that provide monitored data across electronic communication network 340 to data store 320. The data store can also include TAD algorithm 324, MKP algorithm 330, training data set 330, and testing data set 332.
In some embodiments, the anomaly score and variable ranking outputs of TAD algorithm 100 (step 125) can be presented graphically.
TAD output curve 410 represents monitored sensor data over a time period from a single sensor. This output curve depicts the data in time series feature space after processing by TAD algorithm 100. Failure spike 440 represents a failure in an industrial asset (e.g., a lean blowout (LBO) in a turbine generator). Operator effect spike 450 indicates failure propagation.
Alert spike 430, 432 each represent a failure event in the industrial asset. These spikes occur at different times, and each exceeds user predefined threshold 420. Analysis of TAD output curve 420 shows that before the LBO occurrence, the TAD algorithm successfully detected a problem(s) prior to the occurrence of the failure event. This early detection can provide a possibility for any early response to prevent failure events and their subsequent failure propagation.
In accordance with some embodiments, a computer program application stored in non-volatile memory or computer-readable medium (e.g., register memory, processor cache, RAM, ROM, hard drive, flash memory, CD ROM, magnetic media, etc.) may include code or executable program instructions that when executed may instruct and/or cause a controller or processor to perform methods discussed herein such as a method of temporal anomaly detection and fault analysis, as disclosed above.
The computer-readable medium may be a non-transitory computer-readable media including all forms and types of memory and all computer-readable media except for a transitory, propagating signal. In one implementation, the non-volatile memory or computer-readable medium may be external memory.
Although specific hardware and methods have been described herein, note that any number of other configurations may be provided in accordance with embodiments of the invention. Thus, while there have been shown, described, and pointed out fundamental novel features of the invention, it will be understood that various omissions, substitutions, and changes in the form and details of the illustrated embodiments, and in their operation, may be made by those skilled in the art without departing from the spirit and scope of the invention. Substitutions of elements from one embodiment to another are also fully intended and contemplated. The invention is defined solely with regard to the claims appended hereto, and equivalents of the recitations therein.
Number | Name | Date | Kind |
---|---|---|---|
8914317 | Biem | Dec 2014 | B2 |
20060101402 | Miller | May 2006 | A1 |
20100305806 | Hawley | Dec 2010 | A1 |
20120041575 | Maeda et al. | Feb 2012 | A1 |
20120316835 | Maeda et al. | Dec 2012 | A1 |
20130073260 | Maeda et al. | Mar 2013 | A1 |
20140195184 | Maeda et al. | Jul 2014 | A1 |
Number | Date | Country |
---|---|---|
107403480 | Nov 2017 | CN |
Entry |
---|
Morrow, Adam et a., “Ranking Anomalous High Performance Computing Sensor Data Using Unsupervised Clustering”, 2016 IEEE. (Year: 2016). |
Ding, Yi et al., “Large Scale Kernel Methods for Online AUC Maximization”, 2017 IEEE (Year: 2017). |
Trinh, Van et al., “Data Driven Hyperpararmeter Optimization of One-Class Support Vector Machines for Anomaly Detection in Wireless Sensor Networks”, 2017 IEEE (Year: 2017). |
Chen, Guangliang et al., “A Fast Multiscale Framework for Data in High-Dimensions: Measure Estimation, Anomaly Detection, and Compressive Measurements”, 2012 Visual Communications and Image Processing, USA, 2012, (pp. 1-6, 6 total pages). |
Theissler, Andreas “Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection”, Knowledge-Based Systems, vol. 123, Germany, 2017, DOI: 10.1016/j.knosys.2017.02.023, (pp. 163-173, 11 total pages). |
Number | Date | Country | |
---|---|---|---|
20200133253 A1 | Apr 2020 | US |