Embodiments of the invention relate to anomaly detection in various systems using sensor data.
Sensors are often used in systems, such as power systems, for various purposes. For example, sensors are attached to a wind turbine to take measurements including real-time power outputs, air pressure, air temperature, etc. These measurements are used for monitoring the operating conditions of a power system device. Analyzing the data measured by the sensors and detecting anomalies in the sensor data are the basis for early warning of potential faults of the device.
Anomalies are abnormal and minor patterns emerging in the measurements that distinguish themselves from normal and major patterns. Anomalies can have a variety of lengths, magnitudes, and shapes. In terms of their durations, these anomalies can be broadly classified into two major categories: 1) anomalous points where the measured values at these points are considerably away from normal values, and 2) anomalous intervals where the measured values looks normal if investigated point-wise, while the interval as a whole presents abnormal patterns.
Effective methods are needed for automatically detecting anomalies in the sensor data, especially when many devices in the system need to be monitored simultaneously. Successful methods for anomaly detection rely on accurate models of the system under consideration to capture the discrepancy between the actual sensor measurements and the model outputs, for all possible operating conditions, thus to detect unanticipated events. These methods capture unexpected signatures, and suggest which residuals are normal or which ones resulted from abnormal conditions.
A variety of techniques have been proposed for anomaly detection based on estimation theory, failure sensitive filters, multiple hypothesis filter detection, generalized likelihood ratio tests, model-based approach, statistical analysis, and information theory.
The process of building a system and program for detecting anomalies in the sensor data for monitoring the running conditions of power system devices generally consists of the following stages: 1) the stage of collecting data measured by the sensors attached to the devices and storing the collected data in a database, 2) the stage of exploring the collected data and choosing a proper technique or model to be used for the task, 3) the stage of selecting or computing the best structure of the chosen model, and 4) the stage of determining or computing the best parameters of the chosen model with determined structure, and finally 5) the stage of deploying the built system and program to the power system to monitor the running conditions of the devices.
The relationship between the effectiveness and performance of the chosen model for anomaly detection and its structure and parameters can be complex and generally nonlinear. Therefore, there is a need for an effective technique to improve the performance of anomaly detection in the running conditions of power system devices.
According to one embodiment of the invention, a computer-implemented method is provided for detecting an anomaly condition of a device having attached sensors. The method includes: building one or more models to establish normal behaviors of the device by analyzing historical sensor data of the device; applying the one or more models to target sensor data of the device to compute one or more anomaly scores of the device; and reporting a condition of the device based on an analysis of the one or more anomaly scores. Building the one or more models further comprises: identifying at least one optimization problem for each of the models; constructing a dynamical system such that stable equilibrium points (SEPs) of the dynamical system have one-to-one correspondence with local optimal solutions of the at least one optimization problem; finding the local optimal solutions by computing the SEPs of the dynamical system; and identifying a global optimal solution to the at least one optimization problem among the local optimal solutions.
In another embodiment, a system is provided for detecting an anomaly condition of a device having attached sensors. The system includes data storage to store historical sensor data of the device; a data analysis module coupled to the data storage and adapted to: build one or more models to establish normal behaviors of the device by analyzing the historical sensor data, and apply the one or more models to target sensor data of the device to compute one or more anomaly scores of the device; and a condition reporting module coupled to the data storage and adapted to report a condition of the device based on an analysis of the one or more anomaly scores. The data analysis module further includes a model building unit adapted to: identify at least one optimization problem for each of the models; construct a dynamical system such that SEPs of the dynamical system have one-to-one correspondence with local optimal solutions of the at least one optimization problem; find the local optimal solutions by computing the SEPs of the dynamical system; and identify a global optimal solution to the at least one optimization problem among the local optimal solutions.
In yet another embodiment, a non-transitory computer readable storage medium includes instructions that, when executed by a computer system, cause the computer system to perform the aforementioned method for detecting an anomaly condition of a device having attached sensors.
Embodiments are illustrated by way of example and not limitation in the Figures of the accompanying drawings:
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
To realize a system and method of improved performance for detecting anomalies in the sensor data for monitoring the running conditions of a device, it is desirable to incorporate in the process of model building a deterministic optimization method that can not only escape from a local optimal solution, but compute multiple local optimal solutions to the involved optimization problem.
A method, system, apparatus and computer programs encoded on computer storage media, for detecting anomalies in various systems are described herein. Although power system devices are mentioned as examples in the following description, it is understood that embodiments of the invention can be applied to any devices having attached sensors. In one embodiment, the method includes receiving and storing a plurality of measured values from a plurality of sensors monitoring the performance of a power system device. The method includes building a plurality of models to establish normal behaviors of the power plant device by analyzing the plurality of data stored. The models include a predictive model, a clustering model, and a statistical model. The method includes executing the plurality of normal models on the received sensor data to compute scores regarding the condition of the device. The method includes assessing the condition of the device by analyzing the computed scores. The method includes reporting the condition of the device.
In one embodiment, a plurality of TRUST-TECH enhanced models are built to establish normal behaviors of the power system device by analyzing the plurality of data stored. In one embodiment, the models include a TRUST-TECH enhanced neural network model, a TRUST-TECH enhanced clustering model, and a TRUST-TECH enhanced statistical model. The TRUST-TECH methodology, also referred to as the dynamical trajectory based methodology, has been described in U.S. Pat. No. 7,050,953 and U.S. Pat. No. 7,277,832. Further details of the TRUST-TECH enhanced methods are described below in connection with
In one embodiment, the system described herein monitors devices by building optimal models, namely a predictive model, a clustering model, and a statistical model. A TRUST-TECH enhanced neural network is developed for the optimal predictive model. A TRUST-TECH enhanced affinity propagation model is developed for the optimal clustering model. Furthermore, a TRUST-TECH enhanced probability density estimation model is developed for the optimal statistical model.
The time-stamped signals transferred to the device monitoring system 106 are collected by a data acquisition unit 107. The collected sensor signal data is transferred to a data storage 111 via a system data bus 112, and stored in the data storage 111. The data storage 111 can be any volatile or non-volatile memory device. Using the sensor signal data, a data analysis unit 108 performs data analysis by building and training a plurality of models on the aggregated data (i.e., historical sensor data) to model normal behaviors of the device 101. The data analysis unit 108 then applies multiple built and trained models on the target sensor data, which may be the most-recently acquired data, real-time sensor data (also referred to as online sensor data), or sensor data that is not part of the historical sensor data used for constructing the models. The condition of the device is computed by using the plurality of models. A condition assessment unit 109 assesses the condition of the device 101 by inspecting the computed anomaly score to determine if the score is within the normal range that indicates the device is under a normal condition, or is outside the normal range that indicates the device is under an abnormal condition. A condition reporting unit 110 reports the assessment to a system operator or other administrative entities. Abnormal behaviors detected in the target sensor data are warned, indicating abnormal behaviors of the device 101.
In one embodiment, the problem of building and training the device models can be formulated as an optimization problem of the form:
In one embodiment, the objective function f(x) for building a predictive model is the mean squared error (MSE) between the model outputs and the stored historical sensor data, the objective function f(x) for building a statistical model is the integrated squared error (ISE), and the objective function f(x) for building a clustering model is the within-cluster sum of differences (WCSD). Each of these objective functions f(x) can be nonlinear and nonconvex over a specified domain M, to which the values of x are confined, and can have multiple local optimal solutions. The optimization problem (1) is a global optimization problem for finding global optimal solution; namely, values of x which make f(x) be the smallest over the domain M. The model building and training therefore include optimizing objective functions by a global optimization engine.
The output of model building and training is a set of models (block 303) that models normal behaviors of the device. In one embodiment, the set of models include a predictive model, a statistical model, and a clustering model.
In one embodiment, the transformation function can be the arctangent function
In another embodiment of the invention, the transformation function can be the hyperbolic tangent sigmoid function
In yet another embodiment of the invention, the transformation function can be
The deviation calculator 520 also calculates the amount that the target sensor data 507 deviates from the statistical model 502. The amount of deviation, referred to as the statistical deviation 505, is normalized by the normalizer 530, or more specifically, a statistical deviation normalizer 506. The statistical deviation normalizer 506 applies a transformation function to the statistical deviation 505 and produces a normalized value between 0 and 1. The value 0 indicates the model output exactly matches the target sensor data 507, thus the device's behavior being normal. The larger the normalized value is, the higher level of anomaly there is in the target sensor data 507 and the device's behavior. In one embodiment, the transformation function can be the arctangent function (2). In another embodiment, the transformation function can be the hyperbolic tangent sigmoid function (3). In yet another embodiment of the invention, the transformation function can be (4).
In one embodiment, the normalized predictive difference and the normalized statistical deviation are combined to generate a point anomaly score 510. In one embodiment, the point anomaly score 510 is the average of the normalized predictive difference and the normalized statistical deviation.
In one embodiment, the deviation calculator 520 further computes the difference between the target sensor data 507 and the output of the clustering model 503. The difference, referred to as the clustering difference 511, is the distances between the target sensor data 507 and the data clusters U1, U2, . . . , UK, each of which contains a plurality of data points computed by the clustering model 503. In one embodiment, the distance is
where Ui is the i-th cluster, i=1,2, . . . , K, and d(·) is the distance between two vectors. In one embodiment, the distance can be
In another embodiment, the distance can be
where
The clustering difference normalizer 512 applies a transformation function on the ratio dn/da between the distance dn to the normal cluster(s) and the distance da to the abnormal cluster(s) and produces a value between 0 and 1. The value 0 indicates the model output exactly matches the target sensor data 507, thus the device's behavior being normal. The larger the normalized value is, the higher level of anomaly there is in the target sensor data 507 and the device's behavior.
The normalized value produced by the clustering difference normalizer 512 is also referred to as an interval anomaly score 513. In one embodiment, the point anomaly score 510 and the interval anomaly score 513 are combined to obtain the final anomaly score 514. In one embodiment, the combination can be realized as the average score of the point anomaly score 510 and the interval anomaly score 513. In another embodiment, the combination can be realized as the maximum score of the point anomaly score 510 and the interval anomaly score 513.
The model building unit 600 further includes an auto regression learning unit 606 that uses the historical sensor data 601 to build the statistical model 502. The model building unit 600 further includes a clustering feature extraction unit 607 that performs feature extraction on the historical sensor data 601 to produce another set of feature vectors. The model building unit 600 further includes an affinity propagation clustering unit 608 that uses the extracted feature vectors to build the clustering model 503.
The problem of building device models can be formulated as an optimization problem (1). One reliable way of finding the global optimal solution for the optimization problem (1) is to find first all the local optimal solutions, and then find, from the local optimal solutions, the global optimal solution. In one embodiment, the global optimal solution can be found through a procedure that includes the following two steps:
Step 1: Start from an arbitrary point and compute a local optimal solution to the optimization problem (1).
Step 2: Move away from the local optimal solution and approach another local optimal solution of the optimization problem (1).
TRUST-TECH based methods realize these two steps using some trajectories of a particular class of nonlinear dynamical systems. More specifically, TRUST-TECH based methods accomplish this task by the following steps:
(i) Construct a dynamical system such that there is a one-to-one correspondence between the set of local optimal solutions to the optimization problem (1) and the set of stable equilibrium points (SEPs) of the dynamical system. In other words, for each local optimal solution to the problem (1), there is a distinct SEP of the dynamical system that corresponds to it.
(ii) Then the task of finding all local optimal solutions can be accomplished by finding all SEPs of the constructed dynamical system and finding a complete set of local optimal solutions to the problem (1) among the complete set of SEPs.
(iii) Find the global optimal solution from the complete set of local optimal solutions.
In the embodiment of
The performance of a neural network is usually gauged by measuring the mean square error (MSE) of its output. The goal of optimal training is to find a set of parameters that achieves the global minimum MSE. The optimization problem (1) for optimal neural network model building can be formulated as minimizing the MSE over Q samples in the training set and is given by:
where, ti is the target output for the i-th feature vi, x is the vector of weights of the neural network to be trained, and y(.) is the network output function. The MSE as a function of the network parameters usually contains multiple local optimal solutions.
The TRUST-TECH optimization engine 609 solves the optimization problem (8) by first constructing a dynamical system such that the SEPs in the dynamical system have one-to-one correspondence with local optimal solutions of the optimization problem (8). Because of such correspondence, the problem of computing multiple local optimal solutions of the optimization problem is then transformed to finding multiple stability regions in the defined dynamical system, each of which contains a distinct SEP. An SEP can be computed with the trajectory method or using a local method with a trajectory point in its stability region as the initial point. To solve the optimization problem (8), the desired dynamical system can be defined as a following negative gradient system:
where R(x) is a positive definite symmetric matrix (also known as the Riemannian metric).
at time stamp t of the sensor data within a time window of size k, where w1=Σi=1ka1
v
1(gt)=−log pt−1(gt|gt−1). (11)
The unit 800 includes yet another unit 804 to calculate the moving average of the first statistical index data through
The unit 800 includes yet another probability density learning unit 805 receiving the moving average data 804 to calculate another probability density of the moving average data
at time stamp t of the sensor data within a time window of size k, where w2=Σi=1ka2i(ht−i−μ2) and x2=(a21, . . . , a2k, μ2, σ2)T.
The optimization problem (1) for optimal statistical model building, namely to compute the optimal vectors of parameter values x1=(a11, . . . , a1k, μ1, σ1)T in (10), can be formulated as an optimization problem:
Furthermore, the computation of the optimal vectors of parameter values x2=(a21, . . . , a2k, μ2, σ2)T in (13) can be formulated as another optimization problem:
The parameter estimation objective functions (14) and (15) as a functions of the statistical parameters, namely x1=(a11, . . . , a1k, μ1, σ1)T for (14) and x2=(a21, . . . , a2k, μ2, σ2)T for (15) are usually nonlinear and nonconvex, thus can contain many local optimal solutions.
The unit 800 includes a TRUST-TECH enhanced regression unit 806, comprising the affinity auto regression model learning unit 808 and the TRUST-TECH optimization unit 807 to compute optimal parameters for the probability densities (10) and (13) by solving the associated optimization problems (14) and (15). The probability density functions (10) and (13), defined by the computed optimal parameters x1=(a11, . . . , a1k, μ1, σ1)T and x2=(a21, . . . , a2k, μ2, σ2)T, respectively, constitute the statistical model 502 for modeling normal behaviors of a power system device.
The TRUST-TECH optimization unit 807 solves the optimization problems (14) and (15) by first constructing a dynamical system such that the SEPs in the dynamical system have one-to-one correspondence with local optimal solutions of the optimization problems (14) and (15). Because of such correspondence, the problem of computing multiple local optimal solutions of the optimization problem is then transformed to finding multiple stability regions in the defined dynamical system, each of which contains a distinct SEP. An SEP can be computed with the trajectory method or using a local method with a trajectory point in its stability region as the initial point. To solve the optimization problems (14) and (15), the desired dynamical system can be defined as the following negative gradient system:
where R(x) is a positive definite symmetric matrix (also known as the Riemannian metric).
between a pair of feature vectors bi and bj with i=1, . . . N and j=1, . . . , N, where
are the mean values of bi and bj, respectively.
The inter-feature difference metrics unit 903 includes a differences of mean unit 905 calculating the difference
m
ij
=|
i
−
j| (18)
between the mean values of a pair of vectors bi and bj with i=1, . . . N and j=1, . . . , N.
The inter-feature difference metrics unit 903 includes a differences of standard deviation unit 906 calculating the difference
d
ij
=|
i
−
j| (19)
between the standard deviation values of a pair of vectors bi and bj with i=1, . . . N and j=1, . . . , N, where
are the standard deviation values of bi and bj, respectively.
The module 900 includes a composite difference matrix unit 907 calculating the composite difference matrix
where, sij=w1cij+w2mij+w3dij with i=1, . . . N and j=1, . . . , N, and w1, w2 and w3 are the weighting factors for the three difference metrics, respectively. This difference matrix provides the difference values between each pair of samples in the dataset.
The module 900 includes a TRUST-TECH enhanced clustering unit 908, which further includes the affinity propagation clustering unit 608 and the TRUST-TECH optimization engine 609. The TRUST-TECH enhanced clustering unit 908 receives the composite difference matrix 907, builds and trains the clustering model 503 (e.g., an affinity propagation based clustering model) to model normal behaviors of the device using the plurality of feature vectors extracted in the clustering feature extraction unit 607.
The performance of a clustering is usually gauged by measuring the within cluster sum of differences (WCSD) between the plurality of feature vectors and a plurality of center vectors. The goal of optimal clustering is to find an optimal number of center vectors and optimal values for each center vector that jointly achieves the global minimum WCSD. The optimization problem (1) for optimal clustering model building can be formulated as minimizing the WCSD over N samples in the training set and is given by:
where, x=(u1, . . . uK, K)T is the vector of optimization variables, K is the number of clusters U1, . . . , UK are the clusters with cluster center vectors u1, . . . , uK, respectively, and svu
The TRUST-TECH optimization unit 609 solves the optimization problem (21) by first constructing a dynamical system such that the stable equilibrium points (SEPs) in the dynamical system have one-to-one correspondence with local optimal solutions of the optimization problem (21). Because of such correspondence, the problem of computing multiple local optimal solutions of the optimization problem is then transformed to finding multiple stability regions in the dynamical system, each of which contains a distinct SEP. An SEP can be computed with a trajectory method, such as the backward Euler method, the forward Euler method, the Trapezoidal method and the Runge-Kutta methods, or using a local method, such as the Newton's method, the trust-region method, the sequential quadratic programming (SQP) and the interior point method (IPM), with a trajectory point in its stability region as the initial point. To solve the optimization problem (21), the desired dynamical system can be defined as the following negative gradient system:
where R(x) is a positive definite symmetric matrix (also known as the Riemannian metric).
While the method 1200 of
Referring to
The computer system 1300 includes a processing device 1302. The processing device 1302 represents one or more general-purpose processors, or one or more special-purpose processors, or any combination of general-purpose and special-purpose processors. In one embodiment, the processing device 1302 is adapted to execute the operations of the data monitoring system 106 of
In one embodiment, the processor device 1302 is coupled, via one or more buses or interconnects 1330, to one or more memory devices such as: a main memory 1304 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM), a secondary memory 1318 (e.g., a magnetic data storage device, an optical magnetic data storage device, etc.), and other forms of computer-readable media, which communicate with each other via a bus or interconnect. The memory devices may also different forms of read-only memories (ROMs), different forms of random access memories (RAMs), static random access memory (SRAM), or any type of media suitable for storing electronic instructions. In one embodiment, the memory devices may store the code and data of the data monitoring system 106, which may be stored in one or more of the locations shown as dotted boxes and labeled as data monitoring logic 1322.
The computer system 1300 may further include a network interface device 1308. A part or all of the data and code of the data monitoring system 106 may be transmitted or received over a network 1320 via the network interface device 1308. Although not shown in
In one embodiment, the computer system 1300 may store and transmit (internally and/or with other electronic devices over a network) code (composed of software instructions) and data using computer-readable media, such as non-transitory tangible computer-readable media (e.g., computer-readable storage media such as magnetic disks; optical disks; read only memory; flash memory devices) and transitory computer-readable transmission media (e.g., electrical, optical, acoustical or other form of propagated signals—such as carrier waves, infrared signals).
In one embodiment, a non-transitory computer-readable medium stores thereon instructions that, when executed on one or more processors of the computer system 1300, cause the computer system 1300 to perform the method 1200 of
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, and can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.