The decreasing cost of sensors has led to their deployment in large numbers for such purposes as monitoring and managing infrastructure and resources. Data acquired by sensors may be monitored to detect problems with the sensors, such as hardware failure, for example.
Sensor data may be processed for purposes of detecting one or multiple outliers, or anomalies, in the sensor data so that corrective action may be taken to repair, reconfigure, replace or remove (as examples) the affected sensor(s) that are identified as providing anomalous data (i.e., data that represents an outlier). The anomaly detection may be beneficial for such purposes as identifying abnormal conditions that may result from errant operation of a sensor, such as a failure (or impending failure) of a sensor, a misconfiguration of a sensor or malicious activity involving a sensor.
One way to detect a failed sensor is to use threshold-based outlier detection in which a sensor value (provided by the sensor) is compared to a threshold. Another way to detect outliers is to model a network of sensors as being a statistical model and use sensor values that are predicted by the model to identify outliers. Such a statistical model may be especially beneficial for a sensor network that has a relatively large number (hundreds or even tens of million, for example) of sensors.
In accordance with example techniques and systems that are disclosed herein, a statistical model is used to model a network of sensors, which takes into account a global dependency structure of the sensor network to predict sensor values for the network (assuming the network is properly functioning). The predicted sensor values, in turn, may be used to identify anomalous sensor data (i.e., outlier sensor data). In this manner, the predicted sensor values may compared to the observed sensor values to identify any outliers. A particular advantage in the use of a global dependency structure-based model for outlier detection is that the model may be relatively more precise and robust in the presence of outliers and missing sensor values than a model that is based only on local dependencies among the sensors. Moreover, the global dependency structure-based model allows detection of anomalous sensor data, which may not be achievable using a threshold-based outlier detection method, in which a sensor value is merely compared to a predetermined threshold level, or using a model that is based on local features or variables.
Referring to
As another example, the sensor network 120 may be used in a smart building, where the sensors 110 measure such parameters as air temperature, humidity, building occupancy, lighting, and so forth, for purposes of managing heating, ventilation, air conditioning and lighting systems and optimizing the use of resources, such as electricity, gas and water.
As yet another example, the sensor network 120 may be used in a utility infrastructure, where the sensors 110 acquire data that monitor power, water, and so forth for efficient resource management.
For such purposes as ensuring proper operation of the sensor network 120 and estimating, or predicting, missing sensor data, the system 100 includes a sensor analysis engine 130. In general, sensor analysis engine 130 monitors observed value data 124, i.e., data acquired by the sensors 110, for purposes of detecting outliers, or anomalies, in the sensor data. In this regard, a given sensor 110 may provide anomalous data due to errant operation of the sensor, such as (as examples) the failure of the sensor 110, the impending failure of the sensor 110, errant operation of the sensor 110 due to its misconfiguration, and errant operation of the sensor 110 due to malicious activity involving the sensor 110 or sensor network 120.
In accordance with example techniques and systems that are disclosed herein, the sensor analysis engine 130 uses a sensor model 150 for purposes of recognizing anomalous sensor data. As described herein, the sensor model 150, in accordance with example implementations, predicts the behavior of a proper functioning sensor network and takes into account global dependencies among the sensors 110. In particular, in accordance with example implementations, the states of the sensors 110 are modeled using random variables of an undirected graphical model; and in accordance with some example implementations, the sensor model 150 is a Markov Random Field (MRF)-based graphical model.
In accordance with example implementations, the sensor analysis engine 130 monitors the observed value data 124 and uses the sensor model 150 to generate sensor status data 154, which identifies any individual sensor(s) 110 that are providing anomalous data so that the appropriate corrective action may be taken for the affected sensor(s) 110. For example, the affected sensor(s) 110 may be replaced, repaired, reconfigured, and so forth. Moreover, in accordance with example implementations, the sensor analysis engine 130 may also use the sensor model 150 for purposes of providing estimated missing observed data 156 for any failed sensor(s) 110 or any sensor(s) 110 in which communication with the sensor(s) 110 has otherwise failed.
Referring to
As noted above, in accordance with example implementations, the sensor model 150 may be a Markov Random Field (MRF)-based graphical model. In general, an MRF graphical model is an undirected probabilistic graphical model that contains nodes, which are interconnected by edges: each node of the graphical model represents a random variable, and the edges represent the dependencies among the random variables. The dependencies associated with the edges are referred to as “edge factors” herein. An MRF graphical model may explicitly represent the interdependencies in the joint distribution of all of the random variables, which helps to model the underlying statistical processes.
In accordance with example implementations, an MRF-based graphical model in which each edge factor represents the dependencies between a pair of random variables, or nodes, may be used to model the sensor network 120.
The joint distribution of all of the random variables may be factorized into the product of the pairwise edge factors. More specifically, assuming there are n random variables in the MRF graphical model, “E” represents the edge set, ϕij represents the pairwise edge factor between nodes xi and xj, and Z represents the partition function, then the joint distribution (called “P( )”) may be described as follows:
In general, the systems and techniques that are described herein use the above-described pairwise edge factor dependencies for purposes of determining the edge factors for the sensor model 150.
As an example implementation, the sensor model 150 may (at least in the initial stages of building the model, as described herein) have a pairwise MRF topology 300 that is depicted in
As a more specific example, the sensor 310-1 has an associated observed value node 312-1 and an associated true value node 313-0. As another example, the sensor 310-3 has an associated observed value node 312-5 and an associated true value node 313-4. The values for the true value nodes 313-0, 313-2, 313-4 and 313-6 are “hidden” because the values for the true value nodes 313 are hidden, or unknown, from the historical data 209. It is noted that some of the values for the observed value nodes 312 may also be hidden, in that the corresponding observed values may not be available from the historical data 209.
Due to each sensor 310 being represented by two nodes (an observed value node 312 and a true value node 313), each sensor 310 is hence represented by two random variables in the MRF graphical model.
In accordance with example implementations, the sensor model building engine 210 (
In accordance with example implementations, the sensor model building engine 130 uses the available, observed sensor data to construct a dependency graph, which identifies any dependencies between pairs of the sensors 310. It is noted that dependencies may not exist for every sensor pair. In accordance with example implementations, if the dependency graph identifies a dependency between two sensor nodes in the dependency graph, the sensor building engine 130 adds an edge 320 between the corresponding true value nodes 313 in the MRF topology 300.
More specifically, in accordance with example implementations, for each sensor pair, the sensor model building engine 210 determines the frequencies of co-occurring observations for the pair using the historical data; and the engine 210 normalizes the frequencies, such as normalizing the frequencies to a scale that spans from “0” to “1,” for example.
The normalized frequency for a given co-occurring observation may be called a “dependency score.” In accordance with example implementations, for each sensor pair, a corresponding vector of dependency scores is produced. If the maximum value in the dependency score vector exceeds a certain threshold, then, in accordance with example implementation, the engine 210 adds an edge 320 between the corresponding true value nodes 313. Otherwise, in accordance with example implementations, the sensor model building engine 210 does not add an edge between the true value nodes 313.
In accordance with further example implementations, the dependency score may be determined using a different metric. For example, in place of a co-occurrence frequency, correlation or mutual information may be used. As another example, instead of comparing the maximum value of a dependency score vector to a threshold, a median or average value derived from the vector may be compared to a threshold. Thus, many variations are contemplated, which are within the scope of the appended claims.
In accordance with example implementations, the edge factor for the edge 320 is the dependency score vector between the pair of sensor nodes. For example, if the observed sensor nodes 312-1 and 312-3 have a given dependency score vector, then, in accordance with example implementations, that dependency score vector is used as the edge factor for an edge 320 between the corresponding true value nodes 313-0 and 313-2. As another example, for a given dependency between the observed value nodes 312-3 and 312-4, a corresponding edge factor representing the dependency is used, and this edge factor is used as the edge factor for the edge 320 between the true value nodes 313-2 and 313-4.
The above-described edge factor assignment implies that the true states are related according to the learned dependency graph, and the observed state of every sensor 310 depends on the true state of that location. For every sensor node in the original dependency graph, the sensor model building engine 210 also adds an edge 322 between the true value node 313 and the observed value node 312. Thus, if there are N nodes and E edges in the original graph, the MRF topology 300 contains 2N nodes and E+N edges.
In accordance with example implementations, the sensor model building engine 210 may assign a potential, or factor to the edges 320 that extend between the observed value nodes 312 and the true value nodes 313. A relatively high probability (a probability of 0.99 or even 1, for example) may be used, in accordance with example implementations. The factors that are assigned to these edges 320 may be learned from data (if available), in accordance with further example implementations.
In accordance with example implementations, after construction of the MRF topology 300, the sensor building engine 210 may apply a graphical model inference algorithm, (a message passing-based algorithm, such as belief propagation algorithm, a variable elimination algorithm, a Markov chain Monte Carlo (MCMC) algorithm, a variational method, and so forth) for purposes of determining the states for hidden node values. There may be a relatively large number of values, which may be hidden. In this manner, none of the values for the true value nodes 313 are available, in accordance with example implementations, and possibly a relatively large number of observed node values may be unavailable, or hidden, as well. The goal of the graphical model inference algorithm is to infer states of the hidden nodes. In accordance with example implementations, the sensor model building engine 210 runs graphic model inference on the MRF topology 300 until convergence occurs.
In accordance with further example implementations, the model building engine 210 may transform the original pairwise MRF topology 300 of
In general, the bipartite MRF topology 400 groups the nodes 312 and 313 into two groups: a group 410 of the observed value nodes 312; and a group 414 of the true value nodes 313. For the bipartite MRF topology 400, each true value node 313 that was connected to one or more true value nodes 313 (in the original MRF topology 300) is instead connected to one or more observed value nodes. It is noted that in the bipartite MRF topology, no observed value nodes 312 are connected, as in the pairwise MF topology 300. Thus, in the bipartite MRF topology, there are no connections within the group 414 of true value nodes 313 or within the group 410 of observed value nodes 312.
In accordance with example implementations, the graphical model inference algorithm may be a belief propagation algorithm and performing the belief propagation algorithm involves the following steps. First, it is assumed that an MRF-based graph already exists. The observed values for all of the sensors are obtained. There may be a relatively large number of values that may be missing. Moreover, none of the true values may be available, in accordance with example implementations. The nodes for which no value is available are referred to as “hidden nodes” herein. Thus, all of the true nodes and possibly a large number of observed nodes (depending on the amount of missing data) are hidden nodes. The goal of the belief passing is to infer the states of all of the hidden nodes. The belief passing is run on the MRF until convergence. In general, the belief propagation is a message passing algorithm for inference in graphical models, which involve the following steps. First, at each node, messages are read from neighboring nodes, a marginal belief is updated, and update message are sent to the neighbors. The above-described process is repeated until convergence. The values of observed nodes are compared with true nodes.
The sensor analysis engine 130 (
Referring to
Referring to
In accordance with some example implementations, the sensor model building engine 210 may perform a technique 600 that is depicted in
In accordance with example implementations, the sensor analysis engine 130 may be executed by a processor of a processor-based machine, or computer. For example, in accordance with some implementations, a physical machine 700 that is depicted in
In general, the hardware 710 may include one or multiple central processing units (CPUs) 714, a non-transitory memory 716 and a network interface 720. As examples, the memory 716 may be formed from semiconductor storage devices, magnetic storage devices, memristors, phase change memory devices, and so forth, depending on the particular implementations. In general, the memory 716 may store machine executable instructions, which are executed by the CPU(s) 714 for purposes of forming one or more components of the machine executable instructions 760. The memory 716 may further store data describing the sensor model 150, as well as other data.
For the example of
In further example implementations, the same physical machine may provide the physical platform for both engines 130 and 210. Moreover, the engine 130 and/or 210 may be formed inside a virtual machine of a physical platform in accordance with further example implementations. Thus, many implementations are contemplated, which are within the scope of the appended claims.
While the present techniques have been described with respect to a number of embodiments, it will be appreciated that numerous modifications and variations may be applicable therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the scope of the present techniques.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2015/013303 | 1/28/2015 | WO | 00 |