The present disclosure relates generally to the use of a recurrent Gaussian mixture model for sensor state estimation in condition monitoring. The techniques described herein may be applied, for example, to perform condition monitoring of machines in an industrial automation system.
Condition monitoring relates to the observation and analysis of one or more sensors that sense key parameters of machinery. By closely observing the sensor data, a potential failure or inefficiency may be detected and remedial action may be taken, often before a major system failure occurs. Effective condition monitoring may allow for increased uptime, reduced costs associated with failures, and a decreased need for prophylactic replacement of machine components.
Condition monitoring may be applied to a wide variety of industrial machinery such as capital equipment, factories and power plants; however, condition monitoring may also be applied to other mechanical equipment such as automobiles and non-mechanical equipment such as computers. In fact, principals of condition monitoring may be applied more generally to any system or organization. For example, principals of condition monitoring may be used to monitor the vital signs of a patient to detect potential health problems. As another example, principals of condition monitoring may be applied to monitor performance and/or economic indicators to detect potential problems with a corporation or an economy.
In condition monitoring, one or more sensors may be used. Examples of commonly used sensors include vibration sensors for analyzing a level of vibration and/or the frequency spectrum of vibration. Other examples of sensors include temperature sensors, pressure sensors, spectrographic oil analysis, ultrasound, and image recognition devices. A sensor may be a physical sensory device that may be mounted on or near a monitored machine component or a sensor may more generally refer to a source of data.
Sensor state estimation is one critical step in condition monitoring: based on the observed sensor values y, the values x that sensors should have if the machine is operating normally may be estimated. If the residual r=y−x for certain sensors is too large, this may indicate failures. A typical sensor state estimation algorithm needs to address two problems. The first one is how to model the normal operating range, or the probabilistic distribution of normal data P(x). The second problem is how to map to x from a given observation y, or compute the probability of x conditioned on y.
One conventional technique is to use a Gaussian mixture model (GMM) which can be used to model P(x).
P(st=i)=pi, (1)
P(xt|st=i)=N(xt|mi,Ci) (2)
where i=1, 2, . . . , K (the total number of components in a GMM). pi is the probability of k-th component. Given component si, x has a Gaussian distribution with mean mi and covariance Ci. The observed signal y is modeled by another Gaussian distribution
P(yt|xt)=N(yt|xt,θ), (3)
with mean xt and diagonal covariance θ. θ limits the magnitude of deviation. For normal values of y, θ is small, thus y is restricted to be close to x. For faulty values of y (i.e., those outside of the normal range), θ is large to allow large deviation of y from x. During training, an Expectation-Maximization (EM) algorithm is used to estimate parameters pi, mi and Ci, where i=1, 2, . . . , K. During monitoring, another EM algorithm is used to compute P(xt|yt) and estimate θ simultaneously.
The main drawback of GMM is the ignorance of temporal dependency between sensor signals. This may be overcome by stationary switching autoregressive model (SSAR).
P(st=j|st-1=i)=Zij, (4.1)
P(s1=i)=pi. (4.2)
Z is a K by K transition probability matrix. The first component SI is sampled independently like GMM, because there is no previous component. Normal sensor signal xt also depends on previous signal xt-1
Equation (5) is similar to Equation (4.1) and (4.2), however, the former is more complicated because it makes predictions regarding continuous time series data. Equation (5) uses a Gaussian distribution (as denoted by the N in the equation).
In Equation (5), if signals stay in the same component (i=j), the sensor value of the current time xt follows a vector autoregressive model. This model is autoregressive because xt is predicted based on its past values xt-1. Because this model is linear, the relationship between xt-1 and xt is represented by matrix multiplication (with Aj). Qj denotes the covariance of error that cannot be described by the model. If signals switch from component i to a different j, xt can be generated independently from xt-1 like the GMM case. This is because under different operating modes, signals can be quite different. In SSAR, signals at different time are now correlated due to component transition in Equation (4) and autoregression in Equation (5). Observed signals y are modeled in the same way using Equation (3) as in GMM.
Even if SSAR takes temporal dependency into consideration, only linear dependency is used. However, for complex machines, temporal dependency is usually nonlinear. Recurrent neural networks (RNN) may be applied to address the non-linearity. The idea of RNN is to model the temporal dependency by a neural network that is able to handle nonlinearity. However, RNN typically assumes smooth dependency between adjacent signals and cannot handle the component switching case (as can be done by GMM and SSAR).
Embodiments of the present invention address and overcome one or more of the above shortcomings and drawbacks, by providing methods, systems, and apparatuses related to the use of recurrent Gaussian mixture model for sensor state estimation in condition monitoring.
According to some embodiments, a computer-implemented method for monitoring a system includes training a recurrent Gaussian mixture model to model a probability distribution for each sensor of the system from among a plurality of sensors of the system based on a set of training data. In one embodiment, the training data is recorded from the sensors during a period of fault-free operation of the system. The recurrent Gaussian mixture model applies a Gaussian process to each sensor dimension to estimate current sensor values based on previous sensor values. Measured sensor data is received from the sensors of the system and an expectation-maximization technique is performed to determine an expected value for a particular sensor based on the recurrent Gaussian mixture model and the measured sensor data. A measured sensor value is identified for the particular sensor in the measured sensor data. If the measured sensor value and the expected sensor value deviate by more than a predetermined amount, a fault detection alarm is generated to indicate that the system is not operating within a normal operating range.
Various enhancements, refinements, and other modifications may be made to the aforementioned method in different embodiments. For example, in some embodiments, the recurrent Gaussian mixture model utilizes a plurality of mixture components and each component follows a Markov chain from a previous corresponding component. In some embodiments, each mixture component corresponds to one of a plurality of machines states. These states may comprise, for example, a sleeping state, a stand-by state, and a running state. This fault detection alarm may comprise, for example, an audible alarm generated by a speaker associated with the system. In one embodiment, the fault detection alarm comprises a visual alarm presented on a display associated with the system.
In some embodiments, the recurrent Gaussian mixture model in the aforementioned method is trained by first training a stationary switching autoregressive model to obtain initial estimates for parameters comprising (a) a component probability; (b) a mean value for a Gaussian distribution of expected sensor values; (c) covariance for the Gaussian distribution of expected sensor values; and (d) a component transition probability matrix. An iterative re-estimation process is then performed until convergence of one or more of the parameters. This re-estimation process includes assigning each sensor value in the set of training data to one of the plurality of components in the component transition probability matrix based on the component probability. In one embodiment, the sensor value is assigned to the one of the plurality of components in the component transition matrix by making a hard decision. For each component, the sensor values assigned to the component are used to train the Gaussian process corresponding to the component. Additionally, for each component, an expectation-maximization technique is performed to re-estimate the parameters for the component based on the Gaussian process corresponding to the component.
Various techniques may be used parallelizing the aforementioned methods to perform computations in a faster manner. Such parallelization may be performed using a system comprising the system sensors and a plurality of processors configured to perform at least a portion of the activities discussed above. For example, in one embodiment, the plurality of processors is used to train the Gaussian process for multiple components in parallel. In another embodiment, the processors are used to perform the expectation-maximization technique for multiple components in parallel.
Additional features and advantages of the invention will be made apparent from the following detailed description of illustrative embodiments that proceeds with reference to the accompanying drawings.
The foregoing and other aspects of the present invention are best understood from the following detailed description when read in connection with the accompanying drawings. For the purpose of illustrating the invention, there are shown in the drawings embodiments that are presently preferred, it being understood, however, that the invention is not limited to the specific instrumentalities disclosed. Included in the drawings are the following Figures:
Systems, methods, and apparatuses are described herein which relate generally to the use of recurrent Gaussian mixture model (RGMM) for sensor state estimation in condition monitoring. More specifically, a RGMM is used to address the sensor state estimation problem. Inspired by the RNN, the linear autoregressive part in Equation (5) is replaced with a nonlinear regression function while keeping the other parts of the model intact. The term “recurrent,” as used herein, means that the past is reused as part of a new model. Without recurrence, time dependency is ignored. Therefore, the RGMM can not only describe nonlinear temporal dependency, but also model component switching.
The key ingredient of recurrent Gaussian mixture model (RGMM) is to introduce nonlinearity by replacing Equation (5) by
In Equation (6), ƒ(x,w) can be any regression function with w as its parameter. ƒ may utilize various kernels, neurons, etc. that make it non-linear. The simplest form is the autoregressive function used in SSAR. The techniques described herein use Gaussian process rather than an autoregressive function between xt-1 and xt.
As is generally understood in the art, a Gaussian process is a collection of random variables, where any finite subset of the variables follows a multivariate Gaussian distribution. Suppose there are D sensors. The Gaussian process is applied to each sensor dimension. In other words, the d-th sensor value xt,d at current time t is regressed on all previous sensor values xt-1. σd is the standard deviation for the d-th regression function. It is assumed that given previous signals xt-1, the values of xt,d are independent of each other. Additional information on Gaussian processes is described in C. E. Rasmussen and C. K. I. Williams, “Gaussian Processes for Machine Learning”, The MIT Press, 2006. Because the Gaussian process is powerful at fitting nonlinear data, the techniques described herein do not require a different Gaussian process for each different component. However, in some embodiments, different processes may be used for each component. The RGMM is complete after integrating Equations (3), (4.1) and (4.2) with Equation (6).
After step 210, there is a segmentation of the data based on time. That is, the state at different time periods is known. At step 215, for each component i, each xt assigned to i is identified and used to train a Gaussian process. Conceptually, this may be understood as concatenating the data from all of the states. For example, if the component i indicates that a machine is in a “sleeping” state between the hours of 9 am-10 am and 1 pm to 2 pm, the corresponding sensor values are concatenated to provide two hours of “sleeping” sensor data. This is repeated for every component/state. Thus, for every component (i.e., every state) a single, non-linear model is learned. Using this process, w and σd are obtained where d=1, 2, . . . , D. In some embodiments, each component i can be processed independently in parallel using a computing platform such as illustrated below in
Then, at step 220, with the Gaussian process fixed, the EM algorithm is applied to re-estimate pi, mi, Ci, Z, where i=1, 2, . . . , K. Recall that in step 205, a SSAR model is used to obtain initial estimates for pi, mi, Ci, and Z. In step 220, the accuracy of this initial prediction is enhanced through re-estimation. Following step 220, there is a check to see if the algorithm has converged. This may be performed by determining the difference between the values of pi, mi, Ci, Z and their corresponding values from a prior iteration of the process 200. If the difference is below a predetermined value, then process finishes. This predetermined value can be set based on the type and granularity of the underlying data so, for example, values of 0.1, 0.01, or 0.001 may be used. Otherwise, steps 210-220 are repeated until convergence.
The method 300 shown in
Starting at step 305, a recurrent Gaussian mixture model is trained to model a probability distribution for each sensor of the system based on a set of training data. This training may take place generally as described above with respect to
At step 310, the computing system receives measured sensor data from the plurality of sensors of the system. Where the system is directly connected to the sensors (e.g., in the controller context), the sensor values may be received directly. However, it should be noted that the method 300 may alternatively be implemented using a computer not directly connected to the sensors. For example, a controller or other computing device can pass data to the computing system over a network to perform conditioning monitoring, and possibly other monitoring tasks.
At step 315, an EM technique is performed to determine an expected value for a particular sensor based on the recurrent Gaussian mixture model and the measured sensor data. That is, the EM algorithm is applied to compute P(xt|yt). Additionally, noise variance θ may be estimated simultaneously. With the estimated value determined, at step 320, the measured sensor value for the particular sensor in the measured sensor data is identified in the data that was received at step 310. The measured and estimated values are then compared.
If the values deviate by more than a predetermined amount, a fault detection alarm is generated at step 325 indicating that the system is not operating within a normal operating range. The exact value of the predetermined amount may be preset by the system operator and may depend on the type of data. For example, consider a gas pressure sensor providing readings in kilopascals (kPa). A deviation of 100 pascals (Pa) may be ignored, while a deviation of 1 or more kPa may trigger the alarm. Various techniques may be used for producing the alarm. For example, in some embodiments, the fault detection alarm comprises an audible alarm generated by a speaker associated with the system (e.g., a speaker on a human-machine-interface computer within an automation system). In other embodiments, the fault detection alarm comprises a visual alarm presented on a display associated with the system (e.g., a computer monitor connected to a human-machine-interface computer within an automation system).
To illustrate the benefit of the techniques described herein, GMM and SSAR were compared using a real data set with 6 sensors. Artificial deviation was added to one sensor to simulate faults. Part of the data was used without deviation to train all models and test on the remaining part with deviation. The mean absolute error (MAE) was used for evaluating performance on test data. In addition, the following errors were investigated in different aspects of the data: MAE of all sensors during normal time (En), MAE of faulty sensors during faulty time (Eff) and MAE of normal sensors during faulty time (Enf).
Table 1 shows the error scores for different algorithms. The RGMM described herein produces lowest errors. SSAR produces worse results on this dataset. There can be two reasons for this. First, temporal dependency in this case is nonlinear. Second, SSAR overfits the data (sensors are highly correlated).
Parallel portions of a deep learning application may be executed on the architecture 400 as “device kernels” or simply “kernels.” A kernel comprises parameterized code configured to perform a particular function. The parallel computing platform is configured to execute these kernels in an optimal manner across the architecture 400 based on parameters, settings, and other selections provided by the user. Additionally, in some embodiments, the parallel computing platform may include additional functionality to allow for automatic processing of kernels in an optimal manner with minimal input provided by the user.
The processing required for each kernel is performed by grid of thread blocks (described in greater detail below). Using concurrent kernel execution, streams, and synchronization with lightweight events, the architecture 400 of
The device 410 includes one or more thread blocks 430 which represent the computation unit of the device 410. The term thread block refers to a group of threads that can cooperate via shared memory and synchronize their execution to coordinate memory accesses. For example, in
Continuing with reference to
Each thread can have one or more levels of memory access. For example, in the architecture 400 of
The embodiments of the present disclosure may be implemented with any combination of hardware and software. For example, aside from parallel processing architecture presented in
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
An executable application, as used herein, comprises code or machine readable instructions for conditioning the processor to implement predetermined functions, such as those of an operating system, a context data acquisition system or other information processing system, for example, in response to user command or input. An executable procedure is a segment of code or machine readable instruction, sub-routine, or other distinct section of code or portion of an executable application for performing one or more particular processes. These processes may include receiving input data and/or parameters, performing operations on received input data and/or performing functions in response to received input parameters, and providing resulting output data and/or parameters.
A graphical user interface (GUI), as used herein, comprises one or more display images, generated by a display processor and enabling user interaction with a processor or other device and associated data acquisition and processing functions. The GUI also includes an executable procedure or executable application. The executable procedure or executable application conditions the display processor to generate signals representing the GUI display images. These signals are supplied to a display device which displays the image for viewing by the user. The processor, under control of an executable procedure or executable application, manipulates the GUI display images in response to signals received from the input devices. In this way, the user may interact with the display image using the input devices, enabling user interaction with the processor or other device.
The functions and process steps herein may be performed automatically or wholly or partially in response to user command. An activity (including a step) performed automatically is performed in response to one or more executable instructions or device operation without user direct initiation of the activity.
The system and processes of the figures are not exclusive. Other systems, processes and menus may be derived in accordance with the principles of the invention to accomplish the same objectives. Although this invention has been described with reference to particular embodiments, it is to be understood that the embodiments and variations shown and described herein are for illustration purposes only. Modifications to the current design may be implemented by those skilled in the art, without departing from the scope of the invention. As described herein, the various systems, subsystems, agents, managers and processes can be implemented using hardware components, software components, and/or combinations thereof. No claim element herein is to be construed under the provisions of 35 U.S.C. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for.”
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/049242 | 8/30/2017 | WO | 00 |