Effective data-driven analytics is possible using advancements in sensor technologies and networked industrial machinery design. Fault detection and the corresponding root cause analysis (diagnosis) is especially critical to todays' industrial society. Conventional approaches monitor dynamic sensor networks and detect short and/or long term change.
In modern complex systems, fault often starts from a few certain components and propagates to the other components. Such propagation often lasts for a certain time with a relatively stable pattern. A time series graph of a dynamic network can represent the components and their interrelationships at certain timestamps. A graph node represents a system component and an edge represents the recent relationship between its two connected nodes. Early failure warning and system diagnoses can be achieved by analyzing the time series graph to identify 1) any stable changes (fault propagation) in the graph structure across time; and 2) the root cause of this change.
Conventional systems and methods focus too much attention on a sudden change in the graph sequence (which can result from a mode switch that is usually of no interest), or fail to capture any stable propagation pattern in the presence of noise. What is missing from the art is a system that can analyze the time series graph to measure the degree of stable change and its corresponding root cause.
Embodying systems and methods provide for the suppression of noise effects to capture a time series graph change occurring during a certain period, and its corresponding root cause (i.e., contributing component(s)). Embodying approaches optimize a matrix-based Taylor expansion equation to suppress the noise effect. Embodying systems and methods can determine a temporal fault propagation model (if present in the time series data) by analyzing the time series graph. Through extensive experimentation using both synthetic datasets and real world (i.e., captured/monitored) datasets an embodying Robust Fault Detection and Diagnosis (RoFaD) algorithm is consistently more effective than conventional approaches.
An embodying RoFaD algorithm can detect a fault occurrence from its starting timestamp to the occurrence's end, and identify the fault's root cause component. Because correlations between components are usually dynamic, an embodying RoFaD algorithm is adaptive and operates in about real time to detect and analyze fault. An embodying RoFaD algorithm can find underlying propagation patterns in real world data despite the presence of constant noise in the data.
The following notations are used in this disclosure:
For a matrix Z∈m×n:
Zj (with j>0) denotes as the matrix product of j copies of Z; and
Z−1 denotes the matrix's (pseudo)inverse;
Z(i,j) denotes the element in the i-th row and j-th column; and
Z(i, :) and Z(:,j) denote the matrix's i-th row and j-th column respectively.
Entry-wise norms are denoted by ∥Z∥p:
Where when p=2 gives Frobenius norm ∥Z∥F=√{square root over (ΣijZ(i,j)2)}=√{square root over (tr(ZZT))}; and
p=(2,1) gives 2,1 norm ∥Z∥2,1=Σi=1m ∥Z(i,:)∥2.
The infinity norm of Z is denoted by ∥Z∥∞ where ∥Z∥∞=max1≤i≤mΣj=1n|Z(i,j)|.
For a vector [w1, . . . , wm]∈m×1:
diag([w1, . . . , wm])∈m×m denotes a diagonal matrix with w1, . . . , wm as its diagonal entries;
diag(Z) for a given square matrix Z is a square matrix that has the same diagonal entries as Z but off-diagonal entries are all zeros.
m is an identity matrix of dimension m×m.
In accordance with embodiments, the RoFaD algorithm can operate with the following assumptions:
(1) Given a time series of graph A[t−h+1:t]=[At−h+1, At−h+2, . . . , At] (Aj∈m×m) where a fault propagation starts as t−h+1 and ends at t, there exists a symmetric and full rank fault propagation matrix P∈m×m that is stable across time [t−h+1, t]. This propagation matrix can be expressed by Equation 1.
A
j
=A
j−1
P, or (Aj−1)−1Aj=P, (EQ. 1)
Where t−h≤j≤t. Note that there is no normalization performed on Aj or P.
(2) Because P is symmetric, from P can be obtained a rank-k eigen-vector U∈m×k and an eigen-value matrix S∈k×k from P by application of Equation 2.
P=USU
−1, or [U,S]=eig(P,k), (EQ. 2)
The corresponding ∈m×1 can be measured by Equation 3.
r
j=Σi=1k|US|ji, (EQ. 3)
Where rj indicates the degree of fault of the j-th component. Furthermore, given that the pattern of fault propagation is usually stable within a certain time period, the fault score for the whole graph series A[t−h+1:t] can be measured by Equation 4.
An embodying RoFaD algorithm can obtain both root cause indicator metric r and fault score c from the graph series A[t−h+1:t]. An embodying RoFaD algorithm can overcome the real world effect of ever-present noise. This noise can affect the real measurement of every graph Aj in the time series.
Assume that an observation of Aj is Ãj, and that Ãj=Aj+Qj (EQ. 5);
Where Qj∈m×m is a local noise matrix at time j, and it is independent and identically distributed. Since Ã[t−h+1:t]=[Ãt−h+1, Ãt−h+2, . . . , Ãt] are symmetric, Q[t−h+1:t]=[Qt−h+1, Qt−h+2, . . . , Qt] is also symmetric.
Due to the existing of unknown noise, there is no straightforward way to obtain P from the temporal observation Ã[t−h+1:t] (since Ãj−1−1Ãj is not stable across time). Embodying approaches can implement an alternative way to optimize P (Equation 6).
Given the existence of noise, to get the most recent value of P at time t by Equation 6 (assume that P is stable from time t−h+1 to t), an intuition would be that the most recent graphs (e.g., Ãt) would be more reliable than the earlier graphs (e.g., Ãt−h+1) in the time series. A mathematical combination of Equation 1 and Equation 5 results in Equation 7.
The error magnitude is in direct relationship with j (i.e., the error increases as j increases). To suppress the cumulative error by time, instead of optimizing P using Equation (6), embodying approaches can implement Equation 8.
Where f(j) is a smoothing/weighting function that is decreasing as j increases. By setting
Equation 8 can be rewritten as Equation 9.
According to the power series theorem,
by combining with the matrix exponential of P, Equation 9 can be expressed as
If it is assumed that h is chosen such that
where ϵ is an error bound.
In accordance with embodiments,
can be minimized so that instead of solving for the propagation matrix P, embodying systems and methods can solve eigen-vector matrices U and S directly to obtain root cause indicator r and fault score c. Because propagation matrix P is a symmetric, full ranked matrix, there is a Jordan canonical form of P:
e
P
=Ue
S
U
−1 (EQ. 12)
The objective loss function represented by Equation 13 can be designed by letting
Because it is verifiable that the loss function is jointly convex in U and s, a global optimal solution exists. To reduce cumulative error when solving EQ. 13, in some implementations the length of the time series is restricted to less the 10 units of time.
The optimization of both U and s simultaneously is difficult. In accordance with embodiments, Equation 13 can be solved with an iterative solution. Embodying approaches adopt an alternating and iterative optimization procedure to solve practical optimization problems.
Given s, the optimal U can be computed by minimizing the objective function of Equation 14.
For a constant value of s, each iteration of U can be obtained by Equation 15.
U(i,:)=R(i,:)(T+αDU(i,i)m)−1,1≤i≤m (EQ. 15)
Where R=Âdiag(s);
T=diag(s)2; and
DU is a diagonal matrix with the i-th diagonal element as
Given U, the optimal s can be computed by minimizing the objective function of Equation 16.
For a constant value of U, each iteration of s can be obtained by Equation 17.
In accordance with embodiments, the time complexity of each iteration is 0(m2h) and space complexity is 0(m2h), where m is the number of components in the system (i.e., number of nodes in each graph), and h is the length of the time series section. An embodying RoFaD algorithm outputs one fault score c and one root cause indication r for each input time series section. The elements in r are positive, where a higher value indicates a greater possibility of a corresponding component being the fault's root cause.
An embodying RoFaD algorithm can be expressed as shown in TABLE 1.
In accordance with embodiments, the value of h can be selected. In Equation 10, the Taylor truncation error is bound by ϵ. With E denoting
the following inequality is expressed:
Where ∥E∥∞, is bounded by
Thus, in implementations of an embodying RoFaD algorithm, a choice for h should not be smaller than √{square root over (m)}∥P∥∞log(1/ϵ).
An embodying RoFaD algorithm underwent experimental testing using a real world dataset of sensor data. This real world dataset was collected over a sixty day period from sensors on a complicated machine with forty different components. For each data collection day, the correlation between every pair of component was measured and used to build a graph on the forty components. Therefore the input dataset is a time series graph Ã∈40×′×60.
For this analysis, the following were the parameter settings: h=4, k=1, a=1 and β=1.
The results of the embodying RoFaD algorithm was verified by an expert in the domain of the machine used to collect the experimental dataset. After performing an analysis of the experimental dataset, the domain expert verified that the embodying RoFaD algorithm successfully detected all faults (i.e., 4) and their root cause components (i.e., 5) within the timeseries data set over the sixty day period.
Embodying approaches apply an alternating and iterative optimization procedure to a rank fault propagation matrix representing the time series data. For each monitored component represented in the time series data, (step 415), an element in eigen-value matrix S is calculated (EQ. 17), step 420. For each data point in the selected portion of the time series graph, process 400 calculates rank-k eigen-vector U (EQ. 15), step 425. At step 430, a determination is made whether there are more components. If there are more components, process 400 returns to step 420, else it proceeds to step 435.
At step 435, this alternating and iterative optimization procedure continues for each element of the rank-k eigen-vector U. An iteration of eigen-value matrix S is calculated, step 440. At step 445, a determination is made whether there are more elements in U. If there are more elements, process 400 returns to step 435, else it proceeds to step 450.
At step 450, a determination is made whether there are more monitored component data points in the selected time series data. If there are more monitored component data points, process 400 returns to step 415, else it proceeds to steps 455 and 460.
A fault score for the selected period of interest is calculated (EQ. 4), step 455. Also, elements of root cause indicator vector r are calculated (EQ. 3), step 460. Elements of the root cause indicator vector above a predetermined fault score threshold value are identified, step 465. Graphical representations of the fault score and the root cause indicator are generated.
The fault score graphical representation can include the selected time series data and a visual indication of any fault regions within the selected time series data that is above a predetermined threshold.
An embodying RoFaD algorithm implements an online learning method as the algorithm can adapt to concept drift and/or running environment change. The RoFaD algorithm is based on graph theory to analyze time series data and find those gradual changes that starts from certain components/nodes of the graph, then propagates to the rest of the graph's nodes, where such change maintains a stability within a certain period of time. The RoFaD algorithm is robust to any transient change or noise based on sensor readings without the need for controller signals or any information from a controller.
In accordance with some embodiments, a computer program application stored in non-volatile memory or computer-readable medium (e.g., register memory, processor cache, RAM, ROM, hard drive, flash memory, CD ROM, magnetic media, etc.) may include code or executable program instructions that when executed may instruct and/or cause a controller or processor to perform methods discussed herein such as a method of analyzing time series data to determine a fault score and identify root cause indicator, as disclosed above.
The computer-readable medium may be a non-transitory computer-readable media including all forms and types of memory and all computer-readable media except for a transitory, propagating signal. In one implementation, the non-volatile memory or computer-readable medium may be external memory.
Although specific hardware and methods have been described herein, note that any number of other configurations may be provided in accordance with embodiments of the invention. Thus, while there have been shown, described, and pointed out fundamental novel features of the invention, it will be understood that various omissions, substitutions, and changes in the form and details of the illustrated embodiments, and in their operation, may be made by those skilled in the art without departing from the spirit and scope of the invention. Substitutions of elements from one embodiment to another are also fully intended and contemplated. The invention is defined solely with regard to the claims appended hereto, and equivalents of the recitations therein.