1. Field of the Invention
The present invention relates generally to the field of early detection and diagnosis of incipient machine failure or process upset. More particularly, the invention is directed to model-based monitoring of processes and machines, and experience-based diagnostics.
2. Brief Description of the Related Art
A variety of new and advanced techniques have emerged in industrial process control, machine control, system surveillance, and condition based monitoring to address drawbacks of traditional sensor-threshold-based control and alarms. The traditional techniques did little more than provide responses to gross changes in individual metrics of a process or machine, often failing to provide adequate warning to prevent unexpected shutdowns, equipment damage, loss of product quality or catastrophic safety hazards.
According to one branch of the new techniques, empirical models of the monitored process or machine are used in failure detection and in control. Such models effectively leverage an aggregate view of surveillance sensor data to achieve much earlier incipient failure detection and finer process control. By modeling the many sensors on a process or machine simultaneously and in view of one another, the surveillance system can provide more information about how each sensor (and its measured parameter) ought to behave. Additionally, these approaches have the advantage that no additional instrumentation is typically needed, and sensors in place on the process or machine can be used.
An example of such an empirical surveillance system is described in U.S. Pat. No. 5,764,509 to Gross et al., the teachings of which are incorporated herein by reference. Therein is described an empirical model using a similarity operator against a reference library of known states of the monitored process, and an estimation engine for generating estimates of current process states based on the similarity operation, coupled with a sensitive statistical hypothesis test to determine if the current process state is a normal or abnormal state. The role of the similarity operator in the above empirical surveillance system is to determine a metric of the similarity of a current set of sensor readings to any of the snapshots of sensor readings contained in the reference library. The similarity metric thusly rendered is used to generate an estimate of what the sensor readings ought to be, from a weighted composite of the reference library snapshots. The estimate can then be compared to the current readings for monitoring differences indicating incipient process upset, sensor failure or the like. Other empirical model-based monitoring systems known in the art employ neural networks to model the process or machine being monitored.
Early detection of sensor failure, process upset or machine fault are afforded in such monitoring systems by sensitive statistical tests such as the sequential probability ratio test, also described in the aforementioned patent to Gross et al. The result of such a test when applied to the residual of the difference of the actual sensor signal and estimated sensor signal, is a decision as to whether the actual and estimate signals are the same or different, with user-selectable statistical confidence. While this is useful information in itself, directing thinly stretched maintenance resources only to those process locations or machine subcomponents that evidence a change from normal, there is a need to advance monitoring to a diagnostic result, and thereby provide a likely failure mode, rather than just an alert that the signal is not behaving as normal. Coupling a sensitive early detection statistical test with an easy-to-build empirical model and providing not only early warning, but a diagnostic indication of what is the likely cause of a change, comprises an enormously valuable monitoring or control system, and is much sought after in a variety of industries currently.
Due to the inherent complexity of many processes and machines, the task of diagnosing a fault is very difficult. A great deal of effort has been spent on developing diagnostic systems. One approach to diagnosis has been to employ the use of an expert system that is a rule based system for analyzing process or machine parameters according to rules describing the dynamics of the monitored or controlled system developed by an expert. An expert system requires an intense learning process by a human expert to understand the system and to codify his knowledge into a set of rules. Thus, expert system development takes a large amount of time and resources. An expert system is not responsive to frequent design changes to a process or machine. A change in design changes the rules, which requires the expert to determine the new rules and to redesign the system.
What is needed is a diagnostic approach that can be combined with model-based monitoring and control of a process or machine, wherein an expert is not required to spend months developing rules to be implemented in software for diagnosing machine or process fault. A diagnostic system that could be built on the domain knowledge of the industrial user of the monitoring or control system would be ideal. Furthermore, a diagnostic approach is needed that is easily adapted to changing uses of a machine, or changing parameters of a process, as well as design changes to both.
What is further needed is a way to match precursors of impending failure to past patterns of precursors to known failures rapidly, accurately and without significant human expert time and effort.
The present invention provides unique diagnostic capabilities in a model-based monitoring system for machines and processes. A library of diagnostic conditions is provided as part of routine on-line monitoring of a machine or process via physical parameters instrumented with sensors of any type. Outputs created by the on-line monitoring are compared to the diagnostic conditions library, and if a signature of one or more diagnostic conditions is recognized in these outputs, the system provides a diagnosis of a possible impending failure mode.
The diagnostic capabilities are preferably coupled to a non-parametric empirical-model based system that generates estimates of sensor values in response to receiving actual sensor values from the sensors on the machine or process being monitored. The estimated sensor values generated by the model are subtracted from the actual sensor values to provide residual signals for sensors on the machine or process. When everything is working normally, as modeled by the empirical model, the residual signals are essentially zero with some noise from the underlying physical parameters and the sensor noise. When the process or machine deviates from any recognized and modeled state of operation, that is, when its operation becomes abnormal, these residuals become non-zero. A sensitive statistical test such as the sequential probability ratio test (SPRT) is applied to the residuals to provide the earliest possible decision whether the residuals are remaining around zero or not, often at such an early stage that the residual trend away from zero is still buried in the noise level. For any sensor where a decision is made that the residual is non-zero, an alert is generated on that sensor for the time snapshot in question. An alternative way to generate an alert is to enforce thresholds on the residual itself for each parameter, alerting on that parameter when the thresholds are exceeded. The diagnostic conditions library can be referenced using the residual data itself, or alternatively using the SPRT alert information or the residual threshold alert information. Failure modes are stored in the diagnostic conditions library, along with explanatory descriptions, suggested investigative steps, and suggested repair steps. When the pattern of SPRT alerts or residual threshold alerts matches the signature in the library, the failure mode is recognized, and the diagnosis made. Alternatively, when the residual data pattern is similar to a residual data pattern in the library using a similarity engine, the corresponding failure mode is recognized and the diagnosis made.
Advantageously, the use of a nonparametric-type empirical model, in contrast to a first-principles model or a parametric model, results in estimates and residuals that are uniquely effective in the diagnostic process, especially with respect to personalized modeling of individual instantiations of monitored machines. The present invention is ideal for advanced diagnostic condition monitoring of expensive fleet assets such as aircraft, rental cars, locomotives, tractors, and the like.
The inventive system can comprise software running on a computer, with a memory for storing empirical model information and the diagnostic conditions library. Furthermore, it has data acquisition means for receiving data from sensors on the process or machine being monitored. Typically, the system can be connected to or integrated into a process control system in an industrial setting and acquire data from that system over a network connection. No new sensors need to be installed in order to use the inventive system. The diagnostic outputs of the software can be displayed, or transmitted to a pager, fax or other remote device, or output to a control system that may be disposed to act on the diagnoses for automatic process or machine control. Alternatively, due to the small computing requirements of the present invention, the inventive system can be reduced to an instruction set on a memory chip resident with a processor and additional memory for storing the model and library, and located physically on the process or equipment monitored, such as an automobile or aircraft.
The diagnostic conditions library of the present invention can be empirical, based on machine and process failure autopsies and their associated lead-in sensor data. The number of failure modes in the library is entirely selectable by the user, and the library can be added to in operation in the event that a new failure is encountered that is previously unknown in the library.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as the preferred mode of use, further objectives and advantages thereof, is best understood by reference to the following detailed description of the embodiments in conjunction with the accompanying drawings, wherein:
Turning now to the drawings, and particularly
The data preprocessing module can be any type of monitoring system, typically model-based, and more preferably empirical model-based, and most preferably non-parametric empirical model-based. In particular, kernel-based non-parametric models are preferred. In contrast to “first principle” modeling methods, there is no need to determine the equations of the physics that govern the monitored system. In contrast to parametric methods, which assume the “form” or equation type for a model of the monitored system and then fit the assumed form to empirical data by setting fitting parameters, a non-parametric model essentially reconstitutes the model “on-the-fly” using the input observation, making it much more data-driven and easier to use. This is best understood with reference to
Turning to
An empirical model-based monitoring system for use in the present diagnostic invention requires historic data from which to “learn” normal states of operation, in order to generate sensor estimates. Generally, a large amount of data is accumulated from an instrumented machine or process running normally and through all its acceptable dynamic ranges. The large set of representative data can be used for modeling, or in the interests of computational speed, the large set can be down-sampled to contain a subset of observations characteristic of the operational states, through a “training” process. Characteristic observations may also be determined from the large set by clustering methods of determining average observations, or “centers”. A method for selecting training set snapshots is graphically depicted in
In this example, each snapshot represents a vector of five elements, one reading for each sensor in that snapshot. Of all the collected sensor data from all snapshots, according to this training method, only those five-element snapshots are included in the representative training set that contain either a global minimum or a global maximum value for any given sensor. Therefore, the global maximum 416 for sensor 402 justifies the inclusion of the five sensor values at the intersections of line 418 with each sensor signal 402, 404, 406, 408, 410, including global maximum 416, in the representative training set, as a vector of five elements. Similarly, the global minimum 420 for sensor 402 justifies the inclusion of the five sensor values at the intersections of line 422 with each sensor signal 402, 404, 406, 408, 410. Collections of such snapshots represent states the system has taken on. The pre-collected sensor data is filtered to produce a “training” subset that reflects all states that the system takes on while operating “normally” or “acceptably” or “preferably.” This training set forms a matrix, having as many rows as there are sensors of interest, and as many columns (snapshots) as necessary to capture all the acceptable states without redundancy.
Selection of representative data is further depicted in the flow chart of
In Step 510, if the sensor value of sensor i at snapshot t in X is greater than the maximum yet seen for that sensor in the collected data, max(i) is updated and set to equal the sensor value, while Tmax(i) stores the number t of the observation, as shown in Step 515. If the sensor value is not greater than the maximum, a similar test is done for the minimum for that sensor, as illustrated in Steps 520 and 525. The observation counter t is then incremented in Step 530. As shown in Step 535, if all the observations have been reviewed for a given sensor (i.e., when the observation counter t equals the number of snapshots, L) then the observation counter t is reset to one and the counter i is incremented, as shown in Step 540. At this point, the program continues to Step 510 to find the maximum and minimum for the next sensor. Once the last sensor has been finished, at which point i=n, as shown in Step 545, then any redundancies are removed and an array D is created from a subset of vectors from Array X. This creation process is discussed below.
In Step 550, counters i and j are both initialized to one. As illustrated by Step 555, arrays Tmax and Tmin are concatenated to form a single vector Ttmp. Preferably, Ttmp has 2N elements, sorted into ascending (or descending) order, as shown in Step 560 to form Array T. As shown in Step 565, holder tmp is set to the first value in T (an observation number that contains a sensor minimum or maximum). Additionally, the first column of Array D is set to be equal to the column of Array X corresponding to the observation number that is the first element of T. In the loop starting with the decision box of Step 570, the ith element of T is compared to the value of tmp that contains the previous element of T. If they are equal (i.e., the corresponding observation vector is a minimum or maximum for more than one sensor), that vector has already been included in Array D and need not be included again. Counter i is then incremented, as shown in Step 575. If the comparison is not equal, Array D is updated to include the column from X that corresponds to the observation number of T(i), as shown in Step 580, and tmp is updated with the value at T(i). Counter j is then incremented, as shown in Step 585, in addition to counter i (Step 575). In Step 590, if all the elements of T have been checked, and counter i equals twice the number of elements, N, then the distillation into training set or Array D has finished.
Signal data may be gathered from any machine, process or living system that is monitored with sensors. Ideally, the number of sensors used is not a limiting factor, generally, other than concerning computational overhead. Moreover, the methods described herein are highly scalable. However, the sensors should capture at least some of the primary “drivers” of the underlying system. Furthermore, all sensors inputted to the underlying system should be interrelated in some fashion (i.e., non-linear or linear).
Preferably, the signal data appear as vectors, with as many elements as there are sensors. A given vector represents a “snapshot” of the underlying system at a particular moment in time. Additional processing may be done if it is necessary to insert a “delay” between the cause and effect nature of consecutive sensors. That is, if sensor A detects a change that will be monitored by sensor B three “snapshots” later, the vectors can be reorganized such that a given snapshot contains a reading for sensor A at a first moment, and a reading for sensor B three moments later.
Further, each snapshot can be thought of as a “state” of the underlying system. Thus, collections of such snapshots preferably represent a plurality of states of the system. As described above, any previously collected sensor data can be filtered to produce a smaller “training” subset (the reference set D) that characterizes all states that the system takes on while operating “normally” or “acceptably” or “preferably.” This training set forms a matrix, having as many rows as there are sensors of interest, and as many columns (snapshots) as necessary to capture the acceptable states without redundancy. The matrix can be determined offline as part of model training, or can even be done on-line, prior to rendering estimates for any particular input observation, and may be determined in part on the basis of characteristics of the input observation.
According to a preferred form of the invention, a non-parametric modeling approach is used that is uniquely capable of rendering estimates of variables of a complex system in operation, thus providing unique residuals and alerts between the actual values and the estimates. More preferably, a kernel-based non-parametric approach is used where a function, or “kernel”, is used to combine learned observations in a weighted fashion based on the input observation to generate model results. The similarity-based approach is a kernel-based non-parametric model, capable of rendering useful estimates over a wide range of operation in contrast to parametric approaches like linear regression or neural networks, which tend to be only locally accurate. Kernel regression provides another kernel-based non-parametric estimator for use in the invention. Using a non-parametric model provides for purely data-driven modeling which avoids an investment in first-principles modeling and in tuning parametric estimators (such as neural networks), and provides for novel residual and alert precursors of failures for diagnostic purposes. A suitable kernel-based non-parametric model for use in the present invention is generally described by the equation:
{right arrow over (Y)}estimated=
where estimated sensor readings Yestimated are determined from the results of the kernel function K operating on the input observation vector Xin and the set of learned observations in D, weighted according to some weight matrix C. In an alternative form, the kernel responses can be normalized to account for non-normalized data:
where M is some normalization factor.
According to the similarity operator-based empirical modeling technique, for a given set of contemporaneous sensor data from the monitored process or machine running in real-time, the estimates for the sensors can be generated according to:
{right arrow over (Y)}estimated=
where the vector Y of estimated values for the sensors is equal to the contributions from each of the snapshots of contemporaneous sensor values arranged to comprise matrix D (the reference library or reference set). These contributions are determined by weight vector W (not to be confused with weights C in equations A and B above). The multiplication operation is the standard matrix/vector multiplication operator. The vector Y has as many elements as there are sensors of interest in the monitored process or machine. W has as many elements as there are reference snapshots in D. W is determined by:
=(
or in terms of equation B:
where the T superscript denotes transpose of the matrix, and Yin is the current snapshot of actual, real-time sensor data. The similarity operator is symbolized in Equation 3, above, as the circle with the “X” disposed therein. Moreover, D is again the reference library as a matrix, and DT represents the standard transpose of that matrix (i.e., rows become columns). Yin is the real-time or actual sensor values from the underlying system, and therefore is a vector snapshot. As mentioned above, the step of normalizing the W values in Equation 2 can be performed to improve modeling when the input data and training data have not been converted to normalized ranges. Furthermore, the similarity-based modeling approach can be used in an inferential mode, where estimates are made for variables which are not present as inputs, or the autoassociative case, where estimates are made for the inputs. In the inferential case, the D matrix can be separated into two parts, the first part of which corresponds to the inputs and is used in the kernel K, and the second part of which corresponds to the inferred variables and is in the numerator of C.
As stated above, the symbol {circle around (x)} represents the “similarity” operator, and can be chosen from a wide variety of operators for use in the present invention. Preferably, the similarity operation used in the present invention should provide a quantified measure of likeness or difference between two state vectors, and more preferably yields a number that approaches one (1) with increasing sameness, and approaches zero (0) with decreasing sameness. In the context of the invention, this symbol should not to be confused with the normal meaning of designation of {circle around (x)}, which is something else. In other words, for purposes of the present invention the meaning of {circle around (x)} is that of a “similarity” operation.
Generally, similarity as used herein is best understood to be a vector-to-vector comparison that reaches a highest value of one when the vectors are identical and are separated by zero distance, and diminishes as the vectors become increasingly distant (different). In general, the following guidelines help to define similarity operators:
Accordingly, for example, an effective similarity operator for use in the present invention can generate a similarity of ten (10) when the inputs are identical, and a similarity that diminishes toward zero as the inputs become more different. Alternatively, a bias or translation can be used, so that the similarity is 12 for identical inputs, and diminishes toward 2 as the inputs become more different. Further, a scaling can be used, so that the similarity is 100 for identical inputs, and diminishes toward zero with increasing difference. Moreover, the scaling factor can also be a negative number, so that the similarity for identical inputs is −100 and approaches zero from the negative side with increasing difference of the inputs. The similarity can be rendered for the elements of two vectors being compared, and summed, averaged or otherwise statistically combined to yield an overall vector-to-vector similarity, or the similarity operator can operate on the vectors themselves (as in Euclidean distance).
The similarity operator, {circle around (x)}, works much as regular matrix multiplication operations, on a row-to-column basis. The similarity operation yields a scalar value for each pair of corresponding nth elements of a row and a column, and an overall similarity value for the comparison of the row to the column as a whole. This is performed over all row-to-column combinations for two matrices (as in the similarity operation on D and its transpose above).
By way of example, one similarity operator that can be used compares the two vectors (the ith row and jth column) on an element-by-element basis. Only corresponding elements are compared, e.g., element (i,m) with element (m,j) but not element (i,m) with element (n,j). For each such comparison, the similarity is equal to the absolute value of the smaller of the two values divided by the larger of the two values.
Hence, if the values are identical, the similarity is equal to one, and if the values are grossly unequal, the similarity approaches zero. When all the elemental similarities are computed, the overall similarity of the two vectors is equal to the average of the elemental similarities. A different statistical combination of the elemental similarities can also be used in place of averaging, e.g., median.
Another example of a similarity operator that can be used can be understood with reference to
Line segments 658 and 660 drawn to the locations of X0 and X1 on the base 622 form an angle θ. The ratio of angle θ to angle Ω gives a measure of the difference between X0 and X1 over the range of values in the training set for the sensor in question. Subtracting this ratio, or some algorithmically modified version of it, from the value of one yields a number between zero and one that is the measure of the similarity of X0 and X1.
Yet another example of a similarity operator that can be used determines an elemental similarity between two corresponding elements of two observation vectors or snapshots, by subtracting from one a quantity with the absolute difference of the two elements in the numerator, and the expected range for the elements in the denominator. The expected range can be determined, for example, by the difference of the maximum and minimum values for that element to be found across all the reference library data. The vector similarity is then determined by averaging the elemental similarities.
In yet another similarity operator that can be used in the present invention, the vector similarity of two observation vectors is equal to the inverse of the quantity of one plus the magnitude Euclidean distance between the two vectors in n-dimensional space, where n is the number of elements in each observation. In fact, with regard to vector similarity, the similarity of two observation vectors can be equal to a receptive field function h of the Euclidean norm, such as the Gaussian or exponentially localized function, or a linear function. The value of similarity drops off monotonically in all directions in n-space as the Euclidean norm between the two vectors grows, making each training vector in D a receptive field. This form of similarity-based modeling is known as a radial basis function network.
Elemental similarities are calculated for each corresponding pairs of elements of the two snapshots being compared. Then, the elemental similarities are combined in some statistical fashion to generate a single similarity scalar value for the vector-to-vector comparison. Preferably, this overall similarity, S, of two snapshots is equal to the average of the number N (the element count) of sc values:
Other similarity operators are known or may become known to those skilled in the art, and can be employed in the present invention as described herein. The recitation of the above operators is exemplary and not meant to limit the scope of the claimed invention. The similarity operator is also used in this invention as described below for calculation of similarity values between snapshots of residuals and the diagnostic library of residual snapshots that belie an incipient failure mode, and it should be understood that the description above of the similarity operation likewise applies to the failure mode signature recognition using residuals.
Turning to
In step 708, the element-to-element similarity operation is performed between the kth element of yin and the (ith, kth) element in D. These elements are corresponding sensor values, one from actual input, and one from an observation in the training history D. The similarity operation returns a measure of similarity of the two values, usually a value between zero (no similarity) and one (identical) which is assigned to the temporary variable r. In step 710, r divided by the number of sensors M is added to the ith value in the one-dimensional array A. Thus, the ith element in A holds the average similarity for the elemental similarities of yin to the ith observation in D. In step 712, counter k is incremented.
In step 714, if all the sensors in a particular observation in D have been compared to corresponding elements of yin, then k will now be greater than M, and i can be incremented in step 716. If not, then the next element in yin is compared for similarity to its corresponding element in D.
When all the elements of the current actual snapshot yin have been compared to all elements of an observation in D, a test is made in step 718 whether this is the last of the observations in D. If so, then counter i is now more than the number of observations N in D, and processing moves to step 720. Otherwise, it moves back to step 706, where the array A is reset to zeroes, and the element (sensor) counter k is reset to one. In step 720, a weight vector W-carrot is computed from the equation shown therein, where {circle around (x)} represents a similarity operation, typically the same similarity operator as is used in step 708. In step 722 W-carrot is normalized using a sum of all the weight elements in W-carrot, which ameliorates the effects in subsequent steps of any particularly large elements in W-carrot, producing normalized weight vector W. In step 724, this is used to produce the estimated output yout using D.
Another example of a kernel-based non-parametric empirical modeling method that can be used in the present invention to generate estimates of the process or machine being monitored is kernel regression, or kernel smoothing. A kernel regression can be used to generate an estimate based on a current observation in much the same way as the similarity-based model, which can then be used to generate a residual as detailed elsewhere herein. Accordingly, the following Nadaraya-Watson estimator can be used:
where in this case a single scalar inferred parameter y-hat is estimated as a sum of weighted exemplar yi from training data, where the weight it determined by a kernel K of width h acting on the difference between the current observation X and the exemplar observations Xi corresponding to the y1 from training data. The independent variables Xi can be scalars or vectors. Alternatively, the estimate can be a vector, instead of a scalar:
Here, the scalar kernel multiplies the vector Yi to yield the estimated vector. Put into terms of equation A above:
where matrix YD is the collection of learned output observations Yi and matrix XD is the collection of learned input observations Xi.
A wide variety of kernels are known in the art and may be used. One well-known kernel, by way of example, is the Epanechnikov kernel:
where h is the bandwidth of the kernel, a tuning parameter, and u can be obtained from the difference between the current observation and the exemplar observations as in Equation 13. Another kernel of the countless kernels that can be used in remote monitoring according to the invention is the common Gaussian kernel (like the Gaussian kernel of the abovementioned radial basis function):
Examples of various preprocessed data that can be used for diagnostics as a consequence of monitoring the process or machine as described in detail herein are shown in connection with
One decision technique that can be used according to the present invention to determine whether or not to alert on a given sensor estimate is to employ thresholds for the residual for that sensor. Thresholds as used in the prior art are typically used on the gross value of a sensor, and therefore must be set sufficiently wide or high to avoid alerting as the measured parameter moves through its normal dynamic range. A residual threshold is vastly more sensitive and accurate, and is made possible by the use of the sensor value estimate. Since the residual is the difference between the actual observed sensor value and the estimate of that value based on the values of other sensors in the system (using an empirical model like the similarity engine described herein), the residual threshold is set around the expected zero-mean residual, and at a level potentially significantly narrower than the dynamic range of the parameter measured by that sensor. According to the invention, residual thresholds can be set separately for each sensor. The residual thresholds can be determined and fixed prior to entering real-time monitoring mode. A typical residual threshold can be set as a multiple of the empirically determined variance or standard deviation of the residual itself. For example, the threshold for a given residual signal can be set at two times the standard deviation determined for the residual over a window of residual data generated for normal operation. Alternatively, the threshold can be determined “on-the-fly” for each residual, based on a multiplier of the variance or standard deviation determined from a moving window of a selected number of prior samples. Thus, the threshold applied instantly to a given residual can be two times the standard deviation determined from the past hundred residual data values.
Another decision technique that can be employed to determine whether or not to alert on a given sensor estimate is called a sequential probability ratio test (SPRT), and is described in the aforementioned U.S. Pat. No. 5,764,509 to Gross et al. It is also known in the art, from the theory of Wald and Wolfowitz, “Optimum Character of the Sequential Probability Ratio Test”, Ann. Math. Stat. 19, 326 (1948). Broadly, for a sequence of estimates for a particular sensor, the test is capable of deciding with preselected missed and false alarm rates whether the estimates and actuals are statistically the same or different, that is, belong to the same or to two different probability distributions.
The basic approach of the SPRT technique is to analyze successive observations of a sampled parameter. A sequence of sampled differences between the estimate and the actual for a monitored parameter should be distributed according to some kind of distribution function around a mean of zero. Typically, this will be a Gaussian distribution, but it may be a different distribution, as for example a binomial distribution for a parameter that takes on only two discrete values (this can be common in telecommunications and networking machines and processes). Then, with each observation, a test statistic is calculated and compared to one or more decision limits or thresholds. The SPRT test statistic generally is the likelihood ratio ln, which is the ratio of the probability that a hypothesis H1 is true to the probability that a hypothesis H0 is true:
where Yn are the individual observations and Hn are the probability distributions for those hypotheses. This general SPRT test ratio can be compared to a decision threshold to reach a decision with any observation. For example, if the outcome is greater than 0.80, then decide H1 is the case, if less than 0.20 then decide H0 is the case, and if in between then make no decision.
The SPRT test can be applied to various statistical measures of the respective distributions. Thus, for a Gaussian distribution, a first SPRT test can be applied to the mean and a second SPRT test can be applied to the variance. For example, there can be a positive mean test and a negative mean test for data such as residuals that should distribute around zero. The positive mean test involves the ratio of the likelihood that a sequence of values belongs to a distribution H0 around zero, versus belonging to a distribution H1 around a positive value, typically the one standard deviation above zero. The negative mean test is similar, except H1 is around zero minus one standard deviation. Furthermore, the variance SPRT test can be to test whether the sequence of values belongs to a first distribution H0 having a known variance, or a second distribution H2 having a variance equal to a multiple of the known variance.
For residuals derived from known normal operation, the mean is zero, and the variance can be determined. Then in run-time monitoring mode, for the mean SPRT test, the likelihood that H0 is true (mean is zero and variance is σ2) is given by:
and similarly, for H1, where the mean is M (typically one standard deviation below or above zero, using the variance determined for the residuals from normal operation) and the variance is again σ2 (variance is assumed the same):
The ratio ln from Equations 6 and 7 then becomes:
A SPRT statistic can be defined for the mean test to be the exponent in Equation 8:
The SPRT test is advantageous because a user-selectable false alarm probability α and a missed alarm probability β can provide thresholds against with SPRTmean can be tested to produce a decision:
The ratio ln is then provided for the variance SPRT test as the ratio of Equation 10 over Equation 6, to provide:
and the SPRT statistic for the variance test is then:
Thereafter, the above tests (1) through (3) can be applied as above:
In yet another form of preprocessed output from model estimation that can be used as input to the failure mode signature recognition module 120 of
For example, in a system to be monitored having 12 instrumented variables for modeling, residuals may be generated from the difference of the estimates and the raw signals for several of the 12 variables, and one or more of these may be quantized. Quantization may be based on multiples of the standard deviation in a window of the residual data for a given variable, for example in a window of 1000 samples which provides the standard deviation for that residual (which may be thereafter used as a fixed number), residuals less than one standard deviation can be assigned a quantized value of zero, residuals between one and three standard deviations can be assigned a quantized value of one (or negative one for negative residuals), residuals above three standard deviations can be assigned a quantized value of two. Quantization can also be based on multi-observation persistence, such that the quantization level assigned to the current residual observation variable is based on the median of the window of the last three residual observations, to obviate issues of extreme spiking.
Turning now to the diagnostic function coupled to the model-based monitoring system, depicted in
In the generalized model of
Turning to
In step 1031, the captured data is processed to isolate precursor data for each failure mode. Failure modes are selected by the user of the invention, and are logical groupings of the specific findings from autopsies of each machine failure. The logical groupings of autopsied results into “modes” of failure should be sensible, and should comport with the likelihood that the precursor data leading to that failure mode will be the same or similar each time. However, beyond this requirement, the user is free to group them as seen fit. Thus, for example, a manufacturer of an electric motor may choose to run 50 motors to failure, and upon autopsy, group the results into three major failure modes, related to stator problems, mechanical rotating pieces, and insulation winding breakdown. If these account for a substantial majority of the failure modes of the motor, the manufacturer may choose not to recognize other failure modes, and will accept SPRT or residual threshold alerts from monitoring with no accompanying failure mode recognition as essentially a recognition of some uncommon failure.
According to another method of the invention, commonly available analysis methods known to those in the art may be used to self-organize the precursor data for each instance of failure into logical groupings according to how similar the precursor data streams are. For example, if the user divines a distinct autopsy result for each of 50 failed motors, but analysis of the alerts shows that 45 of the failures clearly have one of three distinct alert patterns leading to failure (for example 12 failures in one pattern, 19 in another pattern and 14 in the third pattern, with the remaining 5 of the 50 belonging to and defining no recognized pattern), the three distinct patterns may be treated as failure modes. The user then must decide in what way the autopsy results match the failed modes, and what investigative and resolution actions can be suggested for the groups based thereon, and stored with the failure mode signature information.
For determining precursor diagnostic data in step 1031, the normal data of 1020 should be trained and distilled down to a reference library and used offline to generate estimates, residuals and alerts in response to input of the precursor data streams.
Finally, in step 1042, the diagnostic precursor signatures, the user input regarding failure mode groupings of those signatures and suggested actions, and the empirical model reference library (if an empirical model will be used) is loaded into the onboard memory store of a computing device accompanying each machine of the production run. Thus, a machine can be provided that may have a display of self-diagnostic results using the experience and empirical data of the autopsied failed machines.
Turning to
In all cases of populating a failure mode database, the user designates the existence, type, and time stamp of a failure. The designation that a process or machine has failed is subject to the criteria of the user in any case. A failure may be deemed to have occurred at a first time for a user having stringent performance requirements, and may be deemed to have occurred at a later second time for a user willing to expend the machine or process machinery. Alternatively, the designation of a failure may also be accomplished using an automated system. For example, a gross threshold applied to the actual sensor signal as is known in the art, may be used to designate the time of a failure. The alerts of the present invention can also be thresholded or compared to some baseline in order to determine a failure. Thus, according to the invention, the failure time stamp is provided by the user, or by a separate automatic system monitoring a parameter against a failure threshold.
Three general possibilities may be provided for failure mode signature analysis, e.g., residual (raw or quantized) snapshot similarity, actual (raw or quantized) snapshot similarity or alert pattern correlation. The residual snapshot similarity discussed herein provides for a library of prior residual snapshots, i.e., the difference signals obtained preceding identified failure modes which may be compared using the above-described similarity engine and Equation 4 with a current residual snapshot to determine the development of a known failure mode. Using residual diagnosis, the residual snapshots are identified and stored as precursors to known failure modes. Various criteria may be employed for selecting snapshots representative of the failure mode residuals for use in the library and for determining the defining characteristics of the failure modes, and criteria for determination of the failure modes.
The actual snapshot similarity used for diagnosis is performed in a manner identical with the residual snapshot similarity. Instead of using residual snapshots, actual snapshots are used as precursor data. Then actual snapshots are compared to the failure mode database of precursor actuals and similarities between them indicate incipient failure modes, as described in further detail below.
The alert module output will represent decisions for each monitored sensor input, as to whether the estimate for it is different or the same. These can in turn be used for diagnosis of the state of the process or equipment being monitored. The occurrence of some difference decisions (alerts on a sensor) in conjunction with other sameness decisions (no alerts on a sensor) can be used as an indicator of likely machine or process states. A diagnostic lookup database can be indexed into by means of the alert decisions to diagnose the condition of the process or equipment being monitored with the inventive system. By way of example, if a machine is monitored with seven sensors, and based on previous autopsy experience, a particular failure mode is evidenced by alerts appearing at first on sensors #1 and #3, compounded after some generally bounded time by alerts appearing on sensor #4 additionally, then the occurrence of this pattern can be matched to the stored pattern and the failure mode identified. One means for matching the failure modes according to developing sensor alert patterns such as these is the use of Bayesian Belief Networks, which are known to those skilled in the art for use in quantifying the propagation of probabilities through a certain chain of events. However, simpler than that, the matching can be done merely by examining how many alerting sensors correspond to sensor alerts in the database, and outputting the best matches as identified failure mode possibilities. According to yet another method for matching the alert pattern to stored alert patterns, the alerts can be treated as a two-dimensional array of pixels, and the pattern analyzed for likeness to stored patterns using character recognition techniques known in the art.
Turning to
According to another method of determining the length of range 1224, the location in
The range 1224 of residual or actual snapshots, each snapshot comprising a residual value or actual value for each sensor, is then distilled to a representative set for the identified failure mode. This distillation process is essentially the same as the training method described in
This precursor data is processed to provide representative data and the associated failure mode, appropriate to the inventive technique chosen from the three prior mentioned techniques for diagnosing failures. This data is added to any existing data on the failure mode, and the system is set back into monitoring mode. Now, the system has more intelligence on precursor data leading up to the particular failure mode.
As with commodity machines, the failure mode granularity is entirely user-selectable. The failure modes can be strictly user defined, where the user must do the autopsy and determine cause. The user must furthermore supply a name and/or ID for the failure mode. The software product of the invention preferably provides an empty data structure for storing:
Turning to
The failure mode similarity engine 1324 of
In order to determine one or more failure modes to indicate as output of the diagnostic system of the present invention when employing residual similarity or actual signal similarity, one way of selecting such identified or likely failure mode(s) is shown with respect to
Other methods of statistically combining the similarities across the set of all stored residual or actual snapshots in the signature library for a given failure mode may be used to get the “average”, such as using only the middle 2 quartiles and averaging them (thus throwing away extreme matches and extreme mismatches); or only using the top quartile; and so on. Regardless of the test used to determine the one or more indicated “winning” failure modes in each snapshot, “bins” accumulate “votes” for indicated failure modes for each current snapshot, accumulating over a moving window of dozens to hundreds of snapshots, as appropriate. A threshold may also be used such that the failure mode “latches” and gets indicated to the human operator as an exception condition.
Alternatively, it is possible to not use any such threshold, but to simply indicate for the moving window which failure mode has the highest count of being designated the indicated failure mode snapshot over snapshot. Another useful output of the system that may be displayed to the user is to indicate the counts for each failure mode, and let the user determine from this information when a particular failure mode seems to be dominating. Under normal operation, it is likely all the failure modes will have approximately equal counts over the window, with some amount of noise. But as a failure mode is properly recognized, the count for that failure mode should rise, and for the other failure modes drop, providing a metric for the user to gauge how likely each failure mode is compared to the others.
Turning to
The pattern match for any of the above alert patterns can be selected from a number of techniques. For example, a complete match may be required, such that a match is not indicated unless each and every alert in the stored pattern is also found in the instant pattern, and no extraneous alerts are found in the instant pattern. Alternatively, a substantial match can be employed, such that at least, say, 75% of the sensors showing alerts in the stored pattern are also found alerting in the instant pattern, and no more than 10% of the instant alerts are not found in the stored pattern. The exact thresholds for matching and extraneous alerts can be set globally, or can be set for each stored pattern, such that one failure mode may tolerate just 65% matching and no more than 10% extraneous alerts, while a second failure mode may be indicated when at least 80% of the stored alerts are matched, and no more than 5% extraneous alerts occurring in the instant pattern are not in the stored pattern. These limits may be set empirically, as is necessary to sufficiently differentiate the failure modes that are desirably recognized, and with sufficient forewarning to provide benefit.
According to the invention, it is also permissible to indicate more than one potential failure mode, if pattern matching has these results. Techniques are known in the art for matching patterns and providing probabilities of the likelihood of the match, and any and all of these may be employed within the scope of the present invention.
Generally, the failure mode data store can be in any conventional memory device, such as a hard disk drive, nonvolatile or volatile memory, or on-chip memory. The data store for the empirical modeling data that is used to generate the estimates of parameters in response to actual parameter values can be separate from or the same as the data store which contains failure mode signature information. Further, failure mode action suggestions can also be stored either together with or separately from the other aforementioned data. Such may be the case where the present invention comprises combing a failure mode signature recognition system with an existing maintenance operations resource planning system that automatically generates maintenance requests and schedules them. The computational programs for performing similarity-based residual or actual sensor snapshot failure mode signature recognition; alert pattern-based failure mode signature recognition; process modeling and sensor value estimation; residual generation from actual and estimated values; and alert testing can be carried out on one processor, or distributed as separate tasks across multiple processors that are in synchronous or asynchronous communications with one another. In this way, it is entirely within the inventive scope for the diagnostic system of the present invention to be carried out using a single microprocessor on-board a monitored machine, or using a number of separately located computers communicating over the internet and possibly remotely located from the monitored process or machine. The computational program that comprises the similarity engine that generates estimates in response to live data can also be the same programmed similarity engine that generates similarity scores for use in matching a residual snapshot or actual snapshot to stored snapshots associated with failure modes.
It will be appreciated by those skilled in the art, that modifications to the foregoing preferred embodiments may be made in various aspects. Other variations clearly would also work, and are within the scope and spirit of the invention. The present invention is set forth with particularity in the appended claims. It is deemed that the spirit and scope of that invention encompasses such modifications and alterations to the preferred embodiment as would be apparent to one of ordinary skill in the art and familiar with the teachings of the present application.
This application is a continuation-in-part of application Ser. No. 10/277,307 filed 22 Oct. 2002, now abandoned; which is a continuation-in-part of application Ser. No. 09/832,166 filed 10 Apr. 2001, now abandoned.
Number | Name | Date | Kind |
---|---|---|---|
3045221 | Roop | Jul 1962 | A |
4060716 | Pekrul et al. | Nov 1977 | A |
4336595 | Adams et al. | Jun 1982 | A |
4402054 | Osborne et al. | Aug 1983 | A |
RE31750 | Morrow | Nov 1984 | E |
4480480 | Scott et al. | Nov 1984 | A |
4639882 | Keats | Jan 1987 | A |
4707796 | Calabro et al. | Nov 1987 | A |
4761748 | Le Rat et al. | Aug 1988 | A |
4796205 | Ishii et al. | Jan 1989 | A |
4823290 | Fasack et al. | Apr 1989 | A |
4937763 | Mott | Jun 1990 | A |
4965513 | Haynes et al. | Oct 1990 | A |
4978909 | Hendrix et al. | Dec 1990 | A |
4985857 | Bajpai et al. | Jan 1991 | A |
5003950 | Kato et al. | Apr 1991 | A |
5005142 | Lipchak et al. | Apr 1991 | A |
5025499 | Inoue et al. | Jun 1991 | A |
5052630 | Hinsey et al. | Oct 1991 | A |
5093792 | Taki et al. | Mar 1992 | A |
5113483 | Keeler et al. | May 1992 | A |
5119287 | Nakamura et al. | Jun 1992 | A |
5123017 | Simpkins et al. | Jun 1992 | A |
5195046 | Gerardi et al. | Mar 1993 | A |
5210704 | Husseiny | May 1993 | A |
5223207 | Gross et al. | Jun 1993 | A |
5251285 | Inoue et al. | Oct 1993 | A |
5285494 | Sprecher et al. | Feb 1994 | A |
5309351 | McCain et al. | May 1994 | A |
5311562 | Palusamy et al. | May 1994 | A |
5325304 | Aoki | Jun 1994 | A |
5327349 | Hoste | Jul 1994 | A |
5386373 | Keeler et al. | Jan 1995 | A |
5414632 | Mochizuki et al. | May 1995 | A |
5420571 | Coleman et al. | May 1995 | A |
5421204 | Svaty, Jr. | Jun 1995 | A |
5445347 | Ng | Aug 1995 | A |
5446671 | Weaver et al. | Aug 1995 | A |
5446672 | Boldys | Aug 1995 | A |
5455777 | Fujiyama et al. | Oct 1995 | A |
5459675 | Gross et al. | Oct 1995 | A |
5463768 | Cuddihy et al. | Oct 1995 | A |
5463769 | Cuddihy et al. | Oct 1995 | A |
5465321 | Smyth | Nov 1995 | A |
5481647 | Brody et al. | Jan 1996 | A |
5486997 | Reismiller et al. | Jan 1996 | A |
5496450 | Blumenthal et al. | Mar 1996 | A |
5500940 | Skeie | Mar 1996 | A |
5502543 | Aboujaoude | Mar 1996 | A |
5539638 | Keeler et al. | Jul 1996 | A |
5548528 | Keeler et al. | Aug 1996 | A |
5553239 | Heath et al. | Sep 1996 | A |
5559710 | Shahraray et al. | Sep 1996 | A |
5566092 | Wang et al. | Oct 1996 | A |
5579232 | Tong et al. | Nov 1996 | A |
5586066 | White et al. | Dec 1996 | A |
5596507 | Jones et al. | Jan 1997 | A |
5600726 | Morgan et al. | Feb 1997 | A |
5602733 | Rogers et al. | Feb 1997 | A |
5608845 | Ohtsuka et al. | Mar 1997 | A |
5612886 | Weng | Mar 1997 | A |
5617342 | Elazouni | Apr 1997 | A |
5623109 | Uchida et al. | Apr 1997 | A |
5629878 | Kobrosly | May 1997 | A |
5638413 | Uematsu et al. | Jun 1997 | A |
5657245 | Hecht et al. | Aug 1997 | A |
5663894 | Seth et al. | Sep 1997 | A |
5668944 | Berry | Sep 1997 | A |
5671635 | Nadeau et al. | Sep 1997 | A |
5680409 | Qin et al. | Oct 1997 | A |
5680541 | Kurosu et al. | Oct 1997 | A |
5682317 | Keeler et al. | Oct 1997 | A |
5708780 | Levergood et al. | Jan 1998 | A |
5710723 | Hoth et al. | Jan 1998 | A |
5727144 | Brady et al. | Mar 1998 | A |
5737228 | Ishizuka et al. | Apr 1998 | A |
5748469 | Pyotsia | May 1998 | A |
5748496 | Takahashi et al. | May 1998 | A |
5751580 | Chi | May 1998 | A |
5754451 | Williams | May 1998 | A |
5754965 | Hagenbuch | May 1998 | A |
5761090 | Gross et al. | Jun 1998 | A |
5764509 | Gross et al. | Jun 1998 | A |
5774379 | Gross et al. | Jun 1998 | A |
5787138 | Ocieczek et al. | Jul 1998 | A |
5817958 | Uchida et al. | Oct 1998 | A |
5818716 | Chin et al. | Oct 1998 | A |
5822212 | Tanaka et al. | Oct 1998 | A |
5841677 | Yang et al. | Nov 1998 | A |
5842157 | Wehhofer et al. | Nov 1998 | A |
5864773 | Barna et al. | Jan 1999 | A |
5895177 | Iwai et al. | Apr 1999 | A |
5905989 | Biggs | May 1999 | A |
5909368 | Nixon et al. | Jun 1999 | A |
5913911 | Beck et al. | Jun 1999 | A |
5930156 | Kennedy | Jul 1999 | A |
5930779 | Knoblock et al. | Jul 1999 | A |
5933352 | Salut | Aug 1999 | A |
5933818 | Kasravi et al. | Aug 1999 | A |
5940298 | Pan et al. | Aug 1999 | A |
5946661 | Rothschild et al. | Aug 1999 | A |
5946662 | Ettl et al. | Aug 1999 | A |
5950147 | Sarangapani et al. | Sep 1999 | A |
5956487 | Venkatraman et al. | Sep 1999 | A |
5961560 | Kemner | Oct 1999 | A |
5987399 | Wegerich et al. | Nov 1999 | A |
5993041 | Toba | Nov 1999 | A |
5995916 | Nixon et al. | Nov 1999 | A |
6006192 | Cheng et al. | Dec 1999 | A |
6006260 | Barrick, Jr. et al. | Dec 1999 | A |
6014598 | Duyar et al. | Jan 2000 | A |
6021396 | Ramaswamy et al. | Feb 2000 | A |
6029097 | Branicky et al. | Feb 2000 | A |
6049741 | Kawamura | Apr 2000 | A |
6049827 | Sugauchi et al. | Apr 2000 | A |
6088626 | Lilly et al. | Jul 2000 | A |
6104965 | Lim et al. | Aug 2000 | A |
6110214 | Klimasauskas | Aug 2000 | A |
6115653 | Berstrom et al. | Sep 2000 | A |
6125351 | Kauffman | Sep 2000 | A |
6128540 | Van Der Vegt et al. | Oct 2000 | A |
6128543 | Hitchner | Oct 2000 | A |
6141647 | Meijer et al. | Oct 2000 | A |
6144893 | Van Der Vegt et al. | Nov 2000 | A |
6278962 | Klimasauskas et al. | Aug 2001 | B1 |
6393373 | Duyar et al. | May 2002 | B1 |
6480810 | Cardella et al. | Nov 2002 | B1 |
6519552 | Sampath et al. | Feb 2003 | B1 |
6526356 | DiMaggio et al. | Feb 2003 | B1 |
6532426 | Hooks et al. | Mar 2003 | B1 |
6556939 | Wegerich | Apr 2003 | B1 |
6590362 | Parlos et al. | Jul 2003 | B2 |
6609036 | Bickford | Aug 2003 | B1 |
6625569 | James et al. | Sep 2003 | B2 |
6687654 | Smith et al. | Feb 2004 | B2 |
6853920 | Hsiung et al. | Feb 2005 | B2 |
6898554 | Jaw et al. | May 2005 | B2 |
6975962 | Wegerich et al. | Dec 2005 | B2 |
7027953 | Klein | Apr 2006 | B2 |
7050875 | Cribbs et al. | May 2006 | B2 |
20020055826 | Wegerich et al. | May 2002 | A1 |
20020152056 | Herzog et al. | Oct 2002 | A1 |
20060036403 | Wegerich et al. | Feb 2006 | A1 |
Number | Date | Country |
---|---|---|
06-278179 | Oct 1994 | JP |
08-220279 | Aug 1996 | JP |
WO0067412 | Nov 2000 | WO |
WO0167262 | Sep 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20040078171 A1 | Apr 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10277307 | Oct 2002 | US |
Child | 10681888 | US | |
Parent | 09832166 | Apr 2001 | US |
Child | 10277307 | US |