Method for estimating residual life of industrial equipment

Abstract
In a method for estimating the residual life of machinery statistically by calculating conditional expectations, it is disclosed that different statistical approaches are required for defining the behavior of failure processes related to repairable and non-repairable systems and that diagnostic information should be combined with event information to significantly improve the accuracy of predictions.
Description


1 FIELD OF THE INVENTION

[0001] This invention relates to industrial equipment on which diagnostic inspections are performed such as rotating machinery and more particularly to a method for estimating the residual life, μ(x) or μ(t) of such equipment.



2 THE PRIOR ART

[0002] “I know my machine is not in good condition, but when can I expect it to fail?” This question on the residual life of equipment is asked every day in manufacturing industries worldwide but the method currently used in practice provides surprisingly few accurate answers to the question. Thus, accurate information about the residual life of machines is extremely useful in practice since it could prevent future, unexpected failures of the machines.


[0003] Residual life is defined as the difference between the conditional expectation of an event and the current operating time, within statistical bounds. This is determined in practice by observing several interarrival times of a machine, Xi, and calculating a parametric function for its Force of Mortality (FOM) [see Ascher, H E and Feingold, H (1984), Repairable systems reliability: Modeling, inference, misconceptions and their causes, Chapters 1, 2 and 3, Marcel Dekker (Ascher and Feingold)], h(x) on the interarrival times from which the conditional expectation is calculated. This method is simple but often yields broad, inaccurate results that cannot be used with authority. Even though this approach is theoretically correct, it only applies to special cases and cannot be used generally to analyze failure time data.


[0004] Different machines have different failure behaviors and the method outlined above does not apply for both different machines and different failure behaviors simultaneously. A failure of a machine occurs when an instant of unsatisfactory performance arises on that machine because of a failure event such as for example destructive breakdown of the machine. A destructive breakdown of the machine is not necessarily required to record a failure event on a machine as what is an instant of unsatisfactory performance on the machine depends on the definition of unsatisfactory performance for every situation. For example, a failure on an electrical motor could be recorded whenever it causes a production interruption, or it causes a certain reduction in production quality or it exceeds a particular diagnostic measurement.


[0005] In some cases, machines or components, such as for example, bearings, seals and gears, are run until failure and then replaced by completely new units while the old unit is discarded. These machines or components that are replaced when by new units the first time they fail are referred to as non-repairable systems. Modeling of non-repairable systems involve characterizing its FOM.


[0006] Larger machines, such as for example, electrical motors, gearboxes and pumps, are usually repaired after failure to the same state as just prior to failure and reintroduced into operation. These larger machines that are repaired after a failure are referred to as repairable systems. As is described by Ascher and Feingold the modeling of repairable systems involve characterizing its Rate of Occurrence of Failure (ROCOF) which is the time derivative of the expected number of failures up to a certain instant.


[0007] From a statistical viewpoint it is fundamentally incorrect to treat data from non-repairable and repairable systems in the same manner. In most cases, non-repairable systems produce data where the interarrival times are independent and identically distributed, i.e. Xi's are random. Repairable systems mostly generate data where the interarrival times have a tendency to increase or decrease with time. In practice the differences in the types of data from non-repairable and repairable systems are frequently totally ignored.


[0008] Another important reason for the inaccurate results of the method above is because no concomitant information about the length of interarrival times (Xii's) is available. For example, if several events of a bearing is observed for a system of 100±20 days but one event of only 30 days, there is probably contamination in the data set which could have been explained by concomitant information such as condition monitoring. Including an observation such as the one of only 100 days, often severely influences the results.


[0009] Lastly consider the method currently used in practice. In that method only data generated from one machine or system is used and before statistical modeling is possible, time has to pass until a sufficient number of events is available (typically 10). This shortcoming causes the method to be unappealing to maintenance practitioners since an immediately implementable solution is desired.


[0010] To solve the problems described above with the method currently in use, there is invented a method for estimating the residual life of industrial equipment that is based on actuarial statistics. This method is analogous to the methodology used by insurance companies that have to estimate the residual life of a client at his or her current age in order to set appropriate life insurance premiums. These companies compile data of the ages of people (with certain similarities) at death and their associated life styles, e.g. the number of cigarettes smoked per day, number of beers drank per week, stress levels, etc. This data is represented by a Proportional Intensity Model (PIM) and it is hence possible to estimate the residual life of a new client at a certain age with a certain life style.


[0011] The methodology of the present invention to estimate the residual life for machines makes use of the ages of machines at failure and associated diagnostic information that was collected during the machine's lifetime. Data of similar failed machines is compiled and a PIM specially developed for reliability data is fitted to observations. From that PIM it is possible to estimate the residual life of a machine currently in operation, at a certain age and with certain diagnostic measurements.


[0012] The present invention recognizes the fact that data produced from non-repairable systems should be treated differently than that of repairable systems. It further includes concomitant information such as condition monitoring information in residual life estimates to avoid possible contamination of data. It lastly also allows (because of the inclusion of covariates) for statistical stratification of data which implies that data from similar nominally similar machines could be used to estimate the residual life of any specific machine. This aspect allows for the invention to be implementable virtually immediately in most situations.



SUMMARY OF THE INVENTION

[0013] A method for estimating the residual life of rotating machinery within statistical bounds based on event data and corresponding diagnostic measurements. The method comprises:


[0014] processing of formulation of combined Proportional Intensity Models (PIMs) in the form of Force of Mortality (FOM) for non-repairable systems and Rate of Occurrence of Failure (ROCOF) for repairable systems;


[0015] processing of formulation of said Force of Mortality (FOM) for non-repairable systems or processing of formulation of said Rate of Occurrence of Failure (ROCOF) for repairable systems;


[0016] deriving the probability density from said Force of Mortality (FOM) or said Rate of Occurrence of Failure (ROCOF); and


[0017] calculating a conditional expectation of a failure event of said rotating machinery within statistical bounds from the probability density.







DESCRIPTION OF THE DRAWING

[0018]
FIG. 1 shows appropriate time scales to measure life times of systems by means of an example sample path of a failure process.


[0019]
FIG. 2 is a schematic presentation of the method of the present invention for non-repairable systems and shows that the method as applied to non-repairable systems can be reduced easily to simpler situations.


[0020]
FIG. 3 is a schematic presentation of the method of the present invention for repairable systems and shows that the method as applied to repairable systems can be reduced easily to simpler situations.







DESCRIPTION OF THE PREFERRED EMBODIMENT(S)

[0021] The objective of the present invention is to model the FOM, h(x) of non-repairable systems and the ROCOF, p(t) of repairable systems. Models for both cases are constructed in such a manner that it allows for sufficient parametric freedom to be able to adapt to most data sets. It is important to note that the parametric freedom is not created by simple arbitrary addition of more parameters but by adding stratified parameters that have physical meaning. For example, both the models for the FOM and ROCOF use only one stratified parameter to allow for acceleration, deceleration or shifts in the time scale.


[0022] The models are proposed below in an over-parameterized format to prevent the loss of generality. This format has the advantage however that unwanted parameters can simply be set to zero to suit a particular data set. The model for non-repairable systems are considered first.


[0023] Suppose k=1, 2, . . . , w nominally similar system copies with l=1, 2, . . . , n parts in series are studied and the event times on each system are recorded. It is assumed that tests confirmed the validity of non-repairable systems theory on all l parts of each of the k systems. On failure of any of the n parts, all the parts are renewed or replaced before the system is put back into service. The event history of every part in every system is recorded in a qlk×2 matrix, consisting of event times,
1Xikl


[0024] and event type indicators,
2Cikl


[0025] Every system has a similar event history matrix that can be deduced from the part histories, i.e.
3Xik=min(Xik)forl=1,2,,


[0026] n and Cik corresponds to the event type of min
4(Xikl).


[0027] On each part in each system, ml time-dependent covariates are measured, i.e.
5zikl=[zi1klzi2klzimlkl]


[0028] (“(x)” is suppressed for notational convenience), where x denotes the local time during lifetime i of system k. Event data is categorized in s=1, 2, . . . , rl strata, where rl is the highest stratum of any part l. A general model for the FOM in such a situation is given by,
6h(x,θ)=l=1nζsklgskl(x,τskl,ψskl)·λ(γskl·zskl)+l=1nζskl·v(αskl·zskl)(1)


[0029] where θ consists of k, the system copy indicator, s, the current stratum indicator, ζsk, a random variable that acts as a frailty in the model that could be system copy- and stratum-specific, gsk, a fully parametric baseline function that could be system copy- and stratum-specific, τsk, a factor that acts additively on x in gsk to represent a time jump or time setback that could be system copy- and stratum specific, ψsk, a factor that acts multiplicatively on x in gsk to result in an acceleration or deceleration of time that could be system copy- and stratum-specific, λsk, a multiplicative functional term that is determined by zsk and that acts on gsk, vsk, an additive functional term determined by zsk and zsk, a vector of time-dependent covariates.


[0030] If the Weibull distribution is used to parameterize the proposed model of the FOM, g for any item l, associated with system k in stratum s becomes,
7gskl(x,θ)=βsklηskl(ψskl(x-τskl)ηskl)βskl-1(2)


[0031] The functional terms become,
8λskl(x,θ)=exp(j=1mlγsjkl·zijkl)(3)vskl(x,θ)=exp(j=1mlαsjkl·zijkl)(4)


[0032] The FOM of the entire system is found by substituting the functions above in the proposed FOM model. Regression coefficients can be solved for by maximum likelihood methods. This yields a considerably more accurate representation of the FOM as compared to the FOM resulting from the method of the prior art. This particular representation of the FOM allows for concomitant information as well as multiple system copies which solves some of the short-comings such as the handling of multiple systems copies, allowing for concomitant information and modeling a non-repairable system with appropriate non-repairable system theory described in the method of the prior art.


[0033] The repairable case will be considered. Suppose a multiple-component system is considered with parts l=1, 2, . . . , n where the success of the system is dependent on the success of each individual part. Event data from each part in each system is recorded in qkl×2 matrices, consisting of event times, Tikl and event type indicators, Cikl. Every system has a similar event history matrix that can be deduced from the part histories, i.e. Tik=min(Tikl) for l=1, 2, . . . , n and Cikl corresponds to the event type of min(Tikl). On each part ml time-dependent covariates are measured, i.e.
9zkl=[z1klz2klzmlkl]


[0034] (“(t)” is suppressed for notational convenience), where t refers to the global time of system k. Event data is categorized in s=1, 2, . . . , rl different strata, where rl is the highest stratum of any part l. A general representation of the described system's ROCOF is given by,
10ρ(t,θ)=l=1nζsklgskl(t,τskl,ψskl)·λ(γskl·zikl)+l=1nζsklv(αskl·zikl)(5)


[0035] where θ consists of k, the system copy indicator, s, the current stratum indicator, ζsk, a random variable that acts as a frailty in the model that could be system copy- and stratum-specific, gsk, a fully parametric baseline function that could be system copy- and stratum-specific, Tsk, a factor that acts additively on t in gsk to represent a time jump or time setback that could be system copy- and stratum-specific, ψsk, a factor that acts multiplicatively on t in gsk to result in an acceleration or deceleration of time that could be system copy- and stratum-specific, λsk, a multiplicative functional term that is determined by zsk and that acts on gsk, vsk, an additive functional term determined by zsk and zsk, a vector of time-dependent covariates.


[0036] If the above model is parameterized by a log-linear function, the baseline function g for any item l, associated with system k in stratum s becomes,
11gskl(t,θ)=exp(Γskl+ψsklΥskl(t-τskl))(6)


[0037] The coefficients
12βskl,ηskl,ψskl,andτskl


[0038] can not be represented as matrices because different systems could be in different strata. The functional terms become,
13λskl(t,θ)=exp(j=1mlγsjkl·zjkl)(7)vskl(t,θ)=exp(j=1mlαsjkl·zjkl)(8)


[0039] The ROCOF for the entire system is obtained by substituting the above relationships into the proposed model for repairable systems' ROCOF. Regression coefficients can be solved for by maximum likelihood methods. This yields a considerably more accurate representation of the ROCOF as compared to the ROCOF resulting from the method of the prior art. This particular representation of the ROCOF allows for concomitant information as well as multiple system copies which solves some of the shortcomings such as the handling of multiple systems copies, allowing for concomitant information and modeling a non-repairable system with appropriate repairable system theory described in the method of the prior art.


[0040] Referring now to FIG. 1, there is shown appropriate time scales to measure life times of systems. It is required to define time scales explicitly because the scale used for non-repairable systems is different than the scale used for repairable systems. The time scales in FIG. 1 are defined by means of an example sample path of a failure process.


[0041] Xi, i=1, 2, 3 . . . , refers to the interarrival time between the (i−1)th failure and ith failure. Xi is a random variable with X0=0. This is referred to as local time and is convenient to use when analyzing non-repairable systems. The real variable xi measures the time elapsed since the most recent failure. Ti, i=1, 2, 3 . . . , measures time from 0 to the ith failure time. Ti is also called the arrival time to the ith failure and is mostly used to analyze repairable systems. This time scale is referred to as global time. Clearly, Tk=X1+X2+ . . . +Xk. From this, a random variable N(t) can be defined as the maximum value of k for which Tk<t, i.e. N(t) is the number of failures that have occurred in (0, t]. N(t), t≧0 is the integer valued counting process that includes both the number of failures in (0, t], N(t), and the instants of occurrence, T1, T2, . . . ;


[0042] A schematic presentation of the general scenario for non-repairable systems is shown in FIG. 2. This scheme shows that the completely general model can be reduced easily to simpler situations. Once the underlying failure process has been defined it is possible to predict the arrival time of the next event. Suppose a system has been in operation for x time units and a maintenance policy exists where the system is replaced preventively at time Xp or at failure, whichever comes first. The conditional expectation of Xr+1, (where Xr+1≦Xp), is given by
14E[Xr+1&RightBracketingBar;Xr+1Xp]=xXpx·f(x,θ)xxXpf(x,θ)x(9)


[0043] Hence, the residual life to the (r+1)th failure is expected to be,


μr+1=E[Xr+1|Xr+1≦Xp]−x  (10)


[0044] It is also possible to calculate confidence levels around the residual life estimate. The lower limit, {tilde under (X)}r+1−x, can be obtained by solving numerically for {tilde under (X)}r+1 in,
15xXr+1f(x,θ)x1-0xf(x,θ)x-Xpf(x,θ)x=0.025(11)


[0045] Similarly is the upper limit {tilde over (X)}r+1−x and and can be obtained by solving for {tilde over (X)}r+1 in,
16xX~r+1f(x,θ)x1-0xf(x,θ)x-Xpf(x,θ)x=0.975(12)


[0046] In the case where there is no preventive maintenance rule, i.e. Xp=∞, and it is desired to calculate the residual life shortly after Xr, i.e. x≈0, equation (9) becomes,
17E[Xr+1]=0x·f(x,θ)x0f(x,θ)x(13)


[0047] and the corresponding residual life to the (r+1)th failure is expected to be,


μr+1=E[Xr+1−x  (14)


[0048] The confidence limits now reduce to,
180Xr+1f(x,θ)x=0.025(15)


[0049] and for the upper limit, {tilde over (X)}r+1, in,
19xX~r+1f(x,θ)x=0.975(16)


[0050] A schematic presentation of the scenario for repairable systems is shown in FIG. 3. This scheme shows that the completely general model can be reduced easily to simpler situations. Once the failure characteristics have been quantified, future failure can be predicted. Suppose a system has been in operation for t time units, where Tr≦t≦Tr+1 and it is required to know when the next failure will occur. Further assume that a decision rule applies that the system will not be allowed to operate beyond Tp=Tr+v. If the conditional probability density is defined as f(t,θ)=ρ(t,θ)·R(t), the conditional expectation of Tr+1 is given by,
20E[Tr+1&RightBracketingBar;Tr+1Tp]=tTps·f(s,θ)stTpf(s,θ)s(17)


[0051] Hence, the residual life to the (r+1)th failure is expected to be,


μr+1=E[Tr+1|Tr+1≦Tp]−t  (18)


[0052] It is also possible to calculate confidence levels around the residual life estimate. The lower limit, {tilde under (T)}r+1−t, can be obtained by solving numerically for {tilde under (T)}r+1 in,
21tTr+1ρ(s,θ)s1-Trtρ(s,θ)s-Tpρ(s,θ)s=0.025(19)


[0053] Similarly is the upper limit {tilde over (T)}r+1−t and and can be obtained by solving for {tilde over (T)}r+1 in,
22xT~r+1ρ(s,θ)s1-0xρ(s,θ)s-Tpρ(s,θ)s=0.975(20)


[0054] Only in very few cases in practice preventive maintenance rules exist for repairable systems. This simplifies the calculations above significantly. Suppose no preventive maintenance rule is applied, i.e. Tp=∞, and we would like to calculate the residual life shortly after Tr, i.e. t≈Tr, equation (17) becomes,
23E[Tr+1]=Trt·ρ1(t)·R(Trt)t=ln[(r+1)α1+exp(α0+α1Tr]-α0α1(21)


[0055] and the corresponding residual life to the (r+1)th failure is expected to be,


μr+1=E[Tr+1]−Tr  (22)


[0056] The confidence limits now reduce to,
24TrTr+1ρ1(s)·R(ts)s=0.025(23)


[0057] Similarly is the upper limit {tilde over (T)}r+1−Tr and and can be obtained by solving for {tilde over (T)}r+1 in,
25TrT~r+1ρ1(s)·R(ts)s=0.975(24)


[0058] It is to be understood that the description of the preferred embodiment(s) is (are) intended to be only illustrative, rather than exhaustive, of the present invention. Those of ordinary skill will be able to make certain additions, deletions, and/or modifications to the embodiment(s) of the disclosed subject matter without departing from the spirit of the invention or its scope, as defined by the appended claims.


Claims
  • 1. A method for estimating the residual life of rotating machinery within statistical bounds based on event data and corresponding diagnostic measurements, comprising: processing of formulation of combined Proportional Intensity Models (PIMs) in the form of Force of Mortality (FOM) for non-repairable systems and Rate of Occurrence of Failure (ROCOF) for repairable systems; processing of formulation of said Force of Mortality (FOM) for non-repairable systems or processing of formulation of said Rate of Occurrence of Failure (ROCOF) for repairable systems; deriving the probability density from said Force of Mortality (FOM) or said Rate of Occurrence of Failure (ROCOF); and calculating a conditional expectation of a failure event of said rotating machinery within statistical bounds from the probability density.
  • 2. The method of claim 1, wherein the formulation of combined Proportional Intensity Models (PIMs) for non-repairable systems is such that said Force of Mortality (FOM) is given by,
  • 3. The method of claim 1, wherein the formulation of combined Proportional Intensity Models (PIMs) for repairable systems is such that said Rate of Occurrence of Failure (ROCOF) is given by,