Method of fault diagnosis based on propagation model

Information

  • Patent Grant
  • 6374196
  • Patent Number
    6,374,196
  • Date Filed
    Tuesday, March 16, 1999
    25 years ago
  • Date Issued
    Tuesday, April 16, 2002
    22 years ago
Abstract
An alarm propagation model of a transmission line is expressed by an equation. A fault portion on the transmission line is estimated by using the alarm propagation model. As a result, it is possible to estimate the fault portion by minimum observation time without lack of precision.
Description




FIELD OF THE INVENTION




This invention relates to a method of fault diagnosis based on a propagation model, specifically, it relates to a method for diagnosing a fault by using, as observed information, a time-series of alarms.




BACKGROUND




Regarding a fault diagnosis of a transmission line, prior art will be described.




The transmission line is composed of tandem connected multiplexer and/or terminating sets etc. For monitoring the transmission line system. ITU-T series G recommendations prescribe how to generate an alarm in each device which composes the system.




According to the alarm generation method prescribed by the ITU-T series G recommendations, when a communication is stopped by a fault at arbitrarily one point within the transmission line system, alarms are notified to a monitoring apparatus from all of the devices which detected the stop of communication. Therefore, accordingly as an affect of the fault is spread in the communication line system, the number of alarms, which are notified to and arrived at the monitoring apparatus, increases.




It is necessary to consider the following states (1) and (2) for executing, by using such alarm, a fault diagnosis in the transmission line.




(1) There exist a time delay in alarm propagation and a time delay in alarm detection, and these are not constant. Then, an order of alarm, observed by the monitoring apparatus, is not constant.




(2) There are plural fault hypotheses, which cause the observed alarm. Therefore, it may be not possible to decide whether only one fault has occurred and spread over the communication line or plural faults were simultaneously occurred.




In a prior art of fault diagnosis based on alarm of which observation order is not stable or constant, It is generally to collect a set of alarms within a time window after a fault occurred, and to decide the most probable fault hypothesis which explains the set of alarms within the time window.




However, the prior art is not able to select the most probable fault hypothesis when there is a loss in the observed data. Then, in the prior art, it is necessary to decide a priority between the hypotheses by a try and error manner.




SUMMARY OF THE INVENTION




An object of the present invention is to provide a method of fault diagnosis which can minimize the necessary time for observing alarms without a lack of precision.




Another object of the present invention is to provide a method of fault diagnosis which can estimate the most probable fault portion even if there is a loss in the observed alarm data.




For solving these objects and performing a fault diagnosis in the present invention, a fault hypothesis is expressed by a time-series model based on an alarm propagation model, and a problem for selecting the fault hypothesis is changed to a problem for judging a likelihood of the time-series model.




In a preferred embodiment, the present invention is directed to a method of fault diagnosis based on using, as an observed information, a model of alarm propagation comprising the steps of:




presuming a fault hypothesis as a time-series model prescribed by a parameter of time delay;




defining a likelihood of the fault hypothesis by a product of a probability density of an observation delay time about alarm which is observed at a fault occurrence;




deciding the most probable fault hypothesis by comparing likelihood between the observed alarm time-series and the fault hypotheses; and




estimating a fault portion based on the decision.




In a further preferred embodiment, the present invention is directed to a method further comprising the steps of:




obtaining AIC (Akaike's Information-theoretic Criterion) of each fault hypothesis;




arranging the fault hypotheses in order of small AIC; and




identifying, as the most probable fault hypothesis, the fault hypothesis whose AIC is the minimum.




In a still further preferred embodiment, the present invention is directed to a method further comprising the steps of:




constructing a fault tree of which node corresponds to each device and of which link corresponds to each connection between the devices;




defining a probability distribution and a probability density function of the observation delay time of alarm, by giving alarm detection delay to each node and giving alarm propagation delay to the link,




respectively as alarm delay parameter associated with a route of the fault tree;




estimating, before alarm arrival, a likelihood of the fault hypothesis at a time t when alarm is observed, by the probability density function;




defining, after alarm arrival, as the likelihood of the fault hypothesis at the time t when alarm is observed, a value of the probability density function when alarm arrived; and




defining the likelihood of each fault hypothesis by the product of the probability density function of the occurred alarm.




The objects of the present application will become more readily apparent from the detailed description given hereafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.











BRIEF DESCRIPTION OF THE DRAWINGS




The present invention will become more fully understood from the detailed description and the accompanying drawings which are given by way of illustration only and wherein:





FIG. 1

shows a topology of a transmission line in the preferred embodiment of the present invention;





FIG. 2

shows a fault tree corresponding to fault of the link


17


only;





FIG. 3

shows a fault tree corresponding to fault of the link


20


only;





FIG. 4

shows a fault tree corresponding to fault of the link


21


only;





FIG. 5

shows a fault tree corresponding to simultaneous faults of the links


20


and


21


;





FIG. 6

shows a process for making an alarm propagation model and an evaluation,function;





FIG. 7

shows an algorithm for estimating a fault portion based on an alarm propagation model;





FIG. 8

shows changes in time of Akaike's Information-theoretic Criterion (AIC ) of each fault hypothesis for an observed alarm sequence.











DESCRIPTION OF THE PREFERRED EMBODIMENT




A preferred embodiment of the present invention will be explained.




First, an alarm propagation model is made by using a fault tree and its process will be explained.




A fault tree is a knowledge representation in which a node corresponds to a parameter indicating a state of a system, and a link having a direction represents a cause and effect between the nodes. Such fault tree is used for fault diagnosis of a chemical plant and its usefulness is recognised.




Therefore, in the present invention, a fault tree is constructed according to a relationship of connections between devices which construct a system to be monitored. Further, an alarm propagation model Is defined by associating time delay, until alarm observation, with the fault tree.




In this embodiment, a topology of a transmission line shown in

FIG. 1

, is assumed.




In

FIG. 1

, devices a, b, c, d, e, f, g and h exist in a transmitting side P


0


, and devices a′, b′, c′, d′, e′, f′, g′ and h′ exist in a receiving side P


15


. Between the devices in the side P


0


and the devices in the side P


15


, the devices having same alphabetic character are connected each other as a-a′, b-b′, c-c′, d-d′, e-e′, f-f′, g-g′ and h-h′. P


1


, P


2


, P


3


, P


4


, P


5


, P


6


and P


7


denote multiplex devices (multiplexers). De-multiplexers (de-multiplex devices) P


8


, P


9


, P


10


, P


11


, P


12


, P


13


and P


14


correspond to those multiplexers P


1


, P


2


, P


3


, P


4


, P


5


, P


6


and P


7


respectively.




Each device a, b, c, d, e, f, g, h in the transmitting side P


0


has a unit for generating alarm Each of


0




a


,


0




b


,


0




c


,


0




d


,


0




e


,


0




f


,


0




g


and


0




h


denotes the alarm generating unit of each devices, corresponding to a, b, c, d, e, f, g and h respectively.




Further, each multiplexer has an alarm generating unit (c). Each of


1




c


,


2




c


,


3




c


,


3




c


,


4




c


,


5




c


,


6




c


and


7




c


denotes the alarm generating unit of each multiplexer, corresponding to P


1


, P


2


, P


3


, P


4


, P


5


, P


6


and P


7


respectively. Each de-multiplexer has an alarm generating unit (a) and an alarm generating unit (b). Each of


8




a


,


8




b


,


9




a


,


9




b


,


10




a


,


10




b


,


11




a


,


11




b


,


12




a


,


12




b


,


13




a


,


13




b


,


14




a


and


14




b


denotes the alarm generating unit of each de-multiplexer, corresponding to P


8


, P


9


, P


10


, P


11


, P


12


, P


13


and P


14


, respectively.




The numerals


1


to


29


are used for identifying the links between the devices which includes the multiplexers and the de-multiplexers.




It is assumed that alarm is generated by the device in the downstream.




Further, as a delay from the an occurrence of fault to detection of alarm in each device (a, b, c, d, e, f, g, h, a′, b′, c′, d′, e′, f′, g′, h′ P


1


, P


2


, P


3


, P


4


, P


5


, P


6


, P


7


, P


8


, P


9


, P


10


, P


11


, P


12


, P


13


, P


14


) on the transmission line, a detection delay and a propagation delay are defined. Wherein, the detection delay is a time necessary from a detection of something wrong by the alarm generation unit to a recognition of the something wrong by the monitoring apparatus after an arrival of alarm The propagation delay is a time during which a fault in a device affects the adjacent device.




As shown in

FIG. 2

, a fault tree, describing an affection of a fault, is made by associating each alarm detection delay and each alarm propagation delay with the topology shown in FIG.


1


. Namely, the alarm detection delay is associated with each node and the alarm propagation delay is associated with each link.




The fault tree shown in

FIG. 2

corresponds to fault only in the link


17


. In

FIG. 2

, among the alarm delay parameters, each of λ


8




b


, λ


7




c


, λ


5




c


, λ


6




c


, λ


1




c


, λ


3




a


, λ


4




c


, λ


0




b


, λ


0




f


, λ


0




g


and λ


0




h


denotes an alarm detection delay, and each of δ


17


, δ


15


, δ


14


, δ


13


, δ


11


, δ


9


, δ


8


, δ


7


, δ


6


and δ


2


denotes an alarm propagation delay.




In a same manner,

FIG. 3

shows a′fault tree in case of considering fault of the link


20


only,

FIG. 4

shows a fault tree in case of considering fault of the link


21


only, and

FIG. 5

shows a fault tree considering simultaneous faults both of the links


20


and


21


.




Next, regarding each fault tree, a probability distribution of alarm observation delay is decided. Its process will be explained. Wherein, symbols are defined as follows.




(1) Li: a fault hypothesis of the link i




(2) L: a set {Li} of fault hypotheses Li




(3) FTi: a fault tree being defined and accompanied with the fault hypothesis Li




(4) FT: a set {FTi } of the fault tree FTi




(5) ak: alarm notified from a node Nk




(6) A: a set of all alarms




(7) Ai: a set of alarms which can be observed under the fault hypothesis Li




The relationship of the set Ai and the set A is indicated by the following equation 1.






Ai⊂A  [equation 1]






The probability distribution of alarm observation delay, when a certain alarm ak is observed at a time t, is indicated by P {ak} (t|θik, t


0


). Wherein, the the alarm ak is raised by a fault which occurs in the link i at a time t


0


, and the relationship of the alarm ak and the set Ai is indicated by the following equation 2, and the symbol θik indicates a row or an arrangement or an alignment of the delay parameters accompanied with the route of the fault tree from the link i, where fault occurred, to a node Nj.






ak∈Ai  [equation 2]






For example, in a case where fault occurred in the link


17


, the probability distribution of the alarm observation delay is given by the following equations 3˜7. In these equations 3˜7, Δi(t) denotes a distribution function of the alarm propagation delay δi, and Λi(t) denotes a distribution function of the alarm detection delay λi, and a symbol ∘ denotes a convolution.






P{a


8




b


}(t|δ


17


, λ


8




b


, t


0


)=Λ


8




b


∘Δ


17


(t)  [equation 3]








P{a


7




c


}(t|δ


17


, δ


15


, λ


7




c


, t


0


)=Λ


7




c


∘Δ


15


∘Δ


17


(t)  [equation 4]








P{a


6




c


}(t|δ


17


, δ


15


, δ


14


, λ


6




c


,t


0


)=Λ


6




c


∘Δ


14


∘Δ


15


∘Δ


17


(t)  [equation 5]








P{a


4




c


}(t|δ


17


, δ


15


, δ


14


, δ


12


, λ


4




c


, t


0


)=Λ


4




c


∘Δ


12


∘Δ


14


∘Δ


15


∘Δ


17


(t)  [equation 6]








P{a


0




h


}(t|δ


17


, δ


15


, δ


14


, δ


12


, δ


8


, λ


0




h


, t


0


)=Λ


0




h


∘Δ


8


∘Δ


12


∘Δ


17


(t)  [equation 7]






Based on the alarm propagation model, an evaluation function is set as follows for estimating a fault portion.




A likelihood of the fault hypothesis Li is defined by the following equations 9 and 10, in a case where the equation 8 represents a alarm time sequence S at a time tk when the alarm ak is observed, and the probability density function, by which the alarm ak is observed at the time t under the fault hypothesis Li, is a function p {ak} (t|θi, t


0


). Further, a log-likelihood LL(t|θi, t


0


) of the fault hypothesis Li is defined by the following equation 11, and the Akaike's Information-theoretic criterion AIC(t|θi, t


0


) is defined, as an evaluation function for comparing the fault hypothesis model, by the following equation 12.






S={<ak,xk>|ak∈A,


1


<k<N, t


0


<xk, x(k−1)≦xk}  [equation 8]






















h


{
ak
}








(
t

&RightBracketingBar;


θ





i

,
t0

)








p






{
ak
}








(
t

&RightBracketingBar;


θ





i

,
t0

)




wherein
,





t0
<
t
<
xk
















p






{
ak
}






xk

&RightBracketingBar;


θ





i

,
t0

)




wherein
,





xk

t








[

equation





9

]




















h


(


t


θ





i


,
t0

)







ak

Ai




h


{
ak
}



(


t


θ





i


,
t0

)







[equation   10]









 LL(t|θi, t


0


)≡log h(t|θi, t


0


)  [equation 11]






AIC(t|θi, t


0


)≡−2×LL(t|θi, t


0


)+2×|θi|, wherein |θi| is a number of delay parameters θi.  [equation 12]






It is necessary to estimate the time t


0


when fault occurred, because the evaluation function AIC(t|θi, t


0


) includes the time t


0


. For estimating, at the time t, the time t


0


(t|θi) when fault occurred, the time t


0


(t|θi) is calculated by the following equation 13 of the most probable estimation, wherein the observed alarm sequence S is given by S={<ak, tk>|0<K<N} and the row of the observation delay parameter is θi.















t0


(

t


θ





i


)


=

arg







max
t0



LL


(


t


θ





i


,
t0

)








[equation   13]













Next, a simultaneous fault hypothesis L {i+j} is considered. The simultaneous fault hypothesis L {i+j} is synthesized by the fault hypothesis Li and the fault hypothesis Lj.




A likelihood of the simultaneous fault hypothesis L {i+j} is defined by the following equations 14 to 16, dependent on whether the alarm ak is included in the fault hypothesis Li or the fault hypothesis Lj or both of them.












If





ak



Ai





and





ak


Aj

;











H


{
ak
}







(

θ


{

i
+
j

}


)


=






H


{
ak
}







(

θ





i

)


+

H


{
ak
}







(

θ





j

)


-












H


{
ak
}







(

θ





i

)






H


{
ak
}







(

θ





j

)










[

equation





14

]













then,










h


{
ak
}







(

θ


{

i
+
j

}


)


=






h


{
ak
}







(

θ





i

)


+

h


{
ak
}







(

θ





j

)


-












(


h


{
ak
}







(

θ





i

)






H


{
ak
}







(

θ





j

)


+














H


{
ak
}







(

θ





i

)






h


{
ak
}







(

θ





j

)


)










 If ak∈Ai and ak∉Aj;






h{ak}(θ{i+j})=h{ak}(θi)  [equation 15]








If ak∉Ai and ak∈Aj;








h{ak}(θ{i+j})=h{ak}(θj)  [equation 16]






An evaluation function, which represents the likelihood of the each fault hypothesis when the alarm observation sequence is obtained, is required for estimating the fault hypothesis based on the alarm observation sequence. The evaluation function is required to satisfy such a property that an evaluation value becomes higher when the alarm observation delay is closer to an expectation value, and the evaluation value becomes lower when the alarm observation delay is more apart from the expectation value. The above-mentioned evaluation function AIC(t|θi, t


0


) satisfies this requisite.




In considering that the fault hypothesis is a model which is prescribed by the set θi of the delay parameters defined on each fault tree FTi, the most proper fault hypothesis to the observed alarm data is estimated as follows by using the evaluation function AIC(t|θi, t


0


).




While a number of parameters which prescribe the time sequence model is the number |θi| of the delay parameters, the number |θi| depends on each fault hypothesis, then the number |θi| is different from each other. However, it is possible to compare the models having different number of parameters, by using the AIC. Therefore, the most probable model θ* can be obtained by the following equation 17.






θ*=arg min AIC(θi)








θi  [equation 17]







FIG. 6

shows above-mentioned process from making alarm propagation model untill making the evaluation function.




Next, referring to follows {circle around (1)} to {circle around (4)}, an algorism for estimating the most probable fault hypothesis at the observation time will be explained, wherein the time sequence S={<ak, tk>|0<k<N} of the observed alarms is given as a list of alarms in order of the observation time.




In the algorism, as shown in

FIG. 7

, the time sequence S={<ak, tk>|0<k<N} of the observed alarms is input. Then, the most probable fault hypothesis is output by using the evaluation function made with the process shown in FIG.


6


. Wherein, a symbol pop(S) represents an operation by which a first component of the time sequence S is picked out from the sequence S.




{circle around (1)} Initial state




(a) n:=0




(b) t:=t


1


(a time when the first alarm is observed)




(c) a list of models to be possible:=a set of fault models based on a single device.




(d) a list of models to be deleted:={ }




{circle around (2)} <ak, tk>=pop(S)




{circle around (3)} while (n<Nmax) do {




(a) n:=n+1;




(b) t:=t+Δt;




(c) if(t<tk) then {




Fault occurrence time t


0


(t|θi) is calculated by equation 13.




AIC(t|θi) is calculated about all θi which are included within the list of models to be possible.




} else {




(1) [Deletion of the model from the list to be possible] About all θi which are included within the list of models to be possible;




if (ak is not included in Ai) then {




(i) θi is added to the list of the models to be deleted,




(ii) θi is deleted from the list of the models to be possible.




}




(2) [Addition to the model to the list to be possible] About all θi which are included within the list of models to be possible, and all θi which are included within the list of models to be deleted;




{




(i) θi, of which ak is included in Ai, is selected, then θ{i+j} is added to the list of models to be possible. Further, fault occurrence time t


0


(t|θi) is calculated by equation 13.




(ii) AIC(t|θ{i+j} is calculated.}




}




(d) The list of models to be possible is aligned in order of small AIC.




(e) if (n=Nmax) then the list of models to be possible is output and the process is stopped.




(f) if (t≧tk) then <ak, tk>=pop(S)




{circle around (4)} The models are output in order of the list of models to be possible.





FIG. 8

shows changes in time of Akaike's Information-theoretic Criterion (AIC ) of many fault hypotheses, when fault of the link


20


and fault of the link


21


simultaneously occurred. For example, there are considered a fault hypothesis on single fault In the link


17


, a no fault hypothesis on single fault in the link


20


, a fault hypothesis on single fault in the link


21


, and a fault hypothesis on simultaneously occurred faults in plural of the links


17


,


20


and


21


.




According to

FIG. 8

, since the fault hypothesis, which can not explain the observed alarm, is deleted in course of time from the list of models to be possible, for example, the fault hypothesis on the single fault in the link


20


is rejected when the alarm


0


g is observed. As the fault hypothesis of which AIC is the minimum, the lower fault hypothesis in

FIG. 8

is the more probable hypothesis.




While a method for estimating a fault portion of the transmission line in the above-mentioned embodiment, this invention can be applied to an estimation or correlation method of a fault portion In a case where plural alarms simultaneously occurs in an arbitrarily, and also can be applied to an estimation method of a fault portion in a network such as a management of a congestion information about a road.




There are following effects of the present invention,




(1) It is possible for the present invention to align the results of diagnosis in order of the likelihood of the fault hypothesis, while It Is Impossible for a prior art to clarify the probability of diagnosis error when diagnosis is executed in a constant time window.




(2) It is possible for the present invention to easily select the proper fault hypothesis by defining the time sequence model as the fault hypothesis and selecting the time sequence model of the fault hypothesis which is nearest to the time sequence of the observed alarms, while it is necessary for a prior art to set a specific process and a complicated rule for diagnosis when a part of data was lost.




(3) It is possible for the present invention to select, based on AIC, a model of which likelihood is high and of which complexity is small when plural faults occur. On the other hand, in a prior art, a fault hypothesis, of which number of fault portions is small, is only selected when plural faults occur. Therefore, there is a problem that a simple model is selected in spite of high likelihood.



Claims
  • 1. A method of fault diagnosis based on using, as observed information, a model of alarm propagation comprising the steps of:presuming a fault hypothesis as a time-series model prescribed by a parameter of time delay; defining a likelihood of the fault hypothesis by a product of a probability density of an observation delay time about an alarm which is observed at a fault occurrence; deciding the most probable fault hypothesis by comparing a likelihood between the observed alarm time-series and the fault hypotheses; and estimating a fault portion based on the decision.
  • 2. The method claimed in claim 1 further comprising the steps of:obtaining AIC (Akaike's Information-theoretic Criterion) of each fault hypothesis; arranging the fault hypotheses in order of small AIC; and identifying, as the most probable fault hypothesis, the fault hypothesis whose AIC is the minimum.
  • 3. The method claimed in claim 1 further comprising the steps of:constructing a fault tree which node corresponds to each device and which link corresponds to each connection between the devices; defining a probability distribution and a probability density function of the observation delay time of alarm, by giving alarm detection delay to each node and giving alarm propagation delay to the link, respectively as alarm delay parameter associated with a route of the fault tree; estimating, before alarm arrival, a likelihood of the fault hypothesis at a time t when alarm is observed, by the probability density function; defining, after alarm arrival, as the likelihood of the fault hypothesis at the time t when alarm is observed, a value of the probability density function when alarm arrived; and defining the likelihood of each fault hypothesis by the product of the probability density function of the occurred alarm.
  • 4. A method of fault diagnosis in a network, which has a plurality of nodes which define communication devices and a plurality of links which define lines between the nodes, based on using an alarm notified to a monitoring device from an alarm generating unit provided for each of the nodes, said method comprising steps of:pre-determining, by each fault hypothesis, an alarm propagation model with an alarm detecting delay and an alarm propagation delay according to a topology of the network; wherein said alarm detection delay being a random variable of probability density function of a required time to send an alarm notification from each of the alarm generating units to the monitoring device and said alarm propagation delay being a random variable of probability density function of a required time for a fault to propagate from one of the nodes to another of the nodes along the link; obtaining a likelihood of each fault hypothesis by convoluting said alarm detection delay and said alarm propagation delay at a certain fault occurrence; obtaining Akaike's Information-theoretic Criterion of each obtained likelihood; arranging the fault hypotheses in order of small Akaike's Information-theoretic Criterion; identifying, as the most probable fault hypothesis, the fault hypothesis whose Akaike's Information-theoretic Criterion is the minimum; and estimating a fault portion in the network based on said identification.
Priority Claims (3)
Number Date Country Kind
10-065501 Mar 1998 JP
10-145945 May 1998 JP
10-268030 Sep 1998 JP
US Referenced Citations (4)
Number Name Date Kind
5214577 Sztipanovits et al. May 1993 A
5465321 Smyth Nov 1995 A
5629872 Gross et al. May 1997 A
6147974 Matsumoto et al. Nov 2000 A