The present invention is related to fault diagnosis of a plant. In particular, the present invention is related to real-time distributed diagnosis for a plant.
Basic research in fault diagnosis has progressed significantly over the past four decades with well-established theoretical developments including consistency-based diagnosis [1], [2] from the artificial intelligence community and analytical redundancy techniques from the automatic control community [3].
In the automotive industry, motor vehicles can contain over 100 electronic control units (ECUs) and even more sensors, actuators, etc. These components have to communicate among each other, e.g. through a Controller Area Network (CAN) bus and the architecture of such electronic systems is transforming into a network of real-time distributed embedded control systems.
In such networks, a global diagnosis method, which collects the diagnostic information from all the subsystem controllers, is not practical due to high communication requirements and time delays induced by centralized diagnosis. The maintenance of such networks is also tedious for new prototypes, because they do not support plug-and-play capability. Consequently, different systems and methods of diagnosis are needed in order for development of improved machines to continue. As such, a system and/or method that provides improved real-time distributed diagnosis for a machine, plant and the like would be desirable.
A system for real-time distributed diagnosis of a plant is provided. The system is operable with a plant that has a plurality of subsystems, each of the subsystems having one or more components that participate in the operation of the plant by performing one or more plant operations. In addition, the plant can have one or more sensors that detect the one or more plant operations and transmit data related thereto. The system can include an agent-based plant diagnostic network that has a plurality of subsystem resident agents, each of the subsystem resident agents being assigned to one of the plurality of subsystems. In addition, each of the subsystem resident agents is operable to run a test on one or more of the components of the assigned subsystem and provide an outcome of the test.
In addition to the subsystem resident agents, a plurality of diagnostic inference algorithms can be included. Each of the diagnostic inference algorithms can be assigned to one of the subsystem resident agents and be operable to monitor and assign a fault state to the outcome of the test run by the assigned subsystem resident agent. The fault state can be selected from “good”, “bad”, “suspect”, and “unknown”.
In some instances, the system can include a multi-signal digraph model of the plant and each of the diagnostic inference algorithms can be a function of the digraph model. In addition, the diagnostic inference algorithms can decompose a problem with the plant into domain specific subsystem models.
The present invention discloses a system for real-time distributed diagnosis of a plant that has a plurality of subsystems. As such, the present invention has utility as a diagnostic system.
The system can be used for the plant having the plurality of subsystems, each of the plurality of subsystems having one or more components that participate in the operation of the plant by performing one or more plant operations. In addition, one or more sensors can be included that detect the one or more plant operations and are operable to transmit data related thereto. The system can include an agent-based plant diagnostic network that has a plurality of subsystem resident agents (SRAs), each of the SRAs assigned to one of the subsystems and being operable to run a test on one or more of the components of the assigned subsystem. In addition, each of the SRAs can provide an outcome of the test that has been run.
In addition to the agent-based plant diagnostic network, a plurality of diagnostic inference algorithms can be included. Each of the diagnostic inference algorithms can be assigned to one of the SRAs and be operable to monitor and assign a fault state to the outcome of the test that is run by the particular SRA. The fault states assigned by the diagnostic inference algorithm can be selected from “Good”, “Bad”, “Suspect” and “Unknown”. Each of the SRAs is operable to receive data that is related to the detected plant operation from one or more of the sensors and transmit the data to a plant expert agent. In addition, each of the SRAs is also operable to transmit the fault state assigned by the diagnostic inference algorithm to the plant expert agent.
In some instances, the system can include a multi-signal digraph model of the plant. If such a model is provided, the diagnostic inference algorithms can be a function of the multi-signal digraph model. It is appreciated that each of the plurality of diagnostic inference algorithms can be assigned to a different subsystem resident agent of the plurality of subsystem resident agents and thus the diagnostic inference algorithms can decompose a problem with the plant into domain specific subsystem models. It is appreciated that the plant can be any type of machine or vehicle that has a plurality of subsystems, for example a motor vehicle.
A process for providing real-time distributed diagnosis to a plant can include providing a plant having a plurality of subsystems, the plurality of subsystems constituting at least part of the plant. Each of the subsystems can have at least one component that is operable to perform one or more predefined plant operations and at least one sensor that is operable to detect at least one of the one or more predefined plant operations performed by the at least one component. The at least one sensor is operable to transmit data related to the detected plant operation. An agent-based plant diagnostic network that has a plurality of SRAs is also provided, along with the plurality of distributed diagnostic algorithms as described above. The plant is operating with the subsystems contributing to the operation of the plant and the sensors detecting and transmitting information related to the operation of one or more components. The SRAs receive the information transmitted by the sensors, and may or may not run a test on at least one component. A given distributed diagnostic algorithm that is assigned to a particular subsystem and/or SRA can assign a fault state to the outcome of the test run by the SRA, the fault state transmitted to the plant expert agent.
Turning now to
Referring now to
Turning now to
After a fault state has been assigned to the data received by the SRA, the fault state can be transmitted to the PEA. It is appreciated that the PEA can run a plant-wide diagnostic analysis, thereby producing a plant-wide diagnosis.
In order to better illustrate the inventive distributed diagnosis algorithm, development of an example distributed diagnosis algorithm is described below, along with comparison of the developed algorithm with a centralized diagnosis algorithm using real-world examples.
Referring to
1) a set of m potential failure sources FS={fs1, . . . , fsm};
2) a set of n available binary tests (or events, alarms, features) T {t1, . . . , tn}; and
3) a D-matrix, D, having dimension m×n, describing the diagnostic capabilities of tests. Each test tj for 1≦j≦n, corresponds to a column in the D-matrix: djT=[d1j, d2j, . . . , dmj]. In addition, dij=1 implies that test tj fails (alarm j rings) if failure source fsi is the cause of failure. Conversely, dij=0 indicates that failure source fsi is not detected by test tj. For example and for illustrative purposes only, the D-matrix for entities 411, 421, . . . 433 (C1-C6) in
A diagnosis problem can be defined by a triple (D, T, X), where D={dij|i=1 . . . m, j=1 . . . n} is the dependency matrix, T {tj|j=1 . . . n} is the set of test outcomes and X{Good, Bad, Suspected, Unknown} is the set of four distinct states associated with each component in the system. In addition, a support SPj of a test tj can be a set of failure sources (rows of the D-matrix) with a nonzero element in the column corresponding to tj. As such, a real-time centralized (RTC) diagnosis inference algorithm [5] for the diagnosis problem (D, T, X) can be stated as:
In this algorithm, G represents the set of Good components, B represents the set of Bad components, S represents the set of Suspected components and U represents the set of Unknown components.
A diagnosis for the diagnosis problem can be defined as D=∪Di with a minimal cardinality diagnosis defined as Dmc={|Dj|=minDi∈D |Di|,Dj∈D} as taught in reference [1]. It is appreciated that the diagnosis D=∉Di follows the principle of parsimony (or Occam's razor), i.e. a minimum set of faulty components can explain the observed findings. Such an approach implicitly implies multiple fault diagnosis by computing the hitting sets using conflict sets. However, for large complex systems with thousands of failure sources (components) and tests, the number of hitting sets will be enormous. As such, all unique diagnoses are included in the set B with the remaining multiple diagnoses included in the set S, excluding failure sources in the set B.
A scope SCi (signature) of a failure source can be a set of tests (columns of the D-matrix) with a nonzero element in the row corresponding to si. In addition, a set E of tests is said to be D-Complete if E is finite and for any component failing, ∃tj∈E such that tj=1. Such a “D-completeness” property guarantees detectability, i.e. any component fault will cause some test(s) to fail. It is appreciated that if a set of tests for a D-matrix is D-complete, there will be no Unknown components in the diagnosis. Furthermore, the Unknown failure source set from algorithm RTC will contain all the failure sources which have empty scope (SC=Ø). Stated differently, the Unknown failure source set from the algorithm RTC will have all the failure sources with undetectable faults.
Given the above, the diagnosis problem can be summarized by the following theorem.
Theorem 1: The diagnosis solution using RTC for the diagnosis problem (D, T, X) is:
B=∪{D
i
:|D
i|=1}
S=∪{|D
i|1}\B
U=Åfsi,Sci=Ø
G=FS\(B∪S┘U)
As an illustration of the theorem, taking test results (0,0,0,1,1,0) for the network illustrated in
It is appreciated that a possible solution to the distributed diagnosis problem is to keep a D-matrix for an entire system in a central diagnosis processing unit. In this manner, the central unit can collect test results from all the agents in the system and obtain a global diagnosis directly using the algorithm RTC. Although this approach can give reduction in memory and computational requirements for local agents, it has major shortcomings. For example, it can be difficult to keep the overall model up to date as engineering design changes occur, particularly since such a solution does not support plug-and-play for future control and monitoring developments and/or enhancements.
In the alternative the inventive distributed diagnostic approach does not require a central unit. Since the overall digraph model is decomposed into individual digraph models for each subsystem node, the local agent does not know, at least initially, the input(s)/output(s) status of its local model. However, two modifications to each local diagnosis agent can be made to handle this uncertainty. In particular, treat each subsystem input as a potential failure source and treat each subsystem output as a test point.
By treating each subsystem input as a potential failure source, account for the fact that a bad output from upstream can cause tests within the local subsystem to fail is provided. As such, all the subsystem inputs are listed as failure sources and in the D-matrix there are more rows corresponding to input faults. By treating each subsystem output as a test point, the test points are pseudo-tests, whose outcomes are inferred by the local diagnosis agent. Therefore, the D-matrix has the pseudo-tests added as columns. In addition, and unlike regular tests, there are four outcomes for pseudo-tests: Pass, Fail, Suspect and Unknown. After making these two modifications, the D-matrices for the three local models/agents Agent1410, Agent2420 and Agent3430, become as shown in Tables 2, 3 and 4, respectively.
It is appreciated that the basic principle of distributed diagnosis is that each agent performs its own diagnosis first using the centralized diagnosis algorithm RTC based on its D-matrix after the modifications. Then, each agent computes the output status. Finally, each local agent broadcasts all changed output/input status results to downstream and upstream agents. After obtaining information from neighboring agents, each agent iteratively revises its diagnosis based on information from upstream (input status) and downstream (output status) agents until it can no longer update the input/output status information, i.e. convergence occurs.
For illustrative purposes, an example step-by-step process is described.
To start the distributed diagnosis process, the following initializations can be performed for each agent.
To avoid storing a digraph in each agent, the pseudo-test list Ltj, which contains all the outputs reached by test tj, can be pre-computed via reachability analysis and stored in the memory. For example, the pseudo-test list LT(A2,1) for test T(A2,1) of Agent2 is LT(A2,1)={O(A2,1), O(A2,2)}.
For input Ii, there is an output list LIi that contains all the pseudo tests (outputs) from the upstream agents that are linked to this input. All the pseudo tests in this list are initialized as Unknown. For example, the input list LI(A3,1) for input I(A3,1) of Agent3 is LI(A3,1)={O(A2,1)(U), O(A1,1)(U)}.
Each agent Ai tracks the local diagnosis status of its own and its neighboring agents. The local diagnosis (LD) status list LDi contains the agents' name and their LD status. There are two status indicators for local diagnosis. These are ‘1’ if the local diagnosis (or local diagnosis update) and the computation of pseudo-test status at the local agent are done and ‘−1’ if the local diagnosis is still running. All the LD status are initialized as −1.
After the two modifications of each local agent detailed above are made, the D-matrix for the local agent will add rows for inputs (potential failure sources) and columns for outputs (pseudo-tests). For the inputs (potential failure sources), the inference will be performed using algorithm RTC. For the outputs (pseudo-tests), the initial values are set to Unknown.
The status of pseudo-tests after the local diagnosis are computed as follows:
Rule (9): If a test tj in the subsystem fails, set all the output pseudo-tests in Ltj that are reachable downstream from this test as having Failed. For example, the downstream pseudo-test list for test T(A2,1) is {O(A2,1),O(A2,2)}. If T(A2,1) fails, the pseudo-tests {O(A2,1),O(A2,2)} will be set to fail.
Rule (10): If all components within a subsystem reaching an output test are good, declare this pseudo-test as Pass. To execute the above rule, we can use the D-matrix as follows. For each pseudo-test tj, corresponding to a column in the D-matrix, djT if ∀dij=1, fsiG, set this pseudo-test to Pass. For example, for Agent2 in
G, set pseudo-test O(A2,2) to Pass.
After the agents perform their local diagnoses, they coordinate with each other to update their input/output status and local diagnoses. By the nature of digraph, the downstream agent receives upstream agent's outputs. Therefore, it is appropriate that the input/output analysis be performed at the downstream agent. For example, illustrative examples for the input/output analysis under different scenarios can be:
It is appreciated that during the input/output analysis, there are input/output status changes and these changes will affect the input/output analysis, if the analysis is not sequenced correctly. In addition to the input/output analysis the input/output analysis (IOA) algorithm guaranteeing correct execution order can be:
LIi, Oj ≠ P to P (Rule 5)
LIi, set Ii = B (Rule (2,4))
LIi, Oj ≠ P, set Oj = S (Rule (7))
LIi, Oj = U to S (Rule (8))
After an agent receives changes for the inputs/outputs, the local diagnosis is updated. Since input changes may trigger output changes, inputs are processed first. For all inputs (IG) that change to Good, the set of Good components are updated as G←G∪IG and each input that changes to Bad, the pseudo-tests that have 1 in the row of this input are updated to Fail (if it did not Fail). Thereafter, the agent treats the changed outputs (pseudo-tests) as new test results and updates the local diagnosis according to Steps 2-4 of algorithm RTC as describe above.
The asynchronous communication among agents are triggered by the input/output status changes, and LD status change. The real-time global (RTG) diagnosis algorithm can be summarized as:
To prove the correctness of the RTG algorithm, two lemmas are first introduced.
Lemma 1: The Local diagnoses {Gi, Bi, Si, Ui}, Ai∈A for each agent at each iteration are disjoint.
Proof: Intuitively, the local diagnosis in the diagnosis problem (D, T, X) provides the status (Good/Bad/Suspected/Unknown) for each failure source and these failure sources at each agent are independent of failure sources associated with other agents. Also UFSI=FS. Therefore, all the local diagnoses are disjoint.
Lemma 2: Given current local diagnosis {Giold, Biold, Siold, Uiold} and input/output changes, the result of local diagnosis update {Ginew, Binew, Sinew, Uinew} is such that |Gnew≧|Gold|, |Bnew|≧|Bold|, |Unew|≦|Uold| and |Snew|≦|Sold|. The local diagnosis update always generates the status for a failure source in the following order:
Proof: From the algorithm RTG and according to Lemma 1, the local diagnoses are disjoint, which means that GiGig and Bi
Big. For an input status change, each input (potential failure sources) will move from Unknown/Suspected→Bad/Good. For the pseudo-test (output) change, the failure sources will change their status from Unknown→Suspected/Bad/Good or Suspected→Bad/Good.
The result of algorithm RTG is given by the following theorem.
Theorem 2: Given the global diagnosis {Gg, Bg, Sg, Ug} based on the entire system D-matrix and test outcomes, if all local diagnoses {Gi, Bi, Si, Ui} for all agents Ai∈A are available and LDi=1, then, when Algorithm RTG converges, {Gg, Bg, Sg, Ug}∪{Gi, Bi, Si, Ui}.
Proof: From Lemma 1, it follows that all the local diagnoses are disjoint. Lemma 2 states that each local diagnosis update will always improve the diagnosis for failure sources in the order Unknown→Suspected→Good/Bad. When there is no input/output change and all the local diagnoses have converged (LD=1), the global diagnosis is the union of all the local diagnoses of failure sources (excluding the inputs).
A summary of the step-by-step process is shown in
In order to test the real-time global (RTG) diagnosis algorithm and compare it with the real-time centralized (RTC) algorithm a number of real-world models with reliable tests were evaluated. The sizes of the real-world models (m, n) varied from (8, 4) to (5206, 3720) in the number of failure sources, m, and number of tests, n. The real-world models included were:
Early External Thermal Control System is a model of a temporary thermal system, which is needed until the components of the permanent ETCS (External Thermal Control System) are launched and activated in a space station. It consists of radiators, heat exchangers, pumps, lines, valves, etc.
Detailed information about the models, such as the density of the D-matrix, average in-degree and average out-degree of each model is listed in Table 5.
aApproximate execution time T for each agent = Execution time/No. of subsystems
As shown in Table 5, both the RTC algorithm and the RTG algorithm work with the same efficiency on models with less than 500 failure sources and the approximate execution tome of the RTG algorithm is comparable to the execution time of the RTC algorithm. However for larger systems, it is apparent from the approximate execution time for the RTG algorithm (seconds in parenthesis) is significantly shorter than the execution time for the RTC algorithm.
In summary, an agent based real-time distributed diagnosis algorithm is provided to support a networked embedded distributed system. A distributed diagnosis algorithm (RTG) decomposes the vehicle diagnosis problem into domain-specific (digraph models) subsystems. By communicating input/output status indicators between neighboring agents, each agent acquires information about its neighbors and improves its local diagnosis iteratively. The algorithm converges after all the local agents finish their local diagnosis (or update) and there is no communication (“silence”) over the network. In addition, the correctness of the algorithm is provided. The distributed diagnosis algorithm RTG has been evaluated on several real-world examples and the algorithm was found to be superior to the centralized diagnosis algorithm RTC for large systems with many subsystems.