The subject matter disclosed herein relates to maintenance data systems, and in particular to latency tolerant fault isolation in a maintenance data system.
Real-time health or maintenance monitoring in a complex system can involve monitoring thousands of inputs as evidence of a potential fault or maintenance issue. A complex system can involve many subsystems which may have individual failure modes and cross-subsystem failure modes. Simple fault identification provided by built-in tests can be helpful in identifying localized issues but may also represent symptoms of larger-scale issues that involve other subsystems or components. For example, detecting a temperature fault in a hydraulic line could result from a sensor error, an electrical connector issue, a hydraulic fluid leak, environmental factors, an actuator fault, or other factors. Isolating and identifying the most likely source of a fault and associated maintenance actions to address the fault can be challenging in a complex system, particularly when performed as a real-time process.
According to one aspect of the invention, a method of latency tolerant fault isolation is provided. The method includes receiving, by a maintenance data computer, evidence associated with a test failure. The maintenance data computer accesses metadata to identify a system failure mode associated with the evidence and other potential evidence associated with the system failure mode. The maintenance data computer determines a maximum predicted latency to receive the potential evidence associated with the system failure mode based on the metadata. The method also includes waiting up to the maximum predicted latency to determine whether one or more instances of the potential evidence associated with the system failure mode are received as additional evidence. The maintenance data computer diagnoses the system failure mode as a fault based on the evidence and the additional evidence.
According to another aspect of the invention, a system for latency tolerant fault isolation is provided. The system includes a plurality of monitored subsystems and a maintenance data computer coupled to the monitored subsystems. The maintenance data computer includes a processing circuit configured to receive evidence associated with a test failure. Metadata is accessed to identify a system failure mode associated with the evidence and other potential evidence associated with the system failure mode. A maximum predicted latency to receive the potential evidence associated with the system failure mode based on the metadata is determined. The processing circuit is further configured to wait up to the maximum predicted latency to determine whether one or more instances of the potential evidence associated with the system failure mode are received as additional evidence. The system failure mode is diagnosed as a fault based on the evidence and the additional evidence.
Another aspect includes a non-transitory computer-readable medium, having stored thereon program code which, when executed, controls a maintenance data computer to perform a method. The method includes receiving evidence associated with a test failure. The maintenance data computer accesses metadata to identify a system failure mode associated with the evidence and other potential evidence associated with the system failure mode. The maintenance data computer determines a maximum predicted latency to receive the potential evidence associated with the system failure mode based on the metadata. The method also includes waiting up to the maximum predicted latency to determine whether one or more instances of the potential evidence associated with the system failure mode are received as additional evidence. The maintenance data computer diagnoses the system failure mode as a fault based on the evidence and the additional evidence.
These and other advantages and features will become more apparent from the following description taken in conjunction with the drawings.
The subject matter, which is regarded as the invention, is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
The detailed description explains embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.
In exemplary embodiments, a dependency model bigraph metadata model is used to identify relationships between evidence provided by monitored subsystems and potential system failure modes. The evidence may be provided by built-in tests which can run over a period of time. In order to diagnose a system failure mode as a fault, multiple pieces of evidence may be needed. Each piece of evidence may not arrive at the same time, as some failures are rapidly detected, while others have greater latency. Rather than simply looking at test results for other related failures upon identifying a failure, embodiments analyze an associated dependency matrix to determine a maximum predicted latency from the failure to additional evidence generation. A weighted bigraph can be traversed to allow all applicable latencies to elapse prior to a failure resolution decision. Once a sufficient period of time has elapsed for all potential evidence to be received, a maintenance decision can be made with a higher likelihood of accuracy. The period of time may be reduced if all evidence is received prior to reaching the maximum predicted latency. Although embodiments herein are described in terms of a vehicle-based maintenance data system, with a specific example of a rotorcraft depicted, it will be understood that embodiments can include any type of maintenance data system.
The processing circuit 202 is configured to execute program code 212 that performs a method of latency tolerant fault isolation. The program code 212 may be stored on the non-volatile memory 204 as a non-transitory computer-readable medium and executed directly from the non-volatile memory 204 or copied to the volatile memory 206 and/or to the processing circuit 202 for execution by the processing circuit 202. The processing circuit 202 executes the program code 212 that performs the functionality as previous described and further described herein.
The non-volatile memory 204 may hold metadata 214 that includes one or more sparse matrices 216 for a dependency model bigraph metadata model which relates evidence to potential system failure modes. The one or more sparse matrices 216 can be partitioned to separate data of the monitored subsystems 104 of
The one or more full matrices 218 are each a dependency model bigraph metadata model linking test failures of the monitored subsystems 104 of
Certain instances of potential evidence can impact multiple subsystem failure modes. In this example, potential evidence 302c is linked to both potential system failure mode 306b of subsystem failure 304a via link 308c and to the potential system failure mode 306c of subsystem failure 304b via link 308d. Accordingly, the subsystem failures 304a and 304b are related and can be analyzed using one full matrix, while failures and evidence associated with the subsystem failures 304c may be partitioned into a separate matrix of the one or more full matrices 218 of
If evidence 301 associated with a test failure is received that maps to potential evidence 302d, an association with the potential system failure mode 306b can be determined based on the link 308e by accessing the one or more full matrices 218 in the metadata 215 of
Where there is no other potential evidence needed for a system failure mode, the evidence is classified as strong evidence; otherwise, the evidence can be classified as weak evidence. For weak evidence, waiting up to a maximum predicted latency may be needed to determine whether one or more instances of the potential evidence associated with the system failure mode are received as additional evidence before diagnosing the system failure mode as a fault.
At block 404, the maintenance data computer 102 determines whether new evidence exists. The maintenance data computer 102 can receive evidence 301 associated with a test failure, for example, from one of the monitored subsystems 104. The maintenance data computer 102 accesses the metadata 215 to identify a system failure mode 305 associated with the evidence 301 and other potential evidence associated with the system failure mode 305.
At block 406, if new evidence is not received, then flow returns to block 404; otherwise, latency processing is performed at block 408. The maintenance data computer 102 determines a maximum predicted latency to receive the potential evidence associated with the system failure mode 305 based on the metadata 215. Using the timer 208, the maintenance data computer 102 can wait up to the maximum predicted latency to determine whether one or more instances 303 of the potential evidence associated with the system failure mode 305 are received as additional evidence.
The evidence 301 may be classified as strong evidence based on determining that there is no potential evidence associated with the system failure mode 305 based on the metadata 215. The evidence 301 may be classified as weak evidence based on determining that there is potential evidence associated with the system failure mode 305 based on the metadata 215.
At block 410, strong evidence is processed. Multiple instances of the strong evidence can be processed in parallel as there is no time dependency. At block 412, if there was only strong evidence, then the maintenance action is resolved, and flow proceeds to block 414. At block 414, the system failure mode 305 is diagnosed as a fault 310 by the maintenance data computer 102 based on the evidence 301 and a corresponding maintenance work order is generated. Flow then returns to block 404.
At block 412, if there is weak evidence, then the weak evidence is processed at block 416 after processing any instances of the strong evidence. If the weak evidence can be resolved where all corresponding instances 303 of potential evidence have been received as additional evidence, then the maintenance action is resolved at block 418 and the flow continues to block 414; otherwise, the flow returns to block 404. For weak evidence, the system failure mode 305 can be diagnosed as the fault 310 prior to waiting for the maximum predicted latency upon receiving all of instances 303 of the potential evidence associated with the system failure mode 305.
Technical effects include providing enhanced fault isolation by accounting for variations in latency between identifying evidence and other related instances of potential evidence associated with a system failure mode before declaring a fault. Embodiments of the invention encompass performing latency tolerant fault isolation on a maintenance data computer. Embodiments also relate to computer-readable media, such as memory, flash chips, flash drives, hard disks, optical disks, magnetic disks, or any other type of computer-readable media capable of storing a computer program to perform latency tolerant fault isolation on a maintenance data computer.
While the invention has been described in detail in connection with only a limited number of embodiments, it should be readily understood that the invention is not limited to such disclosed embodiments. Rather, the invention can be modified to incorporate any number of variations, alterations, substitutions or equivalent arrangements not heretofore described, but which are commensurate with the spirit and scope of the invention. Additionally, while various embodiments of the invention have been described, it is to be understood that aspects of the invention may include only some of the described embodiments. Accordingly, the invention is not to be seen as limited by the foregoing description, but is only limited by the scope of the appended claims.
This invention was made with government support under contract number N00019-06-C-0061 awarded by the United States Navy. The government has certain rights in the invention.