Claims
- 1. A method for diagnosis of a system made up of a plurality of interlinked modules, comprising:
receiving an alarm from the system indicative of a fault in one of the modules; responsive to the alarm, constructing a causal network associating the fault with malfunctions in one or more of the modules that may have led to the fault and relating a conditional probability of the fault to respective probabilities of the malfunctions; based on the alarm and the causal network, updating at least one of the probabilities of the malfunctions; and proposing a diagnosis of the alarm responsive to the updated probabilities.
- 2. A method according to claim 1, wherein receiving the alarm comprises gathering event reports from the plurality of the modules in the system, and extracting the alarm from the event reports.
- 3. A method according to claim 2, wherein gathering the event reports comprises receiving a report of a change in configuration of the system, and wherein constructing the causal network comprises constructing the causal network based on the changed configuration.
- 4. A method according to claim 3, wherein constructing the causal network based on the changed configuration comprises maintaining a database in which the configuration is recorded, and updating the database responsive to the report of the change in the configuration, for use in constructing the causal network.
- 5. A method according to claim 2, wherein extracting the alarm comprises extracting a sequence of alarms occurring at mutually proximal times, including the alarm indicative of the fault in the one of the modules, and wherein updating the at least one of the probabilities comprises processing the sequence of the alarms so as to update the probabilities.
- 6. A method according to claim 5, wherein extracting the sequence of the alarms comprises defining respective lifetimes for the alarms, responsive to expected delays in receiving the alarms from the system, and selecting the alarms to extract from the sequence responsive to the respective lifetimes.
- 7. A method according to claim 6, wherein selecting the alarms to extract comprises selecting the alarms that occurred within their respective lifetimes of a time of occurrence of the alarm indicative of the fault in the one of the modules responsive to which the causal network is constructed.
- 8. A method according to claim 5, wherein constructing the causal network comprises defining an expected alarm that would be caused by one of the malfunctions in the one or more of the modules, and wherein processing the sequence of the alarms comprises updating the probabilities responsive to an occurrence of the expected alarm in the extracted sequence of alarms.
- 9. A method according to claim 5, wherein constructing the causal network comprises defining a template comprising a group of nodes in the network corresponding to a category of the modules in the system and an expected alarm that would be caused by one of the malfunctions in the modules in the category, and instantiating the template in the causal network responsive to an occurrence of the expected alarm in the extracted sequence of alarms.
- 10. A method according to claim 1, wherein the plurality of interlinked modules comprises multiple instances of a given one of the modules interlinked in a regular pattern, and wherein constructing the causal network comprises defining a template comprising a group of nodes in the network corresponding to the given one of the modules, and instantiating the template with respect to one or more of the modules responsive to the alarm.
- 11. A method according to claim 10, wherein defining the template comprises identifying an expected alarm that would be caused by one of the malfunctions in one of the instances of the given one of the modules, and wherein instantiating the template comprises adding an instance of the template to the network responsive to an occurrence of the expected alarm.
- 12. A method according to claim 1, wherein constructing the causal network comprises identifying a local fault condition in the one of the modules in which the fault occurred, and responsive to the local fault condition, linking the fault in the causal network to one of the malfunctions occurring in the one of the modules.
- 13. A method according to claim 1, wherein constructing the causal network comprises identifying a first fault condition occurring in a first one of the modules due to a connection with a second one of the modules in the system, and responsive to the first fault condition, linking the fault in the causal network with a second fault condition occurring in the second one of the modules.
- 14. A method according to claim 13, wherein linking the fault comprises determining that a possible cause of the second fault condition is due a further connection between the second one of the modules and a third one of the modules in the system, and responsive to the further connection, linking the fault in the causal network with a third fault condition occurring in the third one of the modules.
- 15. A method according to claim 1, wherein constructing the causal network comprises adding to the causal network multiple occurrences of one of the malfunctions responsive to the respective probabilities of the malfunctions, and linking the fault in the causal network to the multiple occurrences.
- 16. A method according to claim 15, wherein linking the fault to the multiple occurrences comprises determining one or more fault conditions that are caused by each of the occurrences, and linking at least some of the fault conditions to the fault.
- 17. A method according to claim 1, wherein updating the at least one of the probabilities of the malfunctions comprises assessing a mean time between failures of the one or more of the modules.
- 18. A method according to claim 1, wherein the probabilities of the malfunctions are defined in terms of a probability distribution having a mean and a moment, and wherein updating the at least one of the probabilities comprises reassessing the mean and the moment of the distribution.
- 19. A method according to claim 18, wherein the probability distribution comprises a failure rate distribution, and wherein reassessing the mean and the moment comprises updating the failure rate distribution using a Bayesian Reliability Theory model.
- 20. A method according to claim 1, wherein proposing the diagnosis comprises comparing one or more of the updated probabilities to a predetermined threshold, and invoking diagnostic action when the one of the probabilities exceeds the threshold.
- 21. A method according to claim 20, wherein invoking the diagnostic action comprises notifying a user of the system of the diagnosis.
- 22. A method according to claim 21, wherein notifying the user comprises providing an explanation of the diagnosis based on the causal network.
- 23. A method according to claim 20, wherein invoking the diagnostic action comprises performing a diagnostic test to verify the malfunctions, wherein the test is selected responsive to the one of the probabilities exceeding the threshold.
- 24. A method according to claim 23, and comprising modifying the causal network responsive to a result of the diagnostic test.
- 25. A method for diagnosis of a system made up of a plurality of interlinked modules, comprising:
constructing a causal network associating a fault in one of the modules with malfunctions in two or more of the modules that may have led to the fault and relating a conditional probability of the fault to respective probability distributions of the malfunctions; responsive to an alarm from the system indicative of the fault, updating the probability distributions of the malfunctions; and proposing a diagnosis of the alarm responsive to the updated probabilities.
- 26. A method according to claim 25, wherein updating the probability distributions comprises assessing a mean time between failures of the two or more of the modules.
- 27. A method according to claim 25, wherein the probability distributions have a mean and a moment, and wherein updating the probability distributions comprises reassessing the mean and the moment of the distribution.
- 28. A method according to claim 27, wherein the probability distributions comprise failure rate distributions, and wherein reassessing the mean and the moment comprises updating the failure rate distributions using a Bayesian Reliability Theory model.
- 29. A method according to claim 25, wherein the two or more of the modules comprise the one of the modules in which the fault occurred, and wherein constructing the causal network comprises identifying a local fault condition in the one of the modules, and responsive to the local fault condition, linking the fault in the causal network to one of the malfunctions occurring in the one of the modules.
- 30. A method according to claim 25, wherein the two or more of the modules comprise first and second modules, and wherein constructing the causal network comprises identifying a first fault condition occurring in the first module due to a connection in the system with the second module, and responsive to the first fault condition, linking the fault in the causal network with a second fault condition occurring in the second module.
- 31. A method according to claim 30, wherein the two or more of the modules comprise a third module, and wherein linking the fault comprises determining that a possible cause of the second fault condition is due a further connection in the system between the second module and the third module, and responsive to the further connection, linking the fault in the causal network with a third fault condition occurring in the third module.
- 32. Apparatus for diagnosis of a system made up of a plurality of interlinked modules, the apparatus comprising a diagnostic processor, which is coupled to receive an alarm from the system indicative of a fault in one of the modules and which is arranged, responsive to the alarm, to construct a causal network associating the fault with malfunctions in one or more of the modules that may have led to the fault and relating a conditional probability of the fault to respective probabilities of the malfunctions, and based on the alarm and the causal network, to update at least one of the probabilities of the malfunctions so as to propose a diagnosis of the alarm responsive to the updated probabilities.
- 33. Apparatus according to claim 32, wherein the processor is linked to receive event reports from the plurality of the modules in the system, and to extract the alarm from the event reports.
- 34. Apparatus according to claim 33, wherein the event reports comprise a report of a change in configuration of the system, and wherein the processor is arranged to construct the causal network based on the changed configuration.
- 35. Apparatus according to claim 34, and comprising a memory, containing a database in which the configuration is recorded, and wherein the processor is coupled to update the database responsive to the report of the change in the configuration, for use in constructing the causal network.
- 36. Apparatus according to claim 33, wherein the processor is coupled to extract a sequence of alarms occurring at mutually proximal times, including the alarm indicative of the fault in the one of the modules, and to process the sequence of the alarms so as to update the probabilities.
- 37. Apparatus according to claim 36, wherein respective lifetimes are defined for the alarms, responsive to expected delays in receiving the alarms from the system, and wherein the processor is arranged to select the alarms to extract from the sequence responsive to the respective lifetimes.
- 38. Apparatus according to claim 37, wherein the processor is arranged to select the alarms that occurred within their respective lifetimes of a time of occurrence of the alarm indicative of the fault in the one of the modules responsive to which the causal network is constructed.
- 39. Apparatus according to claim 36, wherein in constructing the causal network, the processor is arranged to define an expected alarm that would be caused by one of the malfunctions in the one or more of the modules, and wherein the processor is further arranged to update the probabilities responsive to an occurrence of the expected alarm in the extracted sequence of alarms.
- 40. Apparatus according to claim 36, wherein a template is defined comprising a group of nodes in the network corresponding to a category of the modules in the system and an expected alarm that would be caused by one of the malfunctions in the modules in the category, and wherein the processor is arranged to instantiate the template in the causal network responsive to an occurrence of the expected alarm in the extracted sequence of alarms.
- 41. Apparatus according to claim 32, wherein the plurality of interlinked modules comprises multiple instances of a given one of the modules interlinked in a regular pattern, and wherein a template is defined comprising a group of nodes in the network corresponding to the given one of the modules, and wherein the processor is arranged to instantiate the template with respect to one or more of the modules responsive to the alarm.
- 42. Apparatus according to claim 41, wherein the template comprises an expected alarm that would be caused by one of the malfunctions in one of the instances of the given one of the modules, and wherein the processor is arranged to instantiate the template by adding an instance of the template to the network responsive to an occurrence of the expected alarm.
- 43. Apparatus according to claim 32, wherein the processor is arranged to identify a local fault condition in the one of the modules in which the fault occurred, and responsive to the local fault condition, to link the fault in the causal network to one of the malfunctions occurring in the one of the modules.
- 44. Apparatus according to claim 32, wherein the processor is arranged to identify a first fault condition occurring in a first one of the modules due to a connection with a second one of the modules in the system, and responsive to the first fault condition, to link the fault in the causal network with a second fault condition occurring in the second one of the modules.
- 45. Apparatus according to claim 44, wherein the processor is arranged to determine that a possible cause of the second fault condition is due a further connection between the second one of the modules and a third one of the modules in the system, and responsive to the further connection, to link the fault in the causal network with a third fault condition occurring in the third one of the modules.
- 46. Apparatus according to claim 32, wherein the processor is arranged to add to the causal network multiple occurrences of one of the malfunctions responsive to the respective probabilities of the malfunctions, and to link the fault in the causal network to the multiple occurrences.
- 47. Apparatus according to claim 46, wherein the processor is arranged to determine one or more fault conditions that are caused by each of the occurrences, and to link at least some of the fault conditions to the fault.
- 48. Apparatus according to claim 32, wherein the at least one of the probabilities of the malfunctions is expressed as a mean time between failures of the one or more of the modules.
- 49. Apparatus according to claim 32, wherein the probabilities of the malfunctions are defined in terms of a probability distribution having a mean and a moment, and wherein the processor is arranged to update the mean and the moment of the distribution.
- 50. Apparatus according to claim 49, wherein the probability distribution comprises a failure rate distribution, and wherein the processor is arranged to update the failure rate distribution using a Bayesian Reliability Theory model.
- 51. Apparatus according to claim 32, wherein the processor is arranged to compare one or more of the updated probabilities to a predetermined threshold, and to invoke diagnostic action when the one of the probabilities exceeds the threshold.
- 52. Apparatus according to claim 51, and comprising a user interface, wherein the processor is coupled to notify a user of the system of the diagnosis via the user interface.
- 53. Apparatus according to claim 52, wherein the processor is arranged to provide via the user interface an explanation of the diagnosis based on the causal network.
- 54. Apparatus according to claim 51, wherein the diagnostic action comprises a diagnostic test that is performed to verify the malfunctions, wherein the test is selected responsive to the one of the probabilities exceeding the threshold.
- 55. Apparatus according to claim 54, wherein the processor is arranged to modify the causal network responsive to a result of the diagnostic test.
- 56. Apparatus for diagnosis of a system made up of a plurality of interlinked modules, the apparatus comprising a diagnostic processor, which is arranged to construct a causal network associating a fault in one of the modules with malfunctions in two or more of the modules that may have led to the fault and relating a conditional probability of the fault to respective probability distributions of the malfunctions, and responsive to an alarm from the system indicative of the fault, to update the probability distributions of the malfunctions so as to propose a diagnosis of the alarm responsive to the updated probabilities.
- 57. Apparatus according to claim 56, wherein the probability distributions are indicative of a mean time between failures of the two or more of the modules.
- 58. Apparatus according to claim 56, wherein the probability distributions have a mean and a moment, and wherein the processor is arranged to reassess the mean and the moment of the distribution responsive to the alarm.
- 59. Apparatus according to claim 58, wherein the probability distributions comprise failure rate distributions, and wherein the processor is arranged to update the failure rate distributions using a Bayesian Reliability Theory model.
- 60. Apparatus according to claim 56, wherein the two or more of the modules comprise the one of the modules in which the fault occurred, and wherein the processor is arranged to identify a local fault condition in the one of the modules, and responsive to the local fault condition, to link the fault in the causal network to one of the malfunctions occurring in the one of the modules.
- 61. Apparatus according to claim 56, wherein the two or more of the modules comprise first and second modules, and wherein the processor is arranged to identify a first fault condition occurring in the first module due to a connection in the system with the second module, and responsive to the first fault condition, to link the fault in the causal network with a second fault condition occurring in the second module.
- 62. Apparatus according to claim 61, wherein the two or more of the modules comprise a third module, and wherein the processor is arranged to determine that a possible cause of the second fault condition is due a further connection in the system between the second module and the third module, and responsive to the further connection, to link the fault in the causal network with a third fault condition occurring in the third module.
- 63. A computer software product for diagnosis of a system made up of a plurality of interlinked modules, the product comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive an alarm from the system indicative of a fault in one of the modules and, responsive to the alarm, to construct a causal network associating the fault with malfunctions in one or more of the modules that may have led to the fault and relating a conditional probability of the fault to respective probabilities of the malfunctions, and based on the alarm and the causal network, to update at least one of the probabilities of the malfunctions so as to propose a diagnosis of the alarm responsive to the updated probabilities.
- 64. A product according to claim 63, wherein the instructions cause the computer to receive event reports from the plurality of the modules in the system, and to extract the alarm from the event reports.
- 65. A product according to claim 64, wherein the event reports comprise a report of a change in configuration of the system, and wherein the instructions cause the computer to construct the causal network based on the changed configuration.
- 66. A product according to claim 65, wherein the instructions cause the computer, responsive to the report of the change in the configuration, to update a database in which the configuration is recorded, for use in constructing the causal network.
- 67. A product according to claim 64, wherein the instructions cause the computer to extract a sequence of alarms occurring at mutually proximal times, including the alarm indicative of the fault in the one of the modules, and to process the sequence of the alarms so as to update the probabilities.
- 68. A product according to claim 67, wherein respective lifetimes are defined for the alarms, responsive to expected delays in receiving the alarms from the system, and wherein the instructions cause the computer to select the alarms to extract from the sequence responsive to the respective lifetimes.
- 69. A product according to claim 68, wherein the instructions cause the computer to select the alarms that occurred within their respective lifetimes of a time of occurrence of the alarm indicative of the fault in the one of the modules responsive to which the causal network is constructed.
- 70. A product according to claim 67, wherein the instructions cause the computer, in constructing the causal network, to define an expected alarm that would be caused by one of the malfunctions in the one or more of the modules, and to update the probabilities responsive to an occurrence of the expected alarm in the extracted sequence of alarms.
- 71. A product according to claim 67, wherein a template is defined comprising a group of nodes in the network corresponding to a category of the modules in the system and an expected alarm that would be caused by one of the malfunctions in the modules in the category, and wherein the instructions cause the computer to instantiate the template in the causal network responsive to an occurrence of the expected alarm in the extracted sequence of alarms.
- 72. A product according to claim 63, wherein the plurality of interlinked modules comprises multiple instances of a given one of the modules interlinked in a regular pattern, and wherein a template is defined comprising a group of nodes in the network corresponding to the given one of the modules, and wherein the instructions cause the computer to instantiate the template with respect to one or more of the modules responsive to the alarm.
- 73. A product according to claim 72, wherein the template comprises an expected alarm that would be caused by one of the malfunctions in one of the instances of the given one of the modules, and wherein the instructions cause the computer to instantiate the template by adding an instance of the template to the network responsive to an occurrence of the expected alarm.
- 74. A product according to claim 63, wherein the instructions cause the computer to identify a local fault condition in the one of the modules in which the fault occurred, and responsive to the local fault condition, to link the fault in the causal network to one of the malfunctions occurring in the one of the modules.
- 75. A product according to claim 63, wherein the instructions cause the computer to identify a first fault condition occurring in a first one of the modules due to a connection with a second one of the modules in the system, and responsive to the first fault condition, to link the fault in the causal network with a second fault condition occurring in the second one of the modules.
- 76. A product according to claim 75, wherein the instructions cause the computer to determine that a possible cause of the second fault condition is due a further connection between the second one of the modules and a third one of the modules in the system, and responsive to the further connection, to link the fault in the causal network with a third fault condition occurring in the third one of the modules.
- 77. A product according to claim 63, wherein the instructions cause the computer to add to the causal network multiple occurrences of one of the malfunctions responsive to the respective probabilities of the malfunctions, and to link the fault in the causal network to the multiple occurrences.
- 78. A product according to claim 77, wherein the instructions cause the computer to determine one or more fault conditions that are caused by each of the occurrences, and to link at least some of the fault conditions to the fault.
- 79. A product according to claim 63, wherein the at least one of the probabilities of the malfunctions is expressed as a mean time between failures of the one or more of the modules.
- 80. A product according to claim 63, wherein the probabilities of the malfunctions are defined in terms of a probability distribution having a mean and a moment, and wherein the instructions cause the computer to update the mean and the moment of the distribution.
- 81. A product according to claim 80, wherein the probability distribution comprises a failure rate distribution, and wherein the instructions cause the computer to update the failure rate distribution using a Bayesian Reliability Theory model.
- 82. A product according to claim 63, wherein the instructions cause the computer to compare one or more of the updated probabilities to a predetermined threshold, and to invoke diagnostic action when the one of the probabilities exceeds the threshold.
- 83. A product according to claim 82, and wherein the instructions cause the computer to notify a user of the system of the diagnosis.
- 84. A product according to claim 83, wherein the instructions cause the computer to provide to the user an explanation of the diagnosis based on the causal network.
- 85. A product according to claim 82, wherein the diagnostic action comprises a diagnostic test that is performed to verify the malfunctions, wherein the test is selected responsive to the one of the probabilities exceeding the threshold.
- 86. A product according to claim 85, wherein the instructions cause the computer to modify the causal network responsive to a result of the diagnostic test.
- 87. A product for diagnosis of a system made up of a plurality of interlinked modules, the product comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to construct a causal network associating a fault in one of the modules with malfunctions in two or more of the modules that may have led to the fault and relating a conditional probability of the fault to respective probability distributions of the malfunctions, and responsive to an alarm from the system indicative of the fault, to update the probability distributions of the malfunctions so as to propose a diagnosis of the alarm responsive to the updated probabilities.
- 88. A product according to claim 87, wherein the probability distributions are indicative of a mean time between failures of the two or more of the modules.
- 89. A product according to claim 87, wherein the probability distributions have a mean and a moment, and wherein the instructions cause the computer to reassess the mean and the moment of the distribution responsive to the alarm.
- 90. A product according to claim 89, wherein the probability distributions comprise failure rate distributions, and wherein the instructions cause the computer to update the failure rate distributions using a Bayesian Reliability Theory model.
- 91. A product according to claim 87, wherein the two or more of the modules comprise the one of the modules in which the fault occurred, and wherein the instructions cause the computer to identify a local fault condition in the one of the modules, and responsive to the local fault condition, to link the fault in the causal network to one of the malfunctions occurring in the one of the modules.
- 92. A product according to claim 87, wherein the two or more of the modules comprise first and second modules, and wherein the instructions cause the computer to identify a first fault condition occurring in the first module due to a connection in the system with the second module, and responsive to the first fault condition, to link the fault in the causal network with a second fault condition occurring in the second module.
- 93. A product according to claim 87, wherein the two or more of the modules comprise a third module, and wherein the instructions cause the computer to determine that a possible cause of the second fault condition is due a further connection in the system between the second module and the third module, and responsive to the further connection, to link the fault in the causal network with a third fault condition occurring in the third module.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 60/214,971, filed Jun. 29, 2000, which is incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60214971 |
Jun 2000 |
US |