Probabilistic risk assessments (PRAs) are known within the risk assessment community. A PRA is defined as a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity. PRAs are generally in the form of time independent analyses, such as fault tree analyses (FTAs) and event tree analyses (ETAs).
In FTAs, elements representing various faults are connected through logic gates (AND gates, OR gates, etc.) and assigned a probability of failure. In ETAs, elements represent various failure events that logically branch into effects caused by those failures. Due to the unique combination of events and logic, a failure probability can be determined for that particular configuration. However, these techniques fail when system complexity is increased beyond a certain threshold, when common cause failures occur, or when probabilities of failure rely on variables that are intrinsically time dependent.
Other risk assessment techniques, that may or may not be probabilistic, include such analyses as Failure Mode and Effect Analyses (FMEAs) and Failure Mode, Effect and Criticality Analyses (FMECAs.) These analyses techniques use detailed information about the various parts of a system to determine what can fail, how it can fail, and the effect on the overall system when the various parts fail in particular fashions. Sometimes these analyses will include probability of failure as one of the attributes of the analysis. However, these analyses only occasionally take time variables into account, and when they do so, only in the most rudimentary fashion. The main purpose of an FMEA or FMECA is to ensure that each possible failure mode is discovered and analyzed. Determining failure probabilities with FMEAs or FMECAs can be performed, however the analysis is inefficient. More usually, FMEA or FMECA analyses are used to develop fault trees that are used to determine probability of failure.
A further type of failure analysis is a Functional Hazard Analysis (FHA.) This analysis is a top down analysis that develops the generic functions that a system performs, and then delineates the system failures that could cause those functions to fail. This type of analysis is not probabilistic in nature nor is the analysis performed in the time domain.
Recovery simulation is generally abstracted from historic event records. Techniques include using ‘mean time to recover’ (MTTR) data from maintenance records, synthesizing recovery times based on surveys, and other techniques designed to determine the recovery time based on past performance. These techniques are generally useful for situations where repairs are conducted as part of general maintenance schemes or where there are no unusual situations that could affect the repair times. These techniques do not provide good results in situations where unforeseen events affect the repair operation, in situations where the base assumptions on which the data is collected are not valid, nor in situations that have not occurred in the past.
The nuclear power industry has used a technique wherein they add recover and probability of recovery events to fault trees to simulate repair actions and those event's affect on the operation of the system. They also have used a technique wherein they use a rules based heuristic to allow for the deletion of parts of the fault tree or certain cut sets. While these techniques provide an improvement on assuming a straight MTTR recovery analysis approach, it is still based on historic data and operator action based on known past events. The nuclear power industry also uses a time series Monte Carlo simulation approach to determining recovery (or non-recovery) times for certain conditions. This simulation compares non-probabilistic recovery times to mission completion times to determine if the repair impacts the total time to recovery. The shortfalls of this approach are that it requires historic recovery time benchmarks as well as not being truly conducted in the time domain.
According to one aspect of the invention, there is provided a risk assessment system, which includes a plurality of elements each having an attribute for determining if an event causes the respective element to fail. The risk assessment system also includes a repair component configured to repair each of the plurality of elements that has failed. The risk assessment system further includes an event generation component configured to generate an event to effect repair of the plurality of elements that have failed. The repair component performs a particular repair of each of the failed elements based on the event generated by the event generation component.
According to another aspect of the invention, there is provided a risk assessment method for performing risk assessment on a system. The method includes assigning an attribute to a plurality of elements, for determining if an event causes the respective element to fail. The method also includes determining whether or not the event has occurred in the system, and which of the plurality of elements has failed. The method further includes repairing, by a repair component, each of the plurality of elements that has failed. The repairing step includes generating a particular event to effect repair of the plurality of elements that have failed.
According to yet another aspect of the invention, there is provided a computer program product executable on a general purpose computer, the computer program product being stored in a computer readable medium, and, when executed on the general purpose computer, causing the general purpose computer to perform steps of assigning an attribute to a plurality of elements, for determining if an event causes the respective element to fail; determining whether or not the event has occurred in the system, and which of the plurality of elements has failed; repairing, by a repair component, each of the plurality of elements that has failed, wherein the repairing step includes generating a particular event to effect repair of the plurality of elements that have failed.
The present invention generally relates to risk assessment tools used in risk modeling. More particularly, the present invention relates to introducing time domain aspects into the analysis of probabilistic risk assessments, and the treatment of highly complex relationships, including the interaction of disparate network types, and the interaction of repair simulation activities, in software tools for performing a time domain probabilistic risk assessment.
Highly complex models typically are analyzed by separating the simulation into several parts that are generally considered to be independent. Further, typical risk analyses rely on the assumption that the systems under analysis are determinant, that is, that a single perturbation to the system results in a particular effect on that system, independent of other variables, including time. This analysis method is flawed in that many highly complex systems cannot be considered independently if the analysis is to produce realistic results.
Time is often the variable of interest in determining system failure and reconstitution effects. Typical risk analyses, including techniques such as fault tree analyses, common cause analyses, zonal analyses and failure modes and effects analyses do not take time into account as discussed earlier. Often, this deficit must be overcome by the user by performing the analyses several times to account for time domain variation. While this approach may yield rough estimates of event differences based on time variation, in many cases this is insufficient to produce results with the requisite accuracy.
A further problem with typical system analysis techniques is that many complex systems are not determinant. Several variables and levels of variables must be considered simultaneously before an accurate determination of the effect of a single variable change can be performed. Current analysis techniques often require a system model to be simplified into a determinant configuration before analysis can proceed, often through the use of cut sets or other techniques that simplify the analysis to the point that it can be mathematically analyzed.
Furthermore, when highly complex systems fail in unusual or unexpected manners, pre-existing algorithms or processes for determining repair times can fail to accurately analyze repair times required to reconstitute a complex system to an operational state. Typical repair algorithms rely on historically derived means, such as mean time before failure (MTBF) and mean time to repair (MTTR) lists to determine repair times. However, these figures produce flawed results when the scope of the repair falls outside the bounds of the environment in which the time figures were collected. Many serious emergency situations where repair assets are outside those bounds need to be analyzed. In many cases, repair and reconstitution times will be the most important outcomes of such analyses.
The following detailed description of embodiments of the present system refers to the accompanying drawings that illustrate exemplary embodiments. Other inventions are possible, and modifications may be made to the embodiments within the spirit and scope of the invention.
The present system may be implemented in many different embodiments of hardware, software, firmware, and/or the entities illustrated in the figures. Any actual software code with specialized, controlled hardware to implement the present system is not intended to limit the scope of the present invention which includes all alternatives, variations, and modifications that would be known to those skilled in the art. The operation and behavior of the present system will be described with the understanding that all such modifications and variations of the embodiments would be recognized by those skilled in the art.
Repair Capabilities:
In certain embodiments of the present system, the architecture of the Repair feature depends heavily on the Event architecture. A failure is indicated by a special attribute of an element, but the attribute is set as a result of an event. The modeler may specify a time event or a condition event to set the failure attribute.
After a failure occurs, the repair is also controlled by events. Whether the element self-repairs or a repair agent is required, an event (or events) must be generated to effect the repair.
In certain embodiments, the repair agent is an Element that is not visible in the model. The repair agent can be flowed through ports (which allow input/output to elements) to the failed Element or to follow diagnosis trees to other elements.
As shown in
These processes are driven by events. The processes are represented by a collection of the events, their destination objects and the resulting actions taken by the destination object. Therefore, an event (or an event caused by a trigger) received by a particular object causes that object to perform some action(s). An example of event-object interaction in a risk assessment system of the first embodiment is shown in
The following code fragment describes one possible implementation of what the RepairAgent does on receipt of a DIAGNOSE event.
The RepairManager.doExhaustive flag is settable by the modeler and indicates whether a RepairAgent will stop at intermediate Elements when returning from a fixed Element to the original fixed Element
One node of a diagnosis tree can be a command to flow to a 2nd, connected Element, to execute the diagnosis tree there. The RepairAgent returns to the 1st Element, to the same node, if executing the remote 2nd diagnosis tree didn't fix the first Element.
Flow
To flow an element between two elements, connected by a pair of ports, use is now made of an underlying architecture (for example, the EDS architecture), rather than [pre-flow|flow|post-flow] reconciliation. A list of actions associated with a Flow is shown in the table below followed by code fragments illustrating one implementation of the primitives for implement the Flow object (elements and ports).
flowObject(IElement obj) implementation
There is a modeling mechanism that allows elements to communicate their attributes. This will utilize the EDS subsystem by defining an extension to the event types.
This feature represents the realization of two outstanding requirements, namely instantaneous flows (Δt=0) and attribute interrogation (inter-element communication).
Examples such as these, utilizing the actual modeling infrastructure to achieve desirable executable artefacts (network traversal, instantaneous flows, attribute interrogation)—rather than simply leaning on construction medium (programming language) facilities, are important to establish a simulation, rather than merely an animation, of a model.
Repair Agent Mobility
Riding on the back of the flow mechanism (and, in turn, on the EDS architecture), there is an additional requirement for repair agents to be able to navigate/traverse a model independent of the actual network itself.
For example, an electrical network may connect a variety of devices, but the decisions and movement of a repair agent may not necessarily follow ‘the wires’, although they are likely to be guided by knowledge of the patterns of connection in the network.
To this end, repair agents are granted the ability to construct transient connections between elements for them to traverse a network. The actual traversal is accomplished via the standard flow mechanism, through these transient connections. Each connection will last at least as long as it takes for the flow to complete, after which it may be discarded. Repair agents will uniquely type these connections by defining their network type to be “REPAIR” as illustrated by the diagram of
An event is defined by the following properties:
The simulation method then includes the following processing logic:
Events may also be allowed to be triggered by conditions other than time: A trigger is defined by the following properties
Event conditions may include attribute values, and may include, as a minimum:
The software of the present system uses a run-time architecture concerned with the management of “Events” and “Triggers”
“Execution Lifespan” (or “Runtime”) is defined as being the length of the event queue as processed from top (earliest) to bottom (latest). This need not be (in fact, is it highly unlikely to be) constant, as both events and triggers can add events to the queue.
A “Scheduler” manages both the event queue and trigger bucket. It is responsible for the processing and management of both events and triggers. Typically, a user specifies at design-time
As shown generally in
A “Scheduler” component is responsible for the maintenance (registration and updating) of events and triggers, and provide a consistent interface for doing so. Elements, in turn, implement a common interface for this scheduler to use to notify them of events.
To localize the creation and registration of events (with the scheduler), and to remain consistent with the ObjectFactory architecture, the “factory” pattern can be followed to design an “EventFactory” component that can be initialized from a separate library, so that this component can be optionally included with any application built upon the present system, should event processing be required.
The present system's object hierarchy may be restructured to resemble the following, where indented sub-items indicate ‘containment’ and pluralization indicates a ‘one-to-many’ relationship with the parent container:
Some of the features of the present system include a software based simulation program using time domain probabilistic risk analysis techniques.
Other features include carrying one or more sequences of one or more instructions for execution by one or more processors, the instructions when executed by the one or more processors, cause the one or more processors to perform a time domain probabilistic risk analysis on a network model stored in a computer memory and to perform a multiple step state analysis wherein the state of the previous step is updated based on effects determined from the previous step and the current network information flows; automatically update and record the state of the elements of the network model based on the outcome of each step of the analysis; record and save to a storage medium variables or states of interest; and output to a user readable form variables or states of interest.
Another feature is providing a software based simulation program including detailed repair asset simulation capabilities.
Another feature includes carrying one or more sequences of one or more instructions for execution by one or more processors, the instructions when executed by the one or more processors: provide detailed element level simulation capability of repair assets; allows for the automatic use of detailed repair assets in a time domain probabilistic risk assessment wherein the variability of values used by the repair assets is accounted for automatically; wherein the aforementioned variability of values used by the repair assets can be manipulated by the user as desired; and wherein the repair assets provide for a realistic repair time determination based on known repair quantities and procedures.
Another feature includes a software based simulation program including element to element level interactions between differing but interconnected infrastructure models (for example, between a model of an electrical grid and a model of a water supply system). Interactions from two or more interconnected infrastructure models may be obtained using the software based simulation program.
Another feature provides for carrying out one or more sequences of one or more instructions for execution by one or more processors, the instructions when executed by the one or more processors, cause the one or more processors to perform a time domain probabilistic risk analysis on a network model stored in a computer memory and to: provide network elements that can be assembled into recognizable facsimiles of existing physical and/or non-physical networks; and provide interaction capability between existing physical and/or non-physical networks that are generally considered to belong to different infrastructure sets.
One skilled in the art would recognize that a typical computer system connected to an electronic network could be used to implement the system described herein. It should be appreciated that many other similar configurations are within the abilities of one skilled in the art and it is contemplated that all of these configurations could be used with the methods and systems of the present invention. Furthermore, it should be appreciated that it is within the abilities of one skilled in the art to program and configure a networked computer system to implement the method steps and system of the present invention, discussed earlier herein.
The present invention also contemplates providing computer readable data storage means with program code recorded thereon (i.e., software) for implementing the method steps and system features described earlier herein. Programming the method steps or system features discussed herein using custom and packaged software is within the abilities of those skilled in the art in view of the teachings disclosed herein.
Other embodiments of the invention will be apparent to those skilled in the art from a consideration of the specification and the practice of the invention disclosed herein. It is intended that the specification be considered as exemplary only, with such other embodiments also being considered as a part of the invention in light of the specification and the features of the invention disclosed herein.
This application claims priority to U.S. provisional patent application Ser. Nos. 60/717,581, filed Sep. 15, 2005, and 60/799,338, filed May 11, 2006, both of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60717581 | Sep 2005 | US | |
60799338 | May 2006 | US |