None.
None.
The technology herein relates to systems fault determination, and more particularly to automated systems and methods for monitoring the health of a system and automatically detecting and analyzing faults. Still more particularly, the example non-limiting technology relates to automated intervention computing systems and processes based on system intended functions, and to an integration framework for organizing and modifying procedures according to current context, which selects between different intervention definition processes using simulation models as references.
The Quantas Flight 32 accident as described in https://www.atsb.gov.au/publications/investigation_reports/2010/aair/ao-2010-089.aspx and “In-flight uncontained engine failure Airbus A380-842, VH-OQA” (Australian Government ATSB Transport Safety Report Occurrence Investigation AO-2010-089, 27 Jun. 2013) is an example of what can happen when multiple aircraft systems fail simultaneously. In that accident, which occurred in early November 2010 while climbing through 7,000 ft after departing from Changi Airport, Singapore, the flight crew heard two ‘bangs’. The aircraft had sustained an uncontained engine rotor failure (UERF) of the No. 2 engine due to a fire caused by a crack that had developed in the oil feed pipe, causing the No. 2 engine to catch on fire and begin leaking fuel. Debris from the UERF impacted other parts of the aircraft, resulting in significant structural and systems damage. For example, a turbine disc from the damaged engine rotor detached and punched a huge hole in the wing.
A number of warnings and cautions were displayed on the electronic centralized aircraft monitor (ECAM). The pilot's display indicated twenty-one of the plane's twenty-two major systems were damaged or completely disabled. As the plane's problems cascaded, the step-by-step instructions the ECAS display provided became so overwhelming that no one was certain how to prioritize or where to focus. Because so many systems were damaged, some instructions seemed to contradict other instructions.
Luckily, there happened to be additional crew on the flight deck as part of a check and training exercise, and this additional crew helped in dealing with the failure. Meanwhile, instead of trying to understand the full complexity of the failures, the captain instead began focusing his attention on a simplified mental model of the aircraft. Transcripts of the voice recorder show that the captain said at a certain point: “So forget the pumps, forget the other eight tanks, forget the total fuel quantity gauge. We need to stop focusing on what's wrong, and start paying attention to what's still working.” This was a crucial turning point in the decision-making process. Under the captain's command, the expanded flight crew managed the situation and, after completing the required actions for the multitude of system failures, safely returned to and landed at Changi Airport with no injuries.
Some in the past have tried to address the issue of automatically diagnosing complex failures such as those experienced by Quantas flight 32, but generally speaking, none of them provide a usable automation method to run multiple possibilities in parallel and select the best possibility or possibilities to provide a safety net for non-deterministic processes.
Complex safety critical systems have procedures for operator-intervention in case of failures of specific subsystems or components. Those procedures are usually defined per subsystem or component failure, such as aircraft quick reference handbooks (“QRHs”) that contain procedures such as “Engine Failure”, “Battery 1 Failure” and so on. See
This is because for a large and/or complex system, in case of complex failures involving multiple subsystems/components or unexpected operation scenarios, it is usually impossible to define procedures for each case due to rapid combinatorial explosion. This makes it difficult for operators to intervene and also makes it difficult to automate the intervention process, even with current artificial intelligence techniques, due to concerns with potential illogical and non-deterministic output(s).
The following shows an example prior art failure response protocol to demonstrate limitations of typical prior art approaches.
Example: Aircraft Environmental Control System
The atmospheric environment outside an aircraft flying at 30,000 feet might be −48 degrees Fahrenheit and only on the order of 4 pounds per square inch. Despite this hostile environment, the aircraft's air handling system components maintain pressurization of about 8 pounds per square inch and 68 degrees Fahrenheit (regulated by the flight crew) with a proper mix of oxygen to other gases including water vapor within the pressurized cabin.
In a typical aircraft, the aircraft fuselage 101 defines a flight deck 103 and cabin zones (106a-106g). The cabin zones 106 are occupied by passengers and flight deck 103 is occupied by crew. The number of occupants typically is a factor used to determine air handling system demand and ventilation requirements.
While the aircraft is flying, the engines 102, 104 provide a convenient source of pressurized hot “bleed” air to maintain cabin temperature and pressure. The normal operation of a gas turbine jet engine 102, 104 produces air that is both compressed (high pressure) and heated (high temperature). A typical gas turbine engine 102, 104 uses an initial stage air compressor to feed the engine with compressed air. Some of this compressed heated air can be “bled” the engine compressor stages and used for cabin pressurization and temperature maintenance without adversely affecting engine operation and efficiency.
During flight operation of the aircraft, bleed air sources include, but are not limited to, left engine(s) 102, right engine(s) 104, and the auxiliary power unit (APU) 116. During ground operation of the aircraft, bleed air sources include, but are not limited to, APU 116 and ground pneumatic sources 118.
Bleed air provided by the APU 116, the left engine(s) 102, and the right engine(s) 104 is supplied via bleed airflow manifold and associated pressure regulators and temperature limiters to the air conditioning units 108 of the aircraft. In this context, the term “air conditioning” is not limited to cooling but refers to preparing air for introduction into the interior of the aircraft fuselage 101. Air conditioning units 108 may also mix recirculated air from the cabin zones 106a-106g and flight deck 103 with bleed air from the previously mentioned sources. An environmental control unit controller 110 controls flow control valve(s) 114 to regulate the amount of bleed air supplied to the air conditioning units 108. Bleed valve(s) 125 are used to select the bleed sources.
Each air conditioning unit 108 typically includes a dual heat exchanger, an air cycle machine (compressor, turbine, and fan), a condenser, a water separator and related control and protective devices. Air is cooled in the primary heat exchanger and passes through the compressor, causing a pressure increase. The cooled air then goes to the secondary heat exchanger where it is cooled again. After leaving the secondary heat exchanger, the high-pressure cooled air passes through a condenser and a water separator for condensed water removal. The main bleed airstream is ducted to the turbine and expanded to provide cold airflow and power for the compressor and cooling fan. The cold airflow is mixed with warm air supplied by the recirculation fan and/or with the hot bypass bleed air immediately upon leaving the turbine.
The environmental control unit controller 110 receives input from the sensors 120 in the cabin zones 106a-106g and the flight deck 103. The pilot or crew also inputs parameters such as number of occupants and desired cabin temperature. Based on these and other parameters, the environmental control unit controller 110 calculates a proper ECS airflow target to control flow control valves 125. The ECU controller 110 provides the air conditioning unit 108 with instructions/commands/control signals 111 to control the flow control valves 125 and other aspects of the system operation. The system typically includes necessary circuitry and additional processing to provide necessary drive signals to the flow control valves 125.
Prior art
“Part 1” is directly related to the component—it is ontologically a “Component Reset”, a set of actions with the goal of restoring the state of a particular component or sub-system. When bleed air has failed, the example procedure instructs the flight crew to “push out” the affected bleed button (bleed button 1 or bleed button 2), wait one minute and then push the affected bleed button back in. The goal is to reset the bleed air valve 125 and associated support systems. The flight crew then is instructed to determine whether the “Bleed x Fail” message has been extinguished.
“Part 2” is related to a multiple failure scenario in which both bleeds 1 and 2 are affected. Part 2.1 (and Part 3 below) are ontologically “Components Isolation”, a set of actions with the goal of isolating the component or sub-system after it has been declared inoperative. Part 2.1 instructs the flight crew to push out both bleed button 1 and bleed button 2. Notice that with the component mindset, every separate combination must be analyzed and treated individually, thus making it very difficult to deal with multiple failures in large systems due to combinatorial explosion.
Part 2.2 instructs the flight crew to “exit/avoid” any icing conditions (because the bleed air used to melt ice building up on the wings and fuselage is now presumably inoperative) and hence instructs the flight crew to fly at an altitude of no more than 10,000 feet or the minimum enroute altitude (MEA), whichever is higher, to prevent icing and cabin pressure/temperature control (each of which can depend on bleed air). As is well known, MEA is the altitude for an enroute segment that provides adequate reception of relevant navigation facilities and ATS communications, complies with the airspace structure and provides the required obstacle clearance. Part 2.2 is thus ontologically linked to the Loss of function, and not to the component itself, in this case the loss of the functions “Ice protection”, and “Cabin Pressure/Temperature Control”.
Part 2.3 addresses the possible use of the APU to provide bleed air in lieu of the engines. Part 2.3 states: “If APU is available, maximum altitude for APU in-flight start is 31,000 feet; the flight crew should push the APU on/off button in; and the flight crew should push the APU START button in, thereby activating the auxiliary power unit 116. Part 2.3 is also not related to the bleed subsystem, but to the use of a redundant sub-system that can also provide some function that has been lost, in this case, the APU 116 that can also provide bleed air to pressurize and control temperature in the cabin. Ontologically it is a component activation.
Part 2.4 and part 4 are ontologically “Operational limitations” related to the new configuration of the system (APU 116 providing bleed air for 2.4 and Single Bleed for 4). Part 2.4 defines a maximum operating altitude of 20,000 feet when the APU 116 is being relied on to provide bleed air. There is also a caveat concerning landing configuration when relying on the APU 116 for bleed air.
Part 3 instructs the flight crew to push out certain buttons (i.e., the affected bleed button), and it is also a Component isolation. Part 4 specifies a maximum altitude (e.g., 35,000 feet) and asks the flight crew to determine whether icing conditions are present. If icing conditions are present, Part 4 instructs that an Anti Ice (AI) single bleed procedure is accomplished. Thus part 4 is ontologically a set of operational limitations due to the loss of a function.
The
As illustrated in the
Additionally, prior automated approaches generally do not capture the tacit knowledge of the operator. Rather, prior approaches often have a different focus, address the problem differently or do not have the same coverage (e.g., some address only limited problems such as fire/smoke events). For example:
This patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The following detailed description of exemplary non-limiting illustrative embodiments is to be read in conjunction with the drawings of which:
Example non-limiting embodiments of improved aircraft automated diagnostic and fault detection systems and methods provide the following advantageous features and advantages:
Example non-limiting embodiments propose a display or other output that is aimed to help manage abnormal situations and use its structure as a means to allow automated intervention and artificial intelligence training. The kind of tacit knowledge that will be used in specific parts of example methods of embodiments define heuristics. In this case, a “functional based” model may be used by the pilot in order to define the intervention in complex scenarios. Other models are possible such as the architectural model or the energy based model.
This application is technology agnostic and may be applied to any complex system subject to failures that needs intervention in emergency situations. Example non-limiting embodiments are structured in an agnostic manner, and therefore are applicable to any kind of complex system, such as submarines, air carriers, satellites, rockets, etc.
When this specification uses the term “function”, it is referring to a functional capability of a complex system as defined in the systems engineering field of knowledge. Examples of system functions are:
Providing Protection from Explosions, Preventing the Release of Radioactive Material, etc.
To better understanding of the non-limiting improved technology, a non-limiting application example in the aeronautical industry (an aircraft) will be described.
Example Integration Framework Overall Description
As one specific simplified example, in the case of an aircraft environmental control system of the type shown in
In the example shown,
An example first step in or function of the System Manager Intervention Process is to identify the failure. This is done by the block number (1) in
The second step is to define the intervention procedure to be applied to the system during a failure event. This is depicted in
Block 3 is the Context Identification 303. It reads context information and applies rules extracted from experienced operators to map special situations where some actions on the system are forbidden not only due to the system itself, but also due to the current context. For example, in an aircraft during a left turn, it is not recommended to shut down the left engine, because the momentum from the right engine might be too large to counteract with the rudder only. Thus, during a left engine fire, it is recommended to level the aircraft wings prior to shutting the left engine down. This kind of action (level the wings prior to shutting down the engine) would normally not be on any kind of checklist, because it is situation specific. As another example, assume the action is to descend to 10,000 ft following aircraft depressurization. If the aircraft is currently over the Himalaya mountain range with 29,000 ft ground height, the aircraft should exit this geographical area prior to descending to avoid controlled flight into terrain. This kind of rule is implemented in the Context ID block, which will later modify the procedures proposed by block 2.
Block 4 (“outcome prediction intervention definition” 304) consists of a model of the system and a reward function. The procedures provided by block 2 and modified by block 3 are simulated and the results of the simulation are compared. The best procedure in this specific scenario are chosen though the reward function. Again, the functional ontology may be used to define a suitable reward function, since the goal of the intervention is to maximize system functionality.
It is worth mentioning that when using the functional ontology for training an artificial intelligence, machine learning or a neural network or to define a reward function for selecting the best intervention, it is interesting to use a slightly different (but conceptually equivalent) structure than the one used in the System State Graph (SSG). This is to improve independence of the solutions, since an optimization algorithm will try to maximize the function and may find an illogical solution, so testing and training should have independent metrics. Also, in addition to terms related to the system functionality, other operationally related terms are included in the reward function. Examples of such terms for an aircraft would be for example, fuel consumption, time take to reach the landing site, the relationship between landing distance capability in each configuration versus the runway distances of the potential landing airports, etc. The procedures steps and the expected system behavior after each step will be passed to block 5 for execution. See for example, Krotkiewicz et al, “Conceptual Ontological Object Knowledge Base and Language”, Computer Recognition Systems pp 227-234, Advances in Soft Computingbook series (AINSC, volume 30); Cali et al, New Expressive Languages for Ontological Query Answering, Twenty-Fifth AAAI Conference on Artificial Intelligence (2011); Welty, C. (2003). Ontology Research. AI Magazine, 24(3), 11. https://doi.org/10.1609/aimag.v24i3.1714 (all incorporated herein by reference).
In the example shown, Block 5 (“Procedure Application and Outcome Matching” 305) applies the procedure on the system step by step, and after each step will check if the system behavior is as expected by the simulation. If yes, the execution continues; otherwise, an alert is issued to a human operator (that can be onboard or at a remote location) and the execution is halted, waiting for human action. In some non-limiting embodiments, block 5 serves as a safety net against internal failure in the system manager, since it checks if its own premises and control actions/responses are being satisfied in the real system under control 310. Depending on system design, not all system parameters may need to be checked in this stage, but a select group, or a custom group depending on which kind of action is being taken, may be checked instead. Also, for continuous values (such as temperatures pressures, etc.), acceptable margins of error may be included. Notice that if more than one possible failure was detected in block 1 “Failure identification”, more than one procedure may be passed by the Block 2 “Intervention definition” with more than one possible outcome. Block 5 is responsible for trying the possible procedures, and through outcome matching, define which failure has occurred. This is done by trying first the procedure for the most probable failure (informed by Block 1), and in case the outcomes do not match, revert the actions and try the next one.
Block 6 (“Simulation Station Engine” 306) is an optional part of the framework that is designed in some instances to be used only when the framework is configured to be operated by a human operator, not on autonomous use. Its function is explained in the next section.
Example Use of the Integration Framework for Autonomous Operation or as an Operation Assistant
The Integration framework can be used basically in two ways:
In some applications, it may be best if the non-limiting technology is used as an autonomous agent only after its development is mature and well tested. Minor operator intervention will be requested on the cases where the block 4 “Outcome prediction” does not find any suitable intervention, or if the block 5 “Procedure application and Outcome Matching” finds a mismatch between expected result and actual result.
Still prior to the non-limiting technology maturing or if chosen by designer, the non-limiting technology may be implemented to function as an advisor to the human operator. In this case, the direct link from the system manager to the system under control will be removed, and several displays and functionalities will be provided to serve as the system's Human-Machine-Interface (HMI). The human will have the responsibility of interacting with this HMI, reasoning and then manually interacting with the system under control. Some possible HMI functionalities are described below.
The next section will describe an example non-limiting Integration framework that can be used with one or more defined intervention methods.
Example Intervention Method Integration Framework
In order to implement a solution to manage the operation of a complex system, an integration framework is provided in order to guarantee the correct system function. The
Example Function Based Intervention Method—Ontology
The function-based Intervention method is a system ontology that can be applied to any system to manage failures. Consider that a “System” is a combination of “Sub-Systems” and “Components”, that work together to perform “Functions”. “Sub-Systems” can also be defined as a combination of “lower level subsystems” and “components”. Notice that different abstraction levels can be represented and used when making partitions, and the level(s) used will depend on design characteristics and domain expertise, but more than one division may be applicable to the same system.
In order to implement a Function Based Intervention, it is helpful to divide the system into one suitable abstraction of System, Sub-Systems and Components, and link the behaviors of those parts together with the functions they perform. The system may then be modeled with a data structure (that can be a matrix, a graph or other suitable structure) having “abstract functional” elements such as functions, and also physical concrete elements as the components. The data structure may be stored in non-transitory memory in a conventional form such as nodes as objects and edges as pointers; a matrix containing all edge weights between identified nodes; and a list of edges between identified nodes. The data structure may be manipulated, updated and searched using one or more processors.
After having this or these relationships mapped, suitable interventions may be defined for each element. These interventions are, in example non-limiting embodiments, ontologically linked to their elements and their own states, and do not extrapolate the boundaries of the elements (in some cases the procedures may refer to actions on other components due to system nature but this should be minimized). This ontological link enables the method to work well in different scenarios of multiple failures. In traditional “pure component based” intervention definitions, the procedures contain elements that are related to an own component, to the function they perform, to redundant systems and so on. In this way, the sum of multiple interventions will very easily become useless in a complex multiple failure scenario, since there is too much mixed information in each procedure.
Taking the
Example System State Graph Method
This section describes a way of implementing the Function based intervention, herein referred to as System State Graph (abbreviated as “SSG”), since it relies on a representation of the system that is similar to a fault tree, and each node of the graph has a type and current state, that are used to guide the execution of the interventions. The word “System” in SSG has the meaning commonly found on systems theory (Systems Engineering, Bertalanffy such as Bertalanffy, L. von, General System Theory (New York 1969), where a system is considered as an arrangement of components, that perform functions. Only a top-level description is shown here; details are omitted for the sake of readability.
Example SSG Modeling
The first step to implement the SSG method is modeling the system SSG, which in one example non-limiting embodiment is a directed graph wherein the nodes have the following attributes (in addition to a “Name” attribute) as shown in Table I below:
As is well known, a directed graph is a graph that is made up of a set of vertices or nodes connected by edges, where the edges have a direction associated with them.
In example non-limiting embodiments, the system is classified into the elementary parts and their relationships mapped in a directed graph.
Note how the diamonds divide the functional (upper) and architectural (lower) domains.
The upper functional domain of the graph comprises function nodes, and the lower architectural domain of the graph comprises component nodes. Thus, in the lower “architectural” domain shown in
In the functional domain of
As noted above, the diamonds 270 between the architectural domain and the functional domain represent functional thresholds. Note further that the functional domain (top of figure) is abstracted from the architectural domain (bottom of figure) so that the functional domain is not specific to or dependent on any particular components the architectural domain describes, but instead depends in this case on logic outputs and one degradation input the architectural domain outputs. In some embodiments, the functional domain is independent of the particular aircraft or other platform, and different specific architectural domains can be used depending on different aircraft configurations (e.g., twin engine, four engine, etc.)
Example Types of Procedures
After modeling the SSG, the procedures for each node state are defined. Those procedures are executed at nodes transitions or when requested by a monitoring algorithm. Those procedures are ontologically different from the ones defined with an architectural mindset, as explained previously. Examples of such procedures are shown in Table II below:
Example Non-Limiting SSG Search Algorithm
In example embodiments, the SSG search algorithm is a monitoring routine that monitors the SSG states, and calls the procedures when applicable. With a simple solution, it is able to search through the SSG and reconfigure the system according to different situations. It monitors all states at a (polling or other reporting) frequency defined depending on system dynamics and do the following:
Execute any (Loss Of Function—Expeditious)
SSG Top-Down Functional Search Description
In one example embodiment, a search is initiated at every functional threshold, and goes down the SSG to try to recover a lost or degraded function.
In example embodiments, the search has the following simplified routine:
Notice that both the top-down search is recursive, and in case it finds (not available) components, it will go down the graph and continue to try to restore the state of the nodes above by following the same rules.
Notice also that this is only one possible search algorithm. Many others may be developed over the same structure. One possible solution is to have the search being started from the failed component and try to restore the system from bottom-up. In other embodiments, a mixed approach may be applied. In addition, the example non-limiting embodiments are not limited to AND and OR Boolean logic, but can use any type of combinatorial logic such as NAND, NOR, and multiple-input logic functions.
Example SSG Method Sample Execution
This section presents a sample of the method execution to illustrate how it works, on the graph of
In the
The following example SSG traversal and analysis is explained in conjunction with a flipbook animation of
Example Pack Failure
Example Non-Limiting Pack Failure with Subsequent Bleed 2 Failure
Example Pack Failure with subsequent Bleed 2 Failure and Subsequent OFV failure
With the above three examples, it becomes easy to see to power of the example non-limiting method and system, and how example embodiments would adapt in different situations. If for example in the second example instead of the Bleed 2 Failure, the Engine 2 had failed, the algorithm would activate the APU to provide Bleed air.
Notice also that in this example the SSG was modeled to a certain point (finishing on the engines and APU). When the system gets bigger, the method may be applied with different graphs for different major functions, or with only one single integrated graph connecting all the systems and subsystems.
As it can be seen the SSG method is agnostic and can be applied to any system composed of sub-systems and components that interact to perform given functions, by modelling the correct system state graph and applying the same algorithm. As a non limiting embodiment
Example Use of the Function Ontology for Artificial Intelligence Training
As shown in the previous sections, the Function system ontology is a powerful way of describing the system and its desired states. This means that it is also an efficient way to design reward functions to train artificial intelligence algorithms to perform systems intervention by maximizing this function.
The SSG for example can be easily converted into a mathematical equation, where each function, sub-function and components states are given weighted values depending on their importance for the safe continuation of the flight (using the criticality of losing each function as per system safety assessment is a good driver for those weights—see FAA AC 25.1309), and thus can be used as a reference to train an artificial intelligence.
Example Displays
Example—Predicted Failures 1004
The list of predicted failures can be shown. If more than one possibility is generated by the algorithm, the options can be shown and ranked according to probability.
Example—Recommended Procedure 1006
The Recommended procedure can be shown on a display either for manual execution by a human operator (if the system is in a passive mode) or for the human operator awareness of what the system is doing. The list of forbidden or recommended actions due to the current context can be shown together with the boundary conditions that they are related to.
Example—SSG Display 1008 and Functional Status Display 1002
The SSG structure and current nodes status can be plotted on a display for the operator to immediately gain situation awareness of the systems current status. This is shown in section 1008. In some embodiments, such information could be displayed in forms other than or in addition to graphically, such as aurally.
In addition to the SSG structure, other information can also be plotted such as the overall scores for the functions if such weights for the functions have been given and implemented. See section 1002 and
In one example embodiment, those 3 values are plotted for the operator in a functional status display. A sample design of this display is shown in
Note that the functional display of example non-limiting embodiments provides exactly the information about what is still working as described above in connection with the Quantas flight. It is thus an alternative resource for information gathering and immediate awareness. The ATSB report indicates in page 176 and figure All that the crew took more than 25 minutes progressing through a number of different systems and their recollection of seeking to understand what damage had occurred, and what systems functionality remained. A functional display such as the one proposed would give this information in an instant.
Example List of Possible Interventions 1006
The list of possible interventions can be shown so the operator can choose which one to use according to his own internal mental models. The scores for each one can also be shown to guide this process.
Example Simulation Station
In addition to displays, a dynamic simulation environment can be made available to the human operator so that she can simulate possible interventions and check the outcome. This is represented by block 6 in
Depending on the system and human factors analysis, the simulation station may not be suitable to have on board due to the possibility of attention tunneling or other human factors issues. But it may be very suitable for remote stations assisting the operation with larger teams (for example in a scenario where a single pilot of an aircraft is assisted by a ground station).
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2019/061307 | 12/23/2019 | WO |