1. Technical Field
This invention relates to the field of software fault mitigation and more specifically to methods of recovering a software generic fault in a flight control system.
2. Background Art
Any discussion of the prior art throughout the specification should in no way be considered as an admission that such prior art is widely known or forms part of common general knowledge in the field.
It is known in the field of redundant flight computing to run three flight control computers in parallel so that either the failure of a first or a first and second computer does not cause a catastrophic failure, such as the loss of an aircraft. Within each of these computers, there is typically a set of processors that run in parallel such that an erroneous output signal is not produced. In the art, the redundant computers are referred to as ‘channels’ and the number of processors and associated redundant input/ output circuitry within each computer are referred to as ‘lanes’.
It is also known to interconnect the several computers in a multiple computer system with a set of ‘cross-channel data links’ 5A, 5B, and 5C, where these data links allow each of the computer channels 10, 20, and 30 to share data with all of the computer channels. As shown, the cross-channel from channel A 5A provides data contained within computing channel A 10 to both computing channel B 20 and computing channel C 30. The cross-channel data links, 5B and 5C, from computing channels B and C respectively function in a similar manner.
A major concern in the implementation of redundant computational systems is the occurrence of generic faults. This class of failure could, with a single fault, disable an entire system if the system included only two processors per channel, because the fault would be common in all channels. This generic failure could be either a ‘design fault’ or a ‘manufacturing fault’. A design fault can occur in either hardware or software. A manufacturing fault is where a particular batch of hardware or a particular release of software includes an inherent defect. The design for a typical system is validated by performing hardware simulations at extreme tolerances and by qualification tests performed on prototype hardware. Hardware manufacturing faults are detected by acceptance test procedures (ATP) that validate that the produced article is as designed.
Extensive and exhaustive testing of the particular source code for a typical system validates the software design. This software testing may be executed on the target hardware or on an emulation of that hardware. An alternate validation approach that has proven to be extremely expensive is where a second software team develops a package for real-time comparison on the target hardware.
The software development environment (autocode mechanisms, compilers, assemblers, loaders, etc) can introduce software “manufacturing” faults. Extensive testing of the software on the target hardware may not be sufficient to detect all faults as some data dependent combinatorial paths may be missed.
During operation of a flight control computer, a generic software fault can manifest itself in two different ways. The first way is where the operational flight program (OFP) software in all channels “gets lost” and there is a total loss of the system. The second way is where the OFP in all channels produces an erroneous output but the system continues to appear to operate normally because no miscomparisons have occurred between channels. Either scenario should be detected by extensive testing of the binary code on the target hardware. However, if sufficient testing is not performed the generic fault could occur and lead to a potentially hazardous condition.
The art has progressed to the point where both dissimilar processors and dissimilar software are used in each flight control computer lane. A leading example of this approach is described in Hay (U.S. Pat. No. 5,550,736) which shows a monitoring method for a fail-operational fault tolerant flight critical computer architecture. Such an architecture, for three redundant computing channels, is shown in
Accordingly the method and architecture according to Hay require at least three different processor types, such as from different processor families. However, only two processor families, the x486 processor and the PowerPC® are currently experiencing sufficient commercial success to ensure technical currency and development. It does not appear that a third processor family will be developed and enjoy large production numbers.
The present mitigation method for a triplex channel dual processor lane architecture that ‘gets lost’ is to sense the simultaneous lost situation in all three channels based on, for example, a simultaneous loss of three watchdog timers and a resultant restart of a computational frame in each channel. This method allows the processing to recover from a specific “gets lost” scenario, but does not address an erroneous calculation scenario nor does it protect against the recurrence of a generic “gets lost” failure. The present mitigation method, for the erroneous calculation failure mode, is to have a different type processor (e.g. Pentium vs. Power PC) monitor the main processor. This monitor processor would use the same source code as the main processor, but since the development environment is different, failures in that environment would be detected in the real-time application. Unfortunately, the failure would be detected simultaneously in all three channels of a triplex channel system, and the embedded redundancy management scheme would drop the entire system. This situation has been mitigated in the past by the introduction of a third dissimilar processor as discussed previously. If two of the three processors were to disagree, this third processor would control the system.
There is therefore a need for a fault tolerant computer architecture based on two rather than three distinct processor types.
There is a long felt need for detecting a simultaneous fault in a system that includes two or more processors.
The following summary of the invention is provided to facilitate an understanding of some of the innovative features unique to the present invention. A full appreciation of the various aspects of the invention can only be gained by taking the entire specification, claims, drawings, and abstract as a whole.
The present invention is advantageously used with multi-computer real-time systems such as aircraft flight control systems. According to an aspect of my invention, the occurrence of a simultaneous fault will cause each channel of the system to revert to a “Get Home” mode. The “Get Home” mode is a software package that is comprised of a minimal simplistic Operational Flight Program (OFP) that is capable of getting the aircraft home. This package would have been 100% tested, such as by deterministic mathematical methods, on the target hardware and is guaranteed to have no generic software or generic hardware faults.
Accordingly, my invention involves a system and a method of using two dissimilar processors with detection of simultaneous fault causing reversion to a minimal complexity 100% tested backup operational mode.
My invention seeks to overcome or at least ameliorate one or more of several problems, including but not limited to: providing a minimal fly home capability for a fly by wire aircraft after a generic software fault. Further, as used in a multi-channel computer system for an airplane, my invention reduces the number of processors as compared to prior flight control computer systems.
Further advantages and embodiments of the present invention will become apparent from the following description and drawings.
The accompanying figures further illustrate the present invention.
The following is a list of the major elements in the drawings in numerical
Carrying Out the Invention
Each of the three computing channels 11, 21, and 31 separately receives aircraft sensor input data, processes this data, and outputs commands to aircraft actuators. The three computing channels 11, 21, and 31 of the present invention are intended to function in a similar manner as the three computing channels 10, 20, and 30 such as is shown in
Each of the three computing channels includes a main processor 113, identified in
Refer now to
Next, the monitor processor 114 reads the input data 121 from the shared memory 112, processes this data (step 43) to produce outputs, and places the resultant monitor processor outputs 123 back into the shared memory 112. The main processor 113 compares (step 44) its resultant data 122 with the resultant data 123 from the monitor processor 114. If a difference between the main processor resultant data 122 and the monitor processor resultant data 123 exceeds a predetermined threshold and persists, then the main processor 113 outputs a main processor “miscompare” discrete 124.
Next, the monitor processor 114 compares its resultant data 123 with the resultant data 122 from the main processor 113. If a difference between the monitor processor resultant data 123 and the main processor resultant data 122 exceeds a predetermined threshold and persists, then the monitor processor 114 outputs a monitor processor “miscompare” discrete 125.
If either the main processor 113 or the monitor processor 114 has issued a miscompare discrete, then the affected computing channel, for example computing channel A 11, issues a “Failure A” discrete 131. The “Failure A” discrete 131 is transmitted to the other two computing channels 21 and 31 and also arms the AND gate 134 for a possible “Generic Failure” discrete 132 (step 46). The cross-channel transmission of these discretes is preferably by hardwired discrete signals, such as +28 VDC/Ground.
During normal operation, the other two computing channels 21 and 31 are performing a similar operation. If the “Failure B” discrete 141 and the “Failure C” discrete 142 are received, at AND gate 134, from the other two computing channels 21 and 31, then the “Generic Failure” discrete 132 is issued. The “Generic Failure” discrete 132 issues (step 47) a program interrupt 133 which vectors the main processor 113, in each of the computing channels, to run (step 48) a minimal “get home” software package 150. The “get home” software package 150 executes on the main processor 113 and since it has been 100% tested no further software or hardware generic faults can occur, or may in other embodiments, execute on a separate processor. In certain embodiments, the “get home” software is tested using deterministic mathematical methods.
The monitor processor 114 is powered on (step 61), hardware associated with the monitor processor is initialized (step 62), and the operating system, such as VxWorks®, associated with the monitor processor is invoked (step 63) prior to normal operation. In preferred embodiments, the method of the present invention is performed concurrently with normal operation. The monitor processor 114 function of executing the application program (step 64), shown in
Advantageously, my invention requires a total processor count of six processors running in three independent computing channels. This contrasts with the prior art, which requires a total of twelve processors running in three independent computing channels to achieve similar functionality. This is achieved by taking advantage of extremely well-tested commercially available processors that have literally billions of hours of cumulative operation in such devices as home computers
Alternate embodiments may be devised without departing from the spirit or the scope of the invention.