Present and future high-reliability (i.e., space) missions require significant increases in on-board signal processing. Presently, generated-data is not transmitted via downlink channels in a reasonable time. As users of the generated data demand faster access, increasingly more data reduction or feature extraction processing is performed directly on the high-reliability vehicle (e.g., spacecraft) involved. Increasing processing power on the high-reliability vehicle provides an opportunity to narrow the bandwidth for the generated data and/or increase the number of independent user channels.
In signal processing applications, traditional instruction-based processor approaches are unable to compete with million-gate, field-programmable gate array (FPGA)-based processing solutions. Systems with multiple FPGA-based processors are required to meet computing needs for Space Based Radar (SBR), next-generation adaptive beam forming, and adaptive modulation space-based communication programs. As the name implies, an FPGA-based system is easily reconfigured to meet new requirements. FPGA-based reconfigurable processing architectures are also re-useable and able to support multiple space programs with relatively simple changes to their unique data interfaces.
Reconfigurable processing solutions come at an economic cost. For instance, existing commercial-off-the-shelf (COTS), synchronous read-only memory (SRAM)-based FPGAs show sensitivity to radiation-induced upsets. Consequently, a traditional COTS-based reconfigurable system approach is unreliable for operating in high-radiation environments. In addition, existing brute-force approaches for detecting and mitigating susceptibilities to a single event upset (SEU) and a single event functional interrupt (SEFI) have several disadvantages such as lower efficiency per processor and unusable system processing capacity.
Embodiments of the present invention address problems with determining single event fault tolerance in an electronic circuit and will be understood by reading and studying the following specification. Particularly, in one embodiment, a system for tolerating a single event fault in an electronic circuit is provided. The system includes a main processor that controls the operation of the system, a fault detection processor (e.g., an application-specific integrated circuit or ASIC) responsive to the main processor, and three or more field programmable logic devices (e.g., three or more FPGAs) responsive to the fault detection processor. The three or more programmable logic devices periodically issue independent input signals to the fault detection processor for determination of one or more single event fault conditions.
Like reference numbers and designations in the various drawings indicate like elements.
In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific illustrative embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical, and electrical changes may be made without departing from the spirit and scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense.
Embodiments of the present invention address problems with determining single event fault tolerance in an electronic circuit and will be understood by reading and studying the following specification. Particularly, in one embodiment, a system for tolerating a single event fault in an electronic circuit is provided. The system includes a main processor that controls the operation of the system, a fault detection processor responsive to the main processor, and three or more programmable logic devices responsive to the fault detection processor. The three or more programmable logic devices periodically issue independent input signals to the fault detection processor for determination of one or more single event fault conditions.
Although the examples of embodiments in this specification are described in terms of determining single event fault tolerance for high-reliability applications, embodiments of the present invention are not limited to determining single event fault tolerance for high-reliability applications. Embodiments of the present invention are applicable to any fault tolerance determination activity in electronic circuits that requires a high level of reliability. Alternate embodiments of the present invention utilize external triple modular component redundancy (TMR) with three or more programmable logic devices operated synchronously with one another. When one or more single event faults detected in one of the devices sufficiently exceeds an adjustable threshold, the device is automatically reconfigured and the three or more devices are resynchronized within a minimum allowable time frame.
Fault detection processor 106 is any programmable logic device (e.g., an ASIC), with a configuration manager, the ability to host TMR voter logic, and an interface to provide at least one output to a distributed processing application system controller, similar to system controller 110. TMR requires each of logic devices 104A to 104C to operate synchronously with respect to one another. Control and data signals from each of logic devices 104A to 104C are voted against each other in fault detection processor 106 to determine the legitimacy of the control and data signals. Each of logic devices 104A to 104C are programmable logic devices such as a field-programmable gate array (FPGA), a complex programmable logic device (CPLD), a field-programmable object array (FPOA), or the like.
System 100 can form part of a larger distributed processing application (not shown) using multiple processor assemblies similar to fault detection processor assembly 102. Fault detection processor assembly 102 and system controller 110 are coupled for data communications via distributed processing application interface 112. Distributed processing application interface 112 is a high speed, low power data transmission interface such as Low Voltage Differential Signaling (LVDS), a high-speed serial interface, or the like. Also, distributed processing application interface 112 transfers at least one set of default configuration software machine-coded instructions for each of logic devices 104A to 104C from system controller 110 to fault detection processor 106 for storage in logic device configuration memory 108. Logic device configuration memory 108 is a double-data rate synchronous dynamic read-only memory (DDR SDRAM) or the like.
In operation, logic device configuration memory 108 is loaded during initialization with the at least one set of default configuration software machine-coded instructions. Fault detection processor 106 continuously monitors each of logic devices 1041 to 1043 for one or more single event fault conditions. The monitoring of one or more single event fault conditions is accomplished by TMR voter logic 202, and described in further detail below with respect to
TMR voter logic 202 and configuration manager 204 are coupled for data communications to register bus control logic 210 by voter logic interface 220 and configuration manager interface 224. Voter logic interface 220 and configuration manager interface 224 are bi-directional communication links used by fault detection processor 106 to transfer commands between control registers within TMR voter logic 202 and configuration manager 204. Register bus control logic 210 provides system controller 110 of
Memory controller 206 receives the at least one set of default programmable logic for storing in logic device configuration memory 108 via bus arbiter interface 228, SOC bus arbiter 208, and memory controller interface 216. Bus arbiter interface 228 provides a bi-directional, inter-processor communication interface between SOC bus arbiter 208 and inter-processor network interface 212. SOC bus arbiter 208 transfers memory data from and to memory controller 206 via memory controller interface 216. Memory controller interface 216 provides a bidirectional, inter-processor communication interface between memory controller 206 and SOC bus arbiter 208. The set of default configuration software machine-coded instructions discussed above with respect to logic device configuration memory 108 is used to reconfigure each of logic devices 1041 to 1043. SOC bus arbiter 208 provides access to memory controller 206 based on instructions received from TMR voter logic 202 on voter logic interface 218. Voter logic interface 218 provides a bi-directional, inter-processor communication interface between TMR voter logic 202 and SOC bus arbiter 208. SOC bus arbiter 208 is further communicatively coupled to configuration manager 204 via configuration interface 222. Configuration interface 222 provides a bi-directional, inter-processor communication interface between configuration manager 204 and SOC bus arbiter 208. The primary function of SOC bus arbiter 208 is to provide equal access to memory controller 206 and logic device configuration memory 108 between TMR voter logic 202 and configuration manager 204.
In operation, configuration manager 204 performs several functions with minimal interaction from system controller 110 of
Each of word synchronizers 304A to 304C receive one or more original input signals from each of device interface paths 230A to 230C, respectively, as described above with respect to FIG. 2. Each of the one or more original inputs signals includes a clock signal in addition to input data and control signals from each of logic devices 104A to 104C of
In an exemplary embodiment, the synchronized outputs from logic devices 104A to 104C are transferred into TMR/DMR word voter 310. TMR/DMR word voter 310 incorporates combinational logic to compare each synchronized output from one of logic devices 104A to 104C against corresponding synchronized outputs from a remaining two of logic devices 104A to 104C. When two of three corresponding synchronized outputs are a logic one (zero), TMR/DMR word voter 310 produces a one (zero). Fault detection block 311 inside TMR/DMR word voter 310 determines which of logic devices 104A to 104C is miscomparing (i.e., disagreeing). An output pattern from fault detection block 311 contains three signals of all 1's if each of logic devices 104A to 104C is in agreement. If one of logic devices 104A to 104C miscompares, two signals within the output pattern will be logic zero. The two signals that agree (i.e., are each zero) cause a remaining signal to remain a logic one. The two agreeing logic devices of logic devices 104A to 104C continue to operate in a self-checking pair (SCP) or DMR mode. Once one of the logic devices 104A to 104C is determined to be at fault, miscompares between the two remaining logic devices of logic devices 104A to 104C in SCP mode signal a fatal error. In this embodiment, system controller 110, as described with respect to
In a different embodiment, the synchronized outputs contain an instruction from one of logic devices 104A to 104C to inform TMR voter logic 202 to switch into auxiliary mode. Moreover, auxiliary mode does not incorporate the features of triple modular redundancy as described in the present application. In an auxiliary mode, the synchronized outputs from each of logic devices 104A to 104C is transferred into auxiliary mode arbiter 306 to compete for eventual access to the inter-processor SOC bus along voter logic interface 218. Auxiliary mode multiplexer 308 selects which of the synchronized outputs from a selected logic device (i.e., one of logic devices 104A to 104C) is routed to SOC multiplexer 312 along auxiliary mode output interface 320.
Once it is determined which of logic devices 104A to 104C has been substantially modified by one or more single event faults, a reconfigure request is made to SOC bus arbiter 208 via TMR/DMR voter output interface 322 and SOC multiplexer 312. SOC multiplexer 312 selects the affected logic device of logic devices 104A to 104C for access to the SOC bus along voter logic interface 218. Once the affected logic device is granted access, reconfiguration of the affected logic device is handled automatically by configuration manager 204 of fault detection processor 106 as described with respect to
At step 406, a determination is made about whether the adjusted threshold level needs to be changed from a previous or default level. This determination is made in the system controller described above with respect to
At step 408, the method receives a logic reading from each of the three or more programmable logic devices in the electronic circuit. Once each of the three or more logic readings are obtained, the method proceeds to step 410. At step 410, each of the three or more logic readings received is compared with at least other two readings. Once the comparison is made, the method proceeds to step 412. At step 412, the method determines whether all of the three or more logic readings are sufficiently in agreement. Determining whether all of the three or more logic readings are sufficiently in agreement involves determining which of the three or more programmable devices changed state. When all of the three or more logic readings are sufficiently in agreement, the method returns to step 404. When one of the three or more logic readings is not in agreement with the at least remaining two, the method proceeds to step 414. When one of the three logic readings is not in agreement with the at least remaining two, a single event fault has been detected. At step 414, the method updates an error rate counter to indicate that at least one additional single event fault has occurred before proceeding to step 416. The error-rate counter determines when more than an acceptable number of disagreeing logic readings has occurred sequentially. At step 416, the method determines whether the detection of the at least one additional single event fault has caused the error-rate counter to exceed the threshold level. If the threshold level is exceeded, the method proceeds to block 418. If the threshold level is not exceeded, the method returns to step 404.
At this point, the at least two remaining logic devices compensate for the one of the three or more logic readings not in agreement. At step 418, each logic reading of the at least remaining two logic devices is compared with each another before the method proceeds to step 420. At step 420, the method determines whether the at least two remaining logic readings are sufficiently in agreement with each another. If the at least two remaining logic readings are sufficiently in agreement with each another, the method proceeds to step 422. At step 422, a first logic device that was determined not to be sufficiently in agreement with the at least two remaining logic devices is automatically reconfigured. Otherwise, if the method determines at block 420 that the at least two remaining logic readings are not in agreement with each another, each of the three or more logic devices is automatically reconfigured at block 424. If method 400 reaches step 424, it signals to system 100 of
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. These embodiments were chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
The U.S. Government may have certain rights in the present invention as provided for by the terms of a restricted government contract.