The present invention relates generally to the computer processing field, and more specifically, but not exclusively, to a system and method for dynamically optimizing the performance and reliability of redundant processing systems that can be used, for example, in space applications.
In space applications, there is a significant need for smaller and lighter, lower power consuming, high performance systems with increased reliability and higher processing speeds. In order to be cost-effective, these systems are typically designed to minimize their size and weight, because size and weight are typically the overriding “costs” in space missions. Nevertheless, in space applications, mission-critical components of systems are duplicated in order to increase their reliability and tolerance to faults. For example, multiple processors operating as a redundant set are designed to receive the same input data, perform the same mission-critical computations, and transmit the same output commands. However, in addition to the need for increased reliability and tolerance to faults for systems operating in space, there is also a significant need for increased throughput or processing speed. However, the processing speeds of the hardware on existing space systems are relatively slow, and (due partly to their need for redundancy and fault tolerance) these systems are relatively expensive. Therefore, there is a significant need for a technique that can optimize the performance and reliability of redundant processing systems, which can be used, for example, in space applications without incurring significant additional costs. As described in detail below, the present invention provides such a technique, with a system and method that dynamically optimizes performance and reliability in redundant processing systems.
The present invention provides an improved system and method for dynamically optimizing the performance and reliability of redundant processing systems (e.g., for use in space applications). In accordance with a preferred embodiment of the present invention, a Field Programmable Gate Array (FPGA) is provided that includes a plurality of processors. Based on mission specific modes or environmental conditions, the processing system can dynamically and safely transition between the high performance of, for example, a general purpose, quad Symmetric Multiprocessor (SMP) and the high reliability of a redundant set of processors (e.g., Triple Modular Redundancy (TMR) system). This architecture allows the use of a single FPGA with multiple processors to take advantage of the maximum processing throughput available when sufficient mission conditions are met, and can also safely transition to a lower throughput, high reliability mode when needed. In other words, at particular points during a mission, high throughput or processing capacity can be obtained at the expense of reliability or dependability as the mission conditions allow. If the mission conditions can support a reduced level of dependability at a particular point in time, then the processors can be adapted to run in a single string (e.g., triple or quad string) to produce three to four times the processing capacity of the redundant set.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to the figures,
For this illustrative example, system 100 includes a plurality of processing units 102a, 102b, 102c, . . . 102n (wherein the suffix “n” denotes the total number of processing units being used), a comparator unit 104, and a control unit 106. As such, although four processing units 102a-102n are shown in this example, this particular number is for illustrative purposes only and any suitable number of processing units may be used in system 100. However, if processing units 102a-102n are intended for use in a redundant, fault tolerant architecture, then it is preferable that system 100 include at least three redundant processing units. For example, as disclosed in the above-described, related application entitled “REDUNDANT PROCESSING ARCHITECTURE FOR SINGLE FAULT TOLERANCE”, the inclusion of a third processing unit provides a tie-breaking vote in determining a faulty processing unit. In any event, an example of a suitable logic device including a plurality of processors and a comparator, which can be used to implement at least a portion of system 100 arranged as a logic device including a plurality of processing units (e.g., processing units 102a-102n) and a comparator unit (e.g., 104), is the Virtex-II Pro® FPGA manufactured by Xilinx, Inc. The Virtex-II Pro FPGA is a Programmable Logic Device (PLD), which can include up to four, on-chip 300-400 MHz, 420+ DMIPS IBM PowerPC® 405 processors, with on-chip memory and programmable logic resources appropriately coupled to maximize performance.
Notably, instead of a single logic device including four processing units and a comparator unit, the present invention is not intended to be limited by such an architecture and can be arranged in a different embodiment as, for example, two logic devices that include two processors and one comparator in each. In such an arrangement, the two comparators can be combined to perform the comparison function in a distributed architecture. As such, in one embodiment, both comparators can perform substantially the same comparison function. In another embodiment, the two comparators can complement each other and together perform the one comparison function.
For this example embodiment, an output of each processing unit 102a-102n is coupled to a respective input of comparator unit 104. Also, an output of comparator unit 104 is coupled to an input of each processing unit 102a-102n. For this example, comparator unit 104 is implemented advantageously as a hardware comparator, as opposed to being implemented in software (e.g., speed of hardware implementation is significantly faster than software implementation). Thus, comparator unit 104 can perform a comparison function with respect to the input data received from each processing unit 102a-102n, and responsive to the results of comparison functions performed, comparator unit 104 can output one or more suitable signals to control an operation of each processing unit 102a-102n. Additionally, however, comparator unit 104 can also output suitable signals to control the operation of each processing unit 102a-102n responsive to one or more control signals received from an output of the control unit 106.
In operation, for this example embodiment, each processing unit 102a-102n in
Notably, however, in accordance with the principles of the present invention, control logic unit 206 can also generate a control signal to trigger selector 204 to choose a suitable output for broadcaster 208, which is responsive to an input signal received from the external control unit (e.g., control unit 106 in
As such,
For example, a particular software task in a mission application may not require maximum dependability, so control unit 106 can be directed to output suitable control signals (e.g., via comparator 104) to reconfigure processing units 102a-102n responsive to the reduced need for dependability (e.g., to increase throughput for this task). Thus, in accordance with the principles of the present invention, system 100 can dynamically reconfigure the redundant set of processing units 102a-102n in order to optimize the dependability or reliability and capacity and throughput of the processing units responsive to changing mission conditions, with a relatively small and readily configurable logic device.
If (at step 404), however, the mission processor (or the control unit) determines that the projected mission conditions for the predetermined time period are such that a reduced level of processor dependability is acceptable, then the mission processor retrieves capacity and/or throughput requirements for the mission application(s) and/or processing tasks that are to be (or are being) run during the predetermined time period (step 406). The mission processor (or control unit) then determines whether or not additional processor capacity and/or throughput are desired for the predetermined time period (step 408). If not, then the flow is stopped.
If (at step 408), however, the mission processor (or the control unit) determines that additional processor capacity and/or throughput are desired for the predetermined time period, then the mission processor (or the control unit) determines what amount of additional capacity and/or throughput are desired (step 410). The mission processor (or the control unit, itself) then generates a (mode) control signal that includes appropriate control data for reconfiguring the arrangement of the processing units involved (e.g., processing units 102a-102n), in order to attain the desired increase in processing capacity and/or throughput desired (e.g., or at least as much additional processing capacity and/or throughput possible). The (mode) control signal is then sent to the (mode) control unit (e.g., control unit 106) or, in the embodiment illustrated by
It is important to note that while the present invention has been described in the context of a fully functioning processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMs, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular processing system.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. These embodiments were chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
The present application is related to commonly assigned U.S. patent application Ser. No. 10/867,894 (Attorney Docket No. H0006620-1628) entitled “REDUNDANT PROCESSING ARCHITECTURE FOR SINGLE FAULT TOLERANCE”, filed on Jun. 15, 2004, which is incorporated herein by reference in its entirety.