The present disclosure relates to microprocessors, and in particular, the arithmetic logic units that a microprocessor may employ.
Microprocessors generally include one or more arithmetic logic units (ALUs) in the execution pipeline to perform arithmetic and logical operations. ALUs may be characterized by the number of input operands and/or the number of mathematical and logical operations that they support. Some combinations of mathematical operations occur sufficiently often to justify the inclusion of a customized data path in an ALU to accommodate a specific operation. For example, an ALU may accommodate a fused multiply-add (FMA) operation in which the product of two floating point values is added to an accumulated floating point value using a single operation and rounding. Determining whether to implement a specific mathematical operation in a special purpose or complex ALU involves a cost/performance tradeoff. A factor that may influence any such determination is the extent to which a complex ALU may be utilized to perform simpler operations at times when no pending operation requires the full functionality of the complex ALU and/or or the extent to which an underutilized ALU may be employed to improve reliability via redundant execution of less complex instructions.
Embodiments of disclosed inventions pertain to improving computational reliability in computing systems generally and large scale computing systems particularly. In at least one embodiment, a disclosed method increases computational reliability by leveraging resources in a complex ALU to perform redundant computations during times when the full functionality of the ALU is not required. Depending upon the specific instruction being executed and a mode of execution, the resources of the complex ALU may be used to perform a relatively less demanding operation redundantly, a relatively complex operation without redundancy, or the relatively complex operation redundantly using temporal redundancy.
In at least one embodiment, the complex ALU includes resources to perform two floating point, fused multiply-add (FMA) operations independently. In these embodiments, the complex ALU may be referred to as a SuperFMA ALU to denote that the ALU includes sufficient resources to perform an initial FMA operation and a dependent FMA operation based on the results of the independent FMA operation. In some of these embodiments, the SuperFMA ALU may be invoked to perform a simple FMA using spatial redundancy, to perform a complex FMA operation, also referred to herein as a SuperFMA operation, without redundancy, or to perform the SuperFMA redundantly using temporal redundancy by generating first and second computational results and comparing the two results. If the first and second results match, the computational result is confirmed whereas, if the first and second results don not match, an error signal is generated.
In at least one embodiment, a redundant execution mode is determined from a redundant execution signal. The redundant execution signal indicates a preferred redundant execution mode indicated by a reliability controller. The redundant execution mode may determine the manner in which the ALU performs operations.
In at least one embodiment, several different modes of execution support various degrees of redundant execution support. Some embodiments may include a mandatory mode, in which all operations are executed redundantly, either spatially or temporally. SuperFMA operations or other complex operations which cannot be executed with spatial redundancy in the ALU will be required to execute using temporal redundancy
At least one embodiment includes an opportunistic execution mode, in which all operations that can be executed with spatial redundancy are always executed redundantly. In this mode, operations that cannot be executed using spatial redundancy are executed without redundancy. At least one embodiment further supports a reluctant execution mode, in which operations that can be executed with spatial redundancy may be executed with spatial redundancy subject to satisfaction of additional criteria. The additional criteria may include, but are not limited to, criteria pertaining to power consumption and/or a power management state, junction temperature, performance, and so forth. In the reluctant mode, if the operations do not support redundant execution, the operation will execute without redundancy.
In at least one embodiment, a disclosed processor includes multiple execution cores and associated cache memories. In at least one embodiment, the execution cores include an ALU, sometimes referred to herein as a SuperFMA ALU, to receive multiple inputs and perform a SuperFMA computation during an execution stage. In at least one embodiment, dispatch logic determines whether the operation to be performed by the ALU can be executed with spatially-based redundant execution support. If the ALU cannot perform the operation with redundant execution support, at least one embodiment of the ALU performs the operation without redundant execution and generates a computational result. In some embodiments, if the ALU is capable of executing the operation with redundant support, the ALU may do so depending upon a current state of a redundant execution signal indicating the current redundant execution mode.
In at least one embodiment, the ALU performs a SuperFMA computation with temporal redundancy and generates first and second results. In at least one embodiment, responsive to the first and second results matching, the ALU generates a confirmed computational result. In at least one embodiment, responsive to the first and second results of the redundant execution not matching, an error is generated.
In another embodiment, a disclosed multiprocessor system includes a first processor and storage accessible to the first processor. The storage includes an operating system. The operating system may include a processor-executable resume module with instructions to reduce latency associated with transitioning from a power conservation state. The operating system may also include a processor-executable connect module with instructions to maintain a currency of a dynamic application during the power conservation state.
In the following description, details are set forth by way of example to facilitate discussion of the disclosed subject matter. It should be apparent to a person of ordinary skill in the field, however, that the disclosed embodiments are exemplary and not exhaustive of all possible embodiments.
Throughout this disclosure, a hyphenated form of a reference numeral refers to a specific instance of an element and the un-hyphenated form of the reference numeral refers to the element generically or collectively. Thus, for example, widget 12-1 refers to an instance of a widget class, which may be referred to collectively as widgets 12 and any one of which may be referred to generically as a widget 12.
Referring now to
In the
Execution pipeline 106 may be responsible for scheduling and executing micro-operations and may include buffers for reordering micro-operations and a number of execution ports (not shown in
The
In the
Referring now to
In the
In the
As indicated above, the use of unused resources to perform redundant execution can be implemented in various degrees and an representative embodiment that employs three levels of redundant execution will be described. In at least one embodiment, execution pipeline 106 supports three different reliability modes, namely, a mandatory mode, an opportunistic mode, and a reluctant mode. In the mandatory mode, all operations are executed redundantly. If SuperFMA ALU 108 can execute an operation using spatial redundancy, it does so. When SuperFMA 108 cannot perform the operation using spatial redundancy, SuperFMA ALU 108 may perform the operation using temporal redundancy. Temporal redundancy refers to a procedure in which an operation is performed multiple times by the same hardware to determine if each instance of performing the operation produces the same result.
In at least one embodiment of the opportunistic mode, operations that may be executed in a spatially redundant manner are executed redundantly while operations that cannot be executed redundantly or operations that can only be executed with temporal redundancy are executed without redundancy. Finally, in an embodiment of the reluctant mode, operations that support redundant execution may execute redundantly subject to additional criteria while operations that cannot be executed using spatial redundancy are executed without redundancy. In this mode, the additional criteria that influence wither an operation is executed redundantly may include, but is not limited to, criteria pertaining to power consumption, device temperature, and so forth. For example, a reluctant policy might executed applicable operations redundantly as long as power consumption has been averaging below a specified threshold. Similarly, redundant execution criteria may include criteria specifying a particular power management mode, e.g., a device in a power conservation may prohibit or discourage redundant execution. As another example, criteria influencing whether to execute an instruction redundantly may include a simple percentage indicating approximately what percentage of operations that are eligible for redundant execution are executed redundantly.
Returning to
In at least one embodiment of write back stage 206, redundant execution control signal 232 from reliability controller 220 is provided to a result comparator 240 to indicate whether result comparator 240 is needed to compare two results generated by redundant executions of the same operation by SuperFMA ALU 108. When redundant execution mode signal 232 indicates that SuperFMA ALU 108 is being operated in redundant execution mode, comparison block 240 compares the redundant results from SuperFMA ALU 108 to determine if they match. Otherwise, an error signal 242 is generated.
Referring now to
In the
In the
Thus, by integrating four multiplexers and a comparator with the pair of FMA units 330 and 350, SuperFMA ALU 108 is operable not only to perform SuperFMA operations, but also to perform less complex operations using spatial redundancy be executing one instance of an operation in FMA 330 and another instance of the FMA in FMA 350. The cost of these additional logic components is relatively low with respect to the added functional benefit. No additional ports need to be added to the register files and the required changes are concentrated in the ALU itself.
Referring now to
As depicted in
If it is determined in operation 420 that the ALU operation cannot be executed with redundant execution support, the flow continues to process block 450 where the operation is performed in the SuperFMA ALU without redundant execution and the computational result is generated in 470.
Embodiments of processor 101 (
In
In the
In
Processor 101 may also communicate with other elements of processor system 500, such as near hub 590 and far hub 518, which are also collectively referred to as a chipset that supports processor 101. P-P interface 576 may be used by processor 101 to communicate with near hub 590 via interconnect link 552. In certain embodiments, P-P interfaces 576, 594 and interconnect link 552 are implemented using Intel QuickPath Interconnect architecture.
As shown in
Second bus 520 may support expanded functionality for microprocessor system 500 with I/O devices 512 and touchscreen controller 514, and may be a PCI-type computer bus. Third bus 522 may be a peripheral bus for end-user consumer devices, represented by desktop devices 524 and communication devices 526, which may include various types of keyboards, computer mice, communication devices, data storage devices, bus expansion devices, etc. In certain embodiments, third bus 522 represents a Universal Serial Bus (USB) or similar peripheral interconnect bus. Fourth bus 521 may represent a computer interface bus for connecting mass storage devices, such as hard disk drives, optical drives, disk arrays, which are generically represented by persistent storage 528 that may be executable by processor 101.
The
The
Sensor API 542 provides application program access to one or more sensors (not depicted) that may be included in system 500. Examples of sensors that system 500 might have include, as examples, an accelerometer, a global positioning system (GPS) device, a gyro meter, an inclinometer, and a light sensor. The resume module 544 may be implemented as software that, when executed, performs operations for reducing latency when transition system 500 from a power conservation state to an operating state. Resume module 544 may work in conjunction with the solid state drive (SSD) 550 to reduce the amount of SSD storage required when system 500 enters a power conservation mode. Resume module 544 may, for example, flush standby and temporary memory pages before transitioning to a sleep mode. By reducing the amount of system memory space that system 500 is required to preserve upon entering a low power state, resume module 544 beneficially reduces the amount of time required to perform the transition from the low power state to an operating state. The connect module 546 may include software instructions that, when executed, perform complementary functions for conserving power while reducing the amount of latency or delay associated with traditional “wake up” sequences. For example, connect module 546 may periodically update certain “dynamic” applications including, as examples, email and social network applications, so that, when system 500 wakes from a low power mode, the applications that are often most likely to require refreshing are up to date. The touchscreen user interface 548 supports a touchscreen controller 514 that enables user input via touchscreens traditionally reserved for handheld applications. In the
Referring now to
Additionally, a circuit level model with logic and/or transistor gates may be produced at some stages of the design process. This model may be similarly simulated, sometimes by dedicated hardware simulators that form the model using programmable logic. This type of simulation, taken a degree further, may be an emulation technique. In any case, re-configurable hardware is another embodiment that may involve a tangible machine readable storage medium 610 storing a model of processor 101 and SuperFMA ALU 108.
Furthermore, most designs, at some stage, reach a level of data representing the physical placement of various devices in the hardware model. In the case where conventional semiconductor fabrication techniques are used, the data representing the hardware model may be the data specifying the presence or absence of various features on different mask layers for masks used to produce the integrated circuit. Again, this data representing the integrated circuit embodies the techniques disclosed in that the circuitry or logic in the data can be simulated or fabricated to perform these techniques.
In any representation of the design, the data may be stored in any form of a tangible machine readable medium. An optical or electrical wave 640 modulated or otherwise generated to transmit such information, a memory 630, or a magnetic or optical storage 620 such as a disc may be the tangible machine readable medium. Any of these mediums may “carry” the design information. The term “carry” (e.g., a tangible machine readable medium carrying information) thus covers information stored on a storage device or information encoded or modulated into or on to a carrier wave. The set of bits describing the design or the particular part of the design are (when embodied in a machine readable medium such as a carrier or storage medium) an article that may be sold in and of itself or used by others for further design or fabrication.
To the maximum extent allowed by law, the scope of the present disclosure is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited to the specific embodiments described in the foregoing detailed description.
Number | Name | Date | Kind |
---|---|---|---|
4402044 | McDonough et al. | Aug 1983 | A |
4422143 | Guttag | Dec 1983 | A |
4464716 | Young | Aug 1984 | A |
4480306 | Bachman et al. | Oct 1984 | A |
4482953 | Burke | Nov 1984 | A |
4589067 | Porter et al. | May 1986 | A |
4811211 | Sandman et al. | Mar 1989 | A |
4866652 | Chu et al. | Sep 1989 | A |
4926355 | Boreland | May 1990 | A |
5119484 | Fox | Jun 1992 | A |
5136536 | Ng | Aug 1992 | A |
5594912 | Brueckmann et al. | Jan 1997 | A |
6009505 | Thayer | Dec 1999 | A |
6061521 | Thayer | May 2000 | A |
6519694 | Harris | Feb 2003 | B2 |
7047397 | Segelken | May 2006 | B2 |
7475229 | Feghali et al. | Jan 2009 | B2 |
7689639 | Dent | Mar 2010 | B2 |
7707236 | Srivastava | Apr 2010 | B2 |
7711764 | Dent | May 2010 | B2 |
8108610 | Glasco et al. | Jan 2012 | B1 |
8135926 | Glasco et al. | Mar 2012 | B1 |
20010056530 | Harris | Dec 2001 | A1 |
20040054875 | Segelken | Mar 2004 | A1 |
20050005203 | Czajkowski | Jan 2005 | A1 |
20050055607 | Czajkowski | Mar 2005 | A1 |
20050273481 | Dent | Dec 2005 | A1 |
20050273483 | Dent | Dec 2005 | A1 |
20060036667 | Sriivastava | Feb 2006 | A1 |
20060053188 | Mantor | Mar 2006 | A1 |
20060149932 | Gerardus de Vries | Jul 2006 | A1 |
20060174094 | Lloyd et al. | Aug 2006 | A1 |
20060206693 | Segelken | Sep 2006 | A1 |
20080069337 | Gopal | Mar 2008 | A1 |
20080189524 | Poon et al. | Aug 2008 | A1 |
20110035569 | Col et al. | Feb 2011 | A1 |
20110035570 | Col et al. | Feb 2011 | A1 |
20110161630 | Raasch | Jun 2011 | A1 |
20110264897 | Henry et al. | Oct 2011 | A1 |
20120216012 | Vorbach et al. | Aug 2012 | A1 |
20120260071 | Henry et al. | Oct 2012 | A1 |
20120260074 | Henry et al. | Oct 2012 | A1 |
20120260075 | Henry et al. | Oct 2012 | A1 |
20140189319 | Bradford | Jul 2014 | A1 |
Number | Date | Country | |
---|---|---|---|
20140189305 A1 | Jul 2014 | US |