1. Field
This disclosure relates generally to error detection for an execution unit of a processor and, more particularly, to residue-based error detection of exponents for a processor execution unit that supports floating point operations.
2. Related Art
Today, it is common for processors to be designed to detect errors. For example, one known processor design has implemented two identical processor pipelines. In this processor design, processor errors are detected by comparing results of the two identical processor pipelines. While duplicating processor pipelines improves error detection, duplicating processor pipelines is relatively expensive in terms of integrated circuit (chip) area and chip power consumption. A less expensive technique (e.g., in terms of chip area and chip power consumption) for detecting errors in an execution unit of a processor has employed residue checking.
Residue-based error detection (or residue checking) has been widely employed in various applications. For example, U.S. Pat. No. 3,816,728 (hereinafter “the '728 patent”) discloses a modulo 9 residue checking circuit for detecting errors in decimal addition operations. As another example, U.S. Pat. No. 4,926,374 (hereinafter “the '374 patent”) discloses a residue checking apparatus that is configured to detect errors in addition, subtraction, multiplication, division, and square root operations. As yet another example, U.S. Pat. No. 7,555,692 (hereinafter “the '692 patent”) discloses logic for computing residues for full-sized data and reduce-sized data. Typically, an operand provided to an input of residue generator has not included all input bits, as floating-point data includes a mantissa or significand (that has typically been handled by the residue generator) and an exponent that has been extracted and handled separately. However, U.S. Pat. No. 7,769,795 (hereinafter “the '795 patent”) discloses checking floating-point data as a whole (i.e., mantissa, exponent, and sign) using a residue-based approach.
According to one aspect of the present disclosure, a technique for checking an exponent calculation for an execution unit that supports floating point operations includes generating, using an exponent calculation circuit, a result exponent for a floating point operation. The technique also includes generating, using a residue prediction circuit, a predicted exponent residue for the result exponent and generating, using the residue prediction circuit, a result exponent residue for the result exponent. Finally, the technique includes comparing the predicted exponent residue to the result exponent residue to determine whether the result exponent generated by the exponent calculation circuit is correct and, if not, signaling an error.
The present invention is illustrated by way of example and is not intended to be limited by the accompanying figures, in which like references indicate similar elements. Elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale.
As will be appreciated by one of ordinary skill in the art, the present invention may be embodied as a method, system, device, or computer program product. Accordingly, the present invention may take the form of an embodiment including hardware, an embodiment including software (including firmware, resident software, microcode, etc.), or an embodiment combining software and hardware aspects that may all generally be referred to herein as a circuit, module, or system. The present invention may, for example, take the form of a computer program product on a computer-usable storage medium having computer-usable program code, e.g., in the form of one or more design files, embodied in the medium.
Any suitable computer-usable or computer-readable storage medium may be utilized. The computer-usable or computer-readable storage medium may be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable storage medium include: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM) or flash memory, a portable compact disc read-only memory (CD-ROM), an optical storage device, or a magnetic storage device.
As used herein the term “coupled” includes a direct electrical connection between elements or blocks and an indirect electrical connection between elements or blocks achieved using one or more intervening elements or blocks. The term ‘residue checking’, as used herein, refers to the use of the mathematical residues of operands, results, and remainders to verify the result of a mathematical operation. As used herein, the term ‘residue’ refers to the remainder produced by modulo-N division of a number.
While the discussion herein focuses on a residue prediction circuit for a floating-point unit (FPU), it is contemplated that a residue prediction circuit configured according to the present disclosure has broad application to other type of execution units (e.g., vectorized execution units such as single-instruction multiple data (SIMD) execution units). While the discussion herein focuses on modulo 15 and modulo 3 residue generation trees for calculating residues for operand mantissas and operand exponents, respectively, it should be appreciated that other modulos may be utilized in a residue prediction circuit configured according to the present disclosure. While the discussion herein focuses on an operand register with thirty-two bits, it should be appreciated that the techniques disclosed herein are applicable to operand registers with more or less than thirty-two bits. Additionally, while the discussion herein focuses on short format operands with twelve bits, it should be appreciated that the techniques disclosed herein are applicable to short format operands with more or less than twelve bits (e.g., twenty-three bits). In addition, while the discussion herein focuses on long format operands with thirty-two bits, it should be appreciated that the techniques disclosed herein are applicable to long format operands with more or less than thirty-two bits (e.g., a floating point format that employs fifty-two bits).
According to various aspects of the present disclosure, a residue prediction circuit is disclosed that ensures that exponent calculations of various floating-point operations (e.g., addition, subtraction, multiplication, division, square root, and conversion) are correct. It should be appreciated that exponent flows are more difficult to check than data (mantissa) flows, as exponent flows have more special cases and exponent data may be changed in many ways that are difficult to predict and usually require more stages to check.
With reference to
In a first stage 110 of flow 102, the residue modulos of the mantissas of operands A and C are multiplied by modulo multiplier 116. In a second stage 111 of flow 102, the residue modulo from the mantissa of operand B is added to the product-residue modulo from stage 110 using modulo adder 117. In a third stage 112 of flow 102, the residue modulo of bits lost at aligner 21 is subtracted by modulo subtractor 118 from the sum of second stage 111. During the residue checking operation, residue corrections to the actual residue value corresponding to the manipulated data in flow 101 may be necessary. For example, a normalization shift correction may be necessary. As such, in a fourth stage 113 of checking flow 102, residue correction of the normalization shift is performed by modulo multiplier 119. Then, in a fifth stage 114 of flow 102, a subtraction of the bits lost at normalizer 22 is performed by modulo subtractor 120. Finally, in a sixth stage 115 of flow 102, a check operation is performed by comparator 109. That is, comparator 109 compares the result provided by modulo subtractor 120 with the residue modulo of the result provided by result register 5 of flow 101.
With reference to
Decoders 26 transform coded signals into decoded signals that are modulo remainders. Modulo adders 28, positioned at different levels, receive the decoded numerical data from decoders 26. Adders 28 may, for example, be replaced with a series of decoders and multiplexers that perform residue condensing. Outputs of each adjacent pair of decoders 26 are coupled to inputs of a different adder 28 in a first condenser stage. Inputs of each adder 28 in a second condenser stage are coupled to respective outputs of two adders 28 in the first condenser stage. An output of each adder 28 in the second condenser stage may be configured to generate a different residue for a short format operand or may be coupled to respective inputs of an adder 28 in a third condenser stage. In this case, an output of an adder 28 in the third condenser stage is configured to generate a residue for a long format operand. In residue generation tree 200, an operand provided to register 24 may not use all of the input bits. In this case, register bits of an operand in operand register 24 that are not used may be filled with logical zeros (or other bits that do not affect the residue) by unillustrated control logic.
Right-aligning short format operands within their respective register sections of a dataflow (as shown in
As short format operands do not necessarily fill a section of an operand register, various criteria may be taken into consideration when determining how to position short format data in an operand register. For example, to make best use of existing logic that services an operand register for long format operands, short format operands may be aligned within sections of an operand register to facilitate maximum re-use of the existing logic (e.g., decoders, counters, and comparators). As one example, it may be advantageous to position short format operands asymmetrically within an operand register to pass middle bits of the operand register.
With reference to
With reference to
With reference to
In a first stage 508 of flow 502, the residue modulos of the exponents for operands A and C are multiplied by modulo p multiplier 516. In a second stage 509 of flow 502, the residue modulo of the exponent for operand B is added to the product-residue modulo from stage 508 using modulo p adder 517. In a third stage 510 of flow 502, the residue modulo of subtract information (provided by an aligner) is subtracted by modulo p subtractor 518 from the sum of second stage 509. In a fourth stage 511 of flow 502, an appropriate constant (see
During the residue checking operation, residue corrections to the actual residue value corresponding to the manipulated data in flow 502 may be necessary. For example, a normalization shift correction may be necessary. As such, in a fifth stage 512 of checking flow 502, residue correction of the normalization shift is performed by modulo p decrementer 520 on the sum provided by fourth stage 511. Then, in a sixth stage 513 of flow 502, an increment by one is performed by modulo p incrementer 521 if required (to compensate for rounding errors in the exponent caused by fraction overflow). Next, in a seventh stage 514 of flow 502, a correction for overflow or underflow (caused by an intermediate result exponent that does not fit into a target format representation) is performed by modulo p corrector 522 if required. Finally, in an eighth stage 515 of flow 502, a check operation is performed by comparator 523. That is, comparator 523 compares the result provided by modulo p corrector 522 with the residue modulo of the result provided by residue generator 524, which generates the result residue of the exponent result delivered by EXCC 150. When the result provided by modulo p corrector 522 is the same as the residue modulo of the result provided by residue generator 524, a signal indicates a pass condition. On the other hand, when the result provided by modulo p corrector 522 is not the same as the residue modulo of the result provided by residue generator 524, a signal indicates a fail condition. As noted above, EXCC 150 may be constructed in a manner similar to the circuitry illustrated in data flow 101 (with residue generator 106 at the output of result register 5 being omitted, as residue generator 524 is implemented to generate the result residue of the exponent result).
With reference to
With reference to table 800 of
With reference to
With reference to table 1100 of
With reference to
Then, in block 1208, an aligner residue correction associated with the third operand exponent is subtracted, by modulo p subtractor 518 of RPC 160, from the second intermediate exponent residue to generate a third intermediate exponent residue. For example, the third intermediate exponent residue may be determined by: selecting a subrange of variable bits for generation of an aligner residue for the third operand exponent based on an associated event; generating the aligner residue based on the selected subrange of variable bits and a residue constant that is based on constant bits for the associated event; and subtracting the generated aligner residue from the second intermediate exponent residue to provide the third intermediate exponent residue. With reference to
Next, in block 1210, an instruction dependent exponent constant is added, if required, by modulo p constant adder 519 of RPC 160, to the third intermediate exponent residue to provide a fourth intermediate exponent residue. In this case, a value of the instruction dependent constant is based on the first, second, and third operand exponents (see
Next, in block 1214, a rounding value is added, if required, by modulo p incrementer 521 of RPC 160, to the fifth intermediate exponent residue to generate a sixth intermediate exponent residue. Then, in block 1216, an exponent wrap constant is added, if required, by modulo p corrector 522 of RPC 160 to the sixth intermediate exponent residue to generate the predicted residue. In general, the exponent wrap constant compensates for underflow or overflow. As one example, when a modulo 3 residue is employed, the exponent wrap constant residue-correction-value is zero. As noted above, blocks 1218 and 1220 execute in parallel with blocks 1204-1216. In block 1218, EXCC 150 calculates an exponent result that provides a result exponent for the floating point operation. In block 1220, residue generation circuit 524 of RPC 160 generates a result exponent residue for the result exponent.
Next, in block 1222, comparator 523 of RPC 160 compares the predicted and result exponent residues. Then, in decision block 1224, comparator 523 determines whether the predicted and result exponent residues are equal. In response to the predicted and result exponent residues being equal in block 1224, control passes to block 1228, where comparator 523 provides a ‘pass’ check indication. In response to the predicted and result exponent residues not being equal in block 1224, control passes to block 1226, where comparator 523 provides a ‘fail’ check indication. In the event an error occurs, processor 100 may log the error and cause the computation to be performed again. Following blocks 1226 and 1228, control passes to block 1230 where the process 1200 terminates until a next exponent calculation is initiated.
Design flow 1300 may vary depending on the type of representation being designed. For example, a design flow 1300 for building an application specific IC (ASIC) may differ from a design flow 1300 for designing a standard component or from a design flow 1300 for instantiating the design into a programmable array, for example a programmable gate array (PGA) or a field programmable gate array (FPGA) offered by Altera® Inc. or Xilinx® Inc.
Design process 1310 preferably employs and incorporates hardware and/or software modules for synthesizing, translating, or otherwise processing a design/simulation functional equivalent of the components, circuits, devices, or logic structures shown in
Design process 1310 may include hardware and software modules for processing a variety of input data structure types including netlist 1380. Such data structure types may reside, for example, within library elements 1330 and include a set of commonly used elements, circuits, and devices, including models, layouts, and symbolic representations, for a given manufacturing technology (e.g., different technology nodes, 32 nm, 45 nm, 90 nm, etc.). The data structure types may further include design specifications 1340, characterization data 1350, verification data 1360, design rules 1370, and test data files 1385 which may include input test patterns, output test results, and other testing information. Design process 1310 may further include, for example, standard mechanical design processes such as stress analysis, thermal analysis, mechanical event simulation, process simulation for operations such as casting, molding, and die press forming, etc. One of ordinary skill in the art of mechanical design can appreciate the extent of possible mechanical design tools and applications used in design process 1310 without deviating from the scope and spirit of the invention. Design process 1310 may also include modules for performing standard circuit design processes such as timing analysis, verification, design rule checking, place and route operations, etc.
Design process 1310 employs and incorporates logic and physical design tools such as HDL compilers and simulation model build tools to process design structure 1320 together with some or all of the depicted supporting data structures along with any additional mechanical design or data (if applicable), to generate a second design structure 1390. Design structure 1390 resides on a storage medium or programmable gate array in a data format used for the exchange of data of mechanical devices and structures (e.g., information stored in a IGES, DXF, Parasolid XT, JT, DRG, or any other suitable format for storing or rendering such mechanical design structures). Similar to design structure 1320, design structure 1390 preferably comprises one or more files, data structures, or other computer-encoded data or instructions that reside on transmission or data storage media and that when processed by an ECAD system generate a logically or otherwise functionally equivalent form of one or more of the embodiments of the invention shown in
Design structure 1390 may also employ a data format used for the exchange of layout data of integrated circuits and/or symbolic data format (e.g., information stored in a GDSII (GDS2), GL1, OASIS, map files, or any other suitable format for storing such design data structures). Design structure 1390 may comprise information such as, for example, symbolic data, map files, test data files, design content files, manufacturing data, layout parameters, wires, levels of metal, vias, shapes, data for routing through the manufacturing line, and any other data required by a manufacturer or other designer/developer to produce a device or structure as described above and shown in
Accordingly, residue generation techniques for operand exponents have been disclosed herein that can be advantageously employed on execution units that support floating point operations.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” (and similar terms, such as includes, including, has, having, etc.) are open-ended when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Having thus described the invention of the present application in detail and by reference to preferred embodiments thereof, it will be apparent that modifications and variations are possible without departing from the scope of the invention defined in the appended claims.