1. Field of the Invention
The present invention relates to a method to convert a hexadecimal floating point number into a binary floating point number by using a Floating Point Unit with fused multiply-add. Further the invention relates to a Floating Point Unit with fused multiply-add.
2. Background Art
A floating point unit with a fused multiply-add dataflow is described in G. Gerwig et. al. “The IBM eServer z990 floating point unit”, IBM J. Res. & Dev., Vol. 48, No. 3/4, 2004. This floating point unit provides a convert instruction to convert the traditional S/390 HFP (Hex Floating-Point) into BFP (Binary Floating-Point) format according IEEE Standard 754. A good performance is important when results of older existing programs on HFP bases have to exchange data with newer programs on BFP bases. The BFP format is compliant to the IEEE 754 Standard and is more often used in new workloads like C++ and Java.
Floating point numbers are described in the form:
Sign*Base(Exponent)*Mantissa, wherein, more precise, according to the different binary and hexadecimal formats the operands are built up in the hexadecimal HFP format:
(−1)S
and in the binary BFP format:
(−1)S
The main difference between the formats is the base of the exponent E. It is 16 for HFP and 2 for BFP, which leads to a digit width of four for the HFP and a digit width of one for the BFP fraction.
Also the bias of the exponent is different. HFP uses a power of two (64=2**6), while BFP uses a power of two minus one (1023=2**10−1).
The fraction width is 56 for HFP and 52 for BFP.
Since the number range of BFP operands is higher than HFP operands, the result can be in the overflow or underflow range of the HFP target format. In that case, depending on the rounding mode and mask bits, a maximum number, a minimum number or infinity needs to be forced as output result of a transformation from HFP to BFP.
One example for a convert operation is the TBDR (CONVERT HFP TO BFP−Mnemonic=‘TBDR’) instruction according to z/Architecture Principles of Operation (IBM SA 22-7832). This instruction has several special result requirements for result conditions like ‘Zero Result’, ‘Overflow Condition’ and ‘Underflow Condition’. In these cases the results ‘Maximum Number’, ‘Zero’ and ‘Infinity’ have to be forced.
The state of the art implementation in a Floating Point Units uses the normalizer to detect the result conditions. Since this cannot be done within the normal pipelined operation, every convert instruction is executed in two pipelined cycles.
The logic needed to decide on the special result is too complex, to be done within the running cycle. Also the Condition-Code for these cases cannot be set in time.
It is therefore an object of the invention to provide a Floating Point Unit with fused multiply-add and a method which is able to perform and allows to convert the traditional HFP (Hex Floating-Point) into BFP (Binary Floating-Point) format according IEEE Standard 754 with improved performance.
The first part of the invention's technical purpose is met accordingly by a method to convert a hexadecimal floating point number operand into a binary floating point number by using a Floating Point Unit (FPU) with fused multiply-add with an A-register a B-register for two multiplicand operands and a C-register (21) for an addend operand, wherein a leading zero counting unit (LZC) is associated to the addend C-register, which is characzerized in the following steps:
In a preferred embodiment of the invention said transferring of the operand through the multiplier stage is done by multiplying it with a neutral number, in particular ‘one’.
In another preferred embodiment transferring of the operand through the main adder stage is done by adding a neutral number, in particular ‘zero’.
The second part of the invention's technical purpose is met accordingly by the proposed Floating Point Unit (FPU), with fused multiply-add with an A-register a B-register for two hexadecimal multiplicand operands and a C-register for an hexadecimal addend operand, wherein a leading zero counting unit (LZC) is associated to the addend C-register, with a multiplier, a main adder stage and a normalizer and a rounder/reformatter, wherein the final result is provided as a binary floating point number by the rounder/reformatter, that is characterized in that a control unit is provided, which calculates the difference of the leading zero result provided by the LZC and the input exponent with a subtraction unit and which determines based on the raw-result-exponent a force signal (F) with special conditions like ‘Exponent Overflow’, ‘Exponent Underflow’, ‘Zero Result’, which force signal is used by the rounder/reformatter to select the output to ‘use Calculated Normalized Result’, ‘force Infinity’, ‘force Maximum Number’, ‘force Zero Result’.
The invention provides a remarkable performance improvement since the determination of special conditions will be done in parallel to the conversion of the operand through the multiplier, adder, normalizer and rounder stage of the FPU.
Another benefit of the invention is that very few additional hardware components are needed.
The present invention and its advantages are now described in conjunction with the accompanying drawings.
According to the Invention, the zero detection, overflow detection and underflow detection are done ahead with the leading-zero-count out of the C-Register of the FPU. For that, the operand has to be loaded additionally into the C-Register.
The Dataflow for the C-Register 21 includes a Leading Zero counting (LZC) 6 for the operand, which is already needed for other reasons in the C-Operand path. With the LZC and the input exponent, ‘Zero’, ‘Overflow’ as well as ‘Underflow’ can be detected early and in parallel to the transferring of the operand from the B-register 23, wherein the operand was also loaded, through the multiplier stage 32 and the main adder stage 33 of the FPU. In case of HFP denormal operands the LZC has to be subtracted from the Input exponent, to get the potential result exponent, which indicates overflow and underflow conditions. This calculation of the difference of the leading zero result provided by the LZC 6 and the input exponent E is done by a control unit 7 which determines based on the Raw-Result-Exponent a force signal F with special conditions like ‘Exponent Overflow’, ‘Exponent Underflow’, ‘Zero Result’.
The special results (Maximum Number, Zero and Infinity) can be forced in time for the rounder to allow a pipelined execution in one cycle. Also the Condition Code is provided one cycle ahead, which allows a subsequent instruction that depends on the Condition Code (i.e., BCT Branch on Condition) of this conversion instruction to be executed earlier.
The operand of the B-register 23 travels through the multiplier stage 32—in particular by multiplying it with a neutral number like one—and the main adder stage 33—in particular by adding a neutral number like zero—and is after that normalized in the normalizer stage 4. In parallel the control unit 7 determines based on the special conditions, how to select final result by a rounder/reformatter 5 according to ‘use Normalized Result’, ‘force Infinity’, ‘force Maximum Number’ and ‘force Zero Result’. After the normalization the operand is transferred to the rounder/reformatter 5, which outputs the final result as a binary floating point number according to control selection.
With that the instruction can be executed in a single cycle in every case and with that the performance is duplicated to the state-of-the-art methods.
Other as on the normal BFP instructions, TBDR needs to set the Condition-Code dependent on the special results of the instruction. With the ‘Pre detection’ of the overflow conditions, the Condition-code can be also set right in time for a pipelined execution.
The table of
The first column is showing the result conditions:
The next columns are showing the final results with different rounding conditions, whereby RM is the Rounding mode, which can be
The last column is showing the condition-code to be set, which can be
The benefits of the invention are:
The performance is doubled for the TBDR execution, which means the CPI (Cycles per Instruction) value decreases from 2 to 1.
There would no modification to the fraction dataflow be required, which is typically done in custom hardware, where design and design modification is very expensive.
Limit effort needed in Control Logic, which are mainly some staging latches and a small subtractor. Design of such Control Logic is relatively cheaper, since the logic can be synthesized.
The invention is commercially applicable particularly in the field of production, test and the operation of integrated chips in a wide field of applications in integrated chip technology since speeding up calculations is a needed technique.
Number | Date | Country | Kind |
---|---|---|---|
05106817 | Jul 2005 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
5251321 | Boothroyd et al. | Oct 1993 | A |
5742535 | Schwarz et al. | Apr 1998 | A |
5742536 | Schwarz et al. | Apr 1998 | A |
5875123 | Dao Trong et al. | Feb 1999 | A |
5889980 | Smith, Jr. | Mar 1999 | A |
6813626 | Chng et al. | Nov 2004 | B1 |
20040059762 | Yang | Mar 2004 | A1 |
20060190708 | Schwarz et al. | Aug 2006 | A1 |
Number | Date | Country | |
---|---|---|---|
20070022152 A1 | Jan 2007 | US |