1. Field of the Invention
The present invention relates to an arithmetic device which performs addition, subtraction, and multiplication of numbers represented by floating points and an arithmetic method thereof.
2. Description of the Related Art
In recent years, due to rapid spread of multimedia, TV games using delicate graphics, and the like, or other reasons, it is required to provide a customer with high-quality computer graphics and the like used in multimedia, TV games, and the like.
In order to meet such a demand, realization of a high-speed floating point multiplication and addition arithmetic unit is desired. A configuration of a conventional floating point multiplication and addition arithmetic unit (referred to below as “FMA arithmetic unit”) is explained below in a concrete manner.
As shown in
Of the configuration, the register file/other arithmetic unit result register 10 is a storing device which temporarily stores data for an arithmetic operation (such data is referred to below as “operand”), and the selectors 20 to 22 are devices which select operands from the register file/other arithmetic unit result register 10 or the result register 140 (result register 140 is a storing device which stores results of arithmetic operations) and store the selected operands in the operand registers 30 to 32, respectively.
The operand registers 30 to 32 are devices which stores an operand selected by the selectors 20 to 22, respectively. The selectors 23 to 25 are devices which select an operand stored in the operand registers 30 to 32 or the result register 140, and input the selected operand to the format converters 40 to 42, respectively.
The format converters 40 to 42 are devices which convert the format of the operand that is input by the selectors 23 to 25, into a format for an execution of the floating point multiplication and addition arithmetic operation (i.e., the format converters are devices which convert an external format into an internal format of the FMA arithmetic unit). The format converters 40 to 42 store the operands whose format has been converted (such operand is referred to below as “format converted operand”) in the intermediate registers 50 to 52, respectively. The intermediate registers 50 to 60 are devices which temporarily store data (the intermediate registers 50 to 52 store a format converted operand).
The Booth encode circuit 70 is a device which acquires the format converted operand stored in the intermediate register 51 to perform a second-order Booth encode according to the Booth's algorithm on the format converted operand (the format converted operand stored in the intermediate register 51 is set as a multiplier). Then, the Booth encode circuit 70 stores the format converted operand, on which the second-order Booth encode has been performed, in the intermediate register 54.
The CSA (Carry Save Adder) arithmetic unit 80 is a device which acquires the format converted operand which is stored in the intermediate register 53 (the format converted operand stored in the intermediate register 50 is subsequently stored in the intermediate register 53) (the format converted operand stored in the intermediate register 53 is set as a multiplicand), and also acquires the data, on which the second-order Booth encode is performed, stored in the intermediate register 54, then, calculates a partial product (when the multiplier and the multiplicand are 64-bit each, 32 partial products are calculated), and adds each of the calculated partial products.
The adder 90 is a device which adds the sum of partial products calculated by the CSA arithmetic unit 80, and the value of a carry given by the addition of each partial product (the adder 90 is a device which absorbs the carry of the CSA arithmetic unit 80). Then, the adder 90 stores the result of addition in the intermediate register 57. In short, the multiplication of the multiplicand stored in the intermediate register 50 and the multiplier stored in the intermediate register 51 is performed through the Booth encode circuit 70, the CSA arithmetic unit 80, and the adder 90.
The digit adjusting shifter 100 is a device which acquires the format converted operand stored in the intermediate register 52, and performs digit adjusting of the acquired format converted operand. The digit adjusting shifter 100 stores the format converted operand after the digit adjusting in the intermediate register 55 (the data stored in the intermediate register 55 is subsequently stored in the intermediate register 56). The digit adjusting shifter 100 performs the digit adjusting of the format converted operand stored in the intermediate register 52, which allows the values stored in the intermediate register 57 and the intermediate register 56 to be properly added.
The absolute value adder 110 is a device which adds the value stored in the intermediate register 56 and the value stored in the intermediate register 57. Further, the absolute value adder 110 stores the result of addition in the intermediate register 58.
The normalization shifter 120 is a device which normalizes the value stored in the intermediate register 58. Further, the normalization shifter 120 stores the normalized value in the intermediate register 59. The rounding arithmetic unit 130 is a device which acquires the value stored in the intermediate register 59, and performs a rounding operation (i.e. round-off, round-up, round-down and the like) on the acquired value. Further, the rounding arithmetic unit 130 stores the value, on which a rounding operation has been performed, in the intermediate register 60.
The format converter 43 is a device which converts the format of the data (i.e. the value) stored in the intermediate register 60 into the format to be stored in the result register 140 (in other words, the format converter 43 is a device which converts the internal format into the external format). The format converter 43 performs an inverse conversion of the format conversion performed by the format converters 40 to 42. The format converter 43 stores the data, the format of which is converted, i.e., the result of the FMA arithmetic operation, in the result register 140.
Conventionally, floating point addition/subtraction and floating point multiplication are performed with the use of the FMA arithmetic unit described above. Now, floating point addition/subtraction and floating point multiplication are described below with reference to
Since “1” is stored in the operand register 31, the operand stored in the operand register 30, after being subjected to format conversion in the format converter 40, is stored as it is in the intermediate register 57. Thus, the FMA arithmetic unit can perform the floating point addition/subtraction by adding the value stored in the intermediate register 57 and the value stored in the intermediate register 56 by the absolute value adder 110.
On the other hand, when the floating point multiplication is performed with the use of the FMA arithmetic unit, the operand of the multiplicand is stored in the operand register 30, the multiplier is stored in the operand register 31, and “0” is stored in the operand register 32.
When “0” is stored in the operand register 32, “0” is added to the result of multiplication of the multiplicand stored in the operand register 30 and the multiplier stored in the operand register 31 (in other words, “0” is added to the result of multiplication by the absolute value adder 110). Thus, the FMA arithmetic unit can perform the floating point multiplication.
Meanwhile, according to a technology described in Japanese Patent Application Laid-Open No. S59-106043, a register arranged between combinational logic circuits is bypassed in an execution of a one-time arithmetic operation, so that the register is substantially eliminated and an arithmetic operation time is shortened.
When the floating point addition/subtraction or the floating point multiplication is performed in the FMA arithmetic unit described with reference to
Specifically, when the floating point addition/subtraction is performed, the arithmetic operations in the Booth encode circuit 70, the CSA. arithmetic unit 80, and the adder 90 are redundant. On the other hand, when the floating point multiplication is performed, the arithmetic operations in the digit adjusting shifter 100, the absolute value adder 110, and the normalization shifter 120 are redundant.
It is an object of the present invention to at least partially solve the problems in the conventional technology.
According to one aspect of the present invention, an arithmetic device which performs one of addition/subtraction and multiplication of numbers represented by floating points, includes an addition/subtraction unit that performs addition/subtraction of numbers, a multiplication unit that performs multiplication of numbers, and a selection unit that selects one of the addition/subtraction unit and the multiplication unit based on a type of an arithmetic operation on numbers.
Further, according to another aspect of the present invention, an arithmetic method for performing one of addition/subtraction and multiplication of numbers represented by floating points, includes acquiring information on a type of an arithmetic operation performed on the numbers, and selecting one of an addition/subtraction unit that performs addition/subtraction of numbers and a multiplication unit that performs multiplication of numbers based on the type of the arithmetic operation.
The above and other objects, features, advantages and technical and industrial significance of this invention will be better understood by reading the following detailed description of presently preferred embodiments of the invention, when considered in connection with the accompanying drawings.
Exemplary embodiments of an arithmetic device and an arithmetic method according to the present invention are described below in detail with reference to the drawings. The present invention is not limited to the embodiments.
The present invention shortens an arithmetic latency in floating point addition/subtraction and floating point multiplication in a floating point multiplication and addition arithmetic unit (i.e., FMA arithmetic unit) and in executing an arithmetic operation using a result of previous arithmetic operation as an operand in the FMA arithmetic unit, by bypassing a redundant part of the FMA arithmetic unit.
The command control unit 3 is a device which acquires a command stored in the memory/cache 1, interprets the command, and issues a predetermined arithmetic command to the arithmetic unit 4. The arithmetic unit 4 is a device which executes a predetermined arithmetic operation in response to the arithmetic command from the command control unit 3. The FMA arithmetic unit according to the present embodiment is included in the arithmetic unit 4.
Of the above configuration, the register file/other arithmetic unit result register 10, the selectors 20 to 25, the operand registers 30 to 32, the format converters 40 to 43, the intermediate registers 50 to 60, the Booth encode circuit 70, the CSA arithmetic unit 80, the adder 90, the digit adjusting shifter 100, the absolute value adder 110, the normalization shifter 120, the rounding arithmetic unit 130, and the result register 140 are the same with the corresponding elements in the FMA arithmetic unit shown in
The bypass selectors 150 to 154, and 156 are devices which select/acquire data according to the command from the timing control circuit 170. The bypasses 160 to 163 are bypasses used by the bypass selectors 150 to 154, and 156 in order to eliminate redundant operations in the FMA arithmetic unit.
The timing control circuit 170 is a device which controls the bypass selectors 150 to 154, and 156 to bypass redundant parts of the FMA arithmetic unit based on the contents of arithmetic operation (i.e., depending on whether the FMA arithmetic unit performs the floating point addition/subtraction or the floating point multiplication, or, whether the FMA arithmetic unit uses a result of previous arithmetic operation in a subsequent arithmetic operation). Further, the timing control circuit 170 acquires information indicating contents of arithmetic operation from the command control unit 3 shown in
Firstly, the process performed by the timing control circuit 170 when the FMA arithmetic unit performs the floating point addition/subtraction is described. When the conventional technique is employed for the floating point addition/subtraction, an arithmetic latency is as long as that in the FMA arithmetic operation. Here, the Booth encode circuit 70, the CSA arithmetic unit 80, and the adder 90 are not necessary for the floating point addition/subtraction in the FMA arithmetic unit. Therefore, the timing control circuit 170, when performing the floating point addition/subtraction, controls the bypass selectors 153 and 154 to bypass the intermediate registers 53 and 55.
The bypass selector 154 acquires a format converted operand stored in the intermediate register 50 via the bypass 160 and stores the acquired format converted operand in the intermediate register 57, whereas the bypass selector 153 acquires a format converted operand for which the digit adjusting is performed by the digit adjusting shifter 100 via the bypass 161 and stores the acquired format converted operand as it is in the intermediate register 56.
As described above, in the execution of floating point addition/subtraction, the timing control circuit 170 controls the bypass selectors 153, 154 and bypasses the intermediate registers 53, 55, thereby making it possible to shorten the arithmetic latency. Further, as the operand stored in the intermediate register 50 (the operand stored in the operand register 30) can be selected by the bypass selector 154, it is not necessary to store “1” in the operand register 31 in the execution of floating point addition/subtraction, whereby the selection logic of the operand register can be simplified.
1: Intermediate registers 50, 51, 52
2: Intermediate registers 53, 54, 55
3: Intermediate registers 56, 57
4: Intermediate register 58
5: Intermediate register 59
6: Intermediate register 60
7: Result register 140
As shown in
Secondly, the process performed by the timing control circuit 170 when the FMA arithmetic unit performs the floating point multiplication is described. When the conventional technique is employed for the floating point multiplication, an arithmetic latency is as long as that in the FMA arithmetic operation. Here, the digit adjusting shifter 100, the absolute value adder 110, and the normalization shifter 120 are not necessary for the floating point multiplication in the FMA arithmetic unit. Therefore, the timing control circuit 170, when performing the floating point multiplication, controls the bypass selector 156 to bypass the intermediate register 58.
The bypass selector 156 acquires data (i.e. result of multiplication) stored in the intermediate register 57 through the bypass 162, and stores the acquired data in the intermediate register 59.
As described above, in the execution of floating point multiplication, the timing control circuit 170 controls the bypass selector 156 and bypasses the intermediate register 58, thereby making it possible to shorten the arithmetic latency. Further, the bypass selector 156 acquires the data of the result of multiplication stored in the intermediate register 57, and does not acquire the result of addition in the absolute value adder 110. Thus it is not necessary to store “0” in the operand register 32, whereby the selection logic of the operand register can be simplified.
Thirdly, the process performed by the timing control circuit 170 when a result of previous FMA arithmetic operation is employed in a subsequent arithmetic operation, i.e., in successive FMA arithmetic operations, is described. Even in the successive FMA arithmetic operations, according to the conventional technique, the subsequent FMA arithmetic operation is executed after data is transferred to the format converters 40 to 42 from the result register 140 through the register file/other arithmetic unit result register 10, or through the selectors 20 to 22 and operand registers 30 to 32, or through the selectors 23 to 25.
In these cases, the format converter 43 converts the internal format into the external format in the first arithmetic operation, and the format converters 40 to 42 convert the external format back into the internal format in the subsequent arithmetic operation. To eliminate such a redundant operation, the timing control circuit 170, in performing successive FMA arithmetic operations, controls the bypass selectors 150 to 152 to bypass the register file/other arithmetic unit result register 10 and the operand registers 30 to 32.
Then, the bypass selectors 150 to 152 acquire the data stored in the intermediate register 60 via the bypass 163, and store the acquired data as it is in the intermediate registers 50 to 52, respectively.
Thus, in the execution of successive FMA arithmetic operations, the timing control circuit 170 controls the bypass selectors 150 to 152 and bypasses the register file/other arithmetic unit result register 10, the operand registers 30 to 32, thereby making it possible to shorten the arithmetic latency.
The timing control circuit 170 controls the bypass selectors 150 to 152 to select the bypass 163 at the timing “7” in the first cycle. When the FMA arithmetic operations continues further, the timing control circuit 170 controls the bypass selectors 150 to 152 to select the bypass 163, at the timing “7” in a second, a third, to an n-th cycle.
The technique of bypassing the register file/other arithmetic unit result register 10 and the operand registers 30 to 32 in the successive FMA arithmetic operations may be used in the floating point addition/subtraction or the floating point multiplication as described above.
For example, when arithmetic operations of the floating point addition/subtraction are performed successively, the timing “7” as well as the timing “2” shown in
As described above, the FMA arithmetic unit according to the present embodiment has the timing control circuit 170, which controls the bypass selectors 153, 154 to bypass the intermediate resisters 53, 55 in the execution of floating point addition/subtraction, the bypass selector 156 to bypass the intermediate register 58 in the execution of floating point multiplication, and the bypass selectors 150 to 152 to bypass the register file/other arithmetic unit result register 10 and the operand registers 30 to 32 in the execution of successive FMA operations, thereby shortening the arithmetic latency and enabling an effective execution of floating point addition/subtraction, floating point multiplication and the like.
According to the embodiment of the present invention, one of the addition/subtraction unit and the multiplication unit is selected based on the type of arithmetic operation performed on numbers represented by floating points, and the arithmetic operation is executed on the numbers with the use of the selected unit, whereby the arithmetic latency can be shortened.
Although the invention has been described with respect to a specific embodiment for a complete and clear disclosure, the appended claims are not to be thus limited but are to be construed as embodying all modifications and alternative constructions that may occur to one skilled in the art that fairly fall within the basic teaching herein set forth.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2006/302534 | Feb 2006 | US |
Child | 12222521 | US |