1.1. Field of the Invention
The present invention relates to a method and circuit for performing multiply-operations in an arithmetic unit of a computer processor.
1.2. Description and Disadvantages of Prior Art
When performing a multiply operation with a multiplicand A and a multiplier C, a product P=A*C is calculated by adding up a plurality of partial products, for example in a Wallace tree based procedure and architecture. A schematic overview on such an exemplary prior art multiplier implementation is given in
In
P=A*C.
The prior art Wallace tree 14 is schematically depicted in
In order to be able to perform a proper setting of condition code and overflow control signals in the above multiplication scheme, it is required to detect zeros in the end result of the multiplication.
As shown in
Then, by aid of the zero-detect signals the condition code and overflow setting in step 130 can be performed.
Since this logic 19 is slow it adds either one cycle to the condition and overflow setting or makes the timing of the pipeline cycle longer. In high performance computer this can not be accepted because all cycles are “squeezed” out to the limit.
1.3. Objectives of the Invention
It is thus an objective of the present invention to provide a method and respective electronic circuit, wherein the zero detection is completed earlier.
This objective of the invention is achieved by the features stated in enclosed independent claims. Further advantageous arrangements and embodiments of the invention are set forth in the respective subclaims. Reference should now be made to the appended claims.
The present invention is based on the idea to use existing leading zero anticipation (LZA) hardware—i.e., an LZA circuit, which exists usually in floating point processor adders for calculating the number of leading zeros for operand normalization purposes—also for performing a partial sum zero detection in multiply operations.
More precisely, according to the invention a method and respective system circuit are disclosed for performing a multiply-operation in an arithmetic unit of a computer processor, wherein zeros of the product bit string must be detected, wherein the product bit string is built by the addition of respective two add-operands, wherein a) a LZA circuit is fed with two corresponding substrings of the add-operands excluding their two MSB-most and the LSB-most margin bits, b) reading said two MSB-most and the LSB-most margin bits directly from the addition result of the two add-operands, and c) detecting a full zero product bit substring, when both, LZA circuit and said two MSB-most and the LSB-most margin bits from the addition result yield zero results.
According to the invention, the zero detection in partial sums is basically done by an LZA circuit dedicated for different purposes, i.e., for operand normalization, and can be started in parallel with the addition of the above-mentioned partial sums. This is advantageous, as the LZA algorithm and hardware, is existent on the chip anyway as it is needed for the normalization of floating point numbers. The LZA circuit is run thru in parallel to the addition of the Wallace-tree partial results. One drawback of this algorithm is that the number of leading zeros is imprecise by one, e.g. a final correction is needed by checking the MSB of the final result.
Using the LZA output string, the partial zero result detection can be generated almost at the same time the result of the addition is generated. The LZA algorithm is disclosed for example in “Proceedings of the 15th IEEE Symposium on Computer Arithmetic (Arith'01) 1063-6889/01, 2001, IEEE, hereby incorporated by reference.
The advantage results that the adding in the final adder and the zero detect may be performed concurrently, and not subsequently.
Of course the LZA output bit string can be used for detecting different cases, as for example only “all 1” cases, with a respective post-connected evaluation logic analogously applied to evaluation logic 42 in
The present invention is illustrated by way of example and is not limited by the shape of the figures of the drawings in which:
With general reference to the figures and with special reference now to
The LZA circuit 40 is output-connected to the input of an OR gate 42, which has as many inputs as is the bit width of the LZA-subjected substring. In the example this width is assumed to comprise bits 32 to 63, i.e., a width of 32 bits. The single output of the OR gate 42 is fed as an input for a 4-bit NOR gate 44. The other three input bits for the NOR gate 44 are fetched from the add-result 22 coming from the adder 18 output. These bits are the two most significant bits (MSB) and the least significant bit (LSB) of the addition in adder 18.
This LZA circuit logic 40 generally enables the generation of a bit string which can be used to count the number of leading zeros or leading ones in a floating point or fixed point addition without having to perform the addition actually. According to this inventional embodiment the LZA circuit 40 is used to speed up the detection of a partial result of an addition to be zero, for example to speed up the determination if the partial result bits 32 to 63 are all zero. Note that also bits 16 to 31 or any other bit range of interest could be subjected to the LZA algorithm.
As shown from
As shown from
This is expressed in the following formula:
res(32−63)=0<=>NOT(res(32) OR res(33) OR lza(33−62) OR res(63))
The MSB result bits 32 and 33—denoted with reference sign 24 (see
If the LZA result evaluation done in OR gate 42 yields that the LZA bits 33-62 are all zeros, see step 320 A, then it is further required to check the final result bits no. 32, 33 and 63 also to be zero or not, see steps 320 B, 320 C, 320 D. If all of them are zero, then the partial sum is zero. In
As a person skilled in the art will appreciate, the LZA result bit string can be computed quite fast in relation to the prior art NOR gate 19 (
The detection timing requirement can be reduced to a simpler 4-way NOR gate as it is depicted in gate 44, since the lza bit string is available very early before the result, as compared to prior art having a 32-way NOR to generate the partial result zero signal. When using the binary tree structure, the prior art needs two stages of a 4-way OR followed by one stage of 4-way NOR (see
The techniques of the present invention can also be used to detect a partial string of ones because the LZA algorithm is valid for both leading string of ones or zeros. This may be used advantageously by a person skilled in the art would, wherein bit patterns are analyzed to be all ones or all zeros in a predetermined bit string range. Examples are given by an XML parser, or a pattern detection tool, be that graphics-based or text-based.