The present invention relates generally to a logic circuit for zero detection, and more particularly to a logic circuit and a method for detecting a zero result of an addition without performing the addition for two vectors containing signed integer values.
The SRT algorithm is a square root algorithm named after its originators, Sweeney, Robertson and Tocher. A hardware SRT implementation of a divide and square root algorithm uses a redundant data format to perform the inner loop of the algorithm. The partial remainder of the iteration is represented in two vectors of a length N containing signed integer numbers in the 2's complement representation. N is defined by the precision of the operands. The information whether the remainder is zero or not is needed for the rounding step of the operation.
The problem of fast detection whether two numbers sum to zero is not unique to SRT engines. It occurs in several corners of floating-point unit design; for example, in the exponent logic when checking for corner cases like overflow and underflow.
One known solution of the zero check is to use an adder. This method adds two input vectors and checks the result for any non-zero bit. The logic depth of such an implementation is 2*log(n)+3 without the or-reduction. The drawbacks of the method include additional hardware of an N-bit adder, the deeper logic tree to compute the result, and more power consumption.
Another known solution of the zero check is to use a leading zero anticipator with an additional compare of the result of the leading zero anticipator. The zero check using a leading zero anticipator uses two input vectors, performs the leading zero anticipation without adding the two vectors, and compares the result against the number of bits of the vectors. The logic depth of this implementation is between log(n)+7 and 1.5*log(n)+5. The drawbacks of the method include the additional leading zero anticipator, the deeper logic tree to compute the result, and more power consumption.
A method for zero detection of a sum of inputs without performing an addition is provided. The method comprises performing, by first one or more XOR gates in a logic circuit, a bitwise XOR operation for a first vector as a first input and a second vector as a second input, wherein the bitwise XOR operation for the first vector and the second vector generates a third vector, wherein the first vector and the second vector are signed N-bit 2's complement vectors. The method further comprises performing, by first one or more OR gates in the logic circuit, a bitwise OR operation for the first vector and the second vector, wherein the bitwise OR operation generates a fourth vector. The method further comprises performing, by second one or more XOR gates in the logic circuit, a bitwise XOR operation for the third vector and the fourth vector, wherein bit positions of the fourth vector are shifted by one bit to the left and the right end bit of the fourth vector is padded with a zero, wherein the bitwise XOR operation for the third vector and the fourth vector generates a fifth vector. The method further comprises performing, by a third XOR gate in the logic circuit, an XOR operation of a sign extension bit of the third vector and a sign extension bit of the fourth vector. The method further comprises performing, by a first AND gate in the logic circuit, an AND operation of a control signal and an output of the third XOR gate, wherein the control signal switches between a true mathematical zero check and a zero check for trailing N-bits.
A method for zero detection of a sum of inputs without performing an addition is provided. The method comprises performing, by first one or more XOR gates in a logic circuit, a bitwise XOR operation for a first vector as a first input and a second vector as a second input, wherein the bitwise XOR operation for the first vector and the second vector generates a third vector, wherein the first vector and the second vector are signed N-bit 2's complement vectors. The method further comprises performing, by first one or more OR gates in the logic circuit, a bitwise OR operation for the first vector and the second vector, wherein the bitwise OR operation generates a fourth vector. The method further comprises performing, by one or more XNOR gates in the logic circuit, a bitwise XNOR operation for the third vector and the fourth vector, wherein bit positions of the fourth vector are shifted by one bit to the left and the right end bit of the fourth vector is padded with a zero, wherein the bitwise XNOR operation for the third vector and the fourth vector generates a fifth vector. The method further comprises performing, by a second XOR gate in the logic circuit, an XOR operation of a sign extension bit of the third vector and a sign extension bit of the fourth vector. The method further comprises performing, by a first AND gate in the logic circuit, an AND operation of a control signal and an output of the second XOR gate, wherein the control signal switches between a true mathematical zero check and a zero check for trailing N-bits.
Embodiments of the present invention take advantage of the fact that two vectors must have a special structure if an addition leads to a zero result. Embodiments of the present invention describe how this special structure can be detected without using an addition or a leading zero anticipator. The mechanism can also be used in other implementations where a timing critical zero detect of the result of an addition of two vectors in the 2's complement representation is needed. The logic depth of the implementation of zero detection in the present invention is 3+ OR-reduction, compared to 2*log(n)+3+ or-reduction for the zero check using an adder and log(n)+7 to 1.5*log(n)+5 for the zero check using a leading zero anticipator. The advantages of implementation of the zero detection in the present invention are as follows. The implementation in the present invention has less timing delay than the zero check using an adder and the zero check using a leading zero anticipator. The implementation in the present invention uses less logic gates and therefore less area and less power are needed. The implementation in the present invention has a great advantage for the zero check of large input vectors.
For signed N-bit 2's complement vectors A and B, there are basically two types of zero checks. The first type is a true mathematical zero check, i.e., the numbers A and B mathematically add to zero. The operands get sign-extended prior to the addition; therefore, the sum exceeds the N-bit target width:
S(0: N)=A(0,0: N−1)+B(0,0: N−1)→zero=(S(0: N)==0)
An example of the first type is the zero check in the exponent calculation of floating-point units.
The second type is a zero check for trailing N bits. In the second type, A and B add to zero in the given target width.
S(0: N−1)=A(0: N−1)+B(0: N−1)→zero=(S(0: N−1)==0)
The difference from the first type is that the sum is considered zero when its trailing N bits are zero. The overflowing case which maps to zero is A=B=10 . . . 0. This type of the zero check is used in the divide and square root algorithm.
Referring to
C(0: N−1): A(0,N−1)XOR B(0,N−1)
The bitwise XOR operation of the two input vectors (A(0:N−1) and B(0:N−1)) is implemented by one or more XOR gates in a circuit. As shown by block 104, a bitwise OR operation of the two input vectors (A(0:N−1) and B(0:N−1)) results in vector D.
D(0: N−1): A(0,N−1)OR B(0,N−1)
The bitwise OR operation of the two input vectors (A(0:N−1) and B(0:N−1)) is implemented by one or more OR gates in a circuit.
Referring to
E(0: N−1): C(0,N−1)XOR [D(1,N−1)& ‘0’]
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
E(0: N−1): C(0,N−1)XNOR [D(1,N−1)& ‘0’]
Referring to
In another embodiment, referring to
Referring to
In yet another embodiment, the OR reduction operation shown by block 108 followed by the inverter shown by block 110 in
Based on the foregoing, a method and a logic circuit have been disclosed for detecting a zero result of an addition of two vectors without performing the addition. However, numerous modifications and substitutions can be made without deviating from the sprit and scope of the present invention. Therefore, the present invention has been disclosed by way of examples and not limitation.
Number | Name | Date | Kind |
---|---|---|---|
5020016 | Nakano | May 1991 | A |
5367477 | Hinds | Nov 1994 | A |
5581496 | Lai | Dec 1996 | A |
5798958 | Wong | Aug 1998 | A |
6424955 | Wong et al. | Jul 2002 | B1 |
8015230 | Tam | Sep 2011 | B2 |
8578196 | Barowski et al. | Nov 2013 | B2 |
20130262546 | Shinomiya et al. | Oct 2013 | A1 |
20140351308 | Tannenbaum et al. | Nov 2014 | A1 |
Entry |
---|
Lutz et al., “Early Zero Detection ,” 1996 IEEE International Conference on Computer Design: VLSI in Computers and Processors, Oct. 7-9, 1996, ©1996 IEEE, pp. 545-550. |
Weinberger, Arnold, “High-Speed Zero-Sum Detection,” Proceedings of the 3rd IEEE Symposium on Computer Arithmetic, Southern Methodist University, Dallas, Texas, Nov. 19-20, 1975, pp. 200-207. |
Appendix P List of IBM Patents or Applications Treated as Related Dated Oct. 20, 2017. Two pages. |
Kroener et al. Original U.S. Appl. No. 15/438,831, filed Feb. 22, 2017 as DE920160108US1. |
Number | Date | Country | |
---|---|---|---|
20180239589 A1 | Aug 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15438831 | Feb 2017 | US |
Child | 15788901 | US |