This application claims priority from GB App. No. 1117318.4, filed on Oct. 6, 2011, and which is incorporated by reference in its entirety herein.
The following relates to methods and apparatus for use in the design and manufacture of integrated circuits, and particularly to the design and manufacture of circuits that perform divisions.
When designing and manufacturing ICs, sophisticated synthesis tools such as Synopsis™ Design Compiler are used to convert a desired function which must be implemented in the IC into a set of logic gates to perform the functions. Functions which need to be implemented include add, subtract, multiply and divide. The synthesis tools seek to implement the desired functions in an efficient manner in logic gates.
The tools operate by converting a function to be implemented, such as divide by x, to what is known as register transfer level (RTL), which defines a circuit's behavior in terms of the flow of signals between hardware registers and the logical operations performed on these signals. This is then used to generate a high level representation of a circuit from which appropriate gate level representations and the ultimate IC design can be derived for manufacture, and an IC can then be made. If a synthesis tool is presented with division by a constant such as x/d, it will invariably use RTL designed for non-constant division. A designer could note that in the case of constant division an implementation of the form (ax+b)/2k could potentially make smaller ICs. The designer would then have to work out values for the triple (a,b,k) which would perform the task of x/d. As explained in the Summary below, the present inventors have appreciated that by representing integer division in the form (ax+b)/2k rather than the conventional x/d division input to an RTL generator, the division is implemented using a multiply-add implementation for various rounding modes.
Division is acknowledged to be an expensive operation to perform in hardware. However in the case where the divisor is known to be a constant, efficient hardware implementations can be constructed. Consider the division of an unsigned n bit integer x by a known invariant integer constant d:
For the purposes of the exposition we will assume that d is an odd integer larger than 1, the following schemes can be easily modified for even d by those skilled in the art. We consider an implementation of the form:
Where a, b and k are non negative integers. Note that without loss of generality we can assume that a is odd. The prior art in the case where the rounding used is round towards zero and d is an unsigned m bit number comes from [1] and can be succinctly summarised setting:
The second piece of prior art comes from [2] where the rounding mode used is round to nearest, d=2n−1 and x is the result of a multiplication of two unsigned n bit numbers a and b:
When a division is to be performed such as divide by d, the integer triple discussed above is generated and provided to a RTL generation unit, which produces the gate level circuits required as an input to a synthesis tool which then generates the hardware components required for manufacture.
Aspects include methods and apparatus to design an integrated circuit for performing invariant integer division for a desired rounding mode such as round towards zero, round to nearest and faithful rounding, and integrated circuits according to such design.
In an example, the necessary and sufficient conditions for a given integer triple of (a,b,k) to give the required answer for a desired rounding mode are produced. In the application of a hardware scheme an algorithm is presented which will fit into a synthesis flow and produce the most efficient hardware. In particular, we have appreciated that by representing integer division in the form (ax+b)/2k and implementing this, rather than the conventional x/d division input to an RTL generator, that the division is implemented using a multiply-add implementation for various rounding modes. Three rounding modes are described here but the principle can be extended to any rounding mode. Using such an approach results in a hardware implementation for the division which can have up to a 50% decrease in integrated circuit area required.
In accordance with one aspect, there is provided a method for manufacturing an integrated circuit for performing invariant integer division (x/d) comprises: deriving a integer triple (a,b,k) for a desired rounding mode and set of conditions where x/d=(ax+d)/2k; deriving an RTL representation of the (ax+d)/2k representation of the division using the integer triple; deriving a minimum value of k for a desired rounding mode and a set of conditions deriving a hardware layout from the RTL representation; and manufacturing an integrated circuit with the derived hardware layout.
Exemplary aspects of the disclosure are described with reference to three rounding modes. Other rounding modes may also be implemented.
We first present the necessary and sufficient conditions for a given triple of (a,b,k) to implement each of the three following rounding schemes.
Round Towards Zero (Rtz)
In this case we require:
Now the sawtooth function x mod d is discontinuous in x with peaks at x=md−1 where 1≤m≤floor(2n/d) and troughs at x=md for 0≤m≤floor(2n/d). It suffices to check that the upper bound error condition is met for md−1 and the lower bound error condition is met for md:
Now given that a and d is odd then ad−2k≠0. Depending on the sign of ad−2k different values of m will stress these inequalities. It follows that the necessary and sufficient conditions for the implementation of round towards zero mode for the IC design is:
Round to Nearest (RTN)
In this case we require:
Now the sawtooth function (2x+d) mod 2d is discontinuous in x with peaks at md−(d+1)/2 where 0<m≤floor((2n+1+d−1)/2d) and troughs at md−(d−1)/2 for 0<m≤floor((2n+1+d−3)/2d). It suffices to check the upper bound error condition is met for the peaks and the lower bound condition is met for the troughs:
Now given that a and d is odd then ad−2k≠0. Depending on the sign of ad−2k different values of m will stress these inequalities. It follows that the necessary and sufficient conditions for implementation of round towards nearest for the IC design is:
Faithful Rounding (FR1)
In this case we can return either integer that lies either side of the true side, if the true answer is an integer we must return that integer:
Now for the second case the sawtooth function x mod d is discontinuous in x with peaks at x=md−1 where 0<m≤floor(2n/d) and troughs at x=md+1 (note we are assuming x≠md) for 0≤m≤floor(2n/d). It suffices to check that the upper bound error condition is met for md−1 and the lower bound error condition is met for md+1:
Now given that a and d is odd then ad−2k≠0. Depending on the sign of ad−2k different values of m will stress these inequalities in the two cases. It follows that the necessary and sufficient conditions for implementation of faithful rounding is:
Minimal Hardware Implementation Scheme
Minimal hardware implementations in the IC will result from minimising the number of partial product bits in ax+b. The scheme used achieves this as follows:
1 Minimise k producing kopt.
2 For the range of acceptable values of a for a given kopt choose the one that results in the smallest constant multiplier. This can be accomplished by choosing a value for a which has the smallest number of non zero elements in a Canonical Signed Digit representation of a. This will result in aopt. Define this function as minCSD(x).
3 For the range of valid values for b having fixed kopt and aopt choose the one with smallest Hamming weight, as this minimises the number of partial products bits. If there are a range of numbers that have smallest Hamming weight, we choose the one that has smallest value as this will add 1s into the least significant bits of the array where the height of the array is smallest. Define the function which finds this value for numbers in the interval [a, b] as minHamm(a, b). Note that the minHamm(a, b) function can be computed as follows:
Now applying this scheme to the space of allowable (a,b,k) as derived by the rounding mode and set of conditions we can construct a minimal hardware implementation for each of the three rounding schemes:
RTZ Minimal Hardware Implementation when ad−2K>0
In this case we require:
Now note that the right hand side is strictly decreasing in b. So for any valid a, b and k we can always set b=0 and then the condition will still be met, plus it will cost less hardware to implement. Hence a minimal hardware implementation will have b=0. Thus our condition reduces to:
Given that a must be an integer we have a formula for kopt:
And kopt is the smallest such valid k hence:
Hence a=ceil(2kopt/d) is valid but a=ceil(2kopt/d)+1 is not valid. It follows that the there is only valid value for a when k=kopt. We can now state that the design which minimises k and satisfies ad 2k>0 is unique and is defined by:
RTZ Minimal Hardware Implementation when ad−2K<0
In this case we need:
Hence b must necessarily be in the following interval:
b∈[(2k−ad)└2n/d┘2k+a−ad−1]
This interval must be non empty so:
Given that a must be an integer we have a formula for kopt:
Where kopt is the smallest such valid k hence:
Hence a=floor(2kopt/d) is valid but a=floor(2kopt/d)−1 is not valid. It follows that the there is only valid value for a when k=kopt. We can now state that the design which minimises k and satisfies ad−2k<0 is unique in k and a and is defined by:
Where minHamm(a, b) returns the number of smallest value from the numbers of smallest Hamming weight found within the interval [a, b].
RTZ Minimal Hardware Design
Summarizing the previous sections we have the following algorithm:
Note that kopt+ is never equal to kopt−, otherwise if kopt=kopt+=kopt− then:
Simplifying these two conditions we get:
2((−2k−1)mod d)>(−2k)mod d
2(2k−1 mod d)>2k mod d
2(2k−1 mod d)>2k mod d>2(2k−1 mod d)−d
This is a contradiction as 2k mod d is equal to one of these limits.
RTN Minimal Hardware Implementation when ad−2K>0
In this case we need:
Hence b must necessarily be in the following interval:
This interval must be non empty so:
Given that a must be an integer we have a formula for kopt:
Where kopt is the smallest such valid k hence:
Hence a=ceil(2kopt/d) is valid but a=ceil(2kopt/d)+1 is not valid. It follows that the there is only valid value for a when k=kopt. The design which minimises k and satisfies ad−2k>0 is unique and is defined by:
Where minHamm(a, b) returns the number of smallest value from the numbers of smallest Hamming weight found within the interval [a, b].
RTN Minimal Hardware Implementation when ad−2K<0
In this case we need:
Hence b must necessarily be in the following interval:
This interval must be non empty so:
Given that a must be an integer we have a formula for kopt:
Where kopt is the smallest such valid k hence:
Hence a=floor(2kopt/d) is valid but a=floor(2kopt/d)−1 is not valid. It follows that the there is only valid value for a when k=kopt. The design which minimises k and satisfies ad−2k<0 is unique in k and a and is defined by:
Where minHamm(a, b) returns the number of smallest value from the numbers of smallest Hamming weight found within the interval [a, b].
RTN Minimal Hardware Design
Summarizing the previous sections results in the following algorithm:
Note that kopt is never equal to kopt−, otherwise if kopt=kopt+=kopt− then:
Simplifying these two conditions we get:
2((−2k−1)mod d)>(−2k)mod d
2(2k−1 mod d)>2k mod d
2(2k−1 mod d)>2k mod d>2(2k−1 mod d)−d
This is a contradiction as 2k mod d is equal to one of these limits.
FR1 Minimal Hardware Implementation when ad−2K>0
In this case we require:
└2n/d┘(ad−2k)<2k−b
Now note that the right hand side is strictly decreasing in b. So for any valid a, b and k we can always set b=0 and then condition will still be met, plus cost less hardware to implement. Hence minimal hardware implementations will have b=0. Thus our condition reduces to:
Given that a must be an integer we have a formula for kopt:
Where kopt is the smallest such valid k hence:
Hence a=ceil(2kopt/d) valid but a=ceil(2kopt/d)+1 is not valid. It follows that the there is only a valid value for a when k=kopt. We can now state that the design which minimises k and satisfies ad−2k>0 is unique and is defined by:
FR1 Minimal Hardware Implementation when ad−2K<0
In this case we need:
└2n/d┘(2k−ad)≤b<2k
Now b must live in a non empty interval so:
Given that a must be an integer we have a formula for kopt:
Where kopt is the smallest such valid k hence:
Hence a=floor(2kopt/d) is valid but a=floor(2kopt/d)−1 is not valid. It follows that the there is only a valid value for a when k=kopt. We can now state that the design which minimises k and satisfies ad−2k<0 is unique in k and a and is defined by:
Where minHamm(a, b) returns the number of smallest value from the numbers of smallest Hamming weight found within the interval [a, b].
FR1 Minimal Hardware Design
Summarising the previous sections we have the following algorithm:
Note that kopt+ is never equal to kopt−, else if kopt=kopt+=kopt− then:
Simplifying these two conditions we get:
2((−2k−1)mod d)>(−2k)mod d
2(2k−1 mod d)>2k mod d
2(2k−1 mod d)>2k mod d>2(2k−1 mod d)−d
This is a contradiction as 2k mod d is equal to one of these limits.
Invariant Integer Division Synthesiser
Example structure of a synthesis apparatus according to the disclosure that performs invariant integer division is depicted in
This shows a parameter creation unit 2 which has three inputs n, d and rounding mode. n is the number of bits to be used in the numerator of the division, d is the divisor, and the rounding mode is a selection of one of a plurality of rounding modes. Three examples are given here but others are possible.
The parameter creation unit 2 generates in dependence on the inputs n, d, and rounding mode, the integer triple (a, b, k) required by an RTL generator k to generate an appropriate RTL representation of the circuitry for performing the division for the said number of bits of n and rounding mode, and for additional conditions provided to the RTL generation. The RTL generator is computer controlled to generate an RTL representation of a division for the integer triple using additional conditions such as ad−2k<0.
The RTL representation is then output to a synthesis tool 6 which generates the hardware circuits required to implement the division on an appropriate part of an integrated circuit.
The algorithm in the parameter creation may be summarised as:
{k,a,b}=(k+<k−)?{k+,a+,min Hamm(Y+(k−a+))}:{k−,a−,min Hamm(Y−(k−,a−))}
Where
And
An unsigned n bit normalised number x is interpreted as holding the value x/(2n−1). Multiplication of these numbers thus involves computing the following:
We can apply the previously found results to implementing this design for the three rounding modes. In this case d=2n−1 and given that ab≤(2n−1) then 2n−1 in the previous sections will be replaced by (2n−1)2. Substituting these values into the previous sections gives rise to the following three rounding:
Note that the RTN case gives a generalisation and proof of the formula for such multiplication [2]. Note that the allowable interval for the additive constant in each case is [2n−1, 2n+1], [2n−1 (2n+1)−2, 2n−1(2n+1)] and no freedom for the FR1 case.
Alternative Implementations
Further implementations can be realized by those skilled in the art based on the following disclosures, to deal with the following situations:
In summary of the above,
Number | Date | Country | Kind |
---|---|---|---|
1117318.4 | Oct 2011 | GB | national |
Number | Name | Date | Kind |
---|---|---|---|
9753693 | Rose | Sep 2017 | B2 |
20050289209 | Robison | Dec 2005 | A1 |
20060095486 | Ferguson | May 2006 | A1 |
20060095494 | Kumar | May 2006 | A1 |
20080295056 | Wen et al. | Nov 2008 | A1 |
Number | Date | Country |
---|---|---|
0821303 | Jan 1998 | EP |
Entry |
---|
D. J. Magenheimer, L. Peters, K. Pettis, D. Zuras, “Integer multiplication and division on the HP precision architecture”, IEEE Trans. Comput., vol. 37, pp. 980-990, 1988. |
Blinn, “Three Wrongs Make a Right,” IEEE Computer Graphics and Applications, Nov. 1995, pp. 90-93. |
Ugurdag et al, “Hardware Division by Small Integer Constants,” in IEEE Transactions on Computers, vol. 66, No. 12, pp. 2097-2110, Dec. 1, 2017. |
Lee et al, “Accuracy-Guaranteed Bit-Width Optimization,” Oct. 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 25, No. 10, pp. 1900-2000. |
Robison, “N-Bit Unsigned Division Via N-Bit Multiply-Add,” 2005, 17th IEEE Symposium on Computer Arithmetic, pp. 1-9. |
Optimizing Integer Division by a constant Divisor Grappel Dr. Dobbs drdobbs.comlparallel/184408499. |
Correctly Rounded Constant Integer Divison via Multiply-Add cas.ee.ic.ac.uk/people/gac1/pubs/theoiscas12.pdf (Drane). |
Integer Multiplication and Division on the HP Precision Architecture Magenheimer IEEE Transactions on Computers vol. 37 No. 8, Aug. 1988. |
N Bit unsigned division via n-bit multiply-add Robison available on http://arith.polito.it.final/paper-104.pdf. |
Number | Date | Country | |
---|---|---|---|
20180239585 A1 | Aug 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13626886 | Sep 2012 | US |
Child | 15898455 | US |