Division is one of the fundamental arithmetic operations performed in microprocessors, digital signal processors, and other types of processors. By way of example, such processors may be configured to perform integer division as well as floating-point division. Integer division typically takes more clock cycles to perform than floating-point division, including double precision floating-point division. Furthermore, the number of clock cycles required for integer division can vary depending on the operand values.
As a result, the power consumption associated with performance of integer division operations using conventional circuitry may in some cases be excessive and unpredictable. This can lead to a variety of related issues in the corresponding processors, as well as the computers, mobile telephones and other processing devices in which such processors are incorporated, including reduced battery life, power dissipation that approaches package thermal limits, and power supply regulator performance degradation.
One or more illustrative embodiments of the invention provide improved divider circuitry in which power consumption is reduced by performing quotient prediction based on estimated partial remainders. Such divider circuitry can be implemented, by way of example, in an arithmetic logic unit (ALU) of a microprocessor or other type of processor, or in read channel circuitry of a storage device.
In one embodiment of the invention, an integrated circuit comprises divider circuitry configured to perform a division operation. The divider circuitry iteratively determines bits of a quotient over multiple stages of computation. The divider circuitry is configured to estimate a partial remainder for a given one of the stages and to predict one or more of the quotient bits for one or more subsequent stages based on the estimated partial remainder so as to allow one or more computations to be skipped for said one or more subsequent stages. This reduces the power consumption in the divider circuitry relative to that which would otherwise be required if the computations were not skipped. The integrated circuit may be incorporated into a computer, a mobile telephone, a storage device or other type of processing device.
Embodiments of the invention will be illustrated herein in conjunction with exemplary data processing systems and associated divider circuitry and division algorithms. It should be understood, however, that embodiments of the invention are more generally applicable to any circuitry-implemented arithmetic operations that include division, such as integer division and floating point division, as well as related arithmetic operations such as computation of square roots and cube roots.
The ALU 110 further comprises divider circuitry 125 configured to perform division operations. As will be described in greater detail below, the divider circuitry 125 in this embodiment iteratively determines bits of a quotient over multiple stages of computation, and is configured to estimate a partial remainder for a given one of the stages and to predict one or more of the quotient bits for one or more subsequent stages based on the estimated partial remainder so as to allow one or more computations to be skipped for the one or more subsequent stages.
The particular configuration of data processing system 100 as shown in
Also, the divider circuitry 125 can be implemented in a wide variety of different types of data processing systems. Another embodiment of such a system, comprising a data storage device that incorporates divider circuitry 125, will be described in greater detail below in conjunction with
The divider circuitry 125 in one embodiment is configured to implement an integer division algorithm. The algorithm may be viewed as an example of what is more generally referred to herein as a modified non-restoring integer division algorithm. It allows division operations to be performed by the divider circuitry 125 at lower power consumption relative to a non-restoring integer division algorithm without the modification. More particularly, the modified algorithm utilizes an estimate of the partial remainder at a given stage in the computation process to predict the quotient bits for a certain number of subsequent stages, thereby allowing computations to be skipped for those stages such that power consumption is reduced. Such an arrangement also serves to speed up the computation process for division operations.
In this embodiment, a division operation may be characterized as
N=Q·D+R (1)
where N is the dividend, Q is the quotient, D is the divisor and R is the remainder. An exemplary non-restoring division algorithm without the above-noted modification for skipping of stages may comprise the following process in which bits of a quotient are iteratively determined over multiple stages of computation:
Initialization: Quotient Q0=0 and partial remainder R0=2−N+1−D.
Computation: For 0≦i<n recursively compute
In the above process, at least one addition or subtraction operation is required at each stage of the iterative computation in order to update the partial remainder and predict a single bit of the quotient. However, we have determined that if at any stage the magnitude of the partial remainder is small compared with the divisor, then we can predict the quotient bits and the partial remainder for several subsequent stages at once.
The divider circuitry 125 in the present embodiment may therefore be configured to determine, for a given stage of the computation, a particular number of subsequent stages for which computations may be skipped, as a function of an estimated partial remainder relative to a divisor. The partial remainder is estimated in this embodiment based on one or more of its most significant bits, which provides additional simplification relative to utilizing the entire partial remainder itself.
By way of example, assume that the partial remainder for stage i is denoted Ri and further assume that t denotes the number of leading bits of Ri that are identical, that is, have the same logic value. We have determined that if t≧2, then t-1 bits of the quotient can be predicted by the divider circuitry 125. In such an arrangement, if the t leading bits of Ri are 0, then the t-1 predicted bits of the quotient may be predicted as a 1 followed by at least one 0, and if the t leading bits of Ri are 1, then the t-1 predicted bits of the quotient may be given by a 0 followed by at least one 1. Thus, if the leading bits of the partial remainder Ri are 0, then the t-1 predicted bits of the quotient are 100 . . . and if the leading bits are 1, then the predicted quotient bits are 011 . . . .
The correctness of this prediction can be shown as follows, using Case 1 for positive partial remainders and Case 2 for negative partial remainders.
Case 1: Ri≧0
Note that since Ri has t leading 0 bits, 0≦2t-1Ri<2n, and since D is normalized, D≧2n−1. Accordingly,
0≦2t-1Ri<2D (4)
The predicted t-1 bit quotient string 100 . . . would be correct if and only if the resultant Ri+t-1 satisfies −D≦Ri+t-1<D. It is apparent that
R
i+t-1=2t-1Ri−D (5)
Subtracting D from both sides of the inequality results in the proper bound on Ri+t-1.
Case 2: Ri<0
In this case, Ri has t leading 1 bits giving −2n≦2t-1Ri<0. Combining this with the bound on D gives
−2D≦2t-1Ri<0 (6)
The predicted t-1 bit quotient string 011 . . . would be correct if and only if the resultant Ri+1−1 satisfies −D<Ri+t-1<D. Now,
R
i+t-1=2t-1Ri+D (7)
Adding D to both sides of the inequality results in the proper bound on Ri+t-1.
Referring now to
The modified non-restoring integer division algorithm in this embodiment is started in block 200 and proceeds to initialization in block 202. For purposes of the
A determination is then made in block 204 as to whether R is positive or negative. If R is less than zero, the algorithm proceeds down the left branch to block 206, and otherwise the algorithm proceeds down the right branch to block 212.
As indicated previously, the partial remainder is estimated for a given stage based on the two most significant bits of the partial remainder, denoted R[n] and R[n−1] in block 206 and 212, and an additional bit of the partial remainder, denoted R[n−2-step] in blocks 206 and 212, which is selected depending on the current step value.
If R<0 and it is determined in block 206 that all of the bits R[n], R[n−1] and R[n−2-step] are equal, the process moves to block 208, in which Q is predicted as 2Q+1 and a skip counter k is incremented by 1, but the partial remainder R is unchanged. Otherwise, the process moves to block 210, in which Q is predicted as 2Q+1, the partial remainder R is updated based on the skip counter k to 2kR−D, and then the skip counter k is reset to 1. After either block 208 or block 210, the step value is incremented in block 218. If the incremented step value is determined to be equal to n in block 220, the algorithm ends as indicated in block 222, and otherwise returns to block 204 as indicated.
Thus, if the determination in block 206 is yes, the quotient is updated without making any changes to the partial remainder and the algorithm proceeds back to block 204 and then to block 206 to evaluate the next set of bits R[n], R[n−1] and R[n−2-step] at the new step value for the updated quotient. No further quotient prediction is possible when any one of these bits does not match the others, in which case block 210 is executed.
The operation of the right branch of the algorithm is similar to that of the left branch as described above, starting with block 212. Accordingly, if R≧0 and it is determined in block 212 that all of the bits R[n], R[n−1] and R[n−2-step] are equal, the process moves to block 214, in which Q is predicted as 2Q and the skip counter k is incremented by 1, but the partial remainder R is unchanged. Otherwise, the process moves to block 216, in which Q is predicted as 2Q, the partial remainder R is updated based on the skip counter k to 2kR+D, and then the skip counter k is reset to 1. After either block 214 or block 216, the step value is incremented in block 218. If the incremented step value is determined to be equal to n in block 220, the algorithm ends as indicated in block 222, and otherwise returns to block 204 as indicated.
Thus, if the determination in block 212 is yes, the quotient is updated without making any changes to the partial remainder and the algorithm proceeds back to block 204 and then to block 212 to evaluate the next set of bits R[n], R[n−1] and R[n−2-step] at the new step value for the updated quotient. No further quotient prediction is possible when any one of these bits does not match the others, in which case block 216 is executed.
It should be noted that the particular division algorithm shown in
The exemplary modified non-restoring integer division algorithm described above in the context of
With continued reference to
The first multiplexer 306 is configured to select as the value m3 a particular one of a plurality of lower order bit outputs of the remainder register 302 responsive to a first count signal comprising bits c1 and c0 from the first counter 310. The second multiplexer 308 is configured to select a particular shifted version of outputs of the remainder register 302, also responsive to the first count signal from the first counter 310.
The first counter 310 in this embodiment is implemented as a Gray code counter that tracks the number of consecutive stages for which at least one addition or subtraction operation is skipped. As noted above, such skipping of computations in the present embodiment occurs when the estimated partial remainder is small relative to the divisor.
The second counter 312 is a step counter that keeps track of the current step of the division algorithm. The step value that it generates is utilized to select a particular bit position of the quotient register 304 for updating to a predicted value in a corresponding one of the computation stages.
Also included in the divider circuitry 125 is an exclusive-or (XOR) gate 314 having a first input adapted to receive the divisor D and a second input coupled to the most significant bit output m1 of the remainder register 302 via an inverter 315. The second input of the XOR gate 314 therefore receives the complement of m1. The output of the XOR gate 314 is provided to one input of an n+1 bit adder 316. The other input of the adder 316 is coupled to an output of the second multiplexer 308. The output of the adder 316 provides the current partial remainder for storage in the remainder register 302.
The quotient register 304 comprises for each of the n bits of the quotient a corresponding AND gate 317 and flip-flop 318. Each AND gate 317 has one of its inputs driven by a corresponding one of the bits step [0], step[1], step[2] . . . step [n−1] of the step value provided by the step counter 312, and its other input driven by a clock signal. The output of each AND gate 317 is provided to a clock input of the corresponding flip-flop 318.
The quotient register 304 in the present embodiment is therefore configured to include a set of individual flip-flops 318 each controlled by a signal from the step counter 312. The step counter sequence is configured such that it produces one logic transition at every step of the division algorithm. The step value output bits of the step counter 312 are used as gating signals for the quotient register 304 via respective ones of the AND gates 317. This ensures that only the individual quotient bit predicted for a particular step is updated in the quotient register 304 for that step. The update input signal is denoted q_new, and is applied to data inputs of each of the flip-flops 318. The quotient register can be reset by an applied reset signal, as indicated in the figure.
A transparent latch 320 is coupled between an output of the first counter 310 and a select line input of the second multiplexer 308. The transparent latch is controlled by a remainder register enable signal r_en. This signal r_en also gates the clock signal applied to the remainder register 302, via AND gate 322. As a result, the remainder register 302 is clocked only when the signal r_en is enabled.
Updating of the selected bit position of the quotient Q is performed in quotient register 304 as a function of the first count signal from the first counter and the most significant bit output m1 of the remainder register 302. More particularly, the output m1 is applied to one input of an XOR gate 324 that has an output driving the data inputs of the flip-flops 318. The two bits c1 and c0 of the counter 310 are applied to inputs of an OR gate 326 and the resulting output drives the other input of the XOR gate 324. The clock signal applied to counter 310 is gated by an AND gate 328 based on a counter enable signal c_en. The counter 310 is reset using a counter reset signal c_reset.
The partial remainder bits R[n], R[n−1] and R[n−2-step]corresponding to respective signals m1, m2 and m3 are utilized in the magnitude comparison process in the following manner. First, if m1=m2=m3=1, then we have a case where the magnitude of R is small compared to D and at least one more bit of the quotient can be predicted apart from the current predicted bit of the quotient. When this condition occurs, the skip counter 310 increments by one, and one bit of the quotient register 304 is updated. The contents of the remainder register 302 remain unaltered, by disabling the r_en signal.
If the remainder register were to be updated in every step, then we would have to examine the same bit positions to see if there is any opportunity to skip any updates to the partial remainder. However, since the contents of the remainder register 302 remain unchanged when the divider circuitry is in a prediction sequence, we need to examine the bit positions R[i-1], R[i-2], R[i-3] if we examined R[i], R[i-1], R[i-2] in the previous step of the algorithm. Accordingly, we need to examine only a single bit to the right of the last bit that was examined in the previous step. This is accomplished by the first multiplexer 306 which generates the signal m3. The first multiplexer 306 taps the bit position R[n−2] when the divider circuitry is in a non-prediction sequence, and taps successive bit positions to the right for successive steps when the divider circuitry is in a prediction sequence. The bit position to tap is controlled by the current count of the skip counter 310.
The second multiplexer 308 at the input of the adder 316 prepares the appropriate operands that need to be fed to the adder during either a non-prediction sequence or at the end of a prediction sequence. Since operands are generated only at these times, the current count of the skip counter 310 is passed through the transparent latch 320 controlled by the same r_en signal that gates the clocking of the remainder register 302.
The second multiplexer 308 is configured to perform a selection that is equivalent to a designated bit shifting operation. For example, the right-most input of the multiplexer 308 corresponds to a left shift of one bit position, which is equivalent to multiplication by two. Similarly, the next adjacent input of multiplexer 308 corresponds to a left shift of two bit positions, which is equivalent to multiplication by four. No actual shift is implemented, but bit positions corresponding to the shift are selected. Thus, for example, if R=011001, then the left shift of R is 110010, which can be achieved by ignoring the most significant bit and selecting all of the bits to the right of the most significant bit. For two left shifts, we ignore the two most significant bits and pick all other bits to the right of the two most significant bits. Accordingly, the multiplexer 308 implements a bit selection process that is equivalent to corresponding left shifts of R. Because a Gray code is used for the skip counter in this embodiment, the inputs of the multiplexer 308 are arranged in Gray code sequential order, such that 1-bit, 2-bit, 3-bit and 4-bit shifts are selected given mux select input bits of 00, 01, 11 and 10, respectively, supplied from counter 310 via latch 320.
Additional circuitry utilized to generate signals in the divider circuitry 125 is shown in
The circuitry of
The circuitry of
The circuitry of
The additional circuitry of
S1=R[n]R[n−2-step] (8)
S2=c0c1 (9)
S3=c0c1 (10)
S4=S1S2 (11)
c_reset=S3S4 (12)
S5=R[n]R[n−1]R[n−2-step] (13)
S7=
S8=c0c1 (16)
S9=S5S6 (17)
S10=S7S8 (18)
count—inc=S9S10 (19)
S11=c0c1 (20)
S12=S11count_inc (21)
r_en=S12c_resetstart_div (22)
The divider circuitry 125 as illustrated in the
It is to be appreciated that the particular divider circuitry shown in
For example, in another embodiment, the multiplexers 306 and 308 can be implemented as respective two-input multiplexers, and yet computations can still be skipped for any number of stages. Such an embodiment may be particularly desirable if the maximum number of stages for which computations can be skipped is large, but use of large multiplexers is not appropriate for the corresponding application.
In this embodiment, one of the inputs to the two-input version of multiplexer 308 will be the value of R left shifted by one position, and the other input will be the value of a new register referred to as R′. In blocks 210 and 216 in
As indicated previously, divider circuitry 125 can be implemented in a wide variety of different types of data processing systems. Another embodiment of such a system is the data processing system 400 shown in
It should be noted that the term “divider circuitry” as used herein is intended to be generally construed so as to encompass processor circuitry that implements division operations at least in part in the form of software that is executed in the processor. For example, at least a portion of the division algorithm of
As indicated above, embodiments of the invention may be implemented in the form of integrated circuits. In a given such integrated circuit implementation, identical die are typically formed in a repeated pattern on a surface of a semiconductor wafer. Each die includes divider circuitry as described herein, and may include other structures or circuits. The individual die are cut or diced from the wafer, then packaged as an integrated circuit. One skilled in the art would know how to dice wafers and package die to produce integrated circuits. Integrated circuits so manufactured are considered part of this invention.
Again, it should be emphasized that the embodiments of the invention as described herein are intended to be illustrative only. For example, other embodiments of the invention can be implemented using a wide variety of other types of divider circuitry and associated division algorithms, than those included in the embodiments described herein. These and numerous other alternative embodiments within the scope of the following claims will be readily apparent to those skilled in the art.