The invention relates to a digital logic circuit, and in particular relates to a digital logic circuit that can be used to efficiently calculate the reciprocal square root of a given value.
There are many known methods or techniques used to calculate the reciprocal square root of a given value, A. For completeness, the reciprocal square root of A is written
One of these methods applies Newton's method to the equation
After a final iteration, x will be approximately equal to the reciprocal square root of A.
According to the Newton method, in each iteration the following expression is evaluated
After the final iteration has been completed, the square root of A can be calculated, if required, by multiplying the result by A.
However, the conventional techniques used to implement this method in a digital logic circuit are relatively inefficient.
According to a first aspect of the invention, there is provided a digital logic circuit for calculating an estimate of the reciprocal square root of a number A using an iterative process, the digital logic circuit comprising circuitry for setting the value of A to have a fixed-point number format QW.Y, where W and Y are positive integers; and circuitry for setting the output value of the iterative process, the estimate of the reciprocal square root of the number A, to have the fixed-point number format Q(Y/2).(W+Y/2).
According to a second aspect of the invention, there is provided a digital logic circuit for calculating an estimate of the reciprocal square root of a number A using an iterative process, the digital logic circuit comprising circuitry for generating a new iteration xn+1 of the reciprocal square root of A from the previous iteration xn by (i) multiplying the previous iteration xn by the number A; (ii) multiplying the result of (i) by the previous iteration xn; (iii) subtracting the result of (ii) from 3; and (iv) multiplying the result of (iii) by half of the previous iteration xn.
According to a third aspect of the invention, there is provided a calculator unit for determining an initial value for use in a iterative process for calculating an estimate of the reciprocal square root of a number A, the calculator unit comprising circuitry for (a) rounding the number A to the nearest number of the form 2J, where J is an integer; (b) if J is odd, rounding J up to the nearest even number to give J′; (c) if J is even, setting J to J′; and (d) calculating 2−(J′/2) to determine the initial value for the reciprocal square root of A.
According to a fourth aspect of the invention, there is provided a digital logic circuit comprising circuitry for determining when to stop an iterative process used to calculate an estimate of the reciprocal square root of a number A, the circuitry determining the result of A.(xn)2, where xn is the value of the current iteration; and stopping the iterative process if the result is sufficiently close to 1.
According to a fifth aspect of the invention, there is provided a method of calculating an estimate of the reciprocal square root of a number A using an iterative process, the method comprising setting the value of A to have a fixed-point number format QW.Y, where W and Y are positive integers; and setting the output value of the iterative process, the estimate of the reciprocal square root of the number A, to have the fixed-point number format Q(Y/2).(W+Y/2).
The invention will now be described, by way of example only, with reference to the following drawings, in which:
The invention relates to four areas in which the prior art methods of calculating reciprocal square roots have been improved.
The first area to be described will be the derivation of the number format used to represent the output of the iterative method.
Assume that the numbers are represented in the standard fixed-point number format QW.Y, where W is the number of integer bits (for example 6) and Y is the number of fractional bits (for example 26). Thus, the largest number that can be represented is 2W for an unsigned number, and the smallest number that can be represented is 2−Y. The table in
In order to ensure that the number does not overflow (the largest number is 2Y/2), as well as providing the maximum possible resolution without any increase in the total number of bits in the calculation, in one embodiment of the invention, V is set to W+Y/2, which means that the output format is Q(Y/2).(W+Y/2).
So, if the input format is Q6.26 described above, then the output format will be Q13.19.
If the square root of A is desired instead of the reciprocal square root of A, the number format for
will be Q(6+13).(26+19) which is Q19.45.
To restore the original number format, the number needs to be shifted to the right by W+Y/2 bits, and the top Y/2 bits need to be truncated. This gives
(Q19.45)>>(19)=Q19.26
trunc(Q19.26)=Q6.45
Although this involves truncating the integer bits, there is no chance of overflow, as the square root of a number that has the format QW.Y is Q(W/2).V.
The second area in which the invention improves over the prior art is in the order in which the elements in equation (3) are carried out by a multiplier block.
In contrast, in the preferred method, (column 2), the current value xn, is multiplied by A in the first stage, the resulting value is multiplied by xn in the second stage, the result of the second stage is subtracted from 3 in the third stage, and the in the fourth stage, the value of the next iteration, xn+1, is determined by multiplying the output of the third stage by the current value, xn, and dividing by 2. The consequence of the preferred order of the first and second stages means that the output of the first stage has the number format QW.Y (since this stage is nearly the equivalent of the square root of A). Again, the output of the second and subsequent stages would be in the format Q(Y/2).(W+Y/2).
Thus, at lines 2 and 3, the number of integer and fractional bits in the input are set, and at lines 6 and 7 the number of integer and fractional bits in the reciprocal square root are determined.
In line 14, the number of iterations to find the reciprocal square root is defined, and at line 17, the initial value (“start_value(a)”) is defined. In lines 22-35, the method shown in column 2 of the table in
The third area in which the invention improves over the prior art is in the initialisation or selection of the initial value (“start_value(a)”). Conventionally, a look-up table can be used to determine an appropriate initial value for a given value of A.
However, it has been recognised that it is relatively simple to calculate the square root, or the reciprocal square root, of a number that is a power of 2. Therefore, a method of determining an initial value that takes advantage of this is shown in
Thus, in step 101, the value of A is determined. In step 103, A is rounded to the nearest number that is a power of 2. This can be given by 2J where W≦J≦(−Y). This can be implemented as a leading-zero detect circuit, which uses the position of the first “1” bit to determine the number of leading zeroes (Z). Therefore, J will be equal to W−Z−1 for unsigned numbers.
In step 105, if J is an odd number, the value of J is rounded up to the nearest even number to give J′. If J is an even number, J=J′.
In step 107, the initial value for the iteration, x0, is determined from x0=2−J′/2. Therefore, as J′ is even, the calculation of the initial value, x0, will comprise determining the reciprocal of a power of 2 (since 2−J′/2=2−K=½K, where K is an integer).
This approach to determining the initial value is more efficient in terms of memory than a look-up table, and is well suited to implementation in a field programmable gate array (FPGA) architecture.
It is possible to improve the overall accuracy of the estimated initial value by multiplying the result by 1/√2 (≈0.707) if the value of J was odd (this compensates for rounding J to the nearest even value in step 105). However, including this step increases the complexity.
The final improvement over the prior art methods is to terminate the iterative process early if it determined that the currently determined value of the reciprocal square root of A is within an acceptable amount of the correct value. This can be determined by examining the value of t2 from the second step of the preferred method in
An additional advantage of the proposed method is that the output of t1 (t1=A.xn) gives the square root of the current estimate (xn). This will be less accurate than explicitly calculating the square root of the output (xn+1), but is provided without any additional calculation.
It will be appreciated that although the digital logic circuit is shown as comprising various separate components, the digital logic circuit can be implemented in a single component such as a processor or a field-programmable gate array (FPGA) device.
The digital logic circuit 2 comprises an input 4 for receiving the value A for which the reciprocal square root is to be calculated. The input 4 is connected to an initial value calculator unit 6 and an input of a first multiplexer 8. The first multiplexer 8 has three data inputs and is controlled by a signal Cmux1.
The initial value calculator unit 6 comprises circuitry for determining an initial value for the reciprocal square root of A as described above. Namely, the unit 6 performs the method as described with reference to
The output of the second multiplexer 10 is stored in a register 12, and the value in the register 12 is provided to a multiply and shift unit 14, along with the output value of the first multiplexer 8. The multiply and shift unit 14 multiplies two values together to produce an output which is provided to an add unit 16.
In this implementation, the add unit 16 subtracts the output of the multiply and shift unit 14 from 3, and provides the resulting value to one of the inputs of the first multiplexer 8.
The output of the multiply and shift unit 14 is also provided to an input of the both the first multiplexer 8 and second multiplexer 10.
Thus, after initialising the digital logic circuit 2 (i.e. including resetting the register 12), the digital logic circuit 2 receives a value A for which the reciprocal square root is to be calculated, and the initial value calculator unit 6 determines an initial estimate of the reciprocal square root x0 using the method as described with reference to
Once this initial value x0 has been determined, it is passed to the second multiplexer 10, which is controlled by the control signal Cmux2 to output x0 to the register 12 where it is stored.
Using the initial value x0, the digital logic circuit 2 now determines the first iteration of the reciprocal square root of A in accordance with the method shown in
The multiply and shift unit 14 calculates the value of “t1” shown in column 2 of the table in
The first multiplexer 8 is then controlled by Cmux1 to output the value of t_new to the multiply and shift unit 14. However, the second multiplexer 10 is controlled by Cmux2 to maintain the output of the initial value x0 to the register 12 where it is stored and output to the multiply and shift unit 14.
The multiply and shift unit 14 then calculates the value of “t2” shown in column 2 of the table in
The add unit 16 calculates the value of “t3” as shown in column 2 of
The multiply and shift unit 14 then calculates the first iteration of the reciprocal square root of A x1 as shown in column 2 of the table in
The second multiplexer 10 is controlled by Cmux2 to output the value of t_new (x1) to the register 12, where it is stored.
The procedure can then repeat to determine x2, x3, and so on.
It will be appreciated that the digital logic circuit 2 will comprise some control circuitry (not shown) for generating the multiplexer control signals Cmux1 and Cmux2 and for initialising the calculator unit 6, register 12, multiply and shift unit 14 and add unit 16.
There is therefore provided a method and digital logic circuit that can be used to efficiently calculate the reciprocal square root of a given value.
Number | Name | Date | Kind |
---|---|---|---|
5956263 | Narita et al. | Sep 1999 | A |
6175911 | Oberman et al. | Jan 2001 | B1 |
6487575 | Oberman | Nov 2002 | B1 |
7599980 | Rarick | Oct 2009 | B1 |
20020116431 | Ho et al. | Aug 2002 | A1 |
20060184602 | Lutz et al. | Aug 2006 | A1 |