The present invention relates generally to a numerical estimation for use with data processing, and more particularly to performing a logarithmic numerical estimation for use with data processing.
A general purpose processor typically cannot perform a logarithmic function as efficiently as other mathematical operations, such as addition, subtraction, and multiplication. A logarithmic function is likely to require many more processor cycles than a multiplication operation, for example.
According to the format specified by IEEE Standard 754 for Binary Floating Point Arithmetic, a normalized floating-point number, such as x, is represented by three groups of bits, namely, a sign bit, exponent bits, and mantissa bits. The sign bit is the most significant bit of the floating-point number. The next eight less significant bits are the exponent bits, which represent the signed biased exponent of the floating-point number. An unbiased exponent can be computed by subtracting the appropriate bias from the biased exponent. Furthermore, there are different biases for different floating point representations. Those of skill in the art understand that IEEE 754 is just one example of the type of numerical representations usable.
The 23 least significant bits are the fraction bits, where the value of the significand, here referred to as the mantissa, is computed by dividing the unsigned integral value represented by these 23 bits by 22 and adding 1 to the quotient. Although the number 23 above is used for single-precision floating point calculations, those of skill in the art understand that other counts of fraction bits can be used with other appropriate precisions.
Excluding the sign bit, a floating-point number x can be considered as a product of two parts corresponding to the exponent and the mantissa, respectively. The part corresponding to the exponent of x has the value 2exp, where exp is the unbiased exponent. Thus, log2(x) can be expressed by the sum of the logs of the above two parts (that is, log2 2exp+log2 (mantissa)). The log2(2exp) is the unbiased exponent, exp, itself, which is a signed integer. Thus log2(mantissa) is the positive fractional part of the floating-point result yF, because the value of the mantissa is between 1 (inclusive) and 2, therefore the value of yF is between 0 (inclusive) and 1, where yF=log2 (mantissa). Thus, the floating-point result y can be obtained as follows:
y=exp+log2(mantissa)
where exp is the unbiased exponent of x, and mantissa is the mantissa of x.
If a graph of the log2(mantissa) function is compared with a graph of a linear function (mantissa-1) within the range of 1 to 2 for the mantissa, the results from the above two functions are identical at the endpoints, while the results from the log2(mantissa) function is typically slightly greater than the results from the linear function between the endpoints.
Conventionally, if a logarithmic function with a low-precision estimation is needed, then the low-precision logarithmic function can be obtained simply by making small corrections to the linear function. On the other hand, if a logarithmic function with a higher precision estimation is required, the higher-precision logarithmic function can be obtained by means of a table lookup, sometimes in conjunction with point interpolation, as is well-known to those skilled in the art.
A floating-point number x, in the IEEE 754 format for example, is partitioned into a signed biased exponent part, expbias, and a fraction part, xF. An unbiased exponent, exp, is then obtained, such as by subtracting 127 or other appropriate value from the biased exponent. Next, an unnormalized mantissa is then obtained via a lookup table utilizing fraction part xF as the input.
If the biased exponent part is negative, both the unbiased exponent and the unnormalized mantissa will be complemented. The unbiased exponent is then concatenated with the unnormalized mantissa, with a binary point in between to form an immediate result. Subsequently, the immediate result is normalized by removing all leading zeros and the leading one, such as via left shifting, to obtain an normalized fraction part of the result y, and the exponent part of the result y is then generated by, for example, counting the number of leading digits shifted off and then subtracting that number from 8, or another number, as appropriate to the precision. At this point, the exponent part of the result y is unbiased. Finally, the floating-point result y is formed by combining the unbiased exponent part and the normalized faction part. A biased exponent can be obtained by adding 127 to the unbiased exponent.
However, there are problems associated with the above approach. For instance, employment of a look-up table with estimations can add to the complexity of the circuitry, thereby adding to the power consumption and cycle time for the calculations. This can be especially irksome in real-time graphics calculations, as both power consumption and completion time for the value estimation can be critical limiting factors.
For instance, conventional technologies with discontinuities and lack of accuracy could be egregious enough that various software developers refuse to use it. Software developers may have used slower software lookup tables or other methods, in certain games, rather than using the hardware estimations.
Therefore, there is a need for an improved estimation of numerical values in a manner that addresses at least some of the problems associated with conventional technological approaches to the estimations of numerical values.
The present invention provides for determining a floating point estimation of a number. Combinational logic is configured to produce a first value and a second value from an original value. A first adder is configured to accept the first values, wherein the adder is further configured to add the accepted first value to the original value. A second adder is configured to accept an output of the first adder and the combinational logic.
For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following Detailed Description taken in conjunction with the accompanying drawings, in which:
In the following discussion, numerous specific details are set forth to provide a thorough understanding of the present invention. However, those skilled in the art will appreciate that the present invention may be practiced without such specific details. In other instances, well-known elements have been illustrated in schematic or block diagram form in order not to obscure the present invention in unnecessary detail. Additionally, for the most part, details concerning network communications, electromagnetic signaling techniques, digital logic design techniques, and the like, have been omitted inasmuch as such details are not considered necessary to obtain a complete understanding of the present invention, and are considered to be within the understanding of persons of ordinary skill in the relevant art.
In the remainder of this description, a processing unit (PU) may be a sole processor of computations in a device. In such a situation, the PU is typically referred to as an MPU (main processing unit). The processing unit may also be one of many processing units that share the computational load according to some methodology or algorithm developed for a given computational device. For the remainder of this description, all references to processors shall use the term MPU whether the MPU is the sole computational element in the device or whether the MPU is sharing the computational element with other MPUs, unless otherwise indicated.
It is further noted that, unless indicated otherwise, all functions described herein may be performed in either hardware or software, or some combination thereof. In a preferred embodiment, however, the functions are performed by a processor, such as a computer or an electronic data processor, in accordance with code, such as computer program code, software, and/or integrated circuits that are coded to perform such functions, unless indicated otherwise.
Turning to
Turning to
In the system 200, the logarithmic value is still broken into a sign bit, a biased exponent and a fraction value (F). However, when generating the value to be added to the original fraction value by the combinational logic A 110, two numbers are generated, not just one. This creates significantly less discontinuity in the production of estimated logarithmic values.
Generally, combinational logic block A 110 takes as inputs the 11 most significant bits of the fraction part of the input. It produces two outputs, referred to as A and C. The combinational logic is designed such that as the value of the input fraction F increases, the sum of the values of A and C will be largest near the midpoint of the entire range of F (0000000000 to 1111111111). The sum of A and C will be the smallest (0) at the two endpoints. This allows the characteristic bowed out curve of a logarithmic function. The added accuracy and continuity of this algorithm is added without substantially diminishing performance as defined in clock speed. This is at least in part because of the A and C output configuration. The logic configuration of block A is more streamlined compared to a configuration wherein only one addend is produced. This is in part because portions of the logic to produce A and C only needs to be produced once, whereas that same logic involved in producing a single addend would need to be reproduced several times, diminishing performance of that logic 110. Also, some of the logical effort is essentially moved from determining a single addend to the logic that implements the three-way adder, which is blocks 120, 130, 140 and 150 combined. This streamlining also allows this design to be implanted without substantially diminishing performance when compared to conventional technologies. Separating logic into producing separate A and C values also allows some flexibility in implementation, since logic can be more easily moved across cycle boundaries.
Combinational logic block B 130 is also illustrated. Combinatorial logic block B 130 passes the leading fraction bit to be placed within the first memory block 151 of the results fraction. A MUX 140 accepts input for bits 2-5 from both the 11-bit adder and the combinatorial logic block B 130. The output of the MUX 140 is selected as a function of the combinatorial logic block A, which generates a signal which indicates whether on not the C value is a non-zero value. If C is a zero number, then the MUX uses the output of the 11 bit adder. However, if C is a non-zero number, the MUX uses the output of the combinational logic B.
However, those of skill in the art understand that the A and C outputs and the input mantissa can be added within a single adder, instead of the combination of boxes 120, 130, 140 and 150 in
In step 340, the unnormalized mantissa is generated through combinatorial logic. The floating point, expressed as 10110000000 is input into combinational logic A, thereby generating the values A=00001001111 and C=00001000000.These values are then combined as illustrated in
In step 350, in
It is understood that the present invention can take many forms and embodiments. Accordingly, several variations may be made in the foregoing without departing from the spirit or the scope of the invention. The capabilities outlined herein allow for the possibility of a variety of programming models. This disclosure should not be read as preferring any particular programming model, but is instead directed to the underlying mechanisms on which these programming models can be built.
Having thus described the present invention by reference to certain of its preferred embodiments, it is noted that the embodiments disclosed are illustrative rather than limiting in nature and that a wide range of variations, modifications, changes, and substitutions are contemplated in the foregoing disclosure and, in some instances, some features of the present invention may be employed without a corresponding use of the other features. Many such variations and modifications may be considered desirable by those skilled in the art based upon a review of the foregoing description of preferred embodiments. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the invention.