The present invention generally relates to computation devices that calculate the value of a function using hardware-implemented approximation techniques and, in particular, a technique for calculating such approximations based on LaGrange polynomials.
Computation devices that perform arithmetic operations are well known in the art. In order to perform such operations, these computation devices typically comprise an arithmetic logic unit or the like. The arithmetic logic unit or, as it is sometimes referred to, a math engine, implements circuitry in hardware used to perform separate arithmetic functions. Such functions range from relatively simple operations such as addition and multiplication to more complex operations such as producing/figuring exponents, logarithms, inverses and the like. While a variety of techniques exist in the prior art for approximating the values of more complex functions, a technique that is often used relies on tables of point values and slope values to approximate the output value of a function.
Referring to
In
f′(x)=f(xj)+mΔx (Eq. 1)
The difference between the estimated output value, ƒ′(x), and the true output value, ƒ(x), as shown in
The technique previously described with respect to
X or Numerical Value=(+/−)Mantissa×2Exponent (Eq. 2)
In essence, the mantissa represents the significant digits of a value and the exponent value represents a relative magnitude of the significant digits. A sign bit labeled S in the figures, indicates whether the mantissa value is positive or negative. In this manner, a very large range of values may be represented depending on the number of bits used. The mantissa, for example x, may be further divided into a first portion, labeled x0, and a second portion, labeled Δx. As shown, the first portion x0 comprises the most significant bits of the mantissa and defines the points as previously described. For example, if the first portion x0 comprises the five most significant binary bits, there are 32 points available. The remaining least significant digits define the second portion illustrated as Δx in
Additionally, sign/exponent processing 310 is performed on the input value sign, s, and exponent in order to provide the output value sign and exponent, as shown in
In order to operate upon the exponent, it therefore becomes necessary to first remove the offset when processing the exponent and, when processing is completed, to add the offset value once again. Additionally, the nature of the function being approximated affects the processing of the exponent. For example, where an inverse function is being implemented, processing of the true value of the exponent can be as simple as inverting each binary bit of the biased exponent value and then subtracting two (one if the input is an exact multiple of 2.0). In another example, implementation of a square root function requires subtracting the biased exponent value from 381 (383 if the input is an exact multiple of 4.0), then dividing by two. A third example would be the logarithm function where the input exponent is simply unbiased and concatenated as an integer value to the fixed point fractional mantissa result. Such sign and exponent processing is well known to those having ordinary skill in the art.
Regardless, as can be seen in
The DirectX9 standard typically provides a maximum absolute error equivalent to only 9 to 10 bits of precision when calculating sine and cosine function values. Further, the DirectX9 standard, if implemented in software, typically requires 8 instructions to calculate sine and cosine function values. Since a pipelined hardware implementation of the DirectX9 standard may require up to four twenty-four bit hardware multipliers, such an implementation would be relatively expensive and, therefore, costly to implement. Additionally, the size of the tables required to achieve this level of precision using the implementation shown in
According to another known technique, the Taylor series approximation may be implemented to produce the sine and cosine functions. However, to provide a high level of precision, such as 8 or 9 bits of precision, a large amount of processing resources is required to implement the Taylor series approximations.
According to another technique, a floating point argument of a function addresses a floating point interpolating memory. A memory and a decoder produce coefficients in response to a floating point exponent. A polynomial evaluator produces a floating point representation of the evaluated function. However, since the floating point argument addresses the memory, additional computations are required in order to convert the floating point argument into a value for addressing the memory. As a result, additional processing and additional complexity are required, resulting in increased cost and/or increased delay in processing and evaluating the polynomial.
The present invention provides a technique for approximating output values of a function based on LaGrange polynomials. More particularly, the present invention takes advantage of the superior accuracy of LaGrange polynomials using a hardware implementation that requires substantially less circuitry that would otherwise be required to directly implement a LaGrange polynomial. To this end, the present invention relies on a factorization of a LaGrange polynomial that results in a representation of the LaGrange polynomial requiring substantially less hardware and a relatively modest amount of memory to implement tables.
With this simplified representation, an output value of a function may be determined based on an input value comprising an input mantissa and an input exponent. Based on a first portion of the input mantissa, a point value is provided. Additionally, at least one slope value based on the first portion of the input mantissa is also provided. Each of the at least one slope value is based on a LaGrange polynomial approximation of the function. Thereafter, the point value and the at least one slope value are combined with a second portion of the input mantissa to provide an output mantissa. Likewise, conventional techniques for processing an exponent value are used to process the input exponent value. In a preferred embodiment, where an exponential, sine or cosine function is being implemented, the input mantissa is first converted to fixed point format such that the input value falls within a valid range of the function. Otherwise, the input mantissa is taken in unchanged, but treated like a fixed point value within the valid range of the function. Based on this technique, a single set of hardware may be used to implement a variety of functions such as a reciprocal function, a reciprocal square root function, an exponential function and a logarithmic function. Furthermore, relatively high precision is achievable using a relatively simple hardware embodiment. This technique may be implemented in a variety of computing platforms and, in particular, in a graphics processing circuit.
The present invention may be more fully described with reference to
Referring now to
In general, LaGrange polynomials offer a technique for interpolating (i.e., approximating) an output value of a function, ƒ(x), based on a set of known input and output values. This is illustrated in
Simple inspection, however, of Equation 4 makes it clear that a substantial number of multiplications and additions would be require to directly implement a third-order LaGrange polynomial. In order to simplify the implementation of Equation 4, equivalent values for the sub-points, x01, x02 and x1, illustrated in Equations 5-7, are substituted into Equation 4, which is subsequently expanded.
x01=x0+h (Eq. 5)
x02=x0+2h (Eq. 6)
x1=x0+3h (Eq. 7)
The resulting third-order equation is thereafter factored for successive powers of the quantity (x−x0), i.e., 1, (x−x0), (x−x0)2 and (x−x0)3. For ease of illustration, it is noted that Δx=x−x0, as illustrated in
As shown in Equation 8, the approximation reduces to a series of constant values multiplied by successive powers of the quantity Δx. The particular values for the constants a0 through a3, illustrated in Equation 8 for a third-order LaGrange polynomial, are shown in Equations 9-12 below.
Recall that the point values (i.e., ƒ(x0), ƒ(x1) and ƒ(x01)) corresponding to the point and sub-points are known values and are dependent upon the particular function being implemented. As a result of this simplification, Equation 8 may be implemented in a relatively straight forward fashion, as is further illustrated in
Referring now to
Referring now to
The first portion of the input mantissa, x0, is used to index the tables 704-710 to provide the corresponding constant values. Likewise, the second portion of the input mantissa, Δx, is provided as an input to the multipliers, having reference numerals 712, 714, 718. Multiplier 712 multiplies the first constant, a1, with a first-order power of Δx, as shown. Multiplier 714 provides a second-order power of Δx, which is thereafter multiplied by the second constant value, a2, by multiplier 716, as shown. The output of multiplier 714 is also provided as an input to multiplier 718 which, in turn, provides a third-order power of Δx. The third-order power of Δx is thereafter multiplied by the third constant, a3, by multiplier 720. The resulting products are provided to an adder 722 along with the point value, a0, to provide a sum representative of the mantissa of the approximated output value. The output of the adder 722 is thereafter concatenated to the unbiased input exponent and provided to a fixed-to-float conversion circuit 724, which converts the fixed point value to floating point format using known techniques. Preferably, this conversion to floating point format is used only if the function being implemented is the logarithmic, sine or cosine function and the final result is desired to be in floating point format.
The present invention overcomes the limitations of prior art techniques by providing a relatively inexpensive implementation of LaGrange polynomials while simultaneously providing substantially similar or better accuracy than these prior art techniques. For example, the implementation of the third-order LaGrange polynomial illustrated in
Of equal importance, the hardware implementation illustrated in
The present invention substantially overcomes the limitations of prior art techniques for approximating function output values. To this end, a simplified form of LaGrange polynomials is used, providing greater accuracy for a variety of functions using a single set of implementation hardware. As a result, a cost benefit is realized because different functions may be implemented using the same hardware, while still achieving the same or better accuracy.
As shown in
The graphics processing circuit 402 may be embodied by a dedicated graphics coprocessing chip, one or more processors such as a host processor or any other similar device that would benefit from an improved technique for approximating the value of arithmetic functions. An example of a suitable circuit that may benefit from the present invention is the “RADEON” family of graphics processors and its successors by ATI Technologies, Inc. According to one embodiment, the present invention may be incorporated into any computing device such as a host processor executing a software application or driver or any suitable circuit that performs arithmetic functions.
The math/logic engine 404 comprises the circuitry used to implement arithmetic functions such as, for example, a scalar engine, 2D engine, 3D engine or a programmable shader engine as is known in the art. For example, the sine and/or cosine function may be used to calculate the shading applied to an object in response to a light source. For example, a shading engine may include a vertex and a pixel shader to support 2D operations such as block moves in a screen or MPEG motion compensation during playback. Alternatively, the math engine may compute the contents required rather than store and recall them in tables within memory 410.
It will be recognized that all or some of the disclosed operations may be useful as applied to printers or other devices. For example, the disclosed processor, circuits or graphic processor(s) may process information and/or output information in any suitable color space including but not limited to Y,U,V, RGB, YPbPr or CMYK (cyan, magenta, yellow, black) color spaces. Suitable considerations should be taken into account when converting from RGB to CMYK or vice versa or between any two color spaces. For example, as is known, the ink type, paper type, brightness settings, and other factors should be considered in converting from or to RGB space and CMYK space as a color displayed on a display screen may be different from that output by a color printing operation.
The CMYK color space relates particularly to subtractive color technologies, where adding more color takes a pixel or dot closer to black, just as RGB relates to additive color technologies (where adding more color takes a pixel or dot closer to white). As such, if desired, pixel information, or dot color information, may be processed and/or output for any suitable display medium including electronic display screens or for printers on display medium such as paper, film or any other suitable article.
As shown in step 1110, the point and slope generator 810 receive data representing an input value 1000 including the fixed point input mantissa 802, wherein the data representing the fixed point input mantissa 830 further comprises the most significant bit portion X0 and the least significant bit portion Δx 832.
As shown in step 1120, the point and slope generator 810 produces data representing a point value 840 produced or alternatively stored in the point generator 602, such as a table in memory, in response to the most significant bit portion of the fixed point input mantissa 830.
As shown in step 1130, the point and slope generator 810 produces data representing at least one slope value 850 produced or alternatively stored in the slope generator 1010, such as a table in memory, in response to the first portion of the fixed point input mantissa. Each of the at least one slope values 850 is based on a LaGrange polynomial approximation of the sine and/or the cosine function.
As shown in step 1140, combiner 820 combines data representing the point value 840, the at least one slope value 850 and the least significant bit portion of the fixed point input mantissa 832 to produce data representing an output mantissa 860.
As shown in step 1150, the combiner 820 stores data representing an output value 1020 including the output mantissa 860. According to one embodiment, the combiner 820 also stores data representing the output exponent 1024 and the sign-out output 1022.
According to the embodiment shown, four generators 704-710 each containing 26 points, generate data representing values of the point value ƒ(x0), slopes A1, A2, and A3 for the sine function in the range of 0 to π/4 and four generators of similar size generate corresponding values for the cosine function in the same range. Generators 704-710 may be one or more suitably programmed processors having associated memory containing instructions which when executed, cause the operations described herein. According to an alternative embodiment, generators 704-710 may be implemented as a memory containing look-up tables as is known in the art.
The fixed point input mantissa X0 830 represent the upper 5 bits of which represent the points x0, x1, etc., and the lower bits represent Δx 832. As described above, the output mantissa 860 is calculated by taking the 26-bit point function value ƒ(x0) and adding the three interpolation terms A1*Δx, A2*Δx2, and A3*Δx3. The three interpolation terms have a resolution of 20-bits, 16-bits, and 12-bits respectively and are aligned with the least significant bit of the point function value for the add. The resulting output mantissa 860 is a fractional value. The output mantissa 860 is normalized so the leading one is moved to the most significant bit position and then the floating point output exponent 1024 is calculated. The sign bit 1022 is set as shown in the table below. According to one embodiment, the mantissa, exponent and sign are combined and converted to IEEE biased single precision format, as is known in the art.
The method shown in
As shown in
Among other advantages, one advantage of this method over the Taylor series implemented in the DirectX specification is greater precision. The maximum error of a Taylor series is given by the next term in the series beyond what is used for the approximation, resulting in approximately eight bits of precisions.
With a step size n of 2−5, max error for the approximation method described herein is: x= 1/24 h4= 1/24(2−20)=0.0000000397 or 24 bits precision. According to one embodiment, either the sine or cosine is calculated in one clock cycle using a pipelined circuit with 5 multipliers and a 4-input adder. For comparison with the Taylor method described in the DirectX specification, the computations instead could be performed in an 8-clock macro in the following manner:
P(x)=f(x0)+(x−x0)(P1+(x−x0)(P2+(x−x0)P3))
The operation would start with an initial subtraction to find Δx, (x−x0), followed by three multiply-accumulate operations for each function. If the initial subtraction used to move the angle into the first quadrant is added, then there are two subtractions and six multiply accumulates to get both function results. This compares with the three multiply-accumulates, three multiplies, and two adds needed for the Taylor method described in the DirectX specification.
The present invention substantially overcomes the limitations of prior art techniques for approximating function output values. To this end, a simplified form of LaGrange polynomials for approximating the sine and cosine functions are used, which provides greater accuracy for a variety of functions using a single set of implementation hardware. As a result, a cost benefit is realized because different functions may be implemented using the same hardware, while still achieving the same or better accuracy.
The foregoing description of a preferred embodiment of the invention has been presented for purposes of illustration and description; it is not intended to be exhaustive or to limit invention to the precise form disclosed. The description was selected to best explain the principles of the invention and practical application of these principles to enable others skilled in the art to best utilize the invention and various embodiments, and various modifications as are suited to the particular use contemplated. For example, arithmetic functions other than those listed above may be implemented in accordance with the present invention. Thus, it is intended that the scope of the invention not be limited by the specification, but be defined by the claims set forth below.
This is a Continuation-In-Part of application Ser. No. 09/918,346, entitled TECHNIQUE FOR APPROXIMATING FUNCTIONS BASED ON LAGRANGE POLYNOMIALS having inventor David B. Clifton, an a filing date of Jul. 30, 2001 now U.S. Pat. No. 6,976,043, owned by instant assignee and herein incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
4583188 | Cann et al. | Apr 1986 | A |
5041999 | Nakayama | Aug 1991 | A |
5068816 | Noetzel | Nov 1991 | A |
5831878 | Ishida | Nov 1998 | A |
5951629 | Wertheim et al. | Sep 1999 | A |
6115726 | Ignjatovic | Sep 2000 | A |
6240433 | Schmookler et al. | May 2001 | B1 |
6373535 | Shim et al. | Apr 2002 | B1 |
6711601 | Inoue et al. | Mar 2004 | B2 |
Number | Date | Country | |
---|---|---|---|
20050071401 A1 | Mar 2005 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09918346 | Jul 2001 | US |
Child | 10987487 | US |