This invention relates to the field of data processing systems. More particularly, this invention relates to the field of data processing systems supporting vector floating point arithmetic.
It is known to perform vector normalisation operations upon a floating point vector V to generate a normalised vector that has unit length and points in the same direction as the vector V. This vector normalisation can be performed as the following sequence of calculations:
While the above sequence of operations work well for idealised mathematical real numbers, there is a problem that floating point numbers only represent mathematical real numbers within a limited range and with a limited degree of precision. A particular problem in the context of the above described vector normalisation technique is that the dot-product may overflow or underflow resulting in at least a loss of precision in the final result and potentially an unacceptable error.
It is desirable that whatever approach is taken to address this problem, there should not be an additional inaccuracy introduced in the determination of the normalised vector and that the amount of additional overhead, such as circuit area and processing time, should not increase unduly.
Viewed from one aspect the present invention provides an apparatus for processing data comprising:
processing circuitry configured to perform processing operations upon data values; and
decoder circuitry coupled to said processing circuitry and configured to decode program instructions to generate control signals for controlling said processing circuitry to perform processing operations specified by said program instructions; wherein
said decoder circuitry decodes an approximate reciprocal value generating instruction to generate control signals to control said processing circuitry to perform a processing operation upon a floating point number with an integer exponent value E and a mantissa value M to generate an approximate reciprocal value with an exponent C that is dependent upon E and a mantissa that represents 1.
The present technique recognises that calculating a reciprocal value of a floating point number is useful when performing normalisation as by multiplying each of the vector components by the reciprocal of the magnitude of the largest of the vector components before the normalisation is performed, the likelihood of overflow or underflow is reduced. However, a problem with this approach is that it may be computationally intensive to compute the reciprocal and there may be a loss of precision associated with the manipulation of the mantissa value of the vector components. The approximate reciprocal value generating instruction addresses these problems by determining an approximate reciprocal value with an exponent that is dependent upon the exponent of the component vector and a mantissa that represents a constant value of 1 (e.g. in the IEEE 754 format a mantissa of all zero bits which when combined with the implied leading “1” represents a value of “1”). Such an approximate reciprocal value may be used to scale the vector components such that they have values which are safe from overflow and underflow. Furthermore, the value of the mantissa being one has the result that the mantissa values of the vector components are not altered. Furthermore, the overhead associated with executing such an approximate reciprocal value generating instruction which is dependent upon the exponent value of the input floating point number may be relatively low compared to a full reciprocal value instruction.
In some embodiments, the exponent C of the approximate reciprocal value may be determined as −E. However, it is known that the integer exponent value of the input floating point number may be subject to a predetermined offset (e.g. in accordance with the IEEE754 Standard) and in this context the exponent value of the approximate reciprocal value may be formed as a bitwise inversion of the integer exponent value E. Such a bitwise inversion can be determined with little overhead and produces a value for the approximate reciprocal value that will render safe from underflow and overflow a normalisation operation.
Viewed from another aspect the present invention provides an apparatus for processing data comprising:
processing means for performing processing operations upon data values; and
decoder means for decoding program instructions to generate control signals for controlling said processing circuitry to perform processing operations specified by said program instructions; wherein
said decoder means decodes an approximate reciprocal value generating instruction to generate control signals to control said processing means perform a processing operation upon a floating point number with an integer exponent value E and a mantissa value M to generate an approximate reciprocal value with an exponent C that is dependent upon E and a mantissa that represents 1.
Viewed from a further aspect the present invention provides a method of processing data comprising the steps of:
performing processing operations upon data values; and
decoding program instructions to generate control signals for controlling said processing operations specified by said program instructions; wherein
said decoding step decodes an approximate reciprocal value generating instruction to generate control signals to control said processing step to perform a processing operation upon a floating point number with an integer exponent value E and a mantissa value M to generate an approximate reciprocal value with an exponent C that is dependent upon E and a mantissa that represents 1.
Viewed from a further aspect the present invention provides an apparatus for processing data comprising:
processing circuitry configured to perform processing operations upon data values; and
decoder circuitry coupled to said processing circuitry and configured to decode program instructions to generate control signals for controlling said processing circuitry to perform processing operations specified by said program instructions; wherein
said decoder circuitry decodes a modified multiply instruction that has as input operands two floating point numbers to generate control signals to control said processing circuitry when one of said two floating point numbers is a signed zero value given by (−1)SZ*0, where SZ is a sign value of said signed zero value, and another of said two floating point numbers is a signed infinity value (−1)SI*∞, where SI is a sign value of said signed infinity value, to generate as a modified multiply result value a predetermined value given by (−1)(SZ+SI)*PSV, where PSV is a predetermined substitute value.
When normalising a floating point vector it is necessary to perform multiplication operations, such as when scaling the input vector components to ensure they do not overflow or underflow. In this context, the handling of zero values and infinities becomes significant. When a signed zero value is multiplied by a signed infinity using a modified multiply instruction, the modified multiply result generated is given by an appropriately signed predetermined substitute value. This predetermined substitute value may then be subject to further processing during the normalisation operation facilitating the generation of an appropriate normalised vector.
The predetermined substitute value could have a variety of values. In some embodiments, the predetermined substitute value may be one, but in other embodiments it may be more convenient to generate the predetermined substitute value as two.
The behaviour of the modified multiply instruction described above, deviates from the normal floating point standards in the case of multiplying a signed zero by a signed infinity. However, for other values of the two input floating point numbers, the modified multiply instruction may operate in accordance with the IEEE Standard 754.
The modified multiply instruction may be a scalar instruction, but in other embodiments may be a vector instruction operating on a plurality of sets of input operands as this is useful in improving the speed and code density of normalisation operations at which the modified multiply instruction is targeted.
Viewed from another aspect the present invention provides an apparatus for processing data comprising:
processing means for performing processing operations upon data values; and
decoder means for decoding program instructions to generate control signals for controlling said processing means to perform processing operations specified by said program instructions; wherein
said decoder means decodes a modified multiply instruction that has as input operands two floating point numbers to generate control signals to control said processing means when one of said two floating point numbers is a signed zero value given by (−1)SZ*0, where SZ is a sign value of said signed zero value, and another of said two floating point numbers is a signed infinity value (−1)SI*∞, where SI is a sign value of said signed infinity value, to generate as a modified multiply result value a predetermined value given by (−1)(SZ+SI)*PSV, by where PSV is a predetermined substitute value.
Viewed from a further aspect the present invention provides a method of data comprising the steps of:
performing processing operations upon data values; and
decoding program instructions to generate control signals for controlling said processing operations specified by said program instructions; wherein
said decoding decodes a modified multiply instruction that has as input operands two floating point numbers to generate control signals to control said processing step when one of said two floating point numbers is a signed zero value given by (−1)SZ*0, where SZ is a sign value of said signed zero value, and another of said two floating point numbers is a signed infinity value (−1)SI*∞, where SI is a sign value of said signed infinity value, to generate as a modified multiply result value a predetermined value given by (−1)(SZ+SI)*PSV, where PSV is a predetermined substitute value.
Viewed from a further aspect the present invention provides a method of operating a data processing apparatus to normalise a vector floating point value having a plurality of components, each of said plurality of components including an integer exponent value and a mantissa value, said method comprising the steps of:
calculating a scaling value in dependence upon said vector floating point value;
scaling each of said plurality of components in dependence upon said scaling value to generate a scaled vector floating point value having a plurality of scaled components;
calculating a magnitude of said scaled vector floating point value; and
dividing each of said plurality of scaled components by said magnitude to generate a normalised vector floating point value; wherein
said step of calculating a scaling value generates a scaling value of 2C, where C is an integer value selected such that a sum of squares of said plurality of scaled components is less than a predetermined limit value.
This technique provides a method of operating a data processing apparatus suitable for normalising a vector floating point value having a plurality of components that serves to maintain accuracy of the normalised value while avoiding overflow or underflow and avoiding the introduction of undue additional overhead. In particular, the technique calculates a scaling value by which each of the components of the input vector are scaled to produce a plurality of scaled components before those scaled components are normalised. The scaling value is chosen as 2C where C is an integer value. Selecting such a scaling value avoids manipulation of the mantissa of the components thereby preserving their accuracy and reducing processing overhead. The value of C is selected such that the sum of the squares of the plurality of scaled components is less than a predetermined limit value so as to avoid overflow and underflow.
The predetermined limit value may be a maximum size floating point number that can be represented with the exponent value and the mantissa value of the floating point numbers being manipulated.
The step of calculating the scaling factor may include the step of identifying the highest integer exponent value B of the plurality of component values. The component values may be scaled by a scaling factor dependent upon the largest of the input components identified by such a step. In this context, the scaling factor may set C as equal to −B, where B is the highest integer exponent value of the plurality of components.
In some embodiments the exponent values are subject to a predetermined integer offset (e.g. in accordance with the IEEE Standard 754) and in this context it may be possible to generate the scaling factor C as equal to a bitwise inversion of the largest exponent value B of any of the input components.
When scaling each of the plurality of components, a multiplication may be performed. Such multiplications may identify the case of multiplying a signed zero by a signed infinity and in this circumstance generating a corresponding component within the scaled floating point vector to have a predetermined value which preserves the sign result and uses a predetermined substitute value as the magnitude. This helps preserve the vector direction within the normalised vector.
Viewed from another aspect the present invention provides an apparatus for normalising a vector floating point value having a plurality of components, each of said plurality of components including an integer exponent value and a mantissa value, said apparatus comprising processing circuitry configured to perform the steps of:
calculating a scaling value in dependence upon said vector floating point value;
scaling each of said plurality of components in dependence upon said scaling value to generate a scaled vector floating point value having a plurality of scaled components;
calculating a magnitude of said scaled vector floating point value; and
dividing each of said plurality of scaled components by said magnitude to generate a normalised vector floating point value; wherein
said step of calculating a scaling value generates a scaling value of 2C, where C is an integer value selected such that a sum of squares of said plurality of scaled components is less than a predetermined limit value.
Viewed from a further aspect the present invention provides an apparatus for normalising a vector floating point value having a plurality of components, each of said plurality of components including an integer exponent value and a mantissa value, said apparatus comprising processing means for performing the steps of:
calculating a scaling value in dependence upon said vector floating point value;
scaling each of said plurality of components in dependence upon said scaling value to generate a scaled vector floating point value having a plurality of scaled components;
calculating a magnitude of said scaled vector floating point value; and
dividing each of said plurality of scaled components by said magnitude to generate a normalised vector floating point value; wherein
said step of calculating a scaling value generates a scaling value of 2C, where C is an integer value selected such that a sum of squares of said plurality of scaled components is less than a predetermined limit value.
It will be appreciated that one complementary aspect of the present invention may provide a virtual machine comprising a computer program executing a program to provide an apparatus as set out in one or more of the aspects of the invention discussed above. Another complementary aspect of the invention may be a computer program product having a non-transitory form for storing a computer program for controlling a data processing apparatus to perform data processing in response to program instructions in accordance with the above described techniques.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
It will be appreciated by those in the technical field that the central processing unit 102 will typically include many further processing circuits, but these have been omitted from
It is possible to scale the input vector by a scaling vector k before its magnitude is determined. In this case, each of the component values is multiplied by k before it is summed with the other component values and then the square root taken. By selecting the scaling vector k appropriately it is possible to ensure that an out-of-range hazard does not occur in the calculation of the square root of the sum of the squares. Furthermore, if the scaling vector is chosen to have a mantissa value of 1 (i.e. k=2C), then problems of rounding inaccuracies in manipulating mantissa values may also be reduced. The scaling vector k may be selected such that the sum over all the vector components of (2Cai)2 is less than or equal to the maximum value that can be represented using the floating point format concerned.
The exponent value is set to C such that the approximate reciprocal value becomes 2C. C may be chosen to be −E, where E is the exponent value of the input floating point number of which the approximate reciprocal value is being calculated. In this case an additional operation is required to convert +/− infinity values to +/− zero values. However, in other embodiments where the exponent value is subject to a predetermined integer offset as illustrated in
When the scaling value is multiplied by the individual component values, it is important that zero values and infinities should be handled in a manner appropriate to vector normalisation. This may be achieved using a modified multiplication instruction FMULX as illustrated in
At step 116 an approximate reciprocal value generating instruction is applied to the component corresponding to the maximum absolute value selected at step 114 and the approximate reciprocal value generated is used as the scaling value. Step 118 uses a modified multiplication instruction to perform a modified multiplication of all the original input vector components by the approximate reciprocal value so as to generate scaled components. Step 120 multiplies each scaled component by itself to form a scaled square value. Step 122 sums these scaled squares to form a scaled dot product of the original input vector. Step 124 determines a square root of this sum. The output of this square root determination is the magnitude of the scaled input vector. Step 126 then divides each scaled component value by the output of the square root determination to form a normalised component.
The memory 4 stores a graphics program 12 and graphics data 14. In operation, program instructions from the graphics program 12 are fetched by the graphics processing unit core 2 and supplied to the instruction decoder 10. The instruction decoder 10 decodes these program instructions and generates control signals 16 which are applied to the processing circuitry in the form of a floating point arithmetic pipeline 6 and the bank of floating point registers 8 to configure and control this processing circuitry 6, 8 to perform the desired processing operation specified by the program instruction concerned. This processing operation will be performed upon the data values from the graphics data 14 which are loaded to and stored from the bank of floating point registers 8 for manipulation by the floating point arithmetic pipeline 6.
As will be understood by those in this technique field, depending upon the program instruction received, the instruction decoder 10 will generate control signals 16 to configure the processing circuitry 6, 8 to perform a particular desired processing operation. These processing operations could take a wide variety of different forms, such as multiplies, additions, logical operations, vector variants of the preceding operations and others. In accordance with the present techniques, the instruction decoder 10 is responsive to argument reduction instructions fetched from the memory 4 as part of the graphics program 12 to perform processing operations as will be described below. It will be appreciated that the circuits which perform these desired processing operations can have a wide variety of different forms and the present technique encompasses all of these different forms. In particular, a result value described with reference to a particular sequence of mathematical operations could be generated by following a different set of mathematical operations which produce the same result value. These variants are included within the present techniques.
The present techniques exploit the realisation that the numerator and denominator of the expression illustrated in line 24 will both be scaled by the same factor if the input vector is scaled. A mathematically convenient and low power, low overhead form of scaling which may be applied to the input vector 18 is a change in the exponent value corresponding to a scaling of the input vector 18 by a power of two. As this scaling has no effect upon the normalised vector 20, the scaling value selected can be such as to protect the dot-product from overflow or underflow. The exponent shift value C (a number added to or subtracted from the exponent value of all the input vector components) utilised can thus be selected within a range so as to ensure that a dot-product calculated from a vector which has been subject to the argument reduction instruction will result in no overflows or underflows with an adverse effect on the final dot-product result.
The value selected for C in this argument reduction instruction may vary within a permitted range. Any value of C within this permitted range would be acceptable. This range is delimited by identifying a value B which is a maximum exponent value among the input components and then setting C as an integer such that B+C is less than 190 (corresponding to a value Edotmax) and such that B+C is greater than 64 (corresponding to Edotmin). The value 190 in this example corresponds to a first predetermined value and the value 64 corresponds to a second predetermined value. The value of C is chosen to be an integer such that B+C lies between the first predetermined value and the second predetermined value. This sets the magnitude of the largest result component to a range that is safe from overflow and underflow. The end points of the acceptable range may be adjusted in embodiments in which it is desired to protect a dot-product composed of a sum of the multiples of many result components from overflow (this risk increases as the vector length increases).
If the input vector is free from not-a-number components and infinity components as checked at steps 26 and 30, then processing proceeds to step 34 where an upper most P bits of the exponent values of each of the input components is extracted to form values Ehoi. Step 36 then sets a value B to be a maximum of the Ehoi values extracted at step 34. Step 38 sets an exponent shift value C to be 2(P-1)−B. This determined/selected exponent shift (scaling factor) is then applied to all of the input vector components in the remainder of the flow diagram. At step 40 an index value i is set to 0. Step 42 then selects the Ehoi value for the vector component corresponding to the current value of i and adds to this the value of C derived at step 38. Step 44 determines if the updated value of Ehoi is less than zero. If the value is less than zero, then step 46 sets the corresponding result vector component vi to be zero. If the determination at step 44 is that Ehoi is not less than zero or after step 46, then processing proceeds to step 48 where a determination is made as to whether or not there are any more input vector components vi requiring adjustment. If there are further such components, then step 50 increments the value of i and processing returns to step 42.
Step 54 initialise the value of i. Step 56 determines if the input vector component for the current value i is a positive infinity. If a determination at step 56 is that the input vector component is a positive infinity, then step 58 sets the corresponding result vector component to be +1. Processing then proceeds to step 60 where if there are any more input vector components to process, step 62 increments the value of i and processing returns to step 56. If there are no more input vector components to process then the infinity exception handling has completed.
If the determination at step 56 is that the current input vector component vi is not a positive infinity, then step 64 checks to see if this value is a negative infinity. If the value is a negative infinity, then step 66 sets the corresponding result component to −1.
If neither step 56 nor step 64 has detected an infinity value, then step 68 serves to set any non-infinity component within the result vector to have a magnitude of 0.
Step 74 generates a reciprocal square root of the scalar product. Step 76 then multiplies each of the scaled components (result components) by the reciprocal square root value generated at step 76. Comparison of the processing of
Number | Date | Country | Kind |
---|---|---|---|
1016071.1 | Sep 2010 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/GB2011/050497 | 3/14/2011 | WO | 00 | 5/22/2013 |