This application claims foreign priority under 35 U.S.C. 119 from United Kingdom patent application No. 2316740.6 filed on 1 Nov. 2023, the contents of which are incorporated by reference herein in their entirety.
The invention relates to rounding a floating-point number in particular rounding a number represented in Extended Exponent Range (EER).
Most computing systems use number formats, typically in binary notation or base 2, for performing various computations. These number formats include fixed-point or floating-point number formats. A fixed-point number format can be relatively straightforward to work with, but can only represent a limited range of values. Therefore, floating-point number formats are used in most of the modern computing systems to provide a trade-off between range and precision.
A floating-point number comprises a mantissa (m) having a bit length of ‘b’ bits, an exponent (e) having a bit length of ‘a’ bits and optionally a sign bit (s) to represent a binary number. In some widely used formats such as IEEE-754 standard, the exponent is biased (i.e. offset) by a value (c) so as to represent numbers smaller than 1 and the exponent is also used to encode exceptional values at its end points. The bias (c) is typically calculated as 2k-1-1, where k is the number of bits in the exponent. For non-extremal values of e, a normal floating-point number x in IEEE-754 standard format represents (−1)s2e-c(1.m). The number is generally normalized and features a nonzero leading bit before the radix point, denoted 1.m in binary and as the significand n. When explicitly representing the whole significand, the first bit (a 1 in the case of normal numbers) may be termed the ‘leading significand bit’ to distinguish it from the mantissa bits also represented in the significand. For a 32-bit floating-point number, the value of the (biased) exponent is limited to a range of −127 to 128. However, in such cases, numbers smaller than 1.0*2−127 are represented or considered as zero. Hence, for extremal values of e, i.e. when e=0, the floating-point number x in IEEE-754 standard can be represented as a denormal number (−1)s21-c(0. m) (in other words, the leading significand bit of the explicitly represented whole significand is always a 0 for denormal numbers). This includes the value 0 which is obtained by letting m=0. Representing the floating-point number as a denormal number is convenient for enforcing gradual underflow. Thus, floating-point numbers can be used to represent very small or very large numbers precisely using scientific notation, in binary or in some other base. The use of floating-point numbers in arithmetic computations provides varying degrees of precision depending on the bit length or type of floating-point format used.
However, representing the floating-point number as denormal number adds further latency or area and cost to the critical path of the design especially for computations whereby the denormal number is normalized or rounded. This is because the leading 1 of the denormal number is in an arbitrary position. Consider the case when performing an arithmetic operation such as multiplication of two denormal numbers, or multiplication of a normal number and a denormal number, which may produce an output denormal number. Further, the output denormal number may need to be normalized or rounded. In order to perform operations such as normalizing the output denormal number, extra steps are performed to find the position of the leading 1. This increases the delay to the critical path and therefore such operations are expensive.
The same considerations apply when performing operations such as rounding the denormal number. Rounding is a process of replacing a precise number with an approximate value having a shorter, simpler, or more explicit representation. There are many known techniques or methods of performing rounding. Some of the methods include: Round Up; Round Down; Round Towards Zero; Round Away from Zero; Round To Nearest, Ties To Even and the like. While rounding a number, it is decided whether a number should be rounded down to the lower approximate value or rounded up to the upper approximate value based on the number of bits to which a floating-point number is rounded. The mantissa (m) is either truncated to round down or incremented by 1 and truncated to round up. A combination of guard, round and sticky bits (which is explained later) of the floating-point number is used to choose between the round up or round down option. Normally to save time in the critical path, in the case of normal numbers, the truncated output and the truncated incremented output are generated simultaneously. Further, based on the guard, round and sticky bits, one of the truncated output and the truncated incremented output is mux-ed out as the rounded output number. However, this is not possible in case of a denormal number where in the absence of additional processing the increment by 1 applies in an arbitrary position.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
According to a first aspect there is provided a method of rounding a floating-point number in an Extended Exponent Range, herein “EER”, that would be a denormal floating-point number represented in an Unextended Exponent Range, herein “UER”. The method may comprise the steps of receiving, at an arithmetic unit, a plurality of input numbers in the EER representation, each input number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi); performing an arithmetic operation to produce an output number in the EER representation comprising a sign bit (sa), an exponent bits (ea) and mantissa bits (ma); constructing a rounding mask based on the exponent bits (ea) computed by the arithmetic operation; and applying the rounding mask to the output number in the EER representation to round the output number to correct position as if rounding in the UER representation.
Optionally, each number among the plurality of input numbers is one of a normal number or a denormal number in EER representation.
Optionally, the output number is one of a normal number or a denormal number in EER representation.
Optionally, the rounding mask is a string of zeros and ones.
Optionally, constructing the rounding mask comprises pre-aligning the rounding mask with a leading 1 at the position of the weight 2(1-bias-mw), where mw is the number of mantissa bits and bias is the exponent bias in the UER representation.
Optionally, constructing the rounding mask comprises pre-aligning the rounding mask based on the exponent computed by the arithmetic operation (ea).
Optionally, constructing the predetermined rounding mask further comprises a step of normalizing the rounding mask by shifting the rounding mask to the left by the same number of bits required to normalize the output number.
Optionally, the method further comprises generating normalized mantissa bits (mr) of the output number.
Optionally, applying the rounding mask comprises performing a bitwise OR operation between the normalized rounding mask and normalized mantissa bits (mr) of the output number.
Optionally, the method further comprises deriving guard round and sticky bits based on the normalized mantissa bits (mr) of the output number and the rounding mask.
Optionally, the method further comprises determining and selecting a truncated output number by truncating the normalized mantissa bits (mr) of the output number or a truncated incremented output number by incrementing the normalized mantissa bits (mr) of the output number and truncating the normalized mantissa bits (mr) of the output number.
Optionally, the selection is based on the derived guard, round and sticky bits.
Optionally, determining the truncated output number comprises setting the non-representable trailing bits of the normalized mantissa bits (mr) of the output number to zero.
Optionally, determining the truncated incremented output number comprises incrementing the normalized mantissa bits (mr) at the first representable position and setting all the below trailing bits to zero.
According to a second aspect there is provided a hardware implementation for rounding a floating-point number in an Extended Exponent Range, herein “EER”, that would be a denormal floating-point number represented in an Unextended Exponent Range, herein “UER”. The hardware implementation can comprise arithmetic unit configured to: receive a plurality of input numbers in the EER representation, each input number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi); and perform an arithmetic operation to produce an output number in the EER representation comprising a sign bit (sa), an exponent bits (ea) and mantissa bits (ma); mask constructing unit configured to construct a rounding mask based on the exponent bits (ea) computed by the arithmetic operation; and rounding unit configured to apply the rounding mask to the output number in the EER representation to round the output number to correct position as if rounding in the UER representation.
Optionally, the mask constructing unit is configured to construct the rounding mask by pre-aligning the rounding mask based on the exponent bits (ea) computed by the arithmetic operation.
Optionally, the mask constructing unit is further configured to construct the rounding mask by performing a step of normalizing the rounding mask by shifting the rounding mask to the left by the same number of bits required to normalize the output number.
Optionally, the hardware implementation further comprises a renormalizing unit configured to generate normalized mantissa bits (mr) of the output number.
Optionally, the rounding unit is configured apply the rounding mask by performing a bitwise OR operation between the normalized rounding mask and normalized mantissa bits (mr) of the output number.
According to a third aspect there is provided computer readable code configured to cause the method of any of the above-mentioned variations of the first aspect to be performed when the code is run.
According to a fourth aspect there is provided a computer readable storage medium having encoded thereon the computer readable code of the third aspect.
According to a fifth aspect there is provided a method of rounding a floating-point number represented in an Unextended Exponent Range, herein “UER”, representation when represented in an Extended Exponent Range, herein “EER”, representation. The method may comprise receiving, at an arithmetic unit, a plurality of input numbers in the EER representation, each input number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi); performing an iterative arithmetic operation to produce a partial output number of an actual output number in the EER representation comprising a sign bit (sa), exponent bits (ea) and mantissa bits (ma); storing a tracking value in a mask tracking unit; updating a rounding mask in a first register based on the tracking value in the mask tracking unit; and applying the updated rounding mask to each partial output number in each step to round the actual output number to correct position as if rounding in the UER representation.
Optionally, each number among the plurality of input numbers is one of a normal number or a denormal number in EER representation.
Optionally, applying the updated rounding mask to the partial output number comprises performing bitwise operation between the updated rounding mask and the partial output number.
Optionally, the rounding mask stored in the first register is determined based on the mantissa bit length and the step size.
Optionally, the rounding mask is a string of zeros and ones.
Optionally, updating the rounding mask comprises: deriving a predetermined tracking value from a memory to the mask tracking unit, wherein the tracking value is predetermined based on the exponent bits (ei) of the input numbers and step size; incrementing the tracking value in the mask tracking unit by the step size; and shifting the bits of the compressed rounding mask in the first register to the left by one bit position every time the tracking value in the mask tracking unit is incremented.
Optionally, updating the rounding mask further comprises indicating to the first register to stop shifting the compressed rounding mask when the tracking value overflows.
Optionally, updating the rounding mask further comprises fanning out the bits of the shifted rounding mask registered in the first register by a factor of step size to be aligned with the mantissa bits of the significand.
Optionally, fanning out the bits of the rounding mask in the first register is performed by padding the bits of the rounding mask to either side to align the rounding mask with the mantissa bits of the significand.
Optionally, updating the rounding mask further comprises of shifting the fanned out rounding mask by a fixed amount based on a mask tracking offset of the tracking value.
Optionally, the mask tracking offset of the tracking value is determined based on the step size.
Optionally, the method further comprises deriving guard round and sticky bits based on the mantissa bits of the actual output number and the rounding mask.
Optionally, the method further comprises determining and selecting a truncated output number by truncating the mantissa bits of the output number or a truncated incremented output number by incrementing the mantissa bits of the output number and truncating the mantissa bits of the output number.
Optionally, wherein the selection is based on the derived guard, round and sticky bits.
Optionally, determining truncated output number is performed by setting the non-representable trailing bits mantissa bits of the output number to zero.
Optionally, determining truncated incremented output number is performed by incrementing the mantissa bits at the first representable position and setting all the below trailing bits to zero.
According to a sixth aspect there is provided a hardware implementation for rounding a floating-point number represented in an Unextended Exponent Range, herein “UER”, representation when represented in an Extended Exponent Range, herein “EER”, representation n.
The hardware implementation can comprise arithmetic unit configured to: receiving, at an arithmetic unit, a plurality of input numbers in the EER representation, each input number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi); and performing an iterative arithmetic operation to produce a partial output number of an actual output number in the EER representation comprising a sign bit (sa), exponent bits (ea) and mantissa bits (ma); mask constructing unit comprising: a first register configured to store rounding mask; a mask tracking unit configured to store a tracking value; wherein the mask constructing unit is configured to update the rounding mask based on the tracking value in the mask tracking unit; and rounding unit configured to apply the rounding mask to each partial output number in each step to round the actual output number to correct position as if rounding in the UER representation.
Optionally, the rounding unit applies the updated rounding mask to the partial output number by performing a bitwise operation between the updated rounding mask and the partial output number.
Optionally, the first register stores the rounding mask pre-determined based on the mantissa bit length and the step size.
Optionally, mask construction unit updates the rounding mask comprises the steps of: deriving a predetermined tracking value from a memory to the mask tracking unit, wherein the tracking value is predetermined based on the exponent bits (ei) of the input numbers and step size; incrementing the tracking value in the mask tracking unit by the step size; and shifting the bits of the compressed rounding mask in the first register to the left by one bit position every time the tracking value in the mask tracking unit is incremented.
Optionally, the mask construction unit updates the rounding mask by further indicating to the first register to stop shifting the compressed rounding mask when the tracking value overflows.
Optionally, the mask construction unit further comprises a mask fanout unit configured to update the rounding mask by fanning out the bits of the shifted rounding mask registered in the first register by a factor of step size to be aligned with the mantissa bits of the significand.
Optionally, the mask fanout unit performs fanning out the bits of the rounding mask in the first register is performed by padding the bits of the rounding mask to either side to align the rounding mask with the mantissa bits of the significand.
Optionally, the mask construction unit further comprises a shifter configured to updating the rounding mask by shifting the fanned out rounding mask by a fixed amount based on mask tracking offset of the tracking value.
According to a seventh aspect there is provided computer readable code configured to cause the method of any of the above-mentioned variations of the fifth aspect to be performed when the code is run.
According to an eighth aspect there is provided a computer readable storage medium having encoded thereon the computer readable code of the seventh aspect.
According to a ninth aspect there is provided a method of converting the format of a floating-point number. The method may comprise: receiving an input floating-point number in a first floating-point format, the input floating-point number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi); constructing a rounding mask based on the exponent bits (ei) of the input number; and applying the rounding mask to the input floating-point number to round the input floating-point number for correct representation in a second floating-point format.
Optionally, rounding mask is a string of zeros and ones.
Optionally, rounding mask comprises a leading 1 at the position of the weight 2(1-bias-mw).
Optionally, the method further comprises deriving guard, round and sticky bits based on the mantissa bits (mi) of the input floating-point number and the rounding mask.
Optionally, applying the rounding mask to the input floating-point number to round the input floating-point number for correct representation in a second floating-point format further comprises determining and selecting either a truncated output number by truncating the mantissa bits (mi) of the input floating-point number or a truncated incremented output number with an incremented least significant bit compared to the truncated output number.
Optionally, the selection is based on the derived guard, round and sticky bits.
Optionally, the mantissa bits (mi) of the input floating-point number comprise trailing bits that are non-representable in the second floating-point format, and determining the truncated output number comprises applying the mask to the mantissa bits (mi) of the input floating-point number to set the non-representable trailing bits to zero.
Optionally, the mantissa bits (mi) of the input floating-point number comprise a bit that is at the least significant representable position in the second floating-point format, and wherein determining the truncated incremented output number comprises applying the mask to the mantissa bits (mi) of the input floating-point number to increment the bit that is at the least significant representable position in the second floating-point format and to set any less significant bits to zero.
Optionally, constructing the rounding mask comprises pre-aligning the rounding mask based on the exponent bits (ei) of the input number.
Optionally, applying the rounding mask to the input floating-point number to round the input floating-point number for correct representation in a second floating-point format comprises: generating normalized mantissa bits in the second floating-point format based on the input floating-point number; a step of normalizing the rounding masked based on a bit shift required to normalize the input floating-point number in the second floating-point format; and applying the normalized rounding mask to the normalized mantissa bits in the second floating-point format.
According to a tenth aspect there is provided a hardware implementation for converting the format of a floating-point number. The hardware implementation can comprise: an input configured to an input floating-point number in a first floating-point format, the input floating-point number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi); a mask alignment unit configured to construct a rounding mask based on the exponent bits (ei) of the input numbers; and a rounding unit configured to apply the rounding mask to the input floating-point number to round the input floating-point number for correct representation in a second floating-point format.
Optionally, the mask constructing unit is configured to construct the rounding mask by pre-aligning the rounding mask based on the exponent bits (ei) of the input numbers.
Optionally, the mask constructing unit is further configured to perform a step of normalizing the mask by shifting the rounding mask to the based on a bit shift required to normalize the input floating-point number in the second floating-point format.
Optionally, the rounding unit is configured to generate normalized mantissa bits in the second floating-point format based on the input floating-point number.
Optionally, the rounding unit is configured apply the normalized rounding mask to the normalized mantissa bits in the second floating-point format.
According to an eleventh aspect there is provided computer readable code configured to cause the method of any of the above-mentioned variations of the ninth aspect to be performed when the code is run.
According to a twelfth aspect there is provided a computer readable storage medium having encoded thereon the computer readable code of the eleventh aspect.
The hardware implementation for rounding a floating-point number, and/or the hardware implementation for converting the format of a floating-point number, may be embodied in hardware on an integrated circuit. There may be provided a method of manufacturing, at an integrated circuit manufacturing system, a hardware implementation for rounding a floating-point number and/or a hardware implementation for converting the format of a floating-point number. There may be provided an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, configures the system to manufacture a hardware implementation for rounding a floating-point number and/or a hardware implementation for converting the format of a floating-point number. There may be provided a non-transitory computer readable storage medium having stored thereon a computer readable description of a hardware implementation for rounding a floating-point number that, when processed in an integrated circuit manufacturing system, causes the integrated circuit manufacturing system to manufacture an integrated circuit embodying a hardware implementation for rounding a floating-point number. There may be provided a non-transitory computer readable storage medium having stored thereon a computer readable description of a hardware implementation for converting the format of a floating-point number that, when processed in an integrated circuit manufacturing system, causes the integrated circuit manufacturing system to manufacture an integrated circuit embodying a hardware implementation for converting the format of a floating-point number.
There may be provided an integrated circuit manufacturing system comprising: a non-transitory computer readable storage medium having stored thereon a computer readable description of the hardware implementation for rounding a floating-point number; a layout processing system configured to process the computer readable description so as to generate a circuit layout description of an integrated circuit embodying the hardware implementation for rounding a floating-point number; and an integrated circuit generation system configured to manufacture the hardware implementation for rounding a floating-point number according to the circuit layout description. There may be provided an integrated circuit manufacturing system comprising: a non-transitory computer readable storage medium having stored thereon a computer readable description of the hardware implementation for converting the format of a floating-point number; a layout processing system configured to process the computer readable description so as to generate a circuit layout description of an integrated circuit embodying the hardware implementation for converting the format of a floating-point number; and an integrated circuit generation system configured to manufacture the hardware implementation for converting the format of a floating-point number according to the circuit layout description.
There may be provided computer program code for performing any of the methods described herein. There may be provided non-transitory computer readable storage medium having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to perform any of the methods described herein.
The above features may be combined as appropriate, as would be apparent to a skilled person, and may be combined with any of the aspects of the examples described herein.
Examples will now be described in detail with reference to the accompanying drawings in which:
The accompanying drawings illustrate various examples. The skilled person will appreciate that the illustrated element boundaries (e.g., boxes, groups of boxes, or other shapes) in the drawings represent one example of the boundaries. It may be that in some examples, one element may be designed as multiple elements or that multiple elements may be designed as one element. Common reference numerals are used throughout the figures, where appropriate, to indicate similar features.
The following description is presented by way of example to enable a person skilled in the art to make and use the invention. The present invention is not limited to the embodiments described herein and various modifications to the disclosed embodiments will be apparent to those skilled in the art.
Embodiments will now be described by way of example only.
As discussed earlier, when performing arithmetic operations between two denormal numbers or a normal number and a denormal number, to normalize the output number extra steps are performed to find the position of the leading 1. For example, when two normal numbers ‘a’ and ‘b’ are multiplied to produce a normal number, then the result would have a leading 1 bit either at the Most Significant Bit (MSB) position or the second MSB position. Therefore, to normalize the output, the output is shifted by 1 position or 2 positions to the left as required.
However, when two denormal numbers or a normal number and a denormal number are multiplied, the output obtained may be a denormal number and hence will have a leading 1 bit at an arbitrary position from the MSB position to the Least Significant Bit (LSB position) along the mantissa bit length. Hence, in order to normalize the output, the output needs to be shifted by an arbitrary number of positions to the left depending on where the leading 1 bit is.
One way of dealing with this issue is to use a non-standard representation in which the exponent range is extended by a 1 bit. This representation is known as Extended Exponent Range (EER) representation. In the EER representation, denormal numbers can be represented as normal numbers. The denormal numbers represented as normal numbers in EER representation are hereinafter also referred to as subnormal numbers. For example, a half-precision (F16) number 0.00000.0001010000 with one sign bit, five exponent bits and 10 mantissa bits, represents the value 2(1-bias) 0.00010100002=2(−14) (2(−4)+2(−6))=2(−18)+2(−20) (in this format we have bias=2(5-1)−1=15). This number can be represented in an extended exponent range F16_norm format as 0.001101.0100000000 with one sign bit, six exponent bits (i.e. with an extra exponent bit) and 10 mantissa bits. This format therefore has the larger bias of bias norm=2(6-1)−1=31 and can hence represent unbiased exponent values below those of the original F16 format, as demonstrated (i.e. represented in F16_norm, 0.001101.0100000000=2(13-31) (1+2(−2))=2(−18)+2(−20), which is the same value as the original denormal F16 number). This format can also represent all unbiased exponent values falling within the normal range of the original format. For example, also using half-precision, the number 1.10001.0011100001=−2(17-15) 1.00111000012=100.111000012 in F16 can be represented as 1.100001.0011100001=−2(33-31) 1.00111000012=100.111000012 in F16_norm.
Hence, denormal numbers can be represented as normal numbers in EER representation and hence the numbers can be normalized without incurring additional cost or latency. One main caveat of representing the denormal numbers as subnormal numbers (i.e., normal numbers in EER representation) is that the process of rounding becomes expensive as rounding position varies according to the new exponent value in the EER representation.
The output of an arithmetic operation which is a normal number may be rounded using any of the rounding modes discussed earlier. The output may be required to be rounded to the mantissa bit length that can be stored in a destination memory or register. Rounding of normal numbers is generally performed by choosing one of the two representable values (or options) that are surrounding the exact result obtained after an arithmetic operation. As discussed earlier, whether the number needs to be rounded down to a lower representable value or rounded up to a higher representable value is decided based on the guard, round, and sticky bits of the number. Different conventions for referring to different bits during rounding are known. For the purposes of this document, the following convention is observed: the bit at the rounding position (i.e., the least significant position in the mantissa bit length to which an output number is rounded after the radix point) is called a guard bit, the bit immediately to the right of the rounding position is known as the round bit and the bits further to the right after the round bit are reduced to form a sticky bit. The sticky bit is high if and only if at least one bit after the round bit is high.
Generally, to choose the lower representable value, we find a truncated value/output by truncating (i.e. removing) the bits in the mantissa up to (and including) the rounding position from the radix point. To choose the higher representable value, we find a truncated incremented value/output by incrementing the bit at the rounding position by 1 bit and then truncating the bits in the mantissa up to (and including) the bit in the rounding position. To save time in the critical path, typically, the truncated output and the truncated incremented output are generated beforehand as the incrementing takes time, and based on the guard, round and sticky bit, one of the truncated output and the truncated incremented output is mux-ed out as the rounded output number. For example, in the ‘Round To Nearest, Ties To Even’ rounding scheme the truncated output is mux-ed out if the round bit is a ‘0’ bit and the truncated incremented output is mux-ed out if the original round bit is a ‘1’ bit and the sticky bit is a ‘1’ bit. When the round bit is a one bit and the sticky bit is a zero bit, then the guard bit is used to resolve the tie (i.e. the exact result is exactly half-way between the two nearest representable values); if the guard bit is a zero bit the truncated output is mux-ed out and if the guard bit is a ‘1’ bit the truncated increment output is mux-ed out as the rounded output number.
The output of an arithmetic operation which is a subnormal number (i.e., denormal number represented as normal number) may be rounded up or rounded down using the similar method as explained above for normal number. However, to produce the same output values after rounding as we would have obtained after rounding the denormal number in original format, instead of selecting between truncated, and truncated incremented output we need to either truncate and/or increment at the appropriate arbitrary position (which is a higher position with fewer bits of precision) and compute the guard, round and sticky bits at that appropriate position. This is because the rounding position and the guard, sticky and round bit are at a position that isn't fixed but depends on the exponent of the output pre-rounding. In other words, whilst use of an extended exponent range simplifies the use of denormal numbers in mathematical operations such as addition or multiplication, rounding remains difficult because the rounding needs to be done to give a consistent result with that which would have been obtained by performing the operation in the original exponent range.
Therefore, in order to perform rounding of the subnormal number the rounding position needs to be determined first. The rounding position could be p bits after radix point, rather than the mantissa bit length of the output number, which is an arbitrary point due to the extended exponent range. Thus, rounding a subnormal number becomes more expensive as extra steps are required to find the arbitrary rounding point and additional hardware requirements are needed to perform rounding as explained below. Further, it is no longer possible to determine the truncated output and the truncated incremented output beforehand as the rounding position could be in any arbitrary point.
The rounding unit 100 in
Thus, as the rounding position is at an arbitrary position, a large general purpose adder is required for incrementing the output at the arbitrary position p, which is an expensive operation in the critical path.
Instead of that, the inventors devised a method of computing/generating a rounding mask, aligning the rounding mask with the mantissa and applying the rounding mask directly on the mantissa such that complex hardware can be eliminated from the critical path. To perform the rounding, the computed mask is directly applied to the output subnormal number (i.e., the number converted to normal number in the EER representation) such that the result is always incremented at a fixed position thereby generating (after truncation) the rounded output number rounded to the correct position. Thus, by using a rounding mask, instead of incrementing and truncating at an arbitrary position (which requires the use of a general-purpose adder), a more efficient and cost effective operation is performed with a much simpler circuit to determine the truncated incremented output. The same rounding mask can be used to determine truncated output number or identify the guard, round and sticky bits.
The generated mask has a bit length equal to at least the bit length of the significand n. The exact position where the strings of ones or zeros starts depends on the type of rounding to be performed. In some cases, the string of ones following a string of zeros may start at a position corresponding to a position p+1 bits from the radix point of the mantissa bits (where the first bit of the mantissa after the radix point is counted as position 1—i.e. a 1-indexed system). In this case, because the mask includes a bit corresponding to the leading 1 of the significand, i.e. the 1 bit before the radix point, the string of ones will begin p+2 bits into the mask. For example, when rounding the output number using a Round Towards Away From Zero (RTA) mode, the rounding mask is generated such that the string of ones starts at a position corresponding to a position p+1 bits from the radix point of the mantissa bits after the radix point. In another case, the string of ones following the string of zeros may start at a position corresponding to a position p+2 bits from the radix point of the mantissa bits. In this case, because the mask includes a bit corresponding to the leading 1 of the significand, i.e. the 1 bit before the radix point, the string of ones will begin p+3 bits into the mask. For example, when a Round to Nearest Even (RNE) rounding method is performed on the output number, the rounding mask is generated such that the string of ones starts at a position corresponding to a position p+2 bits from the radix point of the mantissa bits. This position where the strings of ones starts may vary based on the rounding modes used.
Once the rounding mask is generated, the rounding logic 202 applies the rounding mask directly to the significand n (in other words the mantissa bits m) of the output number by performing a bitwise operation. In an example, a bitwise OR operation is performed. As discussed, the mantissa m is used to refer to fractional part (excluding implicit leading one) of the significand. Using a mask in which a string of 1s begins after the rounding point, the result of performing the bitwise OR between the rounding mask and the mantissa m is that all the bits of the mantissa of the output number after the rounding point from the radix point are set to 1 and all the bits of the mantissa before the truncation point from the radix point are preserved as they are. The output from the rounding logic 202 is provided to the incrementor 204. The incrementor 204 increments the output from the rounding logic 202 by 1 bit at the LSB position such that the carryover from the increment will ripple all the way up to the required point giving the correct rounded number output.
In
Consider the mantissa of the output number having a bit length (mw) of 12 bits is 000011000100. For RNE mode, the rounding mask is generated such that the string of ones starts at a position p+2 bits after the radix point as explained above. As mentioned above, in this example the significand is being shortened by four places, so the rounding position is identified as p=8. Therefore, the string of ones following the string of zeros starts at a position p+2=10 bits after the radix point. The rounding mask is generated as 000000000111 i.e. zeros in positions 1 to 9 after the radix point and ones in position 10 to 12 after the radix point. In other words, the most significant ‘1’ bit in the mask is one place below the round bit. As explained further below, this will allow a rounding increment to propagate to the round bit as part of the RNE rounding. Now, when the bitwise OR operation is performed by the rounding logic 202, an output 000011000111 is generated by preserving the bits before, and including, the position p+1 as they are in the mantissa and setting all the bits after the position p+1 to 1. The output number is now provided to the incrementor 204 which increments the output number at the LSB position. This generates an output number 000011001000. In other words, in this case, the carryover from the increment ripples all the way up to the point p+1, but does not affect the bit at position p, giving the correct rounded number output when the result is truncated to p bits. That truncation is achieved by setting the bits after the required position to zero, to obtain the output rounded at the correct position in RNE mode as 000011000000. Subsequently, and not shown, the additional zeros below the rounding point may be removed or discarded, e.g. as part of a conversion from the EER format to the UER format. However, if the original value had had a 1 bit at position p+1 (i.e. if the original value had been 000011001100), then result of applying the mask to the mantissa would have been 000011001111, and the increment of the LSB would have resulted in a value of 000011010000. In other words, in that case, the carryover from the increment would have rippled all the way up to the point p, again giving the correct rounded number output 000011010000 when the result is truncated to p bits (in that case, the truncation would not actually change the bit values output, as all the bits after the rounding position are already zero). As such, it can be seen how the use of the mask and a method that increments the LSB every time can give correct results equivalent to the conventional approach of selecting between a truncated number and an incremented (at the rounding position) truncated number based on the rounding mode.
However, in both of the
Moreover, for a generalised system in which it is desirable for different rounding modes to be dynamically set or selected, correct rounding will depend on the selected mode and the relevant criteria for choosing between the different ‘truncated’ and ‘truncated incremented’ outputs. As a result, it has been identified that it is not necessary to use different masks for different rounding modes. Instead, a single mask can be used to derive the ‘truncated’ and ‘truncated incremented’ outputs. This mask can be used to exploit the advantage that the increment can be performed at a fixed position (i.e. the LSB) thereby eliminating the need of a large general purpose adder to perform the increment as in a conventional method. Then, the correct rounded value can be selected on the basis of logic operations performed to find the guard, round and sticky bits using the rounding mask (as explained below).
To achieve this generalised approach, a mask can be created to not only be used to perform the truncation and incrementation at the guard bit but also to compute the guard bit itself (as well as the round and sticky bits). To fulfil these different purposes, the mask is initially generated with a different alignment to that shown in the previous examples, with a string of zeros followed by string of ones where the first one is at a position corresponding to p bits after the radix point in the mantissa. The mask can then be shifted as needed (see below) and applied on the significand n (i.e. since the significand includes the mantissa bits, taking the form 1.m or 0.m depending on whether the number is normal or denormal) to obtain the various required outputs. This can be achieved by different logical approaches. According to one approach, the following equations using logical reduction operations and bitwise operations can be used to obtain the various required outputs.
where & is bitwise AND, | is bitwise OR, ˜ is bitwise NOT, >> is right shift, {circumflex over ( )} is bitwise XOR, and unary| operator is OR-reduce.
As mentioned above, different logical approaches could be used to obtain the required outputs that can be arrived at using equations (1)-(5) above. For example, the bitwise XOR used in the computation of the guard and round bits is used to turn the monotonic mask, i.e. a string of zeroes followed by a string of ones, into a one-hot mask, i.e. a string of zeroes and ones that only feature one bit set to one. The position of the one in the one-hot mask can be used to read a bit at the corresponding position in the significand, through bitwise AND and OR reduction. The process of deriving a one-hot mask from a monotonic mask can alternatively be implemented using bitwise NOT and bitwise AND through the following equations:
Replacing an XOR gate by an AND gate and an inverter is functionally equivalent due to the fact that the mask is decreasing, and may be advantageous depending on the electrical and physical characteristics of the technology. However, the particular way in which the required outputs (e.g. the guard bit and round bit) are obtained does not affect how they are subsequently used.
The same rounding mask is used for determining the various required outputs such as the truncated output, the truncated incremented output, the guard bit, the round bit, and the sticky bit. The combination of the various outputs can allow the correct rounded value to be selected based on the rounding mode. Provided below is an example of calculating various outputs by applying the rounding mask to the significand (or mantissa bits).
Consider the number being rounded in
The next step is to perform a NOT operation of the shifted mask. Thus,
The right-shifted and negated mask is also shown in
This result is also shown in
Further the shifted mask is applied to the mantissa bits of the significand by performing a bitwise OR operation. Therefore:
This result is shown in
As explained above, incrementing the LSB after the mask has been suitably applied allows the increment to ripple up to the right position to produce the incremented rounding output. This ripple also has the effect of achieving truncation by setting the bits after the required position to zero (subsequently, and not shown, the additional zeros below the rounding point may be removed or discarded, e.g. as part of a conversion from the EER format to the UER format). Thus, the truncated incremented output, t′, is achieved:
To select between the truncated result and the truncated incremented result, the guard, round and sticky bits can be determined and used to make the selection according to the particular rounding mode being employed.
The shifted mask is XORed with the original mask,
Further the result is applied to the mantissa bits of the significand by performing a bitwise AND operation. Therefore,
Further, the guard bit is obtained by performing an OR reduce operation
The round bit is determined as r=| (n & ((mask>>1) (mask>>2))) based on equation (4).
The mask shifted right by 2 bits, mask>>2=000000000111
The mask shifted right by 1 bit, mask>>1=000000001111
The shifted masks are XORed,
Further the result is applied to the mantissa bits of the significand by a performing bitwise AND operation. Therefore,
Further, the round bit is obtained by performing an OR reduce operation
The sticky bit is determined as | (n & (mask>>2)) based on equation (4).
Further the shifted mask is applied to the mantissa bits of the significand by performing a bitwise AND operation. Therefore,
Further, the sticky bit is obtained by performing an OR reduce operation
The results of these calculations can be used based on known rules for the particular rounding mode to determine whether the correct rounded value should be returned as t or t′. To explain this, Table 1 provided below illustrates how the guard (g), round (r) and sticky (s) bits of the number relate to the rounding position. Table 2 illustrates the various combination of guard, round and sticky bits and whether they correspond to the correct output being the truncated output (0) or the truncated incremented output (1) for the different rounding modes of: Round Towards Zero (RTZ); Round Towards Away from Zero (RTA); Round To Nearest, Ties Towards Zero (RNZ); Round To Nearest, Ties Towards Away From Zero (RNA); Round To Nearest, Ties To Even (RNE); and Round To Nearest, Ties To Odd (RNO).
In the example of the rounding mode being RNE (equivalent to the
The bitwise operations and/or the reduction operations and increment by 1 operation are much simpler operations than using a general-purpose adder for determining the rounding number. In the new method case, although it is required to round the number at an arbitrary position, the increment is always at the same place (the LSB) using the mask to ripple the increment into the correct position.
As mentioned earlier, the rounding mask can be generated on the fly (based on the required rounding position) in parallel to the arithmetic operation. The rounding mask is cheap to integrate in the critical path. In particular, using the rounding mask uses less area or latency than the rounding methods traditionally used to handle denormal numbers in standard representation. Further, use of rounding masks also creates less latency and uses less area by applying the rounding increment at the same position every time, avoiding the need for a general-purpose adder and shifter for incrementing when the denormal number is represented as normal number in EER representation.
As discussed earlier, the mask can be pre-computed and applied directly to the output of arithmetic operation. Some examples of these arithmetic operation include format conversion or multiplication operations where the position of the leading one is already known or can be found amongst a small number of alternatives. When two normal numbers ‘a’ and ‘b’ or a normal number and a denormal number represented as normal number in EER representation are multiplied, the output obtained may be a normal number or a subnormal number in EER, meaning the inputs always have a high bit at the MSB position and hence the output has a leading 1 bit at the MSB position or the second MSB position. Hence to normalize the output, the output is shifted by 1 position or 2 positions to the right. In such cases, the rounding mask can be applied to the result directly, and in a speculative manner when a small number of alternative positions for the leading one exist (e.g. for multiplication, see below). When the rounding mask is applied to normalized output subnormal number/normalized normal number the effect is the same.
It was mentioned above that
For example,
Further in some arithmetic operations such as for cases that makes use of large normalization, the mask can be calculated on the fly parallelly when performing the arithmetic operation. Some examples of these arithmetic operations include large floating-point addition, where a large number of leading zeroes can appear due to cancellations of significant bits. Given below are two methods/solutions for calculating rounding mask on the fly while performing such arithmetic operations requiring a large renormalisation.
The hardware 400 comprises an arithmetic unit 402, a renormalizing unit 404, mask pre-aligning unit 406, mask renormalizing unit 408, and a rounding unit 410. The hardware also comprises fixed bit shifters 407a, 407b and 417, and a plurality of logic operators such as NOT operator 412, AND operator 414, OR operator 416, a second AND operator 418, OR reduce operator 420 and another OR operator 422.
The arithmetic unit 402 receives a plurality of input numbers which are numbers represented in EER representation. Each input number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi). The arithmetic unit performs an arithmetic operation such as floating-point addition, multiplication, or the like. The arithmetic unit 402 performs the arithmetic operation to produce an output number comprising a sign bit (sa), exponent bits (ea) and mantissa bits (ma). Further, the output number is renormalized by the renormalizing unit 404 to obtain a renormalized output number comprising a sign bit (sr), exponent bits (er) and mantissa bits (mr).
The output number from the arithmetic unit 402 is provided to the renormalizing unit 404. As explained above some arithmetic operations on floating-point numbers produce a large number of leading zeroes due to cancellations of significant bits. The renormalizing unit 404 identifies the leading zero count (Izc) of the mantissa bits (ma). The renormalizing unit 404 shifts the mantissa bits (ma) based on the leading zero count (Izc) of the mantissa bits (ma) to generate the renormalized mantissa bits (mr) thereby normalizing the output number from the arithmetic unit. In other words, the renormalizing unit 404 shifts the mantissa bits (ma) to the left by as many positions as required for its leading 1 to end up in the position of the leading significand bit. This is referred to as the normalized significand n (which is the renormalized mantissa bits with the leading 1 before the radix point i.e. 1.mr).
Further, the mask is computed and pre-aligned using the mask pre-aligning unit 406. The exponent bits (ea) computed by the arithmetic unit are fed as an input to the mask pre-aligning unit 406. The rounding mask is generated as a string of zeros and/or a string of ones. In one example the mask may be a string of zeros followed by the string of ones. In another example the mask may be a string of ones followed by a string of zeros and the logic used to process it (compared to the string of zeros followed by a string of ones) is altered accordingly.
Consider a mask having a string of zeros followed by a string of ones is generated by the mask pre-aligning unit 406. The pre-aligned mask is generated such that the leading one falls at a position of weight 21-bias-mw, where ‘mw’ is the number of mantissa bits in the UER or EER number format, and ‘bias’ is the exponent bias in the number format. This position corresponds to the guard bit of denormal numbers, or in other words the position of the least significant bit retained after rounding. The mask has a bit length at least equal to the bitlength of the significand.
In the example of
However, er is not required as an input by the mask pre-aligning unit 406. The mask can be pre-aligned to the output number produced by the arithmetic unit (i.e. before any subsequent normalisation). This can be achieved based on the minimum exponent value calculated from exponent bits (ea) of the number before normalisation and required number of bits of precision. It can be created by shifting to the right a string of ones by
positions (starting from the leading one being in the position of the leading significand bit), where ‘mw’ is the number of mantissa bits.
In the example of
In
However, as described above, the mask can be pre-aligned to the output number produced by the arithmetic unit without er. Returning to the equation
that allows us to calculate how far to the right the string of ones in the mask should be shifted, in the example of
Further, the output of the mask pre-aligning unit 406, which is the pre-aligned mask, is provided as an input to the mask renormalizing unit 408. The mask renormalizing unit 408 further receives the mantissa bits (ma) of the output number from the arithmetic unit as another input. The mask renormalizing unit 408 renormalizes the pre-aligned mask based on the leading zero count (Izc) of the mantissa bits (ma) of the output number i.e., the mask renormalizing unit 408 shifts the mask to the left to as many positions as required for normalising the mantissa bits (ma) of the output number with its leading 1 to end up in the leading significand bit position. That is, the mask is shifted to the left by the same number of bit positions as required to generate the normalized significand nr output by the renormalizing unit 404. In other words, for the avoidance of confusion, the mask renormalising unit does not shift the mask so its own leading one is in the leading significand bit position, but rather it shifts the mask to be in the correct rounding position for use with the renormalized output from the renormalizing unit 404. Therefore, the position xp of the leading one in the rounding mask depends on both the exponent value (calculated from the exponent before normalisation) and leading zero count of the significand before renormalisation. The pre-normalisation and post-normalisation mask can have the same bit length. When renormalising a mask by performing a left-shift, any bits below the mask leading one (i.e. after leading one) are padded with extra 1s (rather than extra 0s).
The renormalized mask normalized based on the leading zero count is shown in examples in
In
Similarly, in
As discussed earlier, in the initial consideration of the examples in
Thus, in
Further in order to determine the truncated incremented output (in accordance with equation (1)), as shown in
In the above calculations, the mask is initially pre-aligned to have the leading one aligned to the guard bit, before later shifting the mask so that the leading one is aligned with the round bit. Equivalently, the mask could be directly generated with the leading one aligned to the round bit. This would avoid the need for a subsequent shift when calculating the truncated output t and the incremented truncated output t′. However, it would introduce the need for left shifting when calculating the guard bit (i.e. ‘mask’ in equation 3, based upon a mask aligned at the guard bit, would become ‘mask<<1’ based upon a mask aligned at the round bit), which would introduce non-standard padding with 1's to the right (as opposed to the standard padding by 0's to the left when right-shifting). Nonetheless, it is noted that such equivalent implementations are possible (also for the second solution, as discussed below) and the invention is not limited to the particular embodiments shown.
Finally, the guard, round and sticky bits necessary to select between the truncated output t and the truncated incremented output t′ can be derived by OR-reducing the corresponding parts of the significand selected by the rounding mask or a derived one-hot mask, for example using the equations:
Further, a sticky bit is obtained from the renormalizing unit 404 post renormalization (i.e. unneeded LSBs post-renormalisation are accumulated into a sticky bit by the renormalizing unit). For example, for a mantissa having bit length of 5 bits, the renormalizing unit 404 would renormalize an input 000101011110011 into 101011 and output sticky bit s=1. Similarly, the renormalizing unit 404 would renormalize input 000101010000000 and output 10101 and sticky bit s=0. The final sticky bit would then consist of the fixed sticky part associated to the normalized mantissa mr, combined at the OR operator 422 with any further sticky bit associated with the denormal mantissa post renormalization in case the output entered that range. These further bits informing the final sticky bit can be found between the end of the normalized mantissa mr and the denormal rounding position (excluding round bit).
Once we have determined the g, r and s bits, in accordance with the above equations, the rounding unit 410 choses one of the truncated output t or the truncated incremented output t′ as the rounded output number. The rounded output number is a number of required precision in EER representation comprising a sign bit (sy), an exponent bits (ey) and mantissa bits (my).
Thus, by using this method there is minimal overhead cost in rounding a number in EER representation as the rounding increment ripples to appropriate position, rather than incrementing at specific position of the significand using a full adder. Also, time is saved as the mask and normalized significand can be computed in parallel. Further the same mask can be used to select the appropriate bits (g, r and s bits) at a fixed position.
However, using a large mask re-normalizer incurs area overhead. The inventors devised a further different solution of rounding a number in EER representation by avoiding the use of a large renormalizing unit for performing mask renormalization. The solution includes applying the pre-aligned mask to the result obtained after the arithmetic operation directly. Thus, in this solution, the mask is pre-aligned in parallel while performing the arithmetic operation as explained above with respect to
The arithmetic unit 702 receives a plurality of input numbers which are numbers represented in EER representation. Each input number comprise a sign bit (si), exponent bits (ei) and mantissa bits (mi). The arithmetic unit 702 performs an arithmetic operation such as multiplication, division and the like. The arithmetic unit 702 performs the arithmetic operation to produce an output number comprising a sign bit (sa), exponent bits (ea) and mantissa bits (ma). Further, the output number is normalized by the renormalizing unit 704 to obtain a normalized output number comprising a sign bit (sr), exponent bits (er) and mantissa bits (mr).
Further, in parallel to performing the arithmetic operation using arithmetic unit 702, the mask is computed and pre-aligned using the mask pre-aligning unit 706. The exponent bits before normalisation (ea) of the input number are fed as an input to the mask pre-aligning unit 706. The rounding mask is constructed based on the exponent value before normalisation. The mask is generated as a string of zeros and string of ones.
In the second solution, the mask is pre-aligned by the mask pre-alignment unit 706 in the same way as for the first solution. That is, the pre-aligned mask is generated such that the leading one falls at a position of weight 21-bias-mw, where ‘mw’ is the number of mantissa bits in the UER or EER number format, and ‘bias’ is the exponent bias in the number format.
In the example of
However, as in
positions (starting from the leading one being in the position of the leading significand bit), where mw is the number of mantissa bits. In the example of
To calculate the truncated incremented output t′ to the required precision, in the second solution, the pre-aligned mask is shifted to the right and then applied into the array of the floating-point number (i.e. before normalisation), as explained in more detail below. The method implements the logic for calculating the truncated incremented output t′ based on equation (2) given above.
The second solution determines the truncated incremented option by performing a bitwise OR of the mantissa obtained after performing the arithmetic operation and the right-shifted mask, prior to normalising the obtained result. Thus, to determine the truncated and incremented output, the pre-aligned mask is right-shifted by one position at shifter 707a (which, as a fixed shifter, may simply be implemented with minimal cost in hardware by hardwiring the changes in bit position) and then is applied to the significand bits (na) by performing an OR operation using the OR operator 716, this is fed to the renormalizing unit 704 and the subsequent output from the renormalizing unit 704 is incremented by 1 in the rounding unit 710.
By performing the OR− operation of the shifted mask with the mantissa bits (ma) output from the arithmetic unit 702, the non-representable trailing bits of the mantissa bits (ma) (i.e. the bits below the rounding position) are pulled to one as shown in
The renormalizing unit 704 identifies the leading zero count (Izc) of the mantissa bits (ma) after performing the reduction OR operation. The renormalizing unit 704 shifts the OR-ed mantissa bits (ma) based on the leading zero count (Izc) of the mantissa bits (ma) to generate the renormalized mantissa bits (mr) thereby normalizing the output number from the arithmetic unit. In other words, the renormalizing unit 704 shifts the OR-ed mantissa bits (ma) to the left to as many positions as required for its leading 1 to end up in the leading significand bit position of the significand nr. This is referred to as the normalized significand 1.mr also shown in
Further, the normalized significand nr is incremented at the last position and the string of ones in the non-representable trailing bits propagate the increment to the right place flipping all bits below to zero as required, thereby generating the truncated incremented output t′ as shown in
Though the calculation of the truncated incremented output t′ is straightforward, calculating the truncated option i.e., the non-incremented truncated version of the significand, by setting the appropriate number of trailing bits to zero does not work in the same way when the pre-aligned mask is applied to the mantissa bits output by the arithmetic unit (ma) directly. The method for determining the truncated output t is explained below. Again, this method follows the logical procedure for calculating the truncated output t based on the equation (1) given above.
In the method illustrated by
For rounding modes requiring the incremented truncated output t′, the second solution works. However, the calculation of the truncated output t needs the normalized mask to be applied on the renormalized mantissa bit.
The inventors devised that the extra time for calculating the increment by the rounding unit for the truncated incremented output t′ can be utilized to calculate the truncated output t thereby hiding the latency. Thus the pre-aligned mask from the mask pre-aligning unit 706 is shifted by one bit and provided to the circuit 708 to normalize the mask before applying the mask to the normalized mantissa.
Thus, in order to calculate the truncated output t (or non-incremented option) the pre-aligned mask is shifted one bit to the right by shifter 707b (which may the same shifter as shifter 707a, in practice, although they are shown as separate shifters in
Since the shifter 714 and incrementors in the rounding unit 704 have comparable delay and the normalized mask may be obtained slightly earlier than the renormalised significand, hence the bitwise operations needed to adjust the non-incremented rounded output can be done in time and with less area based on equation (1) above.
Thus, to determine the truncated output, the pre-aligned mask is shifted one bit to the right at shifter 707b and negated using a NOT operator 712 and shifted using the shifter 714 to obtain the normalized mask. Further the normalized mask is applied on the normalized significand n (i.e., 1.mr) by performing a bitwise AND operation using the AND operator 724. In other words, the string of zeros followed by string of ones of the pre-aligned mask are shifted by one bit to the right and are then flipped and applied to the renormalized significand n by performing a bitwise AND operation such that non-representable trailing bits of the renormalised significand are pulled to zero thereby generating the truncated output. In this case the OR operator 716 is disabled.
Finally, the guard, round and sticky bits necessary to select between the truncated output t and the truncated incremented output t′ can be derived by OR-reducing the corresponding parts of significand selected by the rounding mask or a derived one-hot mask.
Once we know the g, r and s bits based on the above equations (3), (4) and (5), the rounding unit 710 choses one of the truncated output t or the truncated incremented output t′ as the rounded output number. The rounded output number is a number of required precision in EER representation comprising a sign bit (sy), exponent bits (ey) and mantissa bits (my).
Thus, solutions described above are particularly suited for rounding output produced while performing arithmetic operations where large normalization is required such as operations that contain at least one floating-point addition. An example situation in which first and second solutions might be employed is following an FMA (fused multiply add).
As mentioned above, the specific examples of
The inventors devised a further, third, solution of rounding a number in EER representation by avoiding the use of a large renormalizing unit where iterative arithmetic operations produce a normalized or nearly normalized result. The solution includes pre-aligning a mask during each iterative step of the arithmetic operation. Thus, in this solution, the mask is pre-aligned in parallel while performing the iterative arithmetic operation. The pre-aligned mask is applied to the output/result obtained after performing each iterative step of the arithmetic operation directly. This is suited to hardware performing iterative digit-by-digit computation, such as units for calculating divisions or square roots.
The arithmetic unit 902 receives a plurality of input numbers which are numbers represented in EER representation. Each number among the plurality of input numbers could be a normal number or a subnormal number. Each input number comprise a sign bit (si), exponent bits (ei) and mantissa bits (mi). The arithmetic unit 902 performs an iterative arithmetic operation such as division or, more generally, calculating a power or a product of powers of the input(s), i.e., an operation without or with very little renormalisation. The arithmetic unit 902 performs the iterative arithmetic operation to produce a partial iterative output number comprising mantissa bits (mp).
For such iterative operations the exponent of the output number (ea) can be pre-determined/derived. For example, when performing division operation, the exponent of the output number (ea) (i.e., the unbiased floating-point exponent of a quotient) is either the difference or the difference-1 of the exponents of dividend and divisor. Thus, the renormalisation on the calculated fixed-point significand is reduced to a 1-bit shifter. Similarly, in a different example, such as performing a floating-point square-root operation, the square root exponent is determined independently of the significand. Thus, while performing iterative arithmetic operation, there is no or very small need of performing normalization of the significand as the operation produces normalized or nearly normalized output number.
Therefore, there is also no need to perform normalization of the rounding mask prior to applying the mask to the partial output number. Instead, the construction of the rounding mask is reduced to the pre-aligning step. Thus, in parallel to performing the iterative arithmetic operation using arithmetic unit 902, the rounding mask is computed and pre-aligned using the mask constructing unit 904. The initial mask is generated as a string of zeros followed by a string of ones as described in the previous solutions with respect to
The mask constructing unit 904 is implemented without the need of a large pre-aligning shifter unlike in the first solution and second solution described above. The mask constructing unit 904 uses two registers instead of the pre-aligning shifter for constructing/pre-aligning the mask. This is because a register can implement shifting by optionally updating the value stored in it with its own bit shifted by a fixed amount. Thus using registers for pre-aligning the mask saves area compared to using a large pre-alignment shifter, as the pre-alignment shifter could still be very expensive relative to the rest of the required hardware. The pre-alignment shifter could be very expensive, particularly if the hardware for the significand calculation is comparatively very small or if the available slack for a single shift operation is very short or if the output precision, which determines the maximal shift width, is large, such as double precision.
A first register 906 is configured to store the mask being constructed. The initial rounding mask could be predetermined and could be either a string of zeros or a string of ones. The mask stored in the first register 906 is determined based on the mantissa bit length of the input mantissa mi and step size. Step size (in other words radix) is the number of bits of the actual output number calculated in each step of iterative operation. The step size could be for example 1, 2, 3 or 4 bits of the significand of the calculated in each step. The step size can be chosen by the designer. The initial rounding mask is the number of bits of the input mantissa mi divided by the step size r. For example, consider an example as shown in
The initial rounding mask in the first register 906 is shifted with a constant 1 bit position to the left based on a tracking value (which is explained later). When a rounding mask of string of ones followed by zeros is generated the initial mask which is a string of ones is just shifted to the left. When a rounding mask of string of zeros followed by ones is generated, the initial mask which is a string of zeros is shifted to the left and appended by ones on the LSB bits. Thus, initially the predetermined rounding mask is stored back in the first register 906 and is updated in each step of the iterative arithmetic operation. The number of times the mask stored in the first register 906 is to be shifted is determined by another register known as mask tracking unit 908.
The mask tracking unit 908 is configured to store a tracking value. The initial tracking value can also be pre-determined based on the output exponent (ey) and the step size ‘r’ and stored in a different memory location. This is possible because for iterative arithmetic operations the output exponent (ey) and the step size can be derived without the need of calculating the significand value or the mantissa bits of the actual output number (my). An example equation for calculating the tracking value is
Where exp is the value of minimum exponent min exp, and the cycle count is the number of iterations since the start of the operation. For example, if we were computing a long division one bit at a time then cycle count would be the bit which is currently computed.
Thus, the tracking value is input to the mask tracking unit before performing the iterative arithmetic operation. The bit length of the tracking value comprises a number of bits for representing mantissa bits generated in each iteration and an overflow bit. Therefore, the tracking value comprises an overflow bit, one or more accumulation bits and one or more remainder bits (which is called the mask tracking offset or tracker initialisation offset). For example, in
The tracking value in the mask tracking unit 908 is incremented by the step size until the overflow bits turns high or one. Thus, in the example in
The shifted mask is further provided to a mask fanout unit 918 where the rounding mask is further updated by fanning out the bits of the shifted mask registered in the first register 906 by a factor of step size to be aligned with the mantissa bits of the significand of the actual output number ma. The mask fanout unit 918 performs fanning out the bits of the rounding mask from the first register 906 by padding each bit of the rounding mask with the same bit based on the step size in order to align the rounding mask with the mantissa bits of the significand. Thus in
The fanned out mask is further provided to a second shifter 920. The second shifter is comparatively a smaller shifter compared to the mask pre-aligning unit used in the other two solutions explained with respect to
The tracker initialisation offset indicate the fixed amount of shifting as indicated below.
The tracker initialisation offset 00 (despite adding step size until the overflow bit turns high) indicates that the real exponent after shifting by multiple of step size would still be 3 below the normal range, so the fanned out mask needs to be shifted further by 3 bits if remainder bits are zeros. The tracker initialisation offset 11 after adding step size until the overflow bit turns high (assuming all accumulate bits are low) is the smallest normal exponent i.e. no further shifting of the fanned out mask is required.
In
The guard, round and sticky bits are derived based on the actual output number and the rounding mask. Further the selection between truncated output number or a truncated incremented output number is performed based on the derived guard, round and sticky bits. The mask and bitwise operations are used to calculate the truncated output number by setting any non-representable trailing bits (i.e. bits in the EER that are not representable in the UER) of the output number to zero, and to calculate the truncated incremented output number by incrementing the output number at the first representable position in the unextended exponent range (i.e. the position of the LSB in the unextended exponent range) and setting any less significant bits (i.e. any bits of the input mantissa at a less significant position than the position of the LSB in the unextended exponent range) to zero.
Thus, the three different architectures are implemented such that there is no requirement of a general purpose adder for adding the increment at the correct position.
The hardware 400, 700 and 900 for rounding a number in EER representation of
The hardware 400, 700 and 900 described herein may be embodied in hardware on an integrated circuit. The hardware 400, 700 and 900 described herein may be configured to perform any of the methods described herein. Generally, any of the functions, methods, techniques or components described above can be implemented in software, firmware, hardware (e.g., fixed logic circuitry), or any combination thereof. The terms “module,” “functionality,” “component”, “element”, “unit”, “block” and “logic” may be used herein to generally represent software, firmware, hardware, or any combination thereof. In the case of a software implementation, the module, functionality, component, element, unit, block or logic represents program code that performs the specified tasks when executed on a processor. The algorithms and methods described herein could be performed by one or more processors executing code that causes the processor(s) to perform the algorithms/methods. Examples of a computer-readable storage medium include a random-access memory (RAM), read-only memory (ROM), an optical disc, flash memory, hard disk memory, and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.
The terms computer program code and computer readable instructions as used herein refer to any kind of executable code for processors, including code expressed in a machine language, an interpreted language or a scripting language. Executable code includes binary code, machine code, bytecode, code defining an integrated circuit (such as a hardware description language or netlist), and code expressed in a programming language code such as C, Java or OpenCL. Executable code may be, for example, any kind of software, firmware, script, module or library which, when suitably executed, processed, interpreted, compiled, executed at a virtual machine or other software environment, cause a processor of the computer system at which the executable code is supported to perform the tasks specified by the code.
A processor, computer, or computer system may be any kind of device, machine or dedicated circuit, or collection or portion thereof, with processing capability such that it can execute instructions. A processor may be or comprise any kind of general purpose or dedicated processor, such as a CPU, GPU, NNA, System-on-chip, state machine, media processor, an application-specific integrated circuit (ASIC), a programmable logic array, a field-programmable gate array (FPGA), or the like. A computer or computer system may comprise one or more processors.
It is also intended to encompass software which defines a configuration of hardware as described herein, such as HDL (hardware description language) software, as is used for designing integrated circuits, or for configuring programmable chips, to carry out desired functions. That is, there may be provided a computer readable storage medium having encoded thereon computer readable program code in the form of an integrated circuit definition dataset that when processed (i.e. run) in an integrated circuit manufacturing system configures the system to manufacture a hardware 400, 700 and 900 configured to perform any of the methods described herein, or to manufacture a hardware 400, 700 and 900 comprising any apparatus described herein. An integrated circuit definition dataset may be, for example, an integrated circuit description.
Therefore, there may be provided a method of manufacturing, at an integrated circuit manufacturing system, a hardware 400, 700 and 900 as described herein. Furthermore, there may be provided an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, causes the method of manufacturing a hardware 400, 700 and 900 to be performed.
An integrated circuit definition dataset may be in the form of computer code, for example as a netlist, code for configuring a programmable chip, as a hardware description language defining hardware suitable for manufacture in an integrated circuit at any level, including as register transfer level (RTL) code, as high-level circuit representations such as Verilog or VHDL, and as low-level circuit representations such as OASIS (RTM) and GDSII. Higher level representations which logically define hardware suitable for manufacture in an integrated circuit (such as RTL) may be processed at a computer system configured for generating a manufacturing definition of an integrated circuit in the context of a software environment comprising definitions of circuit elements and rules for combining those elements in order to generate the manufacturing definition of an integrated circuit so defined by the representation. As is typically the case with software executing at a computer system so as to define a machine, one or more intermediate user steps (e.g. providing commands, variables etc.) may be required in order for a computer system configured for generating a manufacturing definition of an integrated circuit to execute code defining an integrated circuit so as to generate the manufacturing definition of that integrated circuit.
An example of processing an integrated circuit definition dataset at an integrated circuit manufacturing system so as to configure the system to manufacture a hardware 400, 700 and 900 will now be described with respect to
The layout processing system 1204 is configured to receive and process the IC definition dataset to determine a circuit layout. Methods of determining a circuit layout from an IC definition dataset are known in the art, and for example may involve synthesising RTL code to determine a gate level representation of a circuit to be generated, e.g. in terms of logical components (e.g. NAND, NOR, AND, OR, MUX and FLIP-FLOP components). A circuit layout can be determined from the gate level representation of the circuit by determining positional information for the logical components. This may be done automatically or with user involvement in order to optimise the circuit layout. When the layout processing system 1204 has determined the circuit layout it may output a circuit layout definition to the IC generation system 1206. A circuit layout definition may be, for example, a circuit layout description.
The IC generation system 1206 generates an IC according to the circuit layout definition, as is known in the art. For example, the IC generation system 1206 may implement a semiconductor device fabrication process to generate the IC, which may involve a multiple-step sequence of photo lithographic and chemical processing steps during which electronic circuits are gradually created on a wafer made of semiconducting material. The circuit layout definition may be in the form of a mask which can be used in a lithographic process for generating an IC according to the circuit definition. Alternatively, the circuit layout definition provided to the IC generation system 1206 may be in the form of computer-readable code which the IC generation system 1206 can use to form a suitable mask for use in generating an IC.
The different processes performed by the IC manufacturing system 1202 may be implemented all in one location, e.g., by one party. Alternatively, the IC manufacturing system 1202 may be a distributed system such that some of the processes may be performed at different locations and may be performed by different parties. For example, some of the stages of: (i) synthesising RTL code representing the IC definition dataset to form a gate level representation of a circuit to be generated, (ii) generating a circuit layout based on the gate level representation, (iii) forming a mask in accordance with the circuit layout, and (iv) fabricating an integrated circuit using the mask, may be performed in different locations and/or by different parties.
In other examples, processing of the integrated circuit definition dataset at an integrated circuit manufacturing system may configure the system to manufacture a hardware 400, 700 and 900 without the IC definition dataset being processed so as to determine a circuit layout. For instance, an integrated circuit definition dataset may define the configuration of a reconfigurable processor, such as an FPGA, and the processing of that dataset may configure an IC manufacturing system to generate a reconfigurable processor having that defined configuration (e.g., by loading configuration data to the FPGA).
In some embodiments, an integrated circuit manufacturing definition dataset, when processed in an integrated circuit manufacturing system, may cause an integrated circuit manufacturing system to generate a device as described herein. For example, the configuration of an integrated circuit manufacturing system in the manner described above with respect to
In some examples, an integrated circuit definition dataset could include software which runs on hardware defined at the dataset or in combination with hardware defined at the dataset. In the example shown in
The implementation of concepts set forth in this application in devices, apparatus, modules, and/or systems (as well as in methods implemented herein) may give rise to performance improvements when compared with known implementations. The performance improvements may include one or more of increased computational performance, reduced latency, increased throughput, and/or reduced power consumption. During manufacture of such devices, apparatus, modules, and systems (e.g., in integrated circuits) performance improvements can be traded-off against the physical implementation, thereby improving the method of manufacture. For example, a performance improvement may be traded against layout area, thereby matching the performance of a known implementation but using less silicon. This may be done, for example, by reusing functional blocks in a serialised fashion or sharing functional blocks between elements of the devices, apparatus, modules and/or systems. Conversely, concepts set forth in this application that give rise to improvements in the physical implementation of the devices, apparatus, modules, and systems (such as reduced silicon area) may be traded for improved performance. This may be done, for example, by manufacturing multiple instances of a module within a predefined area budget.
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2316740.6 | Nov 2023 | GB | national |