The present disclosure relates generally to processor systems and, more particularly, to methods, apparatus, and articles of manufacture for determining quotient values within processor systems.
It is often desirable to determine quotient values based on integer values that are stored in floating-point registers. Integer-based processor systems often include floating-point registers for use with floating-point instructions. In some instances, integer values are often stored in the floating-point registers that require the use of floating-point instructions. An alternative to using floating-point instructions to operate on integer values stored in floating-point registers involves copying the integer values to an integer register to enable the use of integer-based instructions. However, copying the integer values from one register to another register often consumes a significant number of processing cycles, and in some cases more processing cycles than would be consumed by using floating-point instructions to perform the desired operation in the first instance.
Base conversion is an example application that often involves determining quotient values based on integer values stored in floating-point registers. In general, processor systems (e.g., computers) store values in a base-two or binary format. The binary values are typically converted to other base formats (e.g., octal format (base-eight), decimal format (base-ten), hexadecimal format (base-sixteen), etc.) prior to being displayed on a monitor. In particular, a processor system is most often used to run applications that display values in a base-ten or decimal format using a binary-to-decimal conversion. Frequent use of such binary-to-decimal conversions in a typical processor system imposes significant processing.
A known method for performing a binary-to-decimal conversion repeatedly divides an initial value n (i.e., the value to be converted) by 10 to determine each digit for base-ten format representation. Another known method for performing binary-to-decimal conversion divides the value to be converted by different orders of magnitude. In general, the initial value n may first be divided by 106 followed with a division by 105 followed with divisions by a series of values having successively lower orders of magnitude until a division by 101 is reached and all of the base ten digits of the initial value n are determined. Of course, the powers of ten used for the divisions may be any array of powers such as, for example, an array including division by 108, division by 104, division by 102, and division by 101.
In general, division operations are relatively complex and require a greater number of processing cycles to implement than more basic operations such as, for example, addition operations and multiplication operations. As noted above, using division to perform binary-to-decimal conversions imposes significant processing requirements on a processor system. A known method that is often used to reduce the number of or eliminate the use of division operations involves performing reciprocal multiplication. In particular, reciprocal multiplication may be used to determine a quotient value using a multiplication operation based on a dividend value and a reciprocal value of a divisor value.
In cases involving integer values stored in integer registers, division by multiplication is often used to determine quotient values. However, implementing division by multiplication based on integer values stored in floating-point registers is typically a cumbersome and inefficient process because known processes require the use of floating-point instructions.
Although the following discloses example systems including, among other components, software executed on hardware, it should be noted that such systems are merely illustrative and should not be considered as limiting. For example, it is contemplated that any or all of these hardware and software components could be embodied exclusively in hardware, exclusively in software, or in any combination of hardware and software. Accordingly, while the following describes example systems, persons of ordinary skill in the art will readily appreciate that the examples provided are not the only way to implement such systems.
The example methods and apparatus described herein may be used to determine quotient values based on integer values stored in a floating-point data type format (e.g., in a floating-point register). In particular, the example methods and apparatus use multiplication to determine quotient values by identifying a reciprocal value of a divisor value, identifying a bias value, and determining a biased quotient value based on a dividend value, the reciprocal value, and at least a portion of the bias value (i.e., a truncation bias value described below in connection with
The example methods and apparatus use multiplication and addition operations to determine quotient and remainder values by biasing one or more intermediate values to achieve results that are more accurate and precise than those provided by known techniques based on use of division operations. Further, multiplication and addition operations are less complex and require fewer processor cycles than division operations. In one example, the methods and apparatus are configured to use a fused multiply accumulate (FMA) instruction (i.e., a floating-point multiply add instruction, a floating-point multiply accumulate instruction, etc.) to determine quotient values. The FMA instruction integrates the operations associated with a multiplication and an addition within one instruction that, when executed, performs only one rounding operation as described below. Additionally, the FMA instruction may be implemented (e.g., may be executed by) general purpose processors such as, for example, the Intel® Itanium® processor. In another example, the methods and apparatus may be configured to use separate (i.e., non-integrated) multiplication and addition instructions or operations to determine quotient values.
Before describing in detail the example methods and apparatus, it is important to understand the physical representation of a floating-point data type. In particular the methods and apparatus described herein may be used in combination with floating-point data types that conform with the IEEE Floating-Point Standard 754-1985. Floating-point values may be used to provide a binary representation of decimal values and are typically represented using various standard formats having corresponding precisions. However, one of ordinary skill in the art will recognize that the floating-point data type may be adapted to suit the particular needs of certain applications. In general, the numeric range of a floating-point data type is based on the IEEE Floating-Point Standard 754-1985.
It is well known in the art that the precision of a floating-point data type is related to the mantissa bitfield 106 and a leading implicit bit (i.e., not physically represented). In normalized floating-point values, the leading bit is a binary one (i.e., 1.f), which is separated from the bits in the mantissa bitfield 106 by an implicit radix point or binary point. The leading bit and the binary point are implicit and, thus, are not stored in any of the bit locations of the example binary representation 100 of the floating-point data type of
The floating-point data type also uses a floating-point bias value (i.e., fp-bias value). The fp-bias value of a double-precision floating-point data type is equal to one thousand twenty-three. Additionally, the fp-bias value is equal to half of the range of the exponent bitfield 104 minus one (e.g.,
).
A numeric conversion of a floating-point encoded value (e.g., the example binary representation 100) typically involves determining a numeric equivalent value of the floating-point encoded value. A floating-point encoded value (e.g., an integer value) may be, for example, any value that is encoded according to the example binary representation 100 of the floating-point data type. Determining the equivalent numeric value of a floating-point encoded value involves a numeric conversion process based on the sign bit 102, the exponent bitfield 104, the mantissa bitfield 106, and a floating-point exponent value (i.e., fp-exponent value), which is determined based on the fp-bias value and the value stored in the exponent bitfield 104 as described below.
The sign bit 102 specifies the sign of the equivalent numeric value. The exponent bitfield 104 and the mantissa bitfield 106 specify the magnitude and fractional portion of the equivalent numeric value. In particular, the fp-exponent value is determined by subtracting the fp-bias value from the value stored in the exponent bitfield 104 (i.e., e−fpbias). In general, the fp-bias value is selected so that positive and negative exponent values can be generated based on the value stored in the exponent bitfield 104. More specifically, the numeric conversion of a floating-point encoded value may be performed according to Equation 1 below.
VFP=(−1)[signbit]·2e-bias·(1.f) Equation 1
As shown in Equation 1, the numeric equivalent value of a floating-point encoded value is represented by the variable VFP. As depicted in Equation 1 above, VFP is determined by multiplying the value stored in the mantissa bitfield 106 by the value of two raised to a power equal to the fp-exponent value, and then multiplying the result by a negative one raised to a power equal to the value stored in the sign bit 102.
The FMA instruction, when executed, performs a multiplication operation and an addition operation as one instruction with only one rounding operation. The FMA instruction, which is expressed in the Intel® IA-64 Architecture instruction set as fma.pc.sf f1=f3,f4,f2 is represented mathematically as set forth in Equation 2 below.
f1=(f3×f4)+f2 Equation 2
When executed, the FMA instruction first determines the product of the values f3 and f4 to infinite precision. As is well known in the art, a binary floating-point value may be represented as having infinite precision by generating a value that represents exactly the result of an operation. In other words, infinite precision implies that the resultant value of an operation is not rounded. The value f2 is then added to the product, again producing a result of infinite precision. The resulting value of the addition operation is then rounded to a desired precision and stored as the value f1. In the Intel® IA-64 Architecture, the desired precision is indicated by the parameter pc of the FMA instruction and may be set to single precision or double precision. By not rounding the product of the multiplication operation, the FMA instruction is capable of maintaining a value having infinite precision throughout its execution. In this memory, the method of
In contrast to using the FMA instruction, implementing a multiply accumulate operation using a discrete multiplication instruction and a discrete addition instruction results in two rounding operations. In some cases, using this method to generate quotient and remainder values may result in values that are relatively less accurate and precise than those determined using the above-described FMA instruction. However, using discrete multiplication and addition instructions in the manner described below may produce useful results for applications having less stringent accuracy and precision requirements.
The example method may be arranged in two phases including a design phase 202 and a runtime phase 204. The design phase 202 may occur prior to the runtime phase 204 and is associated with the time at which software or firmware code (e.g., machine accessible and executable code or instructions) is developed (e.g., written, generated, etc.). The software or firmware code may be developed by a programmer or a software developer, which may be a person or an application executed on, for example, the processor system 610 (
The runtime phase 204 typically occurs any time after the design phase 202 and is associated with a time at which operations for generating, determining, or otherwise identifying a quotient value may be performed. In particular, the instructions and predetermined values stored in memory during the design phase 202 may be used to determine quotient and remainder values as described below.
Although, the design phase 202 and the runtime phase 204 are shown as being separate in
Now turning in detail to
A bias value S0 is then determined (block 208) based on the floating-point data type format of the registers that are used to store the values (e.g., the reciprocal value 1/m, a dividend value n, etc.) during the runtime phase 204. More specifically, the bias value S0 is selected based on the precision of the floating-point data type to select a truncation bias value S as described below. For example, if the operations of the runtime phase 204 are executed on a processor supporting the double precision floating-point data type described above in connection with
A truncation bias value S is then determined (block 210) based on the bias value S0. The truncation bias value S is used to fix the binary points of the values determined during the runtime phase 204 to force an integer truncation. By way of example, an integer truncation causes a value of 3.9 to be truncated to a value of three. More specifically, the truncation bias value S is set equal to at least a portion of the bias value S0 by subtracting one-half from the bias value S0 (i.e., S=S0−0.5). To subtract the value one-half from the bias value S0, the values are first represented in floating-point format and the value one-half is denormalized. Denormalization is well known in the art and may be used to set the fp-exponent value (described above in connection with
After the exponent values are made equal, the value one-half may be subtracted from the bias value S0. Subtracting one-half from the bias value S0 causes the truncation bias value S to have a floating-point exponent value equal to the exponent value of the bias value S0 minus one (e.g., 51=52−1). Additionally, the bits of the mantissa bitfield 106 (
After determining the reciprocal value 1/m, the bias value S0, and the truncation bias value S, the values are stored for subsequent use (block 212). For example, the values (i.e., 1/m, S0, and S) may be stored in a memory (e.g. at least one of the system memory 624 and the mass storage memory 625 of
It is then determined if the quotient value is to be determined using a fused multiply accumulate instruction (block 214). If the quotient value is determined using an FMA instruction, an FMA instruction is executed (block 216) to determine a biased quotient value Q according to Equation 3 below.
As shown in Equation 3 above, the multiplication operation of the FMA instruction determines a product by multiplying a dividend value n by the value of the reciprocal value 1/m to infinite precision. The dividend value n may be any integer value and may be provided during the runtime phase 204. The addition operation of the FMA instruction then adds the truncation bias value S to the product of the multiplication operation (e.g.,
) to produce a value having infinite precision that is then rounded according to the precision parameter pc of the FMA instruction described above to generate the biased quotient value Q.
The biased quotient value Q includes the quotient value q in the right-most bits of the mantissa bitfield 106 (
is equal to a non-integer intermediate quotient value q′. More specifically, adding the truncation bias value S to the product of the multiplication operation
shifts the intermediate quotient value q′ to the right-most bits of the mantissa bitfield 106. In this manner, an integer truncation is performed on the intermediate quotient value q′ to generate the quotient value q in integer form in the right-most bits of the mantissa bitfield 106. Recovering the quotient value q from the biased quotient value Q is described in detail below.
An alternate method for determining the biased quotient value Q is based on separate multiplication and addition operations. If it is determined at block 214 that the FMA instruction is not used to determine the quotient value q, the non-integer intermediate quotient value q′ is first determined using a multiplication operation (block 218). In particular, the intermediate quotient value q′ is determined by multiplying the dividend value n by the reciprocal value 1/m using a floating-point multiplication instruction. The biased quotient value Q is then determined by adding the intermediate quotient value q′ to the truncation bias value S using a floating-point addition instruction (block 220). As described above, the addition operation performs an integer truncation on the intermediate quotient value q′ and causes the integer quotient value q to be placed in the right-most bits of the biased quotient value Q.
After the biased quotient value Q is determined at block 216 or block 220, the quotient value q is determined (block 222). In one example, the quotient value q may be recovered using a subtraction operation based on the bias value S0. More specifically, the bias value S0 may be subtracted from the biased quotient value Q using a floating-point subtraction instruction to determine the quotient value q. In another example, the quotient value q may be recovered using a bitfield extraction operation. In particular, the quotient value q may be extracted from the mantissa bitfield 106 (
It is then determined if a remainder value r is to be determined (block 224). If the remainder value r is to be determined, a fused negative multiply accumulate (FNMA) instruction (i.e., a floating-point negative multiply add instruction) may be executed (block 226) based on the quotient value q, the divisor value m, and the dividend value n. The FNMA instruction, which is written in the Intel® IA-64 Architecture instruction set as fnma.pc.sf f1,=f3,f4,f2 may be represented mathematically according to Equation 4 below.
f1=−(f3×f4)+f2 Equation 4
As set forth in Equation 4 above, the remainder value r corresponds to the value f1, the dividend value n corresponds to the value f2, the quotient value q corresponds to the value f3, and the divisor value m corresponds to the value f4. In particular, the FNMA instruction first determines the product of the values f3 and f4 to infinite precision. The product is then negated and the value f2 is added to the negated product, again producing a result having infinite precision. The resulting value of the addition operation is then rounded to a desired precision and stored as the value f1 (i.e., the remainder value r). In the Intel® IA-64 Architecture, the desired precision is indicated by the parameter pc of the FNMA instruction and may be set to single precision or double precision. Other instructions or combinations thereof may also be used to recover the remainder value r from the quotient value q such as, for example, a fused multiply subtract (FMS) instruction (i.e., a floating-point multiply subtract instruction), discrete multiplication, negation, addition, and/or subtraction instructions, etc.
The example method of
Initially, a bias value S0 is determined (block 306). The function of the bias value S0 is identical to the function of the bias value S0 described above in connection with
A truncation bias value S is then determined based on the bias value S0(block 308). The function of the truncation bias value S is identical to the function of the truncation bias value S described above in connection with
A first reciprocal value ui may then be determined (block 310). The reciprocal value ui is determined based on a base value B (i.e., the base to which values are to be converted). For example, if the desired base representation is decimal or base-ten, the base value B is set equal to ten. In particular, the reciprocal value ui is determined by performing an inverse operation on the base value B raised to the power of a digit index value i (i.e., 1/Bi).
The digit index value i corresponds to a particular digit of a value that is being converted. For example, if the numerical value 524 is being converted from a binary format to a decimal format, the digit index values i of the hundreds position (i.e., 5), the tens position (i.e., 2), and the ones position (i.e., 4) are equal to respective values two, one, and zero. Therefore, if the hundreds position value is to be converted, the first reciprocal value ui is set equal to the inverse value of B2 (i.e.,
).
A second reciprocal value ui+1 may also be determined based on the base value B (block 312). In particular, the second reciprocal value ui+1 may be determined by performing an inverse operation on the base value B raised to the power of the digit index value added to one (i.e.,
).
The values (i.e., the bias value S0, the truncation bias value S, the first reciprocal value ui, and the second reciprocal value ui+1) determined in connection with blocks 306, 308, 310, and 312 may then be stored for subsequent use (block 314). For example, if the values are determined during the design phase 302, they may be stored in a memory location for subsequent retrieval during the runtime phase 304. Alternatively, if the values are determined during the runtime phase 304, they may be stored in a memory (e.g., one or both of the system memory 624 and mass storage memory 625 of
A first FMA operation is performed to determine a first biased quotient value Qi (block 316). More specifically, the first FMA operation is performed according to Equation 5 below.
Qi=(n×ui)+S Equation 5
As shown in Equation 5, the first FMA operation is performed based on a dividend value n, the first reciprocal value ui, and the truncation bias value S. The dividend value n is the numerical value that is to be converted to a different base format (e.g., decimal or base-ten format). For example, the dividend value n may be an integer binary value that is retrieved from memory, stored in a floating-point register, and displayed on, for example, a monitor in base-ten format. The first FMA operation determines a product by multiplying the dividend value n by the first reciprocal value ui to infinite precision. The truncation bias value S is then added to the product to produce a result having infinite precision. The result is then rounded according to the precision parameter pc of the FMA instruction described above to generate the first biased quotient value Qi.
A second FMA operation is then performed to determine a second biased quotient value Qi+1 (block 318). The second FMA operation is performed according to Equation 6 below.
Qi+1=(n×ui+1)+S Equation 6
As shown in Equation 6, the second FMA operation is performed based on the dividend value n, the second reciprocal value ui+1, and the truncation bias value S. The second FMA operation determines a product by multiplying the dividend value n by the second reciprocal value ui+1 to infinite precision. The truncation bias value S is then added to the product to produce a result having infinite precision. The result is then rounded to generate the second biased quotient value Qi+1.
After the first biased quotient value Qi and the second biased quotient value Qi+1 are determined, a third FMA operation is performed to determine a biased digit value Di (block 320). The biased digit value Di includes the value of a digit value d indexed by the digit index value i (i.e., di). For example, for the numerical value 524, if the index value i is equal to 2, the digit di corresponds to the value in the hundreds position (i.e., 5). The third FMA operation is performed according to Equation 7 below.
Di=(−B×Qi+1)+Qi Equation 7
As shown in Equation 7, the third FMA operation is performed based on the base value B, the first biased quotient value Qi, and the second biased quotient value Qi+1. Prior to performing the third FMA instruction, the base value B may be negated using, for example, a negate operation. The third FMA operation determines a product by multiplying the negated base value −B by the second biased quotient value Qi+1. The first biased value Qi is then added to the product to generate a result that is rounded according to the precision parameter pc of the FMA instruction described above. The rounded value is stored as the biased digit value Di. In an alternate implementation, the biased digit value Di may be determined using the fused negative multiply accumulate (FNMA) instruction described above in connection with
The digit value di may then be determined based the biased digit value Di(block 322). In particular, the digit value di is determined according to Equation 8 below.
di=Di+S0·(B−1) Equation 8
As shown in Equation 8 above, the digit value di may be determined by first subtracting one from the base value B. The result of the subtraction is then multiplied by the bias value S0 to produce a product S0·(B−1). The product is then added to the biased digit value Di to determine the digit value di. The calculation of Equation 8 may be performed using any suitable instruction or instructions such as, for example, the FMA instruction, the FNMA instruction, the FMS instruction, and/or any combination of separate multiplication, addition, negation, and/or subtraction instructions.
In another example, the biased digit value Di is biased by the bias value S0, which enables the digit value di to be extracted from a bitfield comprising the biased digit value Di. More specifically, the bias value S0 is determined based on the precision of a floating-point data type format and typically results in a bias value S0 having a relatively large magnitude (e.g., 252). In general, the relatively large magnitude of the bias value S0 affects the left-most bits of the biased digit value Di. Using the truncation bias value S as shown in Equations 5 and 6 to determine the first and second biased quotient values Qi and Qi+1 causes the digit value di to be shifted to the right-most bits of the biased digit value Di. Thus, the bias value S0 does not affect (i.e., change, modify, etc.) the digit value di, which can be extracted from the right-most bits of the bitfield comprising the biased digit value Di. More specifically, the digit value di may be determined by extracting the bits of the mantissa bitfield 106 (
Although, the example method of
The example methods described above in connection with
The data interface 402 may be communicatively coupled to any memory (e.g., one or both of the system memory 624 and the mass storage memory 625 of
The bias value generator 404 is configured to generate the bias value S0 described above in connection with
The truncation bias value generator 406 is configured to generate the truncation bias value S described above in connection with
The reciprocal generator 408 may be configured to generate the reciprocal value 1/m described above in connection with
The fused multiply accumulator 410 is configured to perform fused multiply accumulate operations to generate the biased quotient value Q as described above in connection with block 216 of
The quotient identifier 412 is configured to identify or determine a quotient value q based on the biased quotient value Q as described above in connection with block 222 of
The remainder identifier 414 may be configured to identify or determine the remainder value r based on the quotient value q, the dividend value n, and the divisor value m as described above in connection with block 226 of
The multiplier 502 is configured to determine the intermediate quotient value q′ described above in connection with
The adder 504 is configured to determine the biased quotient value Q described above in connection with
The example systems 400 and 500 described above may be divided in two portions including a design phase (e.g., the design phase 202 of
The processor 612 may be any suitable processor, processing unit or microprocessor such as, for example, a processor from the Intel X-Scale™ family, the Intel Pentium™ family, etc. In the example described in detail below, the processor 612 is a thirty-two bit Intel processor, which is commonly referred to as an IA-32 processor. Although not shown in
The register space 616 may include floating-point registers and/or integer registers. The floating-point registers may be configured to store floating-point values represented in, for example, single precision format, double precision format, and/or any other floating-point format suitable for any particular application.
The processor 612 of
The system memory 624 may include any desired type of volatile and/or non-volatile memory such as, for example, static random access memory (SRAM), dynamic random access memory (DRAM), flash memory, read-only memory (ROM), etc. The mass storage memory 625 may include any desired type of mass storage device including hard disk drives, optical drives, tape storage devices, etc.
The I/O controller 622 performs functions that enable the processor 612 to communicate with peripheral input/output (I/O) devices 626 and 628 via an I/O bus 630. The I/O devices 626 and 628 may be any desired type of I/O device such as, for example, a keyboard, a video display or monitor, a mouse, etc. While the memory controller 620 and the I/O controller 622 are depicted in
The methods described herein may be implemented using instructions stored on a computer readable medium that are executed by the processor 612. The computer readable medium (machine accessible medium) may include any desired combination of solid state, magnetic and/or optical media implemented using any desired combination of mass storage devices (e.g., disk drive), removable storage devices (e.g., floppy disks, memory cards or sticks, etc.) and/or integrated memory devices (e.g., random access memory, flash memory, etc.).
Although certain methods, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. To the contrary, this patent covers all methods, apparatus, and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.