The present invention relates to a system, method, and computer program product for a divider that divides two numbers represented in binary form. The divider may be implemented in an integrated circuit.
Division is an arithmetic operation that may be regarded as the opposite of multiplication, and is required to solve many different problems. Conceptually, a quotient may be regarded as the number of times a dividend may fully contain a divisor. If a dividend is not evenly divided by a divisor, a remainder will be left over, and may be expressed in various ways. Thus generally, DIVIDEND=(QUOTIENT*DIVISOR)+REMAINDER. The sign of a quotient depends on the signs of the dividend and the divisor, i.e. the quotient is positive if the dividend and divisor have the same sign, but the quotient is negative if they have opposite signs. Division by zero is undefined.
The familiar manual operation of dividing of two numbers involves repeated subtraction of the divisor from the dividend. The quotient acts essentially as a counter of the number of times that the divisor may be fully subtracted from the dividend. Division of input numbers may stop when the remainder is less than the divisor, but more generally the process may continue to evaluate the remainder as a fractional number.
Division of numbers represented in binary form is also required for a variety of purposes. For example, digital computers have microprocessors that perform binary division, among other operations, in their arithmetic logic units. Digital signal processing circuitry often requires binary division as well. Conventional binary dividers follow the familiar ‘repeated subtraction’ algorithm, and as a result have a number of disadvantages, chiefly that their operation may be very time-consuming.
Accordingly, the inventor has identified a need in the art for an improved binary divider.
Embodiments of the invention provide a system, method, and computer program product for dividing two binary numbers. In some instances, a full quotient may not need to be computed at all, as a partial quotient may be sufficient for some purposes. Fast production of a partial quotient may be of particular utility in certain applications, such as digital signal processing.
A fast divider embodiment is therefore provided that may rapidly calculate a partial quotient that is only an approximation of the full quotient value. For example, only 24 bits of a full 64-bit quotient may be calculated, but those 24 bits are the most significant ones. Such an embodiment may provide a fast result of limited accuracy, yet with a large dynamic range. An alternate embodiment is also described that provides a full quotient in a variety of binary formats.
As will be described below, the fast divider may implement a fixed point division function by using a floating point normalization architecture to yield the closest initial quotient approximation. Briefly, the fast divider may normalize its inputs, perform a 24-bit divide operation, and multiply/shift the output to its correct value. Normalizing the inputs allows a divider core to be highly optimized, since the inputs it handles may be constrained to the range of [0.5, 1.0), i.e. from one-half, inclusive, up to nearly one.
Further, the desired-precision quotient may be computed by a divider core that outputs results in chunks or quotient subsets. Rapid generation of even an initial chunk of a partial quotient may be of great utility in some circumstances. The divider embodiments described may therefore rapidly produce the most significant bits of a quotient subset, and then proceed to generate additional quotient bits of lower significance over further clock cycles until a predetermined quotient precision is reached.
Dividend 102 and divisor 104 may be signed numbers in one embodiment, so the fast divider 100 may next take the absolute value of each input at blocks 106 and 108, and split out a dividend sign bit 110 and a divisor sign bit 112 for subsequent use. The sign bits 110 and 112 may determine the appropriate sign of the quotient, to be described.
The fast divider 100 may include blocks 114 and 116 to normalize the now-unsigned dividend 102 and divisor 104 respectively, so that each number may have a value within a range of [0.5, 1.0), i.e. one-half, inclusive, up to but not including one. In one embodiment, the inputs may be normalized by scaling each number by a necessary factor of two until a most significant bit position of each number is a “1”. Binary numbers may be easily changed in value by a factor of two by shifting digits one bit. Circuit blocks 114 and 116 may thus shift dividend 102 and divisor 104 as needed so the scaled value of each lies within the desired range. Data describing exponents obtained therefrom may be passed to an adder 118. The adder 118 may subtract the exponents (e.g., powers of two) that were utilized in each normalization (e.g., scaling factor 120 for dividend 102 from the scaling factor 122 for divisor 104) to yield a quotient rescaling factor 124, to be described.
In one embodiment, the normalized dividend may be a 45-bit number, while the normalized divisor may be a 23-bit number. The normalized inputs may be submitted to a divider core 126 that performs division. Different embodiments of divider core 126 are described below in reference to
The output 132 of divider core 126 is a quotient of the normalized dividend and normalized divisor. In one embodiment, the output of the system may be a 24-bit number representing a quotient of the normalized inputs. The quotient may be retrieved from a storage unit within the divider core 126 that stores the quotient. In other embodiments, the divider core output 132 may be established over multiple clock cycles of operation, wherein a predetermined number or “chunk” of the overall number of desired quotient bits may be calculated per clock cycle.
As will be described, a first chunk of the quotient bits to be calculated may be generated by the divider core 126 quite rapidly, e.g. within one bus clock cycle. This subset of the quotient bits may comprise the most significant quotient bits, so that a most accurate approximation of the quotient may be produced as an initial output. Additional bits of less significance may be generated later, a process that may continue until a desired level of quotient precision is achieved. The divider core 126 thus may be designed to meet particular needs by progressively outputting quotient bits in decreasing order of significance. In a system that calculates six quotient bits per clock cycle for example, the six most significant bits would be available after one clock cycle, the twelve most significant bits after two clock cycles and so on. All twenty-four exemplary quotient bits would be available after four clock cycles.
The output 132 of divider core 126 may be scaled by quotient rescaling factor 124 to compensate for any prior input scaling during normalization. As with the normalizing circuits previously defined, a shift register 134 may scale the divider core output by powers of two, by shifting the output an appropriate number of bits in the appropriate direction. In one embodiment, the scaled output may be a 64-bit number.
The scaled output may be converted to a signed number as necessary by signer 136 based on the dividend sign bit 110 and the divisor sign bit 112. In one embodiment, logic 138 performs an exclusive-OR operation on the dividend sign bit 110 and the divisor sign bit 112 to determine the quotient sign, e.g., if only one of the dividend and divisor are negative then the quotient should be negative, otherwise it is positive. The signed scaled output may be stored in a register 140.
The signed scaled output may also be loaded into a saturating accumulator 142 for output as the quotient. Saturating arithmetic limits or “clamps” the values of processed numbers to maximum or minimum range limits during arithmetic operations. This behavior may avoid the often unrealistically drastic changes in numerical value that may result from modular arithmetic “wrap around”. For example, in an 8 bit register, 256 numerical values are available to describe a physical measurement that normally ranges from zero to 255 at most. If the measurement exceeds the expected range, the register may for example “roll over” to zero during overflow, producing a highly misleading numerical description because the true most significant bit is discarded. Similar problems may occur with underflow and the storage of negative numerical values. Saturating accumulator 142 therefore may effectively reformat the signed scaled output to best represent the quotient value with the available register size. The use of saturating accumulators is optional.
Referring now to
In step 204, the absolute value of the divisor may be taken, with sign information retained for later use in computing the quotient sign. In step 206, the absolute value of the dividend may be taken, with divisor sign information similarly retained. Sign processing may be performed simultaneously in some embodiments.
The divisor may be normalized in step 208, which may comprise bit-shifting the divisor by a sufficient power of two so that its scaled value is in the range of [0.5, 1.0), i.e. from one half, inclusive, up to but not actually including one. Similarly, the dividend may be normalized in step 210; the normalizing of divisor and dividend need not occur in the exemplary order described, and indeed may be performed simultaneously.
In step 212, the divider core may begin performing division of the processed dividend and divisor. The divider core may generate a divide-by-zero flag in step 214, which may trigger special handling of the output to denote an error condition. For example, the divider 200 may output a particular final value to denote the divide-by-zero error condition. The divider core may otherwise generate a quotient or quotient chunk in step 216.
In step 218, the method may combine the dividend and divisor normalizing factors to generate a rescaling or denormalizing factor to be applied to the quotient. In step 220, the quotient may be scaled for example by an appropriate power of two by bit-shifting. In step 222, the sign of the quotient may be calculated from the sign of the divisor and the sign of the dividend, e.g. different signs will yield a negative quotient. In step 224, the quotient (or quotient “chunk”, e.g., progressively generated quotient subset) with proper sign may be output.
In step 226, the method may determine if all required quotient bits have been calculated. In one embodiment, a first subset of the quotient bits to be calculated may be generated within one bus clock cycle; these may be the most significant quotient bits. Additional required bits, generally of less significance, may be generated as the method may selectively return operation to the divider core to perform additional computations in step 226. This computation process may continue until a desired level of output quotient precision is achieved.
Referring now to
Divider core 300 may comprise a number of shift/add blocks 302. Each shift/add block 302 may perform this operation:
Each shift/add block 302 thus calculates an output carry bit carry_out and an outut sum bit sum_out. The carry_out and sum_out bits are calculated from an input carry bit carry_in, an input sum bit sum_in, a dividend bit, and the divisor. If the carry_in bit is equal to one, then the divisor is added to the dividend bit and twice the sum_in bit. Otherwise, if the carry_in bit is equal to zero, then the divisor is subtracted from the dividend bit and twice the sum_in bit.
The carry_out bit from a shift/add block 302 may represent a quotient bit. Thus 64 shift/add blocks would be required for processing a 64-bit quotient in a single circuit. In one exemplary implementation only six shift/add blocks can complete their operation within one clock cycle. Therefore, in that embodiment divider core 300 comprises a sequential machine built with a set of six shift/add blocks 302 as shown, to produce a chunk of six quotient bits during a clock cycle, and registers 308 and 310 for holding the final carry and sum values from the end of each pass.
At the beginning of every subsequent clock period, the previous final carry and sum values may be fed back into the set of shift/add blocks 302 so a further chunk of six more quotient bits may be computed. The correct dividend bits and quotient bits may be sequenced through shift registers 304 and 306. The registers may all be controlled from a simple state machine (not shown).
Referring now to
Input bus signals 402 may be processed by a decoder 404 to yield a dividend and divisor, each stored in registers 406 and 408 respectively. The AHB is an exemplary type of bus, though others as may be known in the art are also applicable. The mode of full divider 400 may be selected by control logic 410 according to the order in which its registers are written to, e.g. the most recently written to dividend register may determine what data is used for the dividend 412. Similarly, writing to divisor register 408 may trigger a start signal 414 to direct divider core 416 to begin its operations.
Divider core 416 may process the dividend and divisor and yield a quotient as well as a remainder, a division completed signal, and a possible divide-by-zero error signal. The divider core exemplary inputs and outputs, corresponding register names, and data formats may be summarized by the following table:
The multiple-format full divider 400 may operate in fixed point division modes or in an integer division mode. In the fixed point mode, the full divider may support the division of either two single precision (SP) numbers, or the division of a double precision (DP) number by a single precision number, with the results being placed in an extended double precision register (ACC).
When performing single precision division, full divider 400 may support the following format operations, where x·y denotes the number of bits x before the binary point and the number of bits y after the binary point:
8.24=8.24/8.24
8.56=8.24/8.24
24.56=8.24/8.24
When performing double precision division, full divider 400 may support the following format operations:
8.24=8.56/8.24
8.56=8.56/8.24
24.56=8.56/8.24
Performing an 8.24/8.24 division will potentially result in a 33.28 number. The resulting number may thus be mapped into a saturating accumulator (24.56), a saturated double precision register format (8.56) and a saturated single precision register format (8.24). In the single precision register any extra fractional bits may be truncated.
Performing an 8.56/8.24 division will potentially result in a 33.28 number. As with the case of dividing two single precision numbers, the resulting number may be thus mapped into a saturating accumulator (24.56), a saturated double precision register (8.56) and a saturated single precision register format (8.24). In the single precision register any extra fractional bits may again be truncated. In fixed point mode, there may be a remainder.
In integer division mode, full divider 400 may support the division of a signed 64-bit number with a signed 32-bit number. The result may be a 64-bit number. In integer node, there will be a 32-bit remainder.
The full divider may be configured as a signed divider. The different modes of the full divider may be mapped as a number of virtual registers that sit on top of a smaller number of real registers.
In the case of a division by zero, the result may be a saturated output quotient of plus or minus a maximum numerical value, with a remainder of zero. In integer mode, such an output may be one of 263-1 and −263. In fixed point mode, such a saturated output may be plus or minus the maximum value of an 8.56 or a 24.56 formatted number.
The quotient register 418 and remainder register 420 may only be updated when the full divider reports that it has completed its operations. The most recently written dividend data may be retained after a division is complete, so that if the dividend data has not changed the user need not re-write the dividend for the next division operation.
While particular embodiments of the present invention have been described, it is to be understood that various different modifications within the scope and spirit of the invention are possible. The invention is limited only by the scope of the appended claims.
As described above, one aspect of the present invention relates to a fast binary divider. The provided description is presented to enable any person skilled in the art to make and use the invention. For purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the present invention. Description of specific applications and methods are provided only as examples. Various modifications to the preferred embodiments will be readily apparent to those skilled in the art and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Thus the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and steps disclosed herein.
As used herein, the terms “a” or “an” shall mean one or more than one. The term “plurality” shall mean two or more than two. The term “another” is defined as a second or more. The terms “including” and/or “having” are open ended (e.g., comprising). Reference throughout this document to “one embodiment”, “certain embodiments”, “an embodiment” or similar term means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of such phrases in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner on one or more embodiments without limitation. The term “or” as used herein is to be interpreted as inclusive or meaning any one or any combination. Therefore, “A, B or C” means “any of the following: A; B; C; A and B; A and C; B and C; A, B and C”. An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.
In accordance with the practices of persons skilled in the art of computer programming, embodiments are described with reference to operations that may be performed by a computer system or a like electronic system. Such operations are sometimes referred to as being computer-executed. It will be appreciated that operations that are symbolically represented include the manipulation by a processor, such as a central processing unit, of electrical signals representing data bits and the maintenance of data bits at memory locations, such as in system memory, as well as other processing of signals. The memory locations where data bits are maintained are physical locations that have particular electrical, magnetic, optical, or organic properties corresponding to the data bits.
When implemented in software, the elements of the embodiments are essentially the code segments to perform the necessary tasks. The non-transitory code segments may be stored in a processor readable medium or computer readable medium, which may include any medium that may store or transfer information. Examples of such media include an electronic circuit, a semiconductor memory device, a read-only memory (ROM), a flash memory or other non-volatile memory, a floppy diskette, a CD-ROM, an optical disk, a hard disk, a fiber optic medium, etc. User input may include any combination of a keyboard, mouse, touch screen, voice command input, etc. User input may similarly be used to direct a browser application executing on a user's computing device to one or more network resources, such as web pages, from which computing resources may be accessed.