Implementing division in a programmable integrated circuit device

Description

BACKGROUND OF THE INVENTION

This invention relates to implementing division in programmable integrated circuit devices such as, e.g., programmable logic devices (PLDs).

As applications for which PLDs are used increase in complexity, it has become more common to design PLDs to include specialized processing blocks in addition to blocks of generic programmable logic resources. Such specialized processing blocks may include a concentration of circuitry on a PLD that has been partly or fully hardwired to perform one or more specific tasks, such as a logical or a mathematical operation. A specialized processing block may also contain one or more specialized structures, such as an array of configurable memory elements. Examples of structures that are commonly implemented in such specialized processing blocks include: multipliers, arithmetic logic units (ALUs), barrel-shifters, various memory elements (such as FIFO/LIFO/SIPO/RAM/ROM/CAM blocks and register files), AND/NAND/OR/NOR arrays, etc., or combinations thereof.

One particularly useful type of specialized processing block that has been provided on PLDs is a digital signal processing (DSP) block, which may be used to process, e.g., audio signals. Such blocks are frequently also referred to as multiply-accumulate (“MAC”) blocks, because they include structures to perform multiplication operations, and sums and/or accumulations of multiplication operations.

For example, PLDs sold by Altera Corporation, of San Jose, Calif., as part of the STRATIX® family, include DSP blocks, each of which may include four 18-by-18 multipliers. Each of those DSP blocks also may include adders and registers, as well as programmable connectors (e.g., multiplexers) that allow the various components to be configured in different ways. In each such block, the multipliers can be configured not only as four individual 18-by-18 multipliers, but also as four smaller multipliers, or as one larger (36-by-36) multiplier. In addition, one 18-by-18 complex multiplication (which decomposes into two 18-by-18 multiplication operations for each of the real and imaginary parts) can be performed.

Larger multiplications can be performed by using more of the 18-by-18 multipliers—e.g., from other DSP blocks. For example, a 54-by-54 multiplier can be decomposed, by linear decomposition, into a 36-by-36 multiplier (which uses the four 18-by-18 multipliers of one DSP block), two 36-by-18 multipliers (each of which uses two 18-by-18 multipliers, for a total of four additional 18-by-18 multipliers, consuming another DSP block), and one 18-by-18 multiplier, consuming a portion of a third DSP block. Thus, using 18-by-18 multipliers, nine multipliers are required to perform a 54-by-54 multiplication.

One type of mathematical function that heretofore has not been easily implemented in a PLD or other programmable device is division. Division, especially double-precision floating point division, which may be required for High Performance Computing, is expensive and slow on current FPGAs. A common implementation in general-purpose programmable logic of an FPGA uses a network of 64 80-bit adders, typically requiring between 6,000 and 9,000 four-input look-up tables. Moreover, the resulting operation is slow, typically having a 150 MHz system speed and about 57 clock cycles of latency.

SUMMARY OF THE INVENTION

The present invention implements multiplier-based division in a programmable device. For example, convergence-type multiplier-based approaches offer the possibility of higher system speeds (on the order of about 300 MHz), lower latency (on the order of 10-20 clock cycles), and lower logic utilization (as most of the calculations are done in multipliers rather than in general-purpose programmable logic).

As described above, the DSP blocks provided on PLDs from Altera Corporation support, inter alia, a 36-bit-by-36-bit multiplier mode. In accordance with the present invention, such a DSP block may be modified to support also a 72-bit-by-18-bit multiplier mode. The resulting asymmetric multiplier can then be used to implement a recursive algorithm to perform division operations, as described in more detail below.

Therefore, in accordance with the present invention, there is provided a method of configuring a programmable integrated circuit device to use multipliers to perform a division operation that provides a quotient of a dividend input value and a divisor input value, where the quotient has a first precision. The method includes configuring logic of the programmable integrated circuit device to use at least a first of the multipliers to operate on said divisor input value to provide an inverted divisor approximation having a second precision less precise than the first precision; configuring logic of the programmable integrated circuit device to recursively compute a remainder by initializing the remainder to said dividend input value at the first precision and then, in each recursive stage, subtracting from the remainder a product of (a) the remainder represented at the second precision, (b) the divisor input value represented at the first precision, and (c) the inverted divisor approximation. Logic of the programmable integrated circuit device is configured to compute a respective component of the quotient in each of the recursive stages, by computing a product of (1) the remainder represented at the second precision, and (2) the inverted divisor approximation. Logic of the programmable integrated circuit device is further configured to add the respective components of the quotient to provide the quotient.

A programmable logic device so configurable or configured, and a machine-readable data storage medium encoded with software for performing the method, are also provided.

BRIEF DESCRIPTION OF THE DRAWINGS

Further features of the invention, its nature and various advantages, will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:

FIG. 1 is a schematic representation of a previously-known specialized processing block in a programmable integrated circuit device;

FIG. 2 is a diagram showing the decomposition of a 36-bit-by-36-bit multiplication to be performed in a specialized processing block such as that of FIG. 1;

FIG. 3 is a diagram of the logic flow, and a circuit configuration with which a specialized processing block such as that of FIG. 1 may be programmed, for performing the multiplication of FIG. 2;

FIG. 4 is a diagram showing the decomposition of a 72-bit-by-18-bit multiplication to be performed in a specialized processing block in accordance with an embodiment of the present invention, for implementing division;

FIG. 5 is a diagram of the logic flow, and a circuit configuration with which a specialized processing block may be programmed, for performing the multiplication of FIG. 4 to implement division in accordance with an embodiment of the present invention;

FIG. 6 is a diagram of the logic flow, and a circuit configuration with which a specialized processing block may be programmed, for implementing division in accordance with an embodiment of the present invention;

FIG. 7 is a diagram of a logical equivalent of the configuration of FIG. 6;

FIG. 8 is a schematic representation of a divider structure in accordance with an embodiment of the present invention;

FIG. 9 is a cross-sectional view of a magnetic data storage medium encoded with a set of machine executable instructions for performing the method according to the present invention;

FIG. 10 is a cross-sectional view of an optically readable data storage medium encoded with a set of machine executable instructions for performing the method according to the present invention; and

FIG. 11 is a simplified block diagram of an illustrative system employing a programmable logic device incorporating the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The following division problem:

$Q = \frac{X}{Y}$

can be broken down into the following recursive problem:

$\begin{matrix} Q_{i + 1} = Q_{i} + {Rh}_{i} \frac{1}{Yh} \\ R_{i + 1} = R_{i} - {Rh}_{i} \frac{1}{Yh} Y \end{matrix}$

where:

Q_i=the partial quotient in the ith iteration, initialized to 0 in the 0th iteration,
R_i=the partial remainder in the ith iteration, initialized to X in the 0th iteration,
Rh_i=some number, h, of significant bits of R_i, and
Yh=some number, h, of significant bits of the divisor Y.

As can be seen, in the first (0th) iteration, the partial quotient becomes the product of h bits of X and the inverse of h bits of Y, which will be close as a zeroth-order approximation of the result. At the same time, the remainder becomes the difference between (a) X and (b) the product of (i) h bits of X and (ii) the product of (1) Y and (2) the inverse of h bits of Y, which is the difference between (a) X and (b) the product of (i) h bits of X and (ii) a number close to 1, which is the difference between (a) X and (b) a number close to h bits of X, which is close to zero. In other words, as expected, in the 0th iteration, the result is that Q₀is the product of h bits of X and the inverse of h bits of Y which is close to the result, and R₀is close to zero. The result will converge in subsequent iterations, getting closer to the actual result where Q_iis essentially equal to the result and R_iis essentially equal to zero.

The number of iterations required for convergence depends on how close to the actual result one wants to be, and on the value chosen for h. The value chosen for h cannot be so large that the inverse of Yh cannot be computed easily. In the 72-bit-by-18-bit embodiment described herein, an 18-bit inverse can be calculated relatively easily using, e.g., a Taylor series expansion. The Taylor series expansion can be performed using one 18-bit-by-18-bit multiplier, along with two lookup tables (which may be provided as read-only memories, or programmed into programmable logic in the case of a programmable device), as well as some additional logic such as adders.

In such an embodiment, the R_ipartial remainder multiplications can then be 18-bits by the internal precision of the calculation, which may be 64 bits for double-precision arithmetic or 72 bits for extended double-precision arithmetic, which exceed the required mantissa sizes—52 bits and 64 bits respectively—in both cases, so that any errors accumulate to be less than the least-significant-bit position required in the final answer. The Q_ipartial quotient multiplications—Rh(1/Yh)—would be 18-bits-by-18-bits in either case. The result can be deemed to have converged when R_ifalls below a predetermined value. In a programmable device, that predetermined value may be user-programmable.

FIG. 1 schematically shows a previously-known DSP block 10 of the type described above, available in devices from Altera Corporation. DSP block 10 may have four 18-bit-by-18-bit multipliers 11, whose outputs may be combined by N:2 compressor 12 to provide two partial sums and a carry vector, which are further combined in carry-lookahead adder 13. The total number of signals typically include 144 input data signals 14, and 72-80 output data signals 15.

FIG. 2 shows how such a structure may be used to perform a 36-bit-by-36-bit multiplication. The two 36-bit numbers 20, 21 are decomposed into two 18-bit numbers each—A|B and C|D. The four multipliers form four 18-bit-by-18-bit products DB, DA, CB and AC. The products DA and CB are left-shifted by 18 bits, and the product AC is left-shifted by 36 bits.

FIG. 3 shows the connections in block 10 for performing those multiplications. There are four 18-bit-by-18-bit multipliers 30. As each has 36 (i.e., 18+18) inputs, 36×4=144 inputs 31 are available. However, only 72 unique inputs are required. The 72 inputs can be provided only once, with each input to be used more than once being de-multiplexed to the respective multipliers 30 inside DSP block 10, or inputs can be provided multiple times, once each for every component to use the input, so that up to all 144 inputs are used. The partial products may be left-shifted at 320, 321, 322 using, e.g., a combination of multiplexers and wires (conductive traces). After all partial products have thus been properly aligned, they are compressed using the N:2 compressor 33 into a partial product vector and a carry vector, after which they are added in carry-lookahead adder 34 to make the 72-bit output 35.

A 72-bit-by-18-bit multiplication can use the same number of partial products as a 36-bit-by-36-bit multiplication, except that there are five unique 18-bit numbers. FIG. 4 shows the offsets and combining patterns for the partial products of an 72-bit-by-18-bit multiplication, while FIG. 5 shows how block 10 of FIG. 3 may be modified to provide block 50 capable of performing a 72-bit-by-18-bit multiplication.

As can be seen in block 50, 18-bit left-shifter 322 is replaced with 36-bit left-shifter 522. Preferably, left-shifter 522 is selectable (e.g., using a multiplexer) to shift by either 18 or 36 bits, so that the user can use block 50 in the manner of block 10 if desired.

Of the 144 input conductors 31, between 82 (in the case of a 64-bit-by-18-bit calculation for double-precision arithmetic) and 90 (in the case of a 72-bit-by-18-bit calculation for extended double-precision arithmetic) are used for inputs, while correspondingly 72 or 80 bits are used as outputs. The 72-bit-by-18-bit multiplication operation actually produces a 90-bit output, which cannot be handled by the routing structure in this embodiment, but as the input of each iteration can handle 72 bits, and as the overall division operation is only an iterative approximation, only the 72 most significant bits need be routed out. The precision lost by discarding the 18 least significant bits will not have much impact. Optionally, adder 34 can include a rounding mode to compensate for the discarding of the least significant bits. For example, rounding can occur at the 52nd bit for double precision calculations or at the 64th bit for extended double precision calculations.

In the calculation above for the partial remainder R_i+1, multiplicative term Rh_iis a subset or truncation of the additive term R_i. Therefore, those h bits (e.g., 18 bits) need not be input twice, but rather simply routed twice within block 50. With 144 inputs, the partial remainder recurrence equation can be supported by the block 50. It is already known to provide additional input terms for compressor 33, which may be used, e.g., for accumulation, chaining or redundancy. In order to include the h bits of Rh_iin the multiplication operation, all that would be needed is some additional multiplexing.

As a reminder, each term of the partial remainder recurrence subtracts (which is a form of addition) a product of Rh_i(which is 18 bits wide) and Y(1/Yh) which itself is a 72-bit product. The structure of a DSP block 60 for performing this calculation is shown in FIG. 6. The logical equivalent is shown in FIG. 7. There are 144 inputs 61 representing 72 bits 62 of R_iand 72 bits 63 of Y(1/Yh). The latter are combined with the 18 bits 64 of Rh_iwhich are a subset of bits 62 to provide 90 bits 65. As discussed above, output 66 may be 90 bits wide, but is truncated to its 72 most significant bits, or optionally rounded to 52 or 64 bits, for use by the next iteration.

Chaining a number of these blocks allows calculation of a division operation. With an 18-bit “guess” for 1/Yh, each iteration should give about 15 “good” bits—i.e., bits that can be counted on to be correct. As discussed above, any errors can be expected to accumulate at bit positions less significant than the fifteenth bit of each iteration. Therefore, for double precision, which requires 52 bits, four iterations (60 “good” bits) should be sufficient, while for extended double precision, which requires 64 bits, five iterations (75 “good” bits) should be sufficient.

As shown below in FIG. 8, each iteration requires five 18-bit-by-18-bit multipliers—the four multipliers of a DSP block for the remainder calculation, and one additional multiplier for the quotient calculation (which, as a reminder, is simply Rh_i(1/Yh) added to the previous quotient), or five multipliers. Therefore, the four iterations of a double-precision division operation will require twenty multipliers, plus five more to prepare the “constants” 1/Yh (which requires one 18-bit-by-18-bit multiplier as discussed above) and Y(1/Yh) (which requires four 18-bit-by-18-bit multipliers to perform the necessary 72-bit-by-18-bit multiplication), for a total of twenty-five multipliers. By comparison, for example, a double-precision multiplication operation requires eight or nine 18-bit-by-18-bit multipliers. While division according to the present invention thus requires more multipliers than multiplication, it nevertheless requires fewer resources than the 64 adders previously used, as discussed above.

An embodiment of a divider structure 80 in accordance with the invention is shown in FIG. 8. Although, as discussed above, a minimum of four iterations ordinarily would be provided, to simplify the drawing only three iterations are included in divider structure 80. Y, the divisor, is input at 81, while X, the dividend, is input at 82.

A first DSP block 801 is used to provide an 18-bit approximation 1/Yh of the inverse of Y, using one 18-bit-by-18-bit multiplier plus additional logic as described above. This value 802 is multiplied by Y in DSP block 803 (configured as a 72-bit-by-18-bit multiplier to perform 64-bit-by-18-bit multiplication in a double-precision embodiment or 72-bit-by-18-bit multiplication in an extended double-precision embodiment) and the result 804, which approximates, but does not quite equal, one, is provided to each of DSP blocks 805, 806, 807 which perform respective stages of the remainder calculation. At each stage, 72 bits of the previous remainder 814 are multiplied by value 804, and that product is subtracted from the same previous remainder 814 by carry-lookahead adder 808. The subtraction can be facilitated either by negating inputs to some of the 18-bit multipliers or it can be done in compressor 11 (not shown in FIG. 8).

For each stage of the quotient, value 802 (1/Yh) is multiplied at respective multiplier 809 by previous remainder 814 as input to that stage. All of these stages are then added together. The addition is represented symbolically at 819. However, while one big adder 819 could be provided, the addition alternatively could be carried out in steps, using, e.g., a chaining mode available in DSP blocks of the Altera Corporation products described above. In addition, because each stage provides about fifteen “good” bits of the final quotient, the result of each subsequent stage (except the first) preferably is right-shifted by about fifteen additional bits. Insofar as shifters are essentially simply wires, the shifters are not explicitly shown in FIG. 8. However, the shifting occurs after each multiplier 809 and before adder 819.

Thus, the method of the invention configures a programmable integrated circuit device, such as a PLD, to create the structures shown in FIGS. 6 and 8 to perform division operations using multipliers on the device, at a savings as compared to using adders as has been done previously.

Instructions for carrying out the method according to this invention may be encoded on a machine-readable medium, to be executed by a suitable computer or similar device to implement the method of the invention for programming or configuring programmable integrated circuit devices to perform operations as described above. For example, a personal computer may be equipped with an interface to which a programmable integrated circuit device can be connected, and the personal computer can be used by a user to program the programmable integrated circuit device using a suitable software tool, such as the QUARTUS® II software available from Altera Corporation, of San Jose, Calif.

FIG. 9 presents a cross section of a magnetic data storage medium 600 which can be encoded with a machine executable program that can be carried out by systems such as the aforementioned personal computer, or other computer or similar device. Medium 600 can be a floppy diskette or hard disk, or magnetic tape, having a suitable substrate 601, which may be conventional, and a suitable coating 602, which may be conventional, on one or both sides, containing magnetic domains (not visible) whose polarity or orientation can be altered magnetically. Except in the case where it is magnetic tape, medium 600 may also have an opening (not shown) for receiving the spindle of a disk drive or other data storage device.

The magnetic domains of coating 602 of medium 600 are polarized or oriented so as to encode, in manner which may be conventional, a machine-executable program, for execution by a programming system such as a personal computer or other computer or similar system, having a socket or peripheral attachment into which the PLD to be programmed may be inserted, to configure appropriate portions of the PLD, including its specialized processing blocks, if any, in accordance with the invention.

FIG. 10 shows a cross section of an optically-readable data storage medium 700 which also can be encoded with such a machine-executable program, which can be carried out by systems such as the aforementioned personal computer, or other computer or similar device. Medium 700 can be a conventional compact disk read only memory (CD-ROM) or digital video disk read only memory (DVD-ROM) or a rewriteable medium such as a CD-R, CD-RW, DVD-R, DVD-RW, DVD+R, DVD+RW, or DVD-RAM or a magneto-optical disk which is optically readable and magneto-optically rewriteable. Medium 700 preferably has a suitable substrate 701, which may be conventional, and a suitable coating 702, which may be conventional, usually on one or both sides of substrate 701.

In the case of a CD-based or DVD-based medium, as is well known, coating 702 is reflective and is impressed with a plurality of pits 703, arranged on one or more layers, to encode the machine-executable program. The arrangement of pits is read by reflecting laser light off the surface of coating 702. A protective coating 704, which preferably is substantially transparent, is provided on top of coating 702.

In the case of magneto-optical disk, as is well known, coating 702 has no pits 703, but has a plurality of magnetic domains whose polarity or orientation can be changed magnetically when heated above a certain temperature, as by a laser (not shown). The orientation of the domains can be read by measuring the polarization of laser light reflected from coating 702. The arrangement of the domains encodes the program as described above.

Thus it is seen that a method for efficiently carrying out division in a programmable integrated circuit device, a programmable integrated circuit device programmed to perform the method, and software for carrying out the programming, have been provided.

A PLD 90 programmed according to the present invention may be used in many kinds of electronic devices. One possible use is in a data processing system 900 shown in FIG. 11. Data processing system 900 may include one or more of the following components: a processor 901; memory 902; I/O circuitry 903; and peripheral devices 904. These components are coupled together by a system bus 905 and are populated on a circuit board 906 which is contained in an end-user system 907.

System 900 can be used in a wide variety of applications, such as computer networking, data networking, instrumentation, video processing, digital signal processing, or any other application where the advantage of using programmable or reprogrammable logic is desirable. PLD 90 can be used to perform a variety of different logic functions. For example, PLD 90 can be configured as a processor or controller that works in cooperation with processor 901. PLD 90 may also be used as an arbiter for arbitrating access to a shared resources in system 900. In yet another example, PLD 90 can be configured as an interface between processor 901 and one of the other components in system 900. It should be noted that system 900 is only exemplary, and that the true scope and spirit of the invention should be indicated by the following claims.

Various technologies can be used to implement PLDs 90 as described above and incorporating this invention.

It will be understood that the foregoing is only illustrative of the principles of the invention, and that various modifications can be made by those skilled in the art without departing from the scope and spirit of the invention. For example, the various elements of this invention can be provided on a programmable integrated circuit device in any desired number and/or arrangement. One skilled in the art will appreciate that the present invention can be practiced by other than the described embodiments, which are presented for purposes of illustration and not of limitation, and the present invention is limited only by the claims that follow.

Claims

1. A method of configuring a programmable integrated circuit device to use dedicated symmetrical multipliers to perform a division operation that provides a quotient of a dividend input value and a divisor input value, said quotient having a first precision, said method comprising: configuring logic of said programmable integrated circuit device to use at least a first of said dedicated symmetrical multipliers to operate on said divisor input value to provide an inverted divisor approximation having a second precision less precise than said first precision;configuring logic of said programmable integrated circuit device to recursively compute a remainder by initializing said remainder to said dividend input value at said first precision and then, in each recursive stage, subtracting from said remainder a product, computed by a plurality of said dedicated symmetrical multipliers configured as an asymmetrical multiplier, of (a) said remainder represented at said second precision, (b) said divisor input value represented at said first precision, and (c) said inverted divisor approximation;configuring logic of said programmable integrated circuit device to compute a respective component of said quotient in each said recursive stage, by computing, using at least one of said dedicated symmetrical multipliers, a product of (1) said remainder represented at said second precision, and (2) said inverted divisor approximation; andconfiguring logic of said programmable integrated circuit device to add said respective components of said quotient to provide said quotient.
2. The method of claim 1 wherein said configuring logic of said programmable integrated circuit device to recursively compute a remainder comprises configuring said logic to shorten said remainder as computed in each said recursive stage to at most a number of bits representing said first precision.
3. The method of claim 2 wherein said configuring said logic to shorten said remainder to at most said first number of bits comprises configuring said logic to truncate said remainder to said number of bits representing said first precision.
4. The method of claim 2 wherein said configuring said logic to shorten said remainder to at most said number of bits representing said first precision comprises configuring said logic to round said remainder to a number of bits determined by a desired level of precision.
5. The method of claim 1 wherein: there are four said recursive stages; andsaid dividend input value, said divisor input value and said quotient are double-precision values.
6. The method of claim 1 wherein: there are five said recursive stages; andsaid dividend input value, said divisor input value and said quotient are extended double-precision values.
7. The method of claim 1 wherein said configuring logic of said programmable integrated circuit device to add said respective components of said quotient comprises configuring logic of said programmable integrated circuit device as a single adder for adding said components of said quotient from all said stages.
8. The method of claim 7 further comprising configuring logic of said programmable integrated circuit device to shift each component of said quotient from each successive one of said stages by a successive multiple of a predetermined number of bits.
9. The method of claim 8 wherein said predetermined number of bits is less than a number of bits representing said second precision.
10. The method of claim 1 wherein: said configuring logic of said programmable integrated circuit device to add said respective components of said quotient comprises configuring each said stage to add its respective component of said quotient to a sum of components of said quotient from a preceding stage; andin a first one of said plurality of stages, said sum of components of said quotient from a preceding stage is considered to be zero.
11. The method of claim 10 further comprising configuring each said stage after said first one of said plurality of stages to shift its respective component of said quotient by a successive multiple of a predetermined number of bits.
12. The method of claim 11 wherein said predetermined number of bits is less than a number of bits representing said second precision.
13. A programmable integrated circuit device configurable to perform a division operation that provides a quotient of a dividend input value and a divisor input value, each of which has a first precision, said programmable integrated circuit device comprising: a plurality of specialized processing blocks each having a plurality of dedicated symmetrical multiplier circuits;logic configurable to use at least a first of said dedicated symmetrical multiplier circuits to operate on said divisor input value to provide an inverted divisor approximation having a second precision less precise than said first precision;logic configurable to recursively compute a remainder by initializing said remainder to said dividend input value at said first precision and then, in each recursive stage, subtracting from said remainder a product, computed using a plurality of said dedicated symmetrical multiplier circuits configured as an asymmetrical multiplier circuit, of (a) said remainder represented at said second precision, (b) said divisor input value represented at said first precision, and (c) said inverted divisor approximation;logic configurable to compute a respective component of said quotient in each said recursive stage, by computing, using at least one of said dedicated symmetrical multiplier circuits, a product of (1) said remainder represented at said second precision, and (2) said inverted divisor approximation; andlogic configurable to add said respective components of said quotient to provide said quotient.
14. The programmable integrated circuit device of claim 13 wherein said logic configurable to recursively compute a remainder comprises configuring said logic to shorten said remainder as computed in each said recursive stage to at most a number of bits representing said first precision.
15. The programmable integrated circuit device of claim 14 wherein said logic configurable to shorten said remainder to at most said first number of bits comprises configuring said logic to truncate said remainder to said number of bits representing said first precision.
16. The programmable integrated circuit device of claim 14 wherein said logic configurable to shorten said remainder to at most said number of bits representing said first precision comprises configuring said logic to round said remainder to a number of bits determined by a desired level of precision.
17. The programmable integrated circuit device of claim 13 wherein: there are four said recursive stages; andsaid dividend input value, said divisor input value and said quotient are double-precision values.
18. The programmable integrated circuit device of claim 13 wherein: there are five said recursive stages; andsaid dividend input value, said divisor input value and said quotient are extended double-precision values.
19. The programmable integrated circuit device of claim 13 wherein said logic configurable to add said respective components of said quotient comprises logic configurable as a single adder for adding said components of said quotient from all said stages.
20. The programmable integrated circuit device of claim 19 further comprising logic configurable to shift each component of said quotient from each successive one of said stages by a successive multiple of a predetermined number of bits.
21. The programmable integrated circuit device of claim 20 wherein said predetermined number of bits is less than a number of bits representing said second precision.
22. The programmable integrated circuit device of claim 13 wherein: said logic configurable to add said respective components of said quotient comprises logic in each said stage configurable to add its respective component of said quotient to a sum of components of said quotient from a preceding stage; andin a first one of said plurality of stages, said sum of components of said quotient from a preceding stage is considered to be zero.
23. The programmable integrated circuit device of claim 22 further comprising logic in each said stage after said first one of said plurality of stages configurable to shift its respective component of said quotient by a successive multiple of a predetermined number of bits.
24. The programmable integrated circuit device of claim 23 wherein said predetermined number of bits is less than a number of bits representing said second precision.
25. The programmable integrated circuit device of claim 13 wherein said programmable integrated circuit device is a programmable logic device.
26. A programmable integrated circuit device configured to perform a division operation that provides a quotient of a dividend input value and a divisor input value, each of which has a first precision, said programmable integrated circuit device comprising: a plurality of specialized processing blocks each having a plurality of dedicated symmetrical multiplier circuits;logic configured to use at least a first of said dedicated symmetrical multiplier circuits to operate on said divisor input value to provide an inverted divisor approximation having a second precision less precise than said first precision;logic configured to recursively compute a remainder by initializing said remainder to said dividend input value at said first precision and then, in each recursive stage, subtracting from said remainder a product, computed using a plurality of said dedicated symmetrical multiplier circuits configured as an asymmetrical multiplier circuit, of (a) said remainder represented at said second precision, (b) said divisor input value represented at said first precision, and (c) said inverted divisor approximation;logic configured to compute a respective component of said quotient in each said recursive stage, by computing, using at least one of said dedicated symmetrical multiplier circuits, a product of (1) said remainder represented at said second precision, and (2) said inverted divisor approximation; andlogic configured to add said respective components of said quotient to provide said quotient.
27. The configured programmable integrated circuit device of claim 26 wherein said logic configured to recursively compute a remainder comprises configuring said logic to shorten said remainder as computed in each said recursive stage to at most a number of bits representing said first precision.
28. The configured programmable integrated circuit device of claim 27 wherein said logic configured to shorten said remainder to at most said first number of bits comprises configuring said logic to truncate said remainder to said number of bits representing said first precision.
29. The configured programmable integrated circuit device of claim 27 wherein said logic configured to shorten said remainder to at most said number of bits representing said first precision comprises configuring said logic to round said remainder to a number of bits determined by a desired level of precision.
30. The configured programmable integrated circuit device of claim 26 wherein: there are four said recursive stages; andsaid dividend input value, said divisor input value and said quotient are double-precision values.
31. The configured programmable integrated circuit device of claim 26 wherein: there are five said recursive stages; andsaid dividend input value, said divisor input value and said quotient are extended double-precision values.
32. The configured programmable integrated circuit device of claim 26 wherein said logic configured to add said respective components of said quotient comprises logic configured as a single adder for adding said components of said quotient from all said stages.
33. The configured programmable integrated circuit device of claim 32 further comprising logic configured to shift each component of said quotient from each successive one of said stages by a successive multiple of a predetermined number of bits.
34. The configured programmable integrated circuit device of claim 33 wherein said predetermined number of bits is less than a number of bits representing said second precision.
35. The configured programmable integrated circuit device of claim 26 wherein: said logic configured to add said respective components of said quotient comprises logic in each said stage configurable to add its respective component of said quotient to a sum of components of said quotient from a preceding stage; andin a first one of said plurality of stages, said sum of components of said quotient from a preceding stage is considered to be zero.
36. The configured programmable integrated circuit device of claim 35 further comprising logic in each said stage after said first one of said plurality of stages configurable to shift its respective component of said quotient by a successive multiple of a predetermined number of bits.
37. The configured programmable integrated circuit device of claim 36 wherein said predetermined number of bits is less than a number of bits representing said second precision.
38. The configured programmable integrated circuit device of claim 26 wherein said programmable integrated circuit device is a programmable logic device.
39. A machine-readable data storage medium encoded with machine-executable instructions for configuring a programmable integrated circuit device to use dedicated symmetrical multipliers to perform a division operation that provides a quotient of a dividend input value and a divisor input value, said quotient having a first precision, said instructions comprising: instructions to configure logic of said programmable integrated circuit device to use at least a first of said dedicated symmetrical multipliers to operate on said divisor input value to provide an inverted divisor approximation having a second precision less precise than said first precision;instructions to configure logic of said programmable integrated circuit device to recursively compute a remainder by initializing said remainder to said dividend input value at said first precision and then, in each recursive stage, subtracting from said remainder a product, computed using a plurality of said dedicated symmetrical multipliers configured as an asymmetrical multiplier, of (a) said remainder represented at said second precision, (b) said divisor input value represented at said first precision, and (c) said inverted divisor approximation;instructions to configure logic of said programmable integrated circuit device to compute a respective component of said quotient in each said recursive stage, by computing, using at least one of said dedicated symmetrical multipliers, a product of (1) said remainder represented at said second precision, and (2) said inverted divisor approximation; andinstructions to configure logic of said programmable integrated circuit device to add said respective components of said quotient to provide said quotient.
40. The machine-readable data storage medium of claim 39 wherein said instructions to configure logic of said programmable integrated circuit device to recursively compute a remainder comprises instructions to configure said logic to shorten said remainder as computed in each said recursive stage to at most a number of bits representing said first precision.
41. The machine-readable data storage medium of claim 40 wherein said instructions to configure said logic to shorten said remainder to at most said first number of bits comprises instructions to configure said logic to truncate said remainder to said number of bits representing said first precision.
42. The machine-readable data storage medium of claim 40 wherein said instructions to configure said logic to shorten said remainder to at most said number of bits representing said first precision comprises instructions to configure said logic to round said remainder to a number of bits determined by a desired level of precision.
43. The machine-readable data storage medium of claim 39 wherein: there are four said recursive stages; andsaid dividend input value, said divisor input value and said quotient are double-precision values.
44. The machine-readable data storage medium of claim 39 wherein: there are five said recursive stages; andsaid dividend input value, said divisor input value and said quotient are extended double-precision values.
45. The machine-readable data storage medium of claim 39 wherein said instructions to configure logic of said programmable integrated circuit device to add said respective components of said quotient comprises instructions to configure logic of said programmable integrated circuit device as a single adder for adding said components of said quotient from all said stages.
46. The machine-readable data storage medium of claim 45 further comprising instructions to configure logic of said programmable integrated circuit device to shift each component of said quotient from each successive one of said stages by a successive multiple of a predetermined number of bits.
47. The machine-readable data storage medium of claim 46 wherein said predetermined number of bits is less than a number of bits representing said second precision.
48. The machine-readable data storage medium of claim 39 wherein: said instructions to configure logic of said programmable integrated circuit device to add said respective components of said quotient comprises instructions to configure each said stage to add its respective component of said quotient to a sum of components of said quotient from a preceding stage; andin a first one of said plurality of stages, said sum of components of said quotient from a preceding stage is considered to be zero.
49. The machine-readable data storage medium of claim 48 further comprising instructions to configure each said stage after said first one of said plurality of stages to shift its respective component of said quotient by a successive multiple of a predetermined number of bits.
50. The machine-readable data storage medium of claim 49 wherein said predetermined number of bits is less than a number of bits representing said second precision.

US Referenced Citations (229)

Number	Name	Date	Kind
3473160	Wahlstrom	Oct 1969	A
4156927	McElroy et al.	May 1979	A
4179746	Tubbs	Dec 1979	A
4212076	Conners	Jul 1980	A
4215406	Gomola et al.	Jul 1980	A
4215407	Gomola et al.	Jul 1980	A
4422155	Amir et al.	Dec 1983	A
4484259	Palmer et al.	Nov 1984	A
4521907	Amir et al.	Jun 1985	A
4597053	Chamberlin	Jun 1986	A
4623961	Mackiewicz	Nov 1986	A
4682302	Williams	Jul 1987	A
4718057	Venkitakrishnan et al.	Jan 1988	A
4727508	Williams	Feb 1988	A
4791590	Ku et al.	Dec 1988	A
4799004	Mori	Jan 1989	A
4823295	Mader	Apr 1989	A
4839847	Laprade	Jun 1989	A
4871930	Wong et al.	Oct 1989	A
4912345	Steele et al.	Mar 1990	A
4967160	Quievy et al.	Oct 1990	A
4982354	Takeuchi et al.	Jan 1991	A
4994997	Martin et al.	Feb 1991	A
5046038	Briggs et al.	Sep 1991	A
5122685	Chan et al.	Jun 1992	A
5128559	Steele	Jul 1992	A
5175702	Beraud et al.	Dec 1992	A
5208491	Ebeling et al.	May 1993	A
RE34363	Freeman	Aug 1993	E
5267187	Hsieh et al.	Nov 1993	A
5296759	Sutherland et al.	Mar 1994	A
5338983	Agarwala	Aug 1994	A
5349250	New	Sep 1994	A
5357152	Jennings, III et al.	Oct 1994	A
5371422	Patel et al.	Dec 1994	A
5381357	Wedgwood et al.	Jan 1995	A
5404324	Colon-Bonet	Apr 1995	A
5424589	Dobbelaere et al.	Jun 1995	A
5446651	Moyse et al.	Aug 1995	A
5451948	Jekel	Sep 1995	A
5452231	Butts et al.	Sep 1995	A
5452375	Rousseau et al.	Sep 1995	A
5457644	McCollum	Oct 1995	A
5465226	Goto	Nov 1995	A
5465375	Thepaut et al.	Nov 1995	A
5483178	Costello et al.	Jan 1996	A
5497498	Taylor	Mar 1996	A
5500828	Doddington et al.	Mar 1996	A
5523963	Hsieh et al.	Jun 1996	A
5528550	Pawate et al.	Jun 1996	A
5537601	Kimura et al.	Jul 1996	A
5546018	New et al.	Aug 1996	A
5550993	Ehlig et al.	Aug 1996	A
5559450	Ngai et al.	Sep 1996	A
5563526	Hastings et al.	Oct 1996	A
5563819	Nelson	Oct 1996	A
5570039	Oswald et al.	Oct 1996	A
5570040	Lytle et al.	Oct 1996	A
5572148	Lytle et al.	Nov 1996	A
5581501	Sansbury et al.	Dec 1996	A
5590350	Guttag et al.	Dec 1996	A
5594366	Khong et al.	Jan 1997	A
5594912	Brueckmann et al.	Jan 1997	A
5596763	Guttag et al.	Jan 1997	A
5606266	Pedersen	Feb 1997	A
5617058	Adrian et al.	Apr 1997	A
5633601	Nagaraj	May 1997	A
5636150	Okamoto	Jun 1997	A
5636368	Harrison et al.	Jun 1997	A
5640578	Balmer et al.	Jun 1997	A
5644522	Moyse et al.	Jul 1997	A
5646545	Trimberger et al.	Jul 1997	A
5648732	Duncan	Jul 1997	A
5652903	Weng et al.	Jul 1997	A
5655069	Ogawara et al.	Aug 1997	A
5664192	Lloyd et al.	Sep 1997	A
5689195	Cliff et al.	Nov 1997	A
5696708	Leung	Dec 1997	A
5729495	Madurawe	Mar 1998	A
5740404	Baji	Apr 1998	A
5744980	McGowan et al.	Apr 1998	A
5744991	Jefferson et al.	Apr 1998	A
5754459	Telikepalli	May 1998	A
5761483	Trimberger	Jun 1998	A
5764555	McPherson et al.	Jun 1998	A
5768613	Asghar	Jun 1998	A
5777912	Leung et al.	Jul 1998	A
5784636	Rupp	Jul 1998	A
5790446	Yu et al.	Aug 1998	A
5794067	Kadowaki	Aug 1998	A
5801546	Pierce et al.	Sep 1998	A
5805477	Perner	Sep 1998	A
5805913	Guttag et al.	Sep 1998	A
5812479	Cliff et al.	Sep 1998	A
5812562	Baeg	Sep 1998	A
5815422	Dockser	Sep 1998	A
5821776	McGowan	Oct 1998	A
5825202	Tavana et al.	Oct 1998	A
5838165	Chatter	Nov 1998	A
5841684	Dockser	Nov 1998	A
5847579	Trimberger	Dec 1998	A
5859878	Phillips et al.	Jan 1999	A
5869979	Bocchino	Feb 1999	A
5872380	Rostoker et al.	Feb 1999	A
5874834	New	Feb 1999	A
5878250	LeBlanc	Mar 1999	A
5880981	Kojima et al.	Mar 1999	A
5892962	Cloutier	Apr 1999	A
5894228	Reddy et al.	Apr 1999	A
5898602	Rothman et al.	Apr 1999	A
5931898	Khoury	Aug 1999	A
5942914	Reddy et al.	Aug 1999	A
5944774	Dent	Aug 1999	A
5949710	Pass et al.	Sep 1999	A
5951673	Miyata	Sep 1999	A
5956265	Lewis	Sep 1999	A
5959871	Pierzchala et al.	Sep 1999	A
5960193	Guttag et al.	Sep 1999	A
5961635	Guttag et al.	Oct 1999	A
5963048	Harrison et al.	Oct 1999	A
5963050	Young et al.	Oct 1999	A
5968196	Ramamurthy et al.	Oct 1999	A
5970254	Cooke et al.	Oct 1999	A
5978260	Trimberger et al.	Nov 1999	A
5982195	Cliff et al.	Nov 1999	A
5986465	Mendel	Nov 1999	A
5991788	Mintzer	Nov 1999	A
5991898	Rajski et al.	Nov 1999	A
5995748	Guttag et al.	Nov 1999	A
5999015	Cliff et al.	Dec 1999	A
5999990	Sharrit et al.	Dec 1999	A
6005806	Madurawe et al.	Dec 1999	A
6006321	Abbott	Dec 1999	A
6009451	Burns	Dec 1999	A
6020759	Heile	Feb 2000	A
6021423	Nag et al.	Feb 2000	A
6029187	Verbauwhede	Feb 2000	A
6031763	Sansbury	Feb 2000	A
6041340	Mintzer	Mar 2000	A
6052327	Reddy et al.	Apr 2000	A
6052755	Terrill et al.	Apr 2000	A
6064614	Khoury	May 2000	A
6065131	Andrews et al.	May 2000	A
6066960	Pedersen	May 2000	A
6069487	Lane et al.	May 2000	A
6072994	Phillips et al.	Jun 2000	A
6073154	Dick	Jun 2000	A
6075381	LaBerge	Jun 2000	A
6084429	Trimberger	Jul 2000	A
6085317	Smith	Jul 2000	A
6091261	DeLange	Jul 2000	A
6091765	Pietzold, III et al.	Jul 2000	A
6094726	Gonion et al.	Jul 2000	A
6097988	Tobias	Aug 2000	A
6098163	Guttag et al.	Aug 2000	A
6107820	Jefferson et al.	Aug 2000	A
6107821	Kelem et al.	Aug 2000	A
6107824	Reddy et al.	Aug 2000	A
6130554	Kolze et al.	Oct 2000	A
6140839	Kaviani et al.	Oct 2000	A
6154049	New	Nov 2000	A
6157210	Zaveri et al.	Dec 2000	A
6163788	Chen et al.	Dec 2000	A
6167415	Fischer et al.	Dec 2000	A
6175849	Smith	Jan 2001	B1
6215326	Jefferson et al.	Apr 2001	B1
6226735	Mirsky	May 2001	B1
6242947	Trimberger	Jun 2001	B1
6243729	Staszewski	Jun 2001	B1
6246258	Lesea	Jun 2001	B1
6279021	Takano et al.	Aug 2001	B1
6286024	Yano et al.	Sep 2001	B1
6314442	Suzuki	Nov 2001	B1
6314551	Borland	Nov 2001	B1
6321246	Page et al.	Nov 2001	B1
6323680	Pedersen et al.	Nov 2001	B1
6351142	Abbott	Feb 2002	B1
6359468	Park et al.	Mar 2002	B1
6362650	New et al.	Mar 2002	B1
6366944	Hossain et al.	Apr 2002	B1
6367003	Davis	Apr 2002	B1
6407576	Ngai et al.	Jun 2002	B1
6407694	Cox et al.	Jun 2002	B1
6438570	Miller	Aug 2002	B1
6453382	Heile	Sep 2002	B1
6467017	Ngai et al.	Oct 2002	B1
6480980	Koe	Nov 2002	B2
6483343	Faith et al.	Nov 2002	B1
6531888	Abbott	Mar 2003	B2
6538470	Langhammer et al.	Mar 2003	B1
6542000	Black et al.	Apr 2003	B1
6556044	Langhammer et al.	Apr 2003	B2
6557092	Callen	Apr 2003	B1
6571268	Giacalone et al.	May 2003	B1
6573749	New et al.	Jun 2003	B2
6574762	Karimi et al.	Jun 2003	B1
6591283	Conway et al.	Jul 2003	B1
6591357	Mirsky	Jul 2003	B2
6600788	Dick et al.	Jul 2003	B1
6628140	Langhammer et al.	Sep 2003	B2
6700581	Baldwin et al.	Mar 2004	B2
6725441	Keller et al.	Apr 2004	B1
6728901	Rajski et al.	Apr 2004	B1
6731133	Feng et al.	May 2004	B1
6732135	Samudrala et al.	May 2004	B1
6744278	Liu et al.	Jun 2004	B1
6745254	Boggs et al.	Jun 2004	B2
6774669	Liu et al.	Aug 2004	B1
6781408	Langhammer	Aug 2004	B1
6781410	Pani et al.	Aug 2004	B2
6788104	Singh et al.	Sep 2004	B2
6836839	Master et al.	Dec 2004	B2
6874079	Hogenauer	Mar 2005	B2
6904471	Boggs et al.	Jun 2005	B2
6924663	Masui et al.	Aug 2005	B2
6971083	Farrugia et al.	Nov 2005	B1
20010029515	Mirsky	Oct 2001	A1
20020089348	Langhammer	Jul 2002	A1
20030088757	Lindner et al.	May 2003	A1
20040064770	Xin	Apr 2004	A1
20040083412	Corbin et al.	Apr 2004	A1
20040178818	Crotty et al.	Sep 2004	A1
20040193981	Clark et al.	Sep 2004	A1
20050144215	Simkins et al.	Jun 2005	A1
20050166038	Wang et al.	Jul 2005	A1
20050187999	Zheng et al.	Aug 2005	A1
20070185951	Lee et al.	Aug 2007	A1
20070185952	Langhammer et al.	Aug 2007	A1
20080133627	Langhammer et al.	Jun 2008	A1

Foreign Referenced Citations (35)

Number	Date	Country
0 158 430	Oct 1985	EP
0 380 456	Aug 1990	EP
0 411 491	Feb 1991	EP
0 461 798	Dec 1991	EP
0 498 066	Aug 1992	EP
0 555 092	Aug 1993	EP
0 606 653	Jul 1994	EP
0 657 803	Jun 1995	EP
0 660 227	Jun 1995	EP
0 668 659	Aug 1995	EP
0 905 906	Mar 1999	EP
0 909 028	Apr 1999	EP
0 927 393	Jul 1999	EP
0 992 885	Apr 2000	EP
1 031 934	Aug 2000	EP
1 058 185	Dec 2000	EP
1 220 108	Jul 2002	EP
2 283 602	May 1995	GB
2 286 737	Aug 1995	GB
2 318 198	Apr 1998	GB
61 237133	Oct 1986	JP
7-135447	May 1995	JP
WO9527243	Oct 1995	WO
WO9628774	Sep 1996	WO
WO9708606	Mar 1997	WO
WO9812629	Mar 1998	WO
WO9832071	Jul 1998	WO
WO9838741	Sep 1998	WO
WO9922292	May 1999	WO
WO9931574	Jun 1999	WO
WO9956394	Nov 1999	WO
WO0051239	Aug 2000	WO
WO0052824	Sep 2000	WO
WO0113562	Feb 2001	WO
WO2005101190	Oct 2005	WO

Implementing division in a programmable integrated circuit device

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract

Description

Claims

US Referenced Citations (229)

Foreign Referenced Citations (35)