The present invention relates to programmable devices, such as field-programmable gate arrays (FPGAs), and, in particular, to fast-carry arithmetic circuits for such devices.
A multi-bit adder is a circuit that receives two multi-bit binary input values and generates a multi-bit binary sum value corresponding to the sum of the two input values. Two conventional types of multi-bit adders are ripple-carry adders and look-ahead carry adders.
Ripple-carry adder 100 comprises four 1-bit adders 101-104 connected serially from LSB adder 101 to MSB adder 104. Each 1-bit adder receives three 1-bit input values Ai, Bi, and Ci and generates 1-bit sum bit SUMi and 1-bit carry bit Ci+1, according to Equations (1) and (2) as follows:
SUMi=Ai XOR Bi XOR Ci (1)
Ci+1=Ai·Bi+(Ai XOR Bi)Ci (2)
where the “XOR” operator is the logical “exclusive OR” function, the “·” operator is the logical “AND” function, and the “+” operator is the logical “OR” function. Note that Ci is the ith carry bit, which is received from the previous 1-bit adder, while Ci+1 is the (i+1)th carry bit, which is applied to the subsequent 1-bit adder.
If adder 100 is operated, in a stand-alone manner, as a 4-bit adder, then carry bit CO is 0, and carry bit C4 is the MSB of the resulting multi-bit sum. Alternatively, one or more instances of 4-bit adder 100 can be connected in series to form a multi-bit adder, in which case, carry bit C0 corresponds to carry bit C4 from the previous instance of 4-bit adder 100 (if there is one), and carry bit C4 is applied as carry bit CO to the subsequent instance of 4-bit adder 100 (if there is one).
Adder 100 is referred to as a “ripple-carry” adder, because the carry bits ripple through adder 100 in a serial manner. In particular, the carry-in bit (CIN in
To overcome the processing-speed limitations associated with ripple-carry adders, look-ahead carry adders may be used.
Like ripple-carry adder 100, look-ahead carry adder 200 comprises four 1-bit adders 201-204 connected serially from LSB adder 201 to MSB adder 204, where each 1-bit adder receives three 1-bit input values Ai, Bi, and Ci and two 1-bit values SUMi and Ci+1 are generated according to Equations (1) and (2). Unlike, ripple-carry adder 100, however, look-ahead carry adder 200 includes look-ahead carry generation logic 205, which generates carry bits C1-C4 in parallel with the processing of 1-bit adders 201-204.
Look-ahead carry adder 200 takes advantage of the fact that carry bit Ci+1 generated by the ith 1-bit adder has a value of 1 only (i) if both bits Ai and Bi are a 1 or (ii) if only one of bits Ai and Bi is a 1 and carry bit Ci from the previous 1-bit adder is also a 1. Thus, carry bit Ci+1 may be re-defined from Equation (2) according to Equation (3) as follows:
Ci+1=Gi+Pi·Ci (3)
where generate bit Gi and propagate bit Pi are defined according to Equations (4) and (5) as follows:
Gi=Ai·Bi (4)
Pi=(Ai XOR Bi) (5)
Substituting Equation (5) into Equation (1) yields an alternative formula for generating sum bit SUMi, according to Equation (6) as follows:
SUMi=Pi XOR Ci (6)
Substituting Equations (4) and (5) into Equation (2) to generate a formula for carry bit C1 yields Equation (7) as follows:
C1=G0+P0·C0 (7)
Substituting Equations (4), (5), and (7) into Equation (2) to generate a formula for carry bit C2 yields Equation (8) as follows:
C2=G1+P1·G0+P1·P0·C0 (8)
Continuing this pattern, formulas can be generated for carry bits C3 and C4 according to Equations (9) and (10) as follows:
C3=G2+P2·G1+P2·P1·G0+P2·P1P0·C0 (9)
C4=G3+P3·G2+P3··P2·G1+P3·P2·P1·G0+P3·P2·P1·P0·C0 (10)
Since (as indicated by Equations (4) and (5)) the propagate and generate bits, Pi and Gi, depend only on the input bits, Ai and Bi, and since (as indicated by Equations (7)-(10)) carry bits C1-C4 depend only on the propagate and generate bits, P0-P3 and G0-G3, and carry bit CO, the processing in look-ahead carry adder 200 can be implemented in the following three steps, where the operations within each step are implemented in parallel.
In the first step, each 1-bit adder implements Equations (4) and (5) to generate its propagate and generate bits, Pi and Gi, and provides those values to look-ahead carry generation logic 205. In the second step, look-ahead carry generation logic 205 implements Equations (7)-(10) to generate carry bits C1-C4. In the third step, each 1-bit adder implements Equation (6) to generate its corresponding sum bit SUMi.
In this way, 4-bit look-ahead carry adder 200 of
In one embodiment, the present invention is circuitry adapted to selectively operate in a look-up table (LUT) mode or an arithmetic mode. The circuitry comprises a LUT circuit and a control circuit. The LUT circuit has a plurality of memory cells and a decoder connected to receive signals based on data stored in the memory cells and having a plurality of multiplexers (muxes) configured in one or more decoder stages. The control circuit is connected to the LUT circuit and controls whether the circuitry operates in the LUT mode or the arithmetic mode.
In another embodiment, the present invention is circuitry adapted to selectively operate in a LUT mode or an arithmetic mode. The circuitry comprises a LUT circuit, a control circuit, and carry-out circuitry. The LUT circuit has a plurality of memory cells and a decoder connected to receive signals based on data stored in the memory cells. The decoder has a plurality of muxes configured in at least three decoder stages. The control circuit is connected to the LUT circuit and controls whether the circuitry operates in the LUT mode or the arithmetic mode. The carry-out circuitry is connected to a third decoder stage of the LUT circuit and selects the value of a carry-out signal propagated by the LUT circuit if the circuitry is operating in the arithmetic mode.
In yet another embodiment, the present invention is a method for using a LUT circuit to implement an arithmetic function having a plurality of inputs and a plurality of outputs. The LUT circuit comprises a plurality of memory cells and a decoder connected to receive signals based on data stored in the memory cells. The decoder has a plurality of muxes configured in one or more decoder stages. Data corresponding to a first output of the arithmetic function is stored in a first subset of the memory cells, wherein each memory cell in the first subset stores data corresponds to the first output of the arithmetic function for a different set of input values. Data corresponding to a second output of the arithmetic function is stored in a second subset of the memory cells, wherein each memory cell in the second subset stores data corresponds to the first output of the arithmetic function for a different set of input values. Each set of input values for the first subset is identical to a corresponding set of input values for the second subset. A current set of input values is applied as control signals for a plurality of muxes in the decoder to retrieve a first output value from the first subset and a second output value from the second subset and present the first and second output values as outputs for the arithmetic function.
Other aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments.
The layout of FPGA 300 comprises multiple instances of a limited number of different types of blocks of circuitry. For example, I/O ring 304 contains a number of instances of the same basic block of programmable I/O circuitry repeated around the periphery of the device. Similarly, each PLB 306 within logic core 302 may be implemented using a different instance of the same set of programmable logic circuitry. Moreover, among other types of programmable logic circuitry, each PLB may include one or more instances of a particular type of programmable logic referred to as a LUT4 circuit.
Although LUT4 circuit 400 has the three decoder stages shown in
If LUT4 circuit 400 is operated as a 4-bit LUT, then the four 1-bit control signals A, B, C, and D correspond to the 4-bit address (D C B A) of a particular one of the 16 SRAM cells 405, where D is the MSB and A is the LSB of the 4-bit address. As described in connection with
Control SRAM cell 515 stores a 1-bit control signal 516 that determines whether circuit 500 operates in a LUT mode or an arithmetic mode. In particular, if control signal 516 has a value of 0, then circuit 500 operates in the LUT mode, where mux 514 selects 1-bit LUT control signal C, and mux 517 selects 1-bit LUT control signal D. In that case, muxes 411-413 can operate, in combination with muxes 401-404, in the normal 4-bit LUT mode, as previously described in connection with
On the other hand, if control signal 516 has a value of 1, then circuit 500 operates in the arithmetic mode, where different instances of circuit 500 may be used to implement the 1-bit adders of a look-ahead carry adder having the same architecture as 4-bit look-ahead carry adder 200 of
In order to support the operations of a multi-bit look-ahead carry adder, SRAM cells are populated with the specific bit values shown in
The values stored in memory cells SRAM_0 through SRAM_7 are filled with the data bits shown in the following logic table corresponding to Equation (1):
Circuit 600 has all of the same elements as circuit 500 of
In the arithmetic mode, memory cells SRAM_0 to SRAM_7 and muxes 401-402, 411, and 413 operate identical to the corresponding elements in circuit 500 of
Equation (2) for the (i+1)th carry bit Ci+1 can be rewritten as Equation (11) as follows:
Ci+1=Ai·Bi+Ai·Ci+Bi·Ci (11)
According to one embodiment of the present invention, memory cells SRAM_8 to SRAM_11 are programmed with the results of Equation (11) for the four possible combinations of Ai and Bi assuming that carry-in bit ON is zero. In particular, if CIN is zero (i.e., Ci=0), then Equation (13) can be expressed as Equation (12) as follows:
Ci+1=Ai·Bi (12)
which corresponds to the following logic table, whose values are indicated in
Similarly, memory cells SRAM_12 to SRAM_15 are programmed with the results of Equation (12) for the four possible combinations of Ai and Bi assuming that carry-in bit CIN is one. In particular, if CIN is one (i.e., Ci=1), then Equation (13) can be expressed as Equation (13) as follows:
Ci+1=Ai+Bi (13)
which corresponds to the following logic table, whose values are indicated in
In that case, mux 403 selects the value of carry-out bit Ci+1 for the current Ai and Bi values assuming a carry-in bit value of Ci=0, while mux 404 selects the value of carry-out bit Ci+1 for the current Ai and Bi values assuming a carry-in bit value of Ci=1. In addition to being applied to mux 412, the outputs from muxes 403 and 404 are also applied to mux 610, which uses the actual carry-in bit value CIN to select the appropriate carry-out bit value COUT.
When circuit 600 is operating in arithmetic mode, COUT is generated by mux 610 under the control of CIN. Signal 621, which corresponds to the output of mux 412 is generated under the control of signal 622. Signal 622 corresponds to the output of mux 514, which is controlled by signal 516. When circuit 600 is operating in arithmetic mode, signal 516 corresponds to a 1 resulting in mux 514 selecting CIN for output as signal 621. As such, muxes 412 and 610 are both controlled by CIN when circuit 600 is operating in arithmetic mode. Because the input signals to muxes 412 and 610 are identical, muxes 412 and 610 are logically equivalent structures when circuit 600 is operating in arithmetic mode. In an alternative embodiment, these two muxes may be implemented as a single mux without departing from the spirit and scope of the present invention.
Using circuit 600 of
Moreover, because circuit 600 is programmable, it can be used to implement functions other than 4-input look-up tables and multi-bit binary adders. For example, with a value of 0 stored in control SRAM 515, circuit 600 can be used to implement two different 3-input functions of A, B, and C, where the different results associated with the first function are stored in memory cells SRAM_0 to SRAM_7 and the different results associated with the second function are stored in memory cells SRAM_8 to SRAM_15, and control signal D determines whether the output signal OUT corresponds to the first function or the second function.
Circuit 600 can also be configured in arithmetic mode to implement multi-bit functions of A and B having carry bits other than multi-bit adders, such as multi-bit subtractors, counters, and comparators, where control SRAM 515 stores a value of one (to select the arithmetic mode), memory cells SRAM_0 to SRAM_3 store the output bit values for a carry-in bit value CIN=0, memory cells SRAM_4 to SRAM_7 store the output bit values for a carry-in bit value CIN=1, memory cells SRAM_8 to SRAM_11 store the carry-out bit values COUT for a carry-in bit value CIN=0, and memory cells SRAM_12 to SRAM_15 store the carry-out bit values COUT for a carry-in bit value CIN=1. As in the multi-bit adder of
Circuit 600 can also be programmed to implement four different functions of A and B, where control SRAM 515 stores a value of one, the different results for the first function are stored in memory cells SRAM_0 to SRAM_3, the different results for the second function are stored in memory cells SRAM_4 to SRAM_7, the different results for the third function are stored in memory cells SRAM_8 to SRAM_11, the different results for the fourth function are stored in memory cells SRAM_12 to SRAM_15, the value of CIN determines whether the first and third functions or the second and fourth functions are selected by muxes 411 and 610, respectively, where output signal OUT corresponds to the first and second functions, while output signal COUT corresponds to the third and fourth functions. Note that COUT may be applied to the CIN input of the next instance of circuit 600.
Circuit 700 receives input signals A, B, C, CIN, and D, as discussed above with reference to
Fast-carry generation signal PROP is defined as (A XOR B), as noted above in Equation (5). This particular function corresponds to the output of MUX 401, which selects its output based upon input signals A and B. The output of mux 401 corresponds to contents of the selected SRAM cell from memory cells SRAM_0 through SRAM_3. As noted above in reference to
Second circuit 802 generates the second bit of the two-bit addition. Input signals A1, B1, and CIN1 are used by second circuit 802 to generate output signals SUM1, PROP1, and COUT1. SUM1 signal is the output of adder 800 that corresponds to the MSB of the two-bit sum. PROP1 corresponds to the PROP signal generated within second circuit 802. PROP1 and COUT1 are both transmitted to look-ahead carry generation logic 803 for use in generating fast carry-out signal FCOUT. Look-ahead carry generation logic 803 receives the above signals from first circuit 801 and second circuit 802 as well as fast-carry input signal FCIN, which is typically identical to CIN0. Look-ahead carry generation logic 803 uses these signals to generate output signals COUT and FCOUT, which may be used as CIN and FCIN, respectively, by a subsequent instance of two-bit adder 800 in a multi-bit adder.
COUTi=Ai·Bi+(Ai XOR Bi)·CINi (14)
where Ai and Bi correspond to the ith bits of the two numbers being added together and CINi corresponds to the carry-in bit to the ith stage of the multi-bit adder.
From equation (5), the ith PROP signal is defined as:
PROPi=(Ai XOR Bi) (15)
Substituting Equation (15) into Equation (14) yields the following relationships for the COUT and PROP signals of
COUT0=A0·B0+PROP0·CIN0 (16)
COUT1=A1·B1+PROP1·CIN1 (17)
Because COUT0 is the same signal as CIN1, equation (17) may be rewritten as:
COUT1=A1−B1+PROP1·(A0·B0+PROP0·CIN0) (18)
COUT1 may then be expanded as follows:
COUT1=A1·B1+PROP1·A0·B0+PROP1·PROP0·CIN0 (19)
CIN0 corresponds to the FCIN signal to look-ahead carry generation logic 803. From Equation (19), if A0≈B0 and A1≈B1, then A0·B0=0, A1·B1=0, and, according to Equation (15), PROP0 and PROP1 are both 1, and Equation (19) reduces to COUT1=CIN0=FCIN.
As such, look-ahead carry generation logic 803 is useful in speeding up generation of the COUT signal when the PROP0 and PROP1 signals are both 1 since only when the PROP0 and PROP1 signals are both 1 does the COUT signal depend upon the arrival of the CIN0 signal from an earlier two-bit stage in a multi-bit adder in order to properly determine the correct value for the COUT signal. Otherwise, the COUT signal is the COUT1 signal generated by first and second circuits 801-802.
Look-ahead carry generation circuit 803 implements the above logic by controlling mux 901 with the output of AND gate 902. AND gate 902 asserts a logical 1 if both PROP1 and PROP0 signals are a 1. In that case, mux 901 selects the FCIN signal for both FCOUT and COUT (via buffer gate 903). HAND gate 902 asserts a logical 0, then mux 901 selects COUT1 from circuit 802.
As noted above, input signals CIN0 and FCIN are logically the same signal. These input signals correspond to the carry-in input signal to first circuit 801. Output signals COUT and FCOUT are also logically the same signal. These two output signals correspond to the carry-out signal from second circuit 802. In the embodiment disclosed within
The present invention has been described in the context of circuitry based on a LUT4 circuit. Those skilled in the art will understand that the present invention can also be implemented in the context of circuitry based on other types of LUT circuits, such as a LUT5 circuit having 32 memory cells and 5 input signals. Such LUT circuits can be used to implement analogous functions such as multi-bit adders and the like, where only a portion of the LUT circuit is used. Alternatively, larger LUT circuits can be used to implement even faster functions. For example, larger LUT circuits could be used to implement a multi-bit adder, where each stage in the multi-bit adder adds two 2-bit (or larger) values.
In comparison to the look-ahead carry adder based on circuit 500 of
Although the present invention has been described in the context of LUT circuits having SRAM memory cells, those skilled in the art will understand that the present invention can be implemented using other types of memory cells.
Depending on the particular implementation, a circuit of any of
Although the present invention has been described in the context of FPGAs, those skilled in the art will understand that the present invention can be implemented in the context of other types of programmable devices, such as, without limitation, programmable logic devices (PLDs), mask-programmable gate arrays (MPGAs), simple programmable logic device (SPLDs), and complex programmable logic devices (CPLDs). More generally, the present invention can be implemented in the context of any kind of electronic device having programmable elements.
In general, the present invention may be implemented as circuit-based processes, including possible implementation as a single integrated circuit (such as an ASIC or an FPGA), a multi-chip module, a single card, or a multi-card circuit pack. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.
The use of figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as necessarily limiting the scope of those claims to the embodiments shown in the corresponding figures.
Although the elements in the following method claims, if any, are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
Number | Name | Date | Kind |
---|---|---|---|
6476634 | Bilski | Nov 2002 | B1 |
7185035 | Lewis et al. | Feb 2007 | B1 |
7358765 | Verma et al. | Apr 2008 | B2 |
20070244958 | Redgrave | Oct 2007 | A1 |