The present invention relates to logic circuits in user-programmable integrated circuits such as a field programmable gate array (FPGA). More particularly, the present invention relates to logic cell architecture for such programmable integrated circuits to provide improved support for compressors.
FPGAs are integrated circuits composed of logic cells connected by a programmable routing network. Typically, any logic cell output provided to the programmable routing network can be transmitted through the network to any logic cell input connected to the network. A common type of logic cell includes a K-input look-up table (LUT), multiplexers, and carry chain logic. Such a logic cell would have at least K inputs coming from the programmable routing network and a carry input coming from the carry output of a previous cell in a chain of logic cells.
One important property of an FPGA logic cell is how many of its inputs and outputs are provided by or to the programmable routing network. The versatility of a logic cell may increase as the number of inputs and outputs available from or to the programmable routing network increases but so does the complexity and cost of the FPGA. From a cost metric consideration, a logic cell architecture achieving desired functionality with fewer inputs and outputs required to be connected to the programmable routing network may be preferable to a different architecture that achieves the same functionality but requires more inputs and outputs to be connected to the programmable routing network.
Another important property of an FPGA logic cell is the number K of inputs to the LUT. The value K may range from 2 to 10 or more, but the most common values in practice are 3, 4 and 6. Some existing FPGAs use 3-input LUTs (K=3) or a 4-input LUT (K=4), which are best overall for low-cost, low-power FPGAs. Other existing FPGAs use a fracturable 6-input LUT (K=6). These can implement a wider variety of functions but consume more area and power. A more general way of defining K applicable to any logic cell is as the largest number such that for every function of K inputs, the logic cell can be configured to compute it.
FPGA logic cells may be used to form compressors. Compressors are a family of logic circuits that take multiple binary numerical values as inputs and produce their sum represented in fewer bits. For instance, the well-known full adder can also be referred to as a 3:2 compressor because it takes three one-bit inputs and produces a single bit sum and single bit carry-out output. (Collectively the two bits corresponding to the sum and carry-out indicate a count of the number of ones among the inputs). In some cases, compressors may be organized in a chain similar to a carry chain, with each compressor receiving one or more additional inputs from the previous compressor in the chain and generating the same number of additional outputs to the next compressor in the chain. An example is a 4:2 compressor.
Some prior art logic cell architectures require two logic cells to implement each 3:2 compressor: one to produce each of the two outputs. Other prior art logic cell architectures can implement a 3:2 compressor with a single logic cell at the cost of more required connections with the programmable routing network. A single logic cell architecture capable of implementing a 3:2 compressor with fewer required connections with the programmable routing network for power efficiency would be beneficial.
Some FPGA applications require large numbers of compressors. For example, binary neural networks are networks that quantize weights and activations with binary values instead of full precision values. Binary neural networks often require implementation of a function called “population count” or “pop count”. This function represents the number of ones among a set of M Boolean inputs as a log2(M)-bit binary output value. Pop counters are typically implemented using many compressors.
According to an aspect of the present invention, a logic cell for a programmable logic integrated circuit apparatus is presented. The logic cell includes K-input lookup table (LUT) circuitry including: i) a first (K−1)-input LUT and a second (K−1)-input LUT both sharing in common second through Kth inputs to the K-input LUT, each of the first and second (K−1)-input LUTs having an output; and ii) a first multiplexer having a first input coupled to the output of the first (K−1)-input LUT, a second input coupled to the output of the second (K−1)-input LUT, and a select input coupled to a first input of the K-input LUT circuitry. The first multiplexer provides a primary output Y of the logic cell, wherein Y is any independent function of the K inputs. The logic cell includes a carry circuit with an X multiplexer coupled to provide an output selected from one of a constant logic reference and the primary output Y; and an exclusive-OR gate providing a sum output S, wherein the exclusive-OR gate is coupled to receive the output of the X multiplexer as a first input and a carry-input as a second input.
In one embodiment, the logic cell for a programmable logic integrated circuit includes a programmable routing network, wherein the K-input lookup table circuitry receives the K inputs from the programmable routing network, and wherein the programmable routing network receives the primary output and the sum output.
In one embodiment, the logic cell includes K-input lookup table circuitry for providing the primary output and a carry circuit coupled to receive one or more signals from the lookout table and a carry-in input and which provides a carry-out output and sum output. In one embodiment, the carry-in input is not received from the programmable routing network and the carry-out output is not provided to the programmable routing network. In one embodiment, the carry-in input is received from a previous logic cell in a chain of logic cells, and the carry-out output is provided to a subsequent logic cell in the chain.
In one embodiment the logic cell can be selectively configured to provide the value of the carry-in input at the carry-out output while also providing any function of the K inputs at the primary output.
In one embodiment, the logic cell can be selectively configured to provide the value of the carry-in input at the sum output while also providing any function of the K inputs at the primary output.
In one embodiment, the logic cell can be selectively configured to provide one bit of an adder with the sum appearing at the sum output and a carry appearing at the carry-out output. In one embodiment, the logic cell can be selectively configured to implement the final bit of a multi-bit adder with the sum appearing at the sum output and the carry-out output selectively driven to a pre-determined value (e.g., 0 or 1).
In various embodiments K≤4 such as K=3 or K=4. In another embodiment, K≥4 such as 4 or 6. K is not limited to these values and may be any non-negative integer value.
A method includes the step of a) providing K-input lookup table (LUT) circuitry comprising: i) a first (K−1)-input LUT and a second (K−1)-input LUT both sharing in common second through Kth inputs to the K-input LUT, each of the first and second (K−1)-input LUTs having an output, and ii) a first multiplexer having a first input coupled to the output of the first (K−1)-input LUT, a second input coupled to the output of the second (K−1)-input LUT, and a select input coupled to a first input of the K-input LUT circuitry, the first multiplexer providing a primary output Y of the logic cell. The method includes the step of b) providing a carry circuit coupled to receive a carry-in input (CI) and to generate a carry-out output (CO) and a sum output (S), wherein the carry-out output is selectively independent of Y, wherein the carry circuit comprises: i) an X multiplexer coupled to provide an output selected from one of a constant logic reference and the primary output; and ii) an exclusive-OR gate providing the sum output, wherein the exclusive-OR gate is coupled to receive the output of the X multiplexer as a first input and the carry-input as a second input. The method includes the step of c) generating the carry-out output selectively independent of Y, and d) propagating the carry-in input to a selected one of the carry-out output and the sum output of the logic cell.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements and in which:
Persons of ordinary skill in the art will realize that the following description is illustrative only and not in any way limiting. Other aspects will readily suggest themselves to such skilled persons.
The present invention improves the efficiency of existing LUT-based FPGA logic cells at implementing accumulators and counters in conjunction in parallel with producing primary outputs which are a function of the LUT inputs.
In accordance with one aspect of the present invention, an illustrative LUT-based logic cell 190 that allows improved efficiency for counters and accumulators is shown in
The logic cell 190 of
The logic cell 190 of
A third multiplexer (“P multiplexer”) is shown in
A fourth multiplexer, i.e. a carry-out multiplexer (“CO multiplexer”), is shown in
A fifth multiplexer 86 (“X multiplexer”) in the carry circuit of the logic cell 190 has a first data input receiving the primary output, Y 48, of the first multiplexer 122 and a second data input coupled to a logic-low reference. Configuration circuitry 88 selects which one of the inputs of the X multiplexer will be passed to its data output denoted X 89.
An exclusive-OR gate 46 in the carry circuit 160 of the logic cell 190 has a first input coupled to receive the carry-in input CI 42, and a second input coupled to the data output X 89 of the X multiplexer 86. The output of the exclusive-OR gate is the sum output, S 50, of the logic cell 190.
In some implementations of the invention, either of the F0 or F1 inputs from LUTs 124a and 124b, and one of the logic reference inputs can be omitted from the G or P multiplexers (i.e., the second multiplexer 32 or third multiplexer 36). In various embodiments of a chain of logic cells, the carry circuitry of one or more consecutive logic cells may use a carry-lookahead, carry-select, or similar logic functionally equivalent to the carry function as illustrated in
The logic cell 190 of
The configurations of the P, G, and X multiplexers may vary between the logic cell 212 associated with the least significant bit, the logic cell 216 associated with the most significant bit, and the logic cells such as 214 associated with the intermediate bits of ADDER1210. The P, G, and X multiplexers are configured by the respective configuration circuitry 38, 34, 88, to ensure the proper signals are output by each logic cell within the adders. In particular, the X multiplexer of logic cells in adders is set to select Y so that the sum output S=EXOR(Y, CI), where “EXOR” is the Boolean exclusive OR function applied to the value set (Y, CI). In all but the last logic cell of an adder, the P multiplexer is set to select Y so that the carry-out output CO=CI, if Y=1 and CO=G if Y=0. This results in a ripple-carry chain propagating through the adder. However, in the last logic cell of an adder (such as logic cell 216 of ADDER1) the P multiplexer is configured to prevent the carry from propagating into the first logic cell of the next adder (such as logic cell 222 of ADDER2). In the last logic cell of an adder, the P multiplexer is set to select “0” so that CO=G and the G multiplexer is set to select either 0 or 1, whichever is the proper constant to initialize the carry chain of the next adder.
The P multiplexer and configuration circuitry 310 includes NOR gate 312 receiving the primary output Y and a first configuration bit, CFG0. NOR gate 312 is coupled to NOR gate 314. The output of NOR gate 312 is provided as one of the inputs to NOR gate 314. NOR gate 314 also receives a second configuration bit, CFG1. The output of NOR gate 314 is output P 39. The Boolean function for the P multiplexer and configuration circuitry 310 is P=((Y+CFG0)′+CFG1)′ or any equivalent (wherein the apostrophe indicates logical negation).
The X multiplexer and configuration circuitry 320 includes NAND gate 322 which receives the two configuration bits, CFG0 and CFG1. NAND gate 322 is coupled to NAND gate 324. NAND gate 324 receives the output of NAND gate 322 and the primary output Y as inputs. NAND gate 324 is coupled to inverter gate 326 such that the output of NAND gate 324 is provided to inverter gate 326. The output of inverter gate 326 is output X 89. In the illustrated embodiment, NAND gate 324 is followed by inverter gate 326 to achieve an AND function. In an alternative embodiment, gate 324 is replaced with an AND gate to dispense with the need for the inverter gate 326. The Boolean function for the X multiplexer and configuration circuitry 320 is X=Y·(CFG0·CFG1)′ or any logical equivalent (where the apostrophe indicates logical negation).
Truth table 330 illustrates the P 39 and X 89 outputs of the P multiplexer 36 and the X multiplexer 88, respectively of
As indicated in the second row of function table 400, the logic cell 190 of the present invention as illustrated in
As indicated in the third row of function table 400, the logic cell 190 of the present invention as illustrated in
As indicated in the fourth row of function table 400, logic cell 190 of
Another example of the versatility of the logic cell 190 of
The top full adder 614 of
The In1 and In2 inputs of logic cells 190-0 through 190-(N−1) are coupled to the N operand inputs U[0] through U[N−1] and V[0] through V[N−1], respectively, for the adder 700. The S outputs of each logic cell 190-0 through 190-N form the sum outputs Z[0] through Z[N] for each of the N+1 output bits of the adder. The logic cell 190-N need only pass its carry-in (CI) input to its sum (S) output and on to the programmable routing network. Advantageously, this leaves its LUT available to implement in parallel any unrelated function of K inputs.
The table 702 in
The method of
A logic cell for a programmable logic integrated circuit apparatus includes i) a K-input lookup table (LUT) circuit having a primary output Y and at least one additional output (F), such as F0 or F1 in
While aspects and applications of this invention have been shown and described, it would be apparent to those skilled in the art that many more modifications than mentioned above are possible without departing from the inventive concepts herein.
This application claims priority to U.S. Provisional Patent Application No. 63/191,774 filed on May 21, 2021, which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4873459 | Gamal et al. | Oct 1989 | A |
4910417 | Gamal et al. | Mar 1990 | A |
5017813 | Galbraith et al. | May 1991 | A |
5055718 | Galbraith et al. | Oct 1991 | A |
5095228 | Galbraith et al. | Mar 1992 | A |
5132571 | McCollum et al. | Jul 1992 | A |
5191241 | McCollum et al. | Mar 1993 | A |
5198705 | Galbraith et al. | Mar 1993 | A |
5440245 | Galbraith et al. | Aug 1995 | A |
5610534 | Galbraith et al. | Mar 1997 | A |
5724276 | Rose | Mar 1998 | A |
5781033 | Galbraith et al. | Jul 1998 | A |
7185035 | Lewis | Feb 2007 | B1 |
7268584 | Cashman et al. | Sep 2007 | B1 |
7372296 | Sood | May 2008 | B2 |
7430137 | Greene et al. | Sep 2008 | B2 |
7463061 | Greene et al. | Dec 2008 | B1 |
7804321 | Greene et al. | Sep 2010 | B2 |
7816946 | Hecht et al. | Oct 2010 | B1 |
7884640 | Greene et al. | Feb 2011 | B2 |
7919977 | Greene et al. | Apr 2011 | B2 |
7932744 | Greene et al. | Apr 2011 | B1 |
7932745 | Hecht et al. | Apr 2011 | B2 |
8352532 | Kostarnov | Jan 2013 | B1 |
8415650 | Greene et al. | Apr 2013 | B2 |
8447798 | Langhammer | May 2013 | B2 |
8981328 | Greene et al. | Mar 2015 | B2 |
9000807 | Greene et al. | Apr 2015 | B2 |
9103880 | Greene et al. | Aug 2015 | B2 |
9147836 | Greene et al. | Sep 2015 | B2 |
9514804 | Greene | Dec 2016 | B2 |
9991894 | Greene et al. | Jun 2018 | B2 |
10256822 | Greene et al. | Apr 2019 | B2 |
10361702 | Greene et al. | Jul 2019 | B2 |
10523208 | Hecht et al. | Dec 2019 | B2 |
10714180 | McCollum et al. | Jul 2020 | B2 |
10855286 | Greene et al. | Dec 2020 | B2 |
10936286 | Greene et al. | Mar 2021 | B2 |
10971216 | Greene et al. | Apr 2021 | B2 |
11023559 | McCollum et al. | Jun 2021 | B2 |
20160246571 | Walters, III | Aug 2016 | A1 |
20160315619 | Fan | Oct 2016 | A1 |
20200019375 | Pugh et al. | Jan 2020 | A1 |
20200150925 | Greene | May 2020 | A1 |
Entry |
---|
“Virtex-4 FPGA User Guide”, UG070 (v2.6), Xilinx, Inc., San Jose, CA, Dec. 1, 2008. |
B. Khurshid and R. N. Mir, “High efficiency generalized parallel counters for look-up table based FPGAs,” International Journal of Reconfigurable Computing, vol. 2015, Article ID 518272, 16 pages, 2015 (Year: 2015). |
J. Hormigo, M. Ortiz, F. Quiles, F. J. Jaime, J. Villalba and E. L. Zapata, “Efficient Implementation of Carry-Save Adders in FPGAs,” 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, pp. 207-210, 2009 (Year: 2009). |
M. Kumm and P. Zimpf, “Efficient High Speed Compression Trees on Xilinx FPGAs,” in Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV), pp. 171-182, 2014 (Year: 2014). |
PCT/US2021/060229, International Search Report and Written Opinion, European Patent Office, dated May 9, 2022. |
Thomas B Preusser et al, “Enhancing FPGA Device Capabilities by the Automatic Logic Mapping to Additive Carry Chains”, Field Programmable Logic and Applications (FPL), 2010 International Conference on, IEEE, Piscataway, NJ, USA, Aug. 31, 2010 (Aug. 31, 2010), pp. 318-325. |
Thomas B Preusser et al., “Mapping basic prefix computations to fast carry-chain structures”, Field Programmable Logic and Applications (FPL) 2009 International Conference on, IEEE, Piscataway, NJ, USA, Aug. 31, 2009 (Aug. 31, 2009), pp. 604-608, p. 605, left-hand column, paragraph 3—p. 607, left-hand column, paragraph 5; figures 1,2. |
Michael T Frederick et al, “Beyond the arithmetic constraint”, FPGA 2008 : Sixteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 24, 2008 (Feb. 24, 2008), pp. 37-46, XP058166147,DOI: 10.1145/1344671.1344679, ISBN: 978-1-59593-934-0, Monterey Beach Resort, Monterey, California, USA, Feb. 24-26, 2008, New York, NY : Association for Computing Machinery, US. |
Number | Date | Country | |
---|---|---|---|
20220376693 A1 | Nov 2022 | US |
Number | Date | Country | |
---|---|---|---|
63191774 | May 2021 | US |