Addition is common in digital design, and so modern FPGAs have circuitry dedicated to implementing this functionality. Rather than using pure lookup tables (LUTs) to implement addition, FPGAs are often augmented with circuitry dedicated to the efficient implementation of adders. Typically, full adders (e.g., each having inputs A, B and carry in, and outputs carry and sum) are connected in one of two ways to implement wider adders.
One simple way to implement wider adders is to add dedicated routing from the carry out of a full adder to the carry in of another full adder directly, which can be used to implement a fast ripple carry adder (RCA). The critical path through a ripple carry adder is dominated by the ripple carry path, which grows linearly with the width of the adder that relates to the bit widths of the inputs of the adder and the bit width of the output of the adder. This type of adder is typically quite fast when designed for adding low bit widths but can become quite slow for high bit widths because of the resultant long delays through the lengthy ripple carry path.
Another alternative used in FPGAs to implement wider adders is adding dedicated carry lookahead adder (CLA) circuitry with a fixed block size (K) in a logic block cluster. Block size relates to width or bit width of the block, and more specifically to bit widths of inputs and/or output(s) of a block. This carry look ahead adder circuitry is used to pre-compute whether a group of full adders each of block size K will ignore the incoming carry in, propagate the incoming carry in, or generate a carry out regardless of the value of the carry in. This CLA circuitry speeds up the ripple path, which has a critical path that scales linearly with number of bits/K. The choice of K is a tradeoff that FPGA architects must make up front. A larger value of K will provide better performance for wide adders, but will incur a higher fixed area penalty.
Additional work has shown that the LUTs and adders on FPGAs can be used to implement complex parallel prefix adders, which can be faster for very high bit widths. However, because there is no architectural support for these structures, there is significant area overhead to doing this in a typical FPGA.
Embodiments described herein implement a class of fast carry-skip adders using a combination of existing RCA adder circuitry, which is modified to make propagate and generate signals routable, and soft logic. Techniques described herein allow fast carry-skip adders to be created with variable block size with minimal architecture modifications. In one embodiment, the architecture modifications do not dictate the block size, so the block size(s) that form an adder are decided at compile time, as a trade-off between area and speed. Larger block sizes lead to higher area overhead, while lower block sizes lead to lower area overhead. For low bit-width adders, a standard RCA can be implemented to avoid any soft-logic area overhead.
One embodiment disclosed herein is an adder implemented in a field programmable gate array (FPGA). The adder has a first ripple carry block, for least significant bits of the adder. The adder has a plurality of carry skip adder blocks of differing block sizes. Each block size relates to a bit-width of input to a block. The plurality of carry skip adder blocks is for a plurality of bits of the adder. The adder has a second ripple carry adder block, for most significant bits of the adder.
One embodiment disclosed herein is a computer aided design (CAD) method that is practiced by a CAD system. The method includes receiving instruction to implement an adder in a field programmable gate array (FPGA), and generating the adder in a format for programming the FPGA. The adder includes a first ripple carry block, for least significant bits of the adder. The adder includes a plurality of carry skip adder blocks of differing block sizes, for a plurality of bits of the adder. Each block size relates to bit-width of input to a block. The adder includes a second ripple carry block, for most significant bits of the adder.
One embodiment disclosed herein is a tangible, non-transitory, computer-readable media that has instructions thereupon. When the instructions are executed by a processor, this causes the processor to perform a method. The method includes receiving instruction to implement an adder in a field programmable gate array (FPGA), and programming the FPGA to implement the adder. The adder includes a first ripple carry adder block, for least significant bits of the adder. The adder includes a plurality of carry skip adder blocks of differing block sizes. Each block size relates to bit-width of input to a block. The plurality of carry skip adder blocks is for a plurality of bits of the adder. The adder includes a second ripple carry adder block, for most significant bits of the adder.
In one embodiment, the area/speed tradeoff can be decided as follows:
Adder embodiments disclosed herein have one or more of the following advantages compared to using a hardened carry lookahead adder:
Embodiments described herein will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
In the following description, numerous details are set forth to provide a more thorough explanation of the present embodiments. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present embodiments.
Techniques are described herein for creating a class of fast carry-skip adder structures on FPGAs with low area overhead versus plain ripple carry adders (RCA) using a modified version of the standard hardened RCA that drives the routing fabric with the propagate and generate signals.
In some embodiments, the full adder, implemented using a 4-LUT 104 in
With reference to the carry skip adder embodiments in
In the embodiment shown in
In one embodiment, in terms of block size choice, the adder structure can be chosen by the user by specifying whether the CAD tool should focus more on area or performance (which is a global option that affects the whole design), with a parameterized adder module that the user can instantiate in their design (e.g., the user can specify parameters that control the structure of the adder), using physical synthesis techniques to start with the area optimized adder, then modify the block sizes to target speed only for adders on the critical path.
Thus, as described above, the carry skip adder structure(s) are implemented efficiently on an FPGA using a mix of hardened resources and soft logic/routing. Included in the range of embodiments are at least the following features, and the capability of a CAD system to generate adder implementations that have various combinations of these features.
An adder structure having two or more of the preceding features.
Further features that various embodiments have in various combinations are as follows.
In various embodiments, synthesis creates the entire adder as one block, in a hierarchical structure that has blocks within blocks. For example, if instructed to implement a 32 bit adder, the CAD tool 804 creates all the block sizes that are used to create a carry skip version of the adder. In some embodiments, the CAD tool 804 explores trade-offs, for example the bigger the block size, the longer it takes to create group, generate and propagate signals. Returning to the example of a 32 bit adder, the CAD tool 804 could split the design into four groups of eight or eight groups of four, and analyze critical path, then select which of the two possibilities is optimal for timing of carry. The CAD tool 804 could determine timing for a four bit ripple adder, and compare timing for a four bit carry skip adder. Such comparisons can be performed for various stages of an adder, with various combinations of block sizes.
It has been found that, as the size and width of the adder increases, the time it takes to compute for the carry scales sub-linearly. And, comparing critical path for a ripple carry adder, the time it takes to compute for the carry scales in a linear relationship with the width of the adder. Accordingly, it has been found that, below a certain bit width, a ripple carry adder is fastest. Such a bit width could be used as a threshold value, in the CAD tool 804. Instructed to implement an adder of a bit width below or equal to the threshold value, the CAD tool 804 could implement a ripple carry adder. Greater than that, the CAD tool 804 can implement an adder that begins and ends with a ripple carry adder, i.e., one ripple carry adder for the lower bits, and another ripple carry adder for the upper bits, and has a carry skip adder, or multiple carry skip adder blocks of various block sizes, for the middle bits.
At the beginning, the CAD tool 804 can start with a low block size, for example a block size of two. Then there is an additional threshold where it makes sense, analytically, to increase the block size for the next block(s) and still be below the delay to keep up with the ripple through the critical path of the carry. This is what is meant by hiding the general routing delay, in various embodiments. Delay for the carry generate and carry propagate signals for a given carry skip adder block are compared to delay along the critical path of the carry for the assembled adder, then acceptable block size for that carry skip adder block (and sub-critical delay for block carry generate and block carry propagate signals) is determined based on this comparison.
At some point, for example about midway through the adder, it is possible that adding a large block would create a new critical path to generate sum bits. Adding a smaller block, which takes less delay to generate the later or last sum bits of the adder avoids this new critical path. The CAD tool 804 could proceed in this direction, generating smaller block sizes towards the more significant bits of the adder. Then the final bits for the adder could be implemented with another ripple carry adder, which would be faster than another carry skip adder block. Past the middle of the adder, the CAD tool 804 could create smaller block sizes and keep reducing block size because there is less delay that can be masked by the end of the ripple.
Some embodiments of the CAD tool 804 optimize block sizes of an adder implemented with variable block sizes by balancing the delay through ripple in the carry chain and delay in the block carry generate and block carry propagate signals. Using larger block sizes means fewer stages of ripple in the critical path of the carry, which speeds up the carry propagation but makes the sum generation slower.
One embodiment of the CAD tool 804 looks at each bit of the adder and determines how to compute the sum for the next group of bits, e.g., will that be one bit at a time, two bits at a time, three or four bits at a time, etc. Two factors go into the decision, one is to have enough delay to generate the group generate signal earlier than the delay that has been accumulated thus far in the critical path for the carry. The other factor is generation of the sum bits taking into account ripple through a link which is through general-purpose routing. It is acceptable to make some signals slower because they are not in the critical path, and that dictates how big a block size may be. There is an outward constraint and an input constraint. Towards the more significant bit end of an adder, the sum bits might be slowed down and become the critical path. From an algorithm point of view, one determination is whether it is creating a new critical path by generating a block, and if so, then try a smaller block.
Some portions of the detailed descriptions above are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims which in themselves recite only those features regarded as essential to the invention.
This application claims benefit of priority from U.S. Provisional Application No. 63/144,875, titled DYNAMIC BLOCK SIZE CARRY-SKIP ADDER CONSTRUCTION ON FPGAS BY COMBINING RIPPLE CARRY ADDERS WITH ROUTABLE PROPAGATE/GENERATE SIGNALS and filed Feb. 2, 2021, which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63144875 | Feb 2021 | US |