The present invention relates to semiconductor integrated circuits and more specifically to an arithmetic multiplier.
Semiconductor integrated circuits often include large trees of combination logic gates for performing Boolean and arithmetic functions, such as a multiplication function. In a multiplication function, a binary multiplicand is multiplied by a binary multiplier to produce a binary product. A typical multiplier circuit includes a partial product generator, an adder tree and a final adder. In a typical partial product generator, for each bit of the multiplier, the multiplicand is shifted the appropriate number of bits and multiplied by the value of the digit in that bit of the multiplier to obtain a partial product. The partial products are then added by the adder tree and the final adder to obtain a final product. If the multiplier or the multiplicand has a large number of bits, the partial products addition stage can become very large and complex. Also, the addition of a large number of partial products can be slow.
In addition, as the sizes of transistors on integrated circuits continue to become smaller with new fabrication technologies, the voltage supply levels that drive the transistors are also reduced to prevent damage of the small transistors. This limits the number of transistors that can be connected in series with one another to perform the logical functions in the partial products addition stage, which limits the maximum number of inputs to each logic gate in this stage. The maximum number of inputs is based on the magnitude of the supply voltage and the voltage drop across each transistor in the gate. For example, a given semiconductor technology may limit the number of inputs to a logic AND gate to three bits. This significantly increases the complexity of arithmetic circuits having a large number of input bits since small groups of bits must be combined in multiple logic levels.
The complexity of a logic tree significantly increases with the number of input bits and with more complex logical functions, such as those performed in signed and unsigned binary multiplication. Large logic trees therefore consume large areas on integrated circuits, consume large amounts of power and can have long critical path propagation delays.
Simplified multiplier circuits are therefore desired for performing multiple-bit binary multiplication functions with lower complexity and faster computational speed.
One embodiment of the present invention is directed to a method for multiplying a multiplicand by a multiplier. The method includes generating a plurality of partial products, wherein each partial product has a plurality of bits having respective binary weights and wherein each bit can have a first or second logic state. A first set of multiple-bit columns is formed from bits of the plurality of partial products, wherein the bits in each column of the first set have the same binary weight. Each multiple-bit column in the first set is encoded into a respective modified partial product, which represents a number of bits in the column having the first logic state.
Another embodiment of the present invention is directed to a method of adding a plurality of partial products, wherein each partial product has a plurality of bits having respective binary weights and wherein each bit can have a first or second logic state. The method includes forming a first set of multiple-bit columns from bits of the plurality of partial products, wherein the bits in each column of the first set have the same binary weight. Each multiple-bit column in the first set is encoded into a respective modified partial product, which represents a number of bits in the column having the first logic state.
Another embodiment of the present invention is directed to a multiplier circuit having a partial products generator and an adder. The partial products generator has a multiplicand input, a multiplier input and a plurality of partial product outputs. Each partial product output has a plurality of bits having respective binary weights and which can have a first or second logic state. The adder forms a first set of multiple-bit columns from bits of the plurality of partial product outputs, wherein the bits in each column of the first set have the same binary weight. The adder encodes each column in the first set into a respective modified partial product, which represents a number of bits in the column having the first logic state.
In one embodiment of the present invention, a method and apparatus are provided for multiplying binary words using a Radix-4 modified Booth recoding algorithm in which columns of the partial products are recoded to binary numbers in parallel using shifters. Each bit column of the original binary partial products form a new partial product, which represents the number of logic “1's” in that bit column. This process can be iterated until the number of partial products is reduced to a desired number, which can be added quickly to form the final product. The recoding of each bit column can be performed by packing or shifting all the logic 1's to one end or the other in the bit column and detecting the number of 1's by the logic pattern formed by the packed bit column.
Although embodiments of the present invention are described within the context of a Radix-4 modified Booth recoding algorithm, the recoding and logic reduction functions that are used to add the partial products can be used in any other multiplication algorithm in which multiple partial products are generated. Other Radix values can be used, and Booth recoding is not necessary. Radix-4 multiplication and modified Booth recoding are simply examples of methods that can be used to reduce the number of partial products that are generated.
1. Use of Radix-4 Binary Multiplication
The number of partial products can be reduced by using a higher Radix.
2. Booth Recoding
Booth recoding is based on the simple observation that a binary “1111” is equal to a binary “10000” minus a binary “1”. Thus if the multiplicand has a string of 1's, such as 1111, four partial products (one for each bit) are not necessary. Long strings of 1's can be skipped by substituting the equivalent value of the multiplicand, such as “10000” and “−1” and then generating only two partial products. This is what is referred to as the Booth Algorithm and has been implemented in software multipliers (multiplier implemented as a software function) and early hardware multipliers, such as the Intel 8087 and 80287 integrated circuit chips.
As series multipliers were replaced by parallel multipliers, the Booth Algorithm was modified to be more efficient. With modified Booth recoding, the multiplier is recoded to eliminate more complex calculations, such as multiplying by three. For example, the multiplier can be recoded to multiply by ±1, ±2 or 0. A multiplication by 3 can be recoded into a multiplication by 4 added to a multiplication by −1 (e.g., 4A+−1A=3A, where 4A is simply A shifted to the left by one bit). This recoding reduces the number and complexity of the partial products since the values of ±1A, ±2A and 0A can be calculated by simple inversion and shifting.
The multiplier is divided into overlapping partial multipliers, wherein the multiplier bits are grouped in blocks of three, as shown in
The grouping starts from the least significant bit. In this example, there are two blocks, labeled 40 and 41. The first bit in block 40 is assumed to be “0” since it is the first block, and the second block 40 has a carry of “1”.
When scanning each group (partial multiplier) from right to left, if the bit pattern changes from 1 to 0, the algorithm adds +A to the partial product. If the bit pattern changes from 0 to 1, the algorithm adds −A to the partial product. This algorithm can then be used to generate a truth table for modified Booth recoding.
Table 1 is a truth table, which identifies the recoding of partial products for each 3-bit overlapping block (BITi+1, BITi, BITi−1) for modified Booth recoding.
3. Shift and Recode Multiplier
Even with the use of a higher Radix and modified Booth recoding the number and size of partial products that are generated when multiplying numbers having a greater number of bits can become large. Addition of these partial products can become complex and can consume a large area on the integrated circuit, particularly if the bits can be combined in only small groups. In one embodiment of the present invention, these partial products are added by a shift an recode method, which significantly reduces complexity and speeds computation time.
Partial products generator 56 receives the multiplicand from register 51 and the multiplier from register 52 and generates a plurality of partial products by multiplying the multiplicand by digits of the multiplier. Partial product adder 58 adds the partial products to generate a pair of carry-sum-like values, which are added together by final adder 60 to produce the product on output 56. In an alternative embodiment, the partial products are reduced to a single product in adder 54 such that final adder 55 is not used.
The partial products can be generated in any suitable manner. The number of partial products is reduced in one embodiment of the present invention by using a Radix-4 modified Booth algorithm. However, any other type of partial product generation algorithm can also be used in accordance with the present invention. For example, other Radix values can be used, and the partial products can be generated by other recoding algorithms or methods similar to traditional longhand multiplication, for example.
With a Radix-4 modified Booth recoding algorithm, the number of partial products is reduced by overlapping 3-bit segments of the multiplier in register 52. The segments of the multiplier are generated by concatenating a “0” on the right (least significant bit) side of the multiplier. The resultant digital word is then divided into 3-bit segments, wherein each segment overlaps its neighboring segment by one bit. For example with a 16-bit multiplier there will be eight segments, identified by bits 1:0, 3:1, 5:3, 7:5, 9:7, 11:9, 13:11, and 15:13. Each of these segments is then used to generate a partial product according to Table 1. For example, if a 3-bit segment is “011”, the partial product will be two times the multiplicand (2A), which is simply the multiplicand shifted left by one bit.
As described above, these partial products can be pre-computed. For example, if the following binary values:
Given these pre-computed values, the partial product generator can easily generate the actual partial products by selecting from the pre-computed values based on the 3-bit overlapping segments of the multiplier.
In the above-example, the segments are, 000 (including the concatenated “0”), 110, 011, 000, 010, 100, 001 and 000.
The value of the 3-bit multiplier segment determines which of the pre-computed partial products is coupled to multiplexer output 68. The resulting partial products 68 are shifted to the left by an appropriate number of bits, which is determined by the binary weight of the 3-bit multiplier segment on the multiplier. In the case of a 16-bit multiplier, nine partial products will be produced.
Because of the 3-bit overlapping segmentation of the multiplier, partial products 68 are shifted by two additional bits for each successive segment. A “1” is added to the least significant bit of the partial product when an inverted multiplexer input (−A or −2A) is selected. Additional bits are added conditionally at the most significant bit end of the partial products to take care of sign extension.
Traditionally, the resulting partial products are added together with a full adder tree, such as a Wallace Tree, to reduce the number of partial products down from nine to two.
For multiply and accumulate operations, two additional sum and carry terms, 78 and 79, can be supplied by an accumulator for adding to the partial products 68. An accumulator allows multiply-accumulate operations to be done without having to wait for the final adder. This adds a further layer to tree 70. A final adder can be used to produce the final product from outputs 74 and 76.
In contrast to the full adder approach shown in
The “x's” in rows 100, 101 and 102 represent binary digits in the multiplication process, wherein each “x” can be either a “1” or a “0” depending on the values of the multiplicand and the multiplier. Rows 100, 101 and 102 are aligned horizontally such that the digits in each column 104 have the same binary weight.
In one embodiment of the present invention, each bit column of partial products 100-102 is recoded to a binary number representing a count of the 1's in the corresponding column. Each of these binary numbers becomes a new partial product 110.
For example bit column 14 forms a binary word “1000001100” having three binary symbols with the value “1”. As described in more detail below, the number of binary 1's in each column are counted in parallel using shifters, for example. The count of three is converted to a binary value “0011”, which becomes one of the partial products 110, represented by arrow 112.
Bit column 15 has four 1's, therefore modified partial product 114 has the binary value “0100”. Each of the modified partial products 110 is arranged in
Modified partial products 110 can be visually rearranged to form four partial products 115-118, by simply “dropping” the bits in each column as shown by arrow 119. For example, the bits in modified partial product 112 are shown by box 120. The bits can be rearranged in a variety of other manners in alternative embodiments of the present invention.
The process of shifting and recoding is then iterated until the number of partial products is reduced to a desired number, such as two. For a 16 bit multiplier, this will take three stages in one embodiment of the present invention.
For example, column 16 has three bits “001” with one binary “1”, which are recoded into the binary value “01” to form a further modified partial product 131. Column 17 has four bits “0100” with one binary “1”, which are recoded into the binary value “001” to form a further modified partial product 132. The maximum number of bits in any column of the further modified partial products 130 is now three. The bits in each column can then be visually rearranged to form three partial products 133, 134 and 135. For example, the bits of partial products 130 in column 16 are “dropped” along arrow 136, as can be seen by box 137.
The three further modified partial products 133, 134 and 135 can then be shifted and recoded in a similar fashion to form two partial products. Then, a final adder, such as adder 60 shown in
Other implementations can also be used. For example, instead of using three stages (reductions from 11-to-4, 4-to-3, and 3-to-2), two 4-to-3 stages can be used in parallel, followed by a 6-to-3 stage and then a 3-to-2 stage. Also, one or more of these stages can be mixed with traditional full adder cells, since a full adder cell is fairly efficient in a 3-to-2 conversion.
4. Counting the Number of 1's in Each Bit Column
Any method can be used to count the number of “1's” in each bit column of a plurality of partial products. In one embodiment of the present invention the number of 1's is counted by shifting or “packing” the 1's in each column to one end of the column or the other and then detecting the bit position of the left-most (or right-most) “1”. This bit position represents the number of 1's in the binary word formed by each bit column of the partial products.
A variety of methods can be used to shift the bits in each column.
At step 201, word 220 is divided into groups of adjacent bits to form a first level of multiple-bit groups 226. In this example, each of the groups 226 includes two or three bits. The number of bits in each group can be set based convenience and on a maximum number of transistors that can be connected in series with one another in the technology in which the integrated circuit will be fabricated. Each group 226 can include any number of bits in alternative embodiments of the present invention.
For each group 226 in row 201, a packing circuit packs all of the 1's toward the right end, for example, of that group. This forms a modified first level group 228, shown in row 202, for each of the first level groups 226. For example, the left-most group 226 has a logic pattern “100”. Arrow 229 illustrates the shifting or packing of a “1” in the third bit position of group 226 to the first bit position in group 228.
Row 203 represents the output of a second logic level in the packing circuit. The packing circuit groups adjacent to pairs of modified first level groups 228 into larger second-level groups 230. Each second level group 230 has 5-bits. Again, for each second level group 230, the packing circuit packs any of the bits in that group having a logic “1” state toward the right end of that group, and all bits having a logic “0” toward the left end of that group to form a modified second level group 232 in row 203.
The packing circuit groups adjacent to pairs of the modified second-level groups 232 into a larger third-level group 234. Row 204 represents the output from the third-level of logic in the packing circuit. Again, all bits having a “1” state are packed toward the right side of group 234.
After the packing operation is complete, bit column word 240 includes a first set 242 of contiguous bit positions having a logic high state, and a second set 243 of bit positions having a logic low state. With all of the logic high states packed to the right, the number of logic high states can be easily detected by detecting the bit position of the left-most “1” in modified data word 240 by an appropriate encoding circuit. In this example, modified data word 240 has three 1's, which is encoded into a binary “0011”, as shown by row 205. This binary value then forms one of the modified partial products, such as modified partial product 112 shown in
The process shown in
The remaining figures illustrate examples of circuits that can be used by the packing circuit to shift the 1's in each partial product bit column and encode its result into a binary word to form a modified partial product.
First stage outputs B9:B0 are arranged in two 5-bit groups 230 having bits B9:B5 and B4:B0, which are provided as inputs to multiplexers 270 and 271. Multiplexer 270 has three 5-bit inputs 272, which are coupled to bits B4:B0 unshifted, bits B4:B0 shifted to the right by one bit, and bits B4:B0 shifted to the right by two bits, respectively. Select input 273 controls selection between inputs 272 has a function of shift control signals decoded from bits B1 and B0. The selected input 272 is coupled to output 274 to provide a second stage output C4:C0. The shift control inputs are decoded from B1 and B0 according to the following logic equations, for example:
Similarly, multiplexer 271 includes three 5-bit data inputs 275, shift control inputs 276 and output 277. Data inputs 275 are coupled to B9:B5 unshifted, B9:B5 shifted to the right by 1 bit position, and B9:B5 shifted to the right by two bit positions, respectively. Switch control inputs 276 are decoded from bits B6 and B5 in a similar fashion as the decoding of bits B1 and B0 shown above. Multiplexer output 277 provides second stage outputs C9:C5.
Second stage outputs C9:C0 correspond to groups 232 of row 203 shown in
Output 284 provides 10 bits, D9:D0 having all binary 1's shifted to the right, similar to row 204 in
As discussed with respect to
Similar circuitry can be used in parallel with one another for counting the number of 1's in each bit column of partial products 100-102 to form modified partial products 110. Also, similar circuitry is used in each subsequent stage of the partial product reduction.
For example, similar circuitry is used to reduce the various columns in modified partial products 115-118 in
Reducing the three partial products 133-135 in
A shift and recode multiplier is therefore provided in which N M-bit partial products are considered as M N-bit new partial products, which are recoded to binary numbers in parallel using shifters, where N and M can have any integer value. Each bit column of the original binary partial products is recoded to form one of the new partial products. This process is iterated until a pair of carry-sum-like values is generated.
In the above embodiment, with two M-bit binary numbers A and B, M/2 original partial products are generated using Radix-4 modified Booth recoding. Assuming the proper alignment of these partial products, all of the 1's in each bit column of the partial products is packed to one end or the other using a shift and recode circuit as described above. This eliminates the need to perform a conventional Wallace Tree to sum the partial products. There are a maximum of M/2 bits in each bit column. Thus, the original M/2 partial products are converted into 2Mlog2(M/2)-bit partial products (rounded up to the nearest integer). However, each of these 2M partial products has only a maximum of ((log2(M/2))−1) bits in each bit column. Using similar and shift and recode circuits, the number of bits per bit column can be reduced until each column has two bits. Thus, the circuit sums the original M/2 partial products to a carry-some pair as in a conventional multiplier.
As mentioned above, the circuit can be modified to accommodate a multiplier-accumulate (MAC) operation. For an MAC operation, an accumulation result is kept in the format of a carry-sum pair. This pair of binary numbers is fed back to the multiplier as if they were part of the original M/2 partial products, similar to rows 101 and 102 shown in
With the above-embodiments, partial products can be added together faster while using less hardware. Existing methods combine partial products using 3-to-2 full bit-adders. The above embodiments allow more adder reduction options, such as 3-to-2, 7-to-3, 15-to-4, 31-to-5, etc., and thus can add many partial products together more quickly. The difference becomes more pronounced as the lengths of the multipliers and multiplicands increase and thus involve more partial products.
Although the present invention has been described with reference to preferred embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention. For example, bits can be shifted in any suitable manner. The logic states “1” and “0” are arbitrary and interchangeable symbols. The number of stages can be modified as desired or to accommodate any number of bits.