Not Applicable
1. Field of the Invention
The present invention pertains generally to array multipliers, and more particularly to a Booth-encoded array multiplier architecture wherein low transition probability partial-products are generated, and the adder array is re-arranged according to the partial-products' signal transition probabilities.
2. Description of the Background Art
Multiplication is a ubiquitous operation in digital signal processing (DSP) applications. The well-known modified Booth-encoding algorithm reduces the number of partial products that must be added and is widely used in VLSI implementations of multiplication. Referring to
The radix-4 Booth encoding algorithm is commonly used in DSP applications. For multiplication XY, the radix-4 Booth-encoding algorithm encodes an N-bit two's complement number Y, one bit-pair at a time, into a set of signed-digits
according to Table 1. In
is the most-significant digit (MSD). To reduce the switching activity, and hence the power dissipation, in the partial-product generators and the adders, it is understood to be preferable to encode both a string of zeros and a string of ones as +0, as indicated by the bold +0 entries in Table 1. We define such a Booth-encoding method as “+0 Booth-encoding”. See, for example, C. J. Nicol and P. Larsson, “Low power multiplication for FIR filters,” in Proceedings of International Symposium on Low Power Electronics and Design, August 1997, pp. 76–79, incorporated herein by reference.
The partial-product generator 14 produces a partial-product Qi (generated using a simple shift and complement) according to multiplicand X and a signed-digit Si. In
Referring to
As can be seen, the well-known modified Booth-encoding algorithm reduces the number of partial-products and is widely used in multipliers for DSP applications. The use of +0 Booth-encoding can reduce the transition probability of the partial-products. However, the switching of the partial products causes signal transitions in the partial-product adder array, and spurious transitions and logic races that flow through the adder array are major sources of power dissipation in multipliers. Accordingly, there is a need for an array configuration where unnecessary switching activity in the array is reduced.
In many DSP applications, the magnitude of a digital signal does not always occupy the entire extent of its word-length. For a two's complement number Y with small-magnitude, its most-significant-bits (MSBs) are strings of zeros or ones, which are Booth-encoded as +0. This implies that the corresponding most-significant-digit (MSD) partial-products have a high probability of being zero and a low switching probability. However, referring to the standard 8×8 Booth-encoded carry-save array multiplier 50 in
In accordance with the present invention, the adder array in an array multiplier should be configured according to a probabilistic property of the partial-products, so that the switching activity in the array is reduced. In other words, the partial-products are added according to their transition probabilities. This “reorganized” partial-product addition arrangement has all the advantages of a traditional adder array but with reduced power dissipation. The architecture of the present invention recognizes a useful feature of the signed-digit encoding property of the +0 Booth-encoding algorithm and uses it to exploit the high dynamic range property typical of many signals in DSP applications. Various addition operations and styles can be employed in the present invention, including but not limited to carry-save adders, 4-to-2 compressors, dual-carry-save arrays, and tree-style adder arrays.
By way of example, and not of limitation, for DSP applications a carry-save array structure according to the present invention is configured to add the partial-products sequentially starting with the MSD partial-product (e.g., see
An advantage of an adder array configured according to the present invention is that its organization opposes the carry-propagation direction. Therefore, few signals need to propagate through the entire adder array. Another advantage is that such a structure reduces the number of long signal paths and hence reduces the switching activity in the adder array. Further advantages of the invention will be brought out in the following portions of the specification, wherein the detailed description is for the purpose of fully disclosing preferred embodiments of the invention without placing limitations thereon.
The invention will be more fully understood by reference to the following drawings which are for illustrative purposes only:
The power dissipation of a high-speed CMOS circuit is dominated by the dynamic switching power, which is proportional to the transition probability in the circuit. For a signal s we denote p(s) as the probability of s being non-zero and pΔ(s) as its transition probability. See, Qing Wu, M. Pedram, and Xunwei Wu, “A note on the relationship between signal probability and switching activity”, in Proc. Asia and South Pacific Design Automation Conf., 1997, pp. 117–120, incorporated herein by reference. For random binary signals,
pΔ(s)=2p(s)(1−p(s)).
Since pΔ(s) achieves its maximum at p(s)=0.5, pΔ(s) is reduced if p(s) is skewed away from 0.5. For p(s)≦0.5, we seek to reduce p(s).
A transition at a circuit node may cause glitches in the combinatorial circuits fed by the node. The longer the logic depth, the more the capacitance that is affected due to glitching. To reduce the spurious transitions, it is desirable to assign a signal that has high switching probability to circuits having short logic depth. The switching activity in an adder array is related to the character of its inputs, the partial products. This relationship forms the intuition behind the proposed structure; it is reflected in two aspects as follows:
First, the switching of a partial-product causes spurious transitions and logic races that flow through the adder array. We seek to assign those partial-products having high switching probability a short signal path. In a carry-save array multiplier, for example, the partial-product first added affects a long logic path in the adder array. Therefore, it is desirable that this partial-product have a low switching probability. This leads to a carry-save array structure that adds the partial-products sequentially in the order of non-decreasing pΔ(Qi).
Second, the one probability of partial-products affects the transition probability in the adder array. Those skilled in the art will recognize that the sum signals in the adder array have a one probability close to 0.5 and that the one probability of a carry signal is given by:
p(C1)=p(Qm)p(Qn)
where Qm and Qn are the two partial-products added in the first adder row, and
p(Ck)=0.5p(Ck−1)+0.5p(Qi), for k>1,
where Ck is a carry output of the k-th adder row, and Qi is a partial-product added into the k-th adder row. (As shown in
The above two observations give us a guideline on the ordering of the partial-product inputs in an adder array to reduce power dissipation.
It will be appreciated that, in many DSP applications, the magnitude of a digital signal does not always occupy the entire dynamic range of its wordlength. For example, the coefficients of an adaptive filter might vary over a large range during a training period, but only have a small magnitude once converged. Consider how such a signal property affects the partial-product statistics in a Booth-encoded multiplier. It is known that
p(Qi)=p(X)p(Si)
and p(Qi) is reduced as p(Si) decreases. For a two's complement number that has a smaller magnitude than its word-length, the most-significant bits (MSBs) are repeated sign-extension bits, which are strings of zeros or ones. These signal patterns are Booth-encoded as +0. This implies that the MSD of the Booth-encoder output has low p(Si) and the corresponding MSD partial-product has a low p(Qi).
For a random signal X, p(X)=0.5; therefore, p(Qi)=p(Si)p(X)≦0.5. Since p(Qi)≦0.5, low p(Qi) implies reduced pΔ(Qi). Also, for a digital signal Y that varies in a small range, its MSBs switch less frequently, hence, so do the corresponding Booth signed-digit and the partial-product. Therefore, the MSD partial-products have low pΔ(Qi).
From the foregoing, we now know that the MSD partial-products have both small p(Qi) and pΔ(Qi). Combining this fact and our observations on the transition activity in a carry-save array, we conclude for purposes of the present invention that it is desirable to first (not last) add the MSD partial-products in a carry-save array for reduced circuit switching. Accordingly, the present invention comprises a carry-save array structure that adds the partial-products sequentially with decreasing wordlength, starting with the MSD partial-product.
Referring now to
Note that a carry-save array multiplier according to the present invention is particularly useful for XY when a Booth-encoded input Y is close to zero. In an application where Y is close to its maximum value, the low-power multiplier of the present invention can also be applied with some modifications. Assume Y is an N-bit number {yN−1, . . . , y0} and it represents integer values between −2N−1 and 2N−1−1. Define
(a) Y is positive and close to 2N−1−1. We can rewrite Y=(2N−1−1)−Ŷ where Ŷ is a small positive number. The original multiplication XY becomes
XY=(2N−1−1)X−XŶ=2N−1X−X(1+Ŷ)=2N−1X+X{circumflex over (
and 2N−1 X is simply a shift of X, which can be accumulated after the multiplication (according to the present invention) that computes X{circumflex over (
(b) Y is negative and close to −2N−1. We can rewrite Y=(−2N−1−1)−Ŷ where Ŷ is a small negative number. Similarly,
XY=(−2N−1−1)X−XŶ=−2N−1X−X(1+Ŷ)=−2N−1X+X{circumflex over (
and again the multiplier of the present invention can be applied for low power.
We also observe that the present invention is not limited to the use of carry-save adder arrays. Other partial-product summation techniques can be applied, for example, the use of 4-to-2 compressors and dual-carry-save arrays. In all cases, for low power dissipation, the partial-products should be added in an order that causes increasing signal transition probabilities for successive additions, rather than being ordered in terms of increasing word-length as is usually the case for conventional methods. At the same time, the +0 Booth-encoding should be applied together with array re-arrangement for more power reduction.
Several multiplier circuits were implemented in a 0.5 μm technology using Synopsys Design Compiler™ for synthesis and Cadence™ for place and route. We used a set of well-characterized leaf-cells with state-dependent power dissipation, and the interconnect parasitic information was extracted from the layout. The circuits were simulated at a clock speed of 10 MHz in Verilog™, the toggle rates of all of the nodes were calculated, and the power dissipation was reported using Synopsys Design Power™. Our multiplier implementation did not include the final carry-propagation adder because its power characteristic has been well studied and it is not the focus of this invention.
We first simulated the power dissipation of the multipliers when the Booth-encoded input signal had a dynamic range smaller than its word-length. The other input of the multiplier was a random signal. The simulation results are summarized in Table 2 for a conventional multiplier (std.) and a multiplier according to the invention (inv.). As expected, more power savings are achieved when the input signal has a small magnitude. This corresponds to the case when the MSD partial-products are zero, and the first few adder rows in the proposed structure do not switch at all. (By contrast, all the adder cells in the standard carry-save array switch due to the transitions in the LSD partial-products). Table 2 shows that the invention actually has reduced power dissipation for signals of all dynamic ranges. Table 3 shows that the architecture of the present invention does not require extra circuits or routing overhead.
The invention was also applied in a multiplexed FIR filter running at a clock speed of 10 MHz. The filter coefficients were Booth-encoded and the other multiplier input was a random signal. The power dissipation for various multiplier sizes is summarized in Table 3 and, as can be seen, an over 18% power reduction has been achieved.
Accordingly, the present invention comprises a low-power array multiplier which is preferably Booth-encoded for many applications. Its architecture is suitable for DSP applications where one of the input signals has a large dynamic range. We take advantage of the encoding property of a Booth encoder and modify the standard adder array to reduce the total switching activity inside the array. As a result, power dissipation is reduced.
Although the description above contains many specificities, these should not be construed as limiting the scope of the invention but as merely providing illustrations of some of the presently preferred embodiments of this invention. Thus the scope of this invention should be determined by the appended claims and their legal equivalents. Therefore, it will be appreciated that the scope of the present invention fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present invention is accordingly to be limited by nothing other than the appended claims, in which reference to an element in the singular is not intended to mean “one and only one” unless explicitly so stated, but rather “one or more.” All structural, chemical, and functional equivalents to the elements of the above-described preferred embodiment that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device or method to address each and every problem sought to be solved by the present invention, for it to be encompassed by the present claims. Furthermore, no element, component, or method step in the present disclosure is intended to be dedicated to the public regardless of whether the element, component, or method step is explicitly recited in the claims. No claim element herein is to be construed under the provisions of 35 U.S.C. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for.”
This application claims priority from U.S. provisional application Ser. No. 60/328,365 filed on Oct. 9, 2001, incorporated herein by reference.
This invention was made with Government support under Grant No. MIP-9632698, awarded by the National Science Foundation. The Government has certain rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
5070471 | Dao-Trong et al. | Dec 1991 | A |
5818743 | Lee et al. | Oct 1998 | A |
5867415 | Makino | Feb 1999 | A |
5889691 | Gatherer et al. | Mar 1999 | A |
6021424 | Chu | Feb 2000 | A |
6029187 | Verbauwhede | Feb 2000 | A |
Number | Date | Country | |
---|---|---|---|
20030120695 A1 | Jun 2003 | US |
Number | Date | Country | |
---|---|---|---|
60328365 | Oct 2001 | US |