IBM® is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.
1. Field of the Invention
The present invention relates generally to the field of computer arithmetic, and more particularly to a mechanism of achieving sign extension of a multiplication result aligned in such a way as to not occupy the most significant bits of the actual multiplier implementation without a specific sign extension step either prior to or following the multiplication.
2. Description of Background
A multi-format multiplier may be defined as a circuit whose outputs contain the arithmetic product of two input signals, one or more arithmetic products of parts of the input signals, or the sum of one or more arithmetic products of parts of the input signals. The width of a multi-format multiplier is dictated by the widest data type that it is expected to handle. For data operands smaller than the full widths of one or both inputs, the significant result of the multiplication is contained in some bits that are a subset of all of the multiplier output bits. Depending on the particular alignment of the input data, the result may not be left-aligned; that is, the most significant bit of the result may not be in the most significant output bit of the multiplier. Depending on the needs of the subsequent hardware, signed results that are not left-aligned may require sign-extension.
In two's complement binary representation, sign extension of an M-bit number x to N-bits (N>M) ensures that the numeric value of the signed N-bit number matches that of the M-bit original number, in spite of the additional bits. For positive x, the N-bit two's complement sign extension is achieved by padding the left side (most significant side) of the M-bit number with N minus M zeros. For negative x, the M-bit number is padded on the left by N minus M ones. Equivalently, since the most significant bit of a two's complement number is the sign (0=positive, 1=negative), the leftmost bit may be simply copied to the left until the data is of the desired width.
Typically, sign extension is performed using this “bit copy method”, causing the electrical fanout of the original sign-bit to be large. This, in turn, impacts the speed of any operation requiring sign extension. Assuming the well-known Booth-encoded parallel multiplication technique is used, sign extension of the result is automatically achieved if the inputs are sign-extended. However, this merely pushes the delay associated with sign extension from after the multiplication to before the multiplication.
What is needed, therefore, is a method to perform the sign extension of multiplier results that are not left-aligned within the multiplier output bits within the multiplier's partial product reduction tree.
The shortcomings of the prior art are overcome and additional advantages are provided through a method for implementing sign extension within a multi-precision multiplier. The method includes receiving a first input within a multiplier core of the multiplier, receiving a second input within the multiplier core, creating partial products using the first and second inputs, and summing up the partial products in a partial product reduction tree that is part of the multiplier core. The method also includes performing sign extension within the partial product reduction tree of the multiplier core by adding a value to a partial product. The method further includes computing an output from the partial product reduction tree, the output including a final product of the first and second inputs signed-extended to a desired width.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings.
As a result of the summarized invention, technically we have achieved a solution which performs the sign extension of multiplier results not left-aligned within the multiplier's outputs within the partial product reduction tree itself. This is achieved by adding a specially chosen constant to the partial products by means of the bits to the left of the significant input data in the most significant partial product, which ensures that the result is sign-extended to the full width of the multiplier. Also, it eliminates the need for separate sign extension of either multiplier input or the multiplier result. Finally, it significantly reduces the fanout required on the input sign bit and allows the fanout to occur in parallel with the partial product generation step of the multiplier, effectively hiding the entire latency of the sign extension.
The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
The detailed description explains the preferred embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.
In accordance with exemplary embodiments, the invention allows for a sign extension of the result of a multiplication or a sum of a plurality of multiplications within the reduction tree of a multi-format multiplier. This removes the necessity of explicit sign extensions of the outputs (as shown in
The multi-precision multiplier of the invention may be implemented in a fixed-point processor architecture operating on multiple operand widths, e.g., SEE (streaming single instruction multiple data extension) or VMX (vector media extension). The multi-precision multiplier may reside within one or more execution units of the pipelined architecture. Such fixed-point architectures may require the execution of either the multiplication of two 16-bit inputs with 32-bit result or the sum of up to four multiplications with 8-bit inputs and 32-bit result.
Turning now to the drawings in greater detail, it will be seen that in
Before accessing the multiplier core (100), the subblocks A0 (111) and A1 (112) are shifted in a formatting stage (130) in order to compute the correct partial products (101) for computing R=(A0*B0)+(A1*B1). The output (140) of the multiplier core (100) contains an intermediate value R with N0+N1+2 valid bits (141). The final result vector (160) is created by sign extending (150) the intermediate value of R (141), e.g., the most significant bits thereof, to M bits (161). The multiplier core 100 may be implemented as a Booth multiplier core using a Booth-encoding scheme.
Turning now to
In particular, the exemplary embodiment of
The value of the constant (302) may be computed from the type of the multiplication (i.e., the number of multiplications to be summed up), the width of the multiplicands, and the number of valid bits needed in the output as described further herein. Since this information is usually known at an early stage, computing the constant usually does not increase the critical path. Note that the constant used in the multiplier is independent of any of the inputs (e.g., A and B). That is, computing the value of the constant does not require the actual data operands to be multiplied, a significant advantage over prior art implementations which require sign extension of the data inputs.
The multi-precision multiplier described in
The number and alignment of the significant bits of the multiplier output varies according to the specific multiplication operation and input operand alignment, and the sign extension occurs from the variable position of the most significant product bit to a desired position within the multiplier output. This sign extension occurs by means of the inclusion of the operation-specific (but not data-specific) constant 302 in the partial-product reduction.
The exemplary embodiment uses a property of Booth multipliers. For inputs A(0:N0−1) and B(0:N1−1), a Booth multiplier does not compute R(0:N0+N1−1):=A(0:N0−1)*B(0:N1−1) but A(0:N0−1)*B(0:N1−1)+2N0+N1+2. Thus, the result vector of a Booth multiplier is “100,R(0:N0+N1−1)” for R(0:N0+N1−1)>=0 (i.e., R(0)=0; equivalently, a positive result) and “011,R(0:N0+N1−1)” for R(0:N0+N1−1)>0 (i.e., R(0)=1; equivalently, a negative result). By adding the constant 2m−2N0+N1+2 in the multiplier tree of the multiplier core (100), the result vector R′(0:M) is equal to “10 . . . 0,R(0:N0+N1−1)” for R(0)=0 and “01 . . . 1,R(0:N0+N1−1)” for R(0)=1, where the 0 (respectively 1) is repeated M−(N0+N1) times. Thus, the vector R′(1:M) equals the result R(0:N0+N1−1) sign extended to M bits, as desired.
As the multiplier of
This modification, as described in
The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.