The present invention relates to a method and apparatus for rounding in a multiplier-accumulator.
In general, in the descriptions that follow, I will italicize the first occurrence of each special term of art that should be familiar to those skilled in the art of integrated circuits (“ICs”) and systems. In addition, when I first introduce a term that I believe to be new or that I will use in a context that I believe to be new, I will bold the term and provide the definition that I intend to apply to that term. In addition, throughout this description, I will sometimes use the terms assert and negate when referring to the rendering of a signal, signal flag, status bit, or similar apparatus into its logically true or logically false state, respectively, and the term toggle to indicate the logical inversion of a signal from one logical state to the other. Alternatively, I may refer to the mutually exclusive boolean states as logic_0 and logic_1. Of course, as is well known, consistent system operation can be obtained by reversing the logic sense of all such signals, such that signals described herein as logically true become logically false and vice versa. Furthermore, it is of no relevance in such systems which specific voltage levels are selected to represent each of the logic states.
Hereinafter, when I refer to a facility I mean a circuit or an associated set of circuits adapted to perform a particular function regardless of the physical layout of an embodiment thereof. Thus, the electronic elements comprising a given facility may be instantiated in the form of a hard macro adapted to be placed as a physically contiguous module, or in the form of a soft macro the elements of which may be distributed in any appropriate way that meets speed path requirements. In general, electronic systems comprise many different types of facilities, each adapted to perform specific functions in accordance with the intended capabilities of each system. Depending on the intended system application, the several facilities comprising the hardware platform may be integrated onto a single IC, or distributed across multiple ICs. Depending on cost and other known considerations, the electronic components, including the facility-instantiating IC(s), may be embodied in one or more single- or multi-chip packages. However, unless I expressly state to the contrary, I consider the form of instantiation of any facility that practices my invention as being purely a matter of design choice.
Further, when I use the term develop I mean any process or method, whether arithmetic or logical or a combination thereof, for creating, calculating, determining, effecting, producing, instantiating or otherwise bringing into existence a particular result. In particular, I intend this process or method to be instantiated, embodied or practiced by a facility or a particular component thereof or a selected set of components thereof, without regard to whether the embodiment is in the form of hardware, firmware, software or any combination thereof.
Shown in
Shown by way of example in
As is known, rounding is needed in digital arithmetic units (“AUs”) to preserve the maximum accuracy whenever the number of bits of precision is reduced. The simplest method of rounding is to add a 1 bit to the bit just below the least significant bit (“LSB”) of the rounded result, followed by truncating the lower bits. For example, if a 32-bit fixed-point number represented as 16 integer bits and 16 fraction bits is to be rounded to a 16-bit integer, the fraction ½ would be added to the 32-bit number, and then the result would be truncated to 16 bits, selecting the upper 16 bits and dropping the lower 16 bits. If the bits of the 32-bit number are numbered from 0 to 31, 0 being the LSB and 31 being the most significant bit (“MSB”), adding the fraction ½ is the same as a 1 bit added to bit 15 of the 32-bit number. In this example, at least a 17-bit half-adder would be required to perform the addition prior to the truncation.
One application requiring rounding is in a multiplier-accumulator (“MAC”) unit adapted to perform the following operations:
Multiply:
Accumulator<=Multiplier*Multiplicand [Eq. 1]
Multiply-Accumulate:
Accumulator<=Accumulator+Multiplier*Multiplicand [Eq. 2]
In
The following pseudocode illustrates a prior art method of using the MAC 14 of
In all of the prior art known to me, the MAC unit dedicates a half-adder to rounding, and, for wide data words, a large number of circuits may be required to propagate the carry all the way across the half-adder within the available cycle time. I submit that a method is needed to perform the multiply-accumulate functions more effectively and efficiently than the prior art, and with less circuitry.
In accordance with a first embodiment of my invention, I provide a rounding method for use in a multiply-accumulate (“MAC”) facility comprising controlling the MAC to perform the steps of: developing a product by multiplying a selected multiplicand by a selected multiplier; developing a rounded product by adding to the product a selected one of a predetermined rounding value and an accumulator value; developing the accumulator value by storing the rounded product; and developing a rounded result by selectively shifting the accumulator value.
In accordance with one other embodiment of my invention, a MAC facility may be adapted to practice my pre-rounding method.
In accordance with yet another embodiment of my invention, a digital signal processing system may comprise a MAC facility adapted to practice my pre-rounding method.
In accordance with still another embodiment of my invention, a non-transitory computer readable medium may include executable instructions which, when executed in a processing system, causes the processing system to perform the steps of my pre-rounding method.
My invention may be more fully understood by a description of certain preferred embodiments in conjunction with the attached drawings in which:
In the drawings, similar elements will be similarly numbered whenever possible. However, this practice is simply for convenience of reference and to avoid unnecessary proliferation of numbers, and is not intended to imply or suggest that my invention requires identity in either function or structure in the several embodiments.
In accordance with my invention, I provide a method and apparatus for pre-rounding in a multiply-accumulate facility. In
I have noticed that the addition of the rounding bit just below the LSB of the desired result does not need to be done after the series of multiplies or multiply-accumulates—it can be done at any time, so long as the bit is added at the correct bit position with respect to the eventual shift or bit selection. Rounding Logic 30 is configured to provide the correct addend to the Full-Adder 20 in response to signals from Control 28.
By way of example, let us assume that the rounding addition is performed during the first multiply cycle, using the Full-Adder 20. Therefore, the Half-Adder 26 in the prior art MAC 14 is no longer needed to perform the rounding, and may be eliminated. All that is required is for Control 28 to select the correct bits of the final result using the Shifter 24, truncating any lower bits, and the rounding is complete. Furthermore, by using the Full-Adder 20, but during the first multiply cycle, no additional cycle time is required for rounding.
The following pseudocode illustrates a method of using the MAC 16 of
In this example, the rounding bit is shifted right 2 bits during the first multiply operation so that it will align in the correct position relative to the LSB after a subsequent 2 bit left shift and truncation of the Accumulator contents.
In the embodiment illustrated in
In general, as illustrated in
One application for this method is a volume control that ramps a volume exponentially up or down with a constant ramp factor. For example, to ramp the volume, V, up by a factor (1+Delta), wherein Delta is small, we would calculate:
V′=(1+Delta)*V=V+Delta*V [Eq. 3]
And to ramp the volume down by a factor (1−Delta) we would calculate:
V′=(1−Delta)*V=V−Delta*V [Eq. 4]
In ordinary rounding, you would add ½ to each of these calculations and then truncate, but if V is less than 1/(2*Delta), then the volume will get stuck, because the change in V before rounding will always be less than ½ the value of the LSB. In accordance with my method, my Rounding Logic 30 would be configured to alternate between providing 1 and 0 for rounding, instead of always providing ½. Then when the rounding value of 1 is used, the value of V will always at least increment by 1 when ramping up, and when the rounding value of 0 is used, the value of V will always at least decrement by 1 when ramping down.
The following pseudocode illustrates this method of using the MAC 16 of
The round 1 during the first multiply would be normal for a subsequent left shift by 1 bit, but, in this example, the subsequent left shift is by 2 bits. Note that, during the second multiply-and-shift pair, no round bit is added. The volume result after the second multiply-and-shift is guaranteed to have changed from the input volume before the first multiply, even if RAMP is close to 1 and the input volume is close to zero.
In one other embodiment, my MAC facility 16 may comprise a general purpose DSP, such as is shown in
Although I have described my invention in the context of particular embodiments, one of ordinary skill in this art will readily realize that many modifications may be made in such embodiments to adapt either to specific implementations. Thus it is apparent that I have provided a pre-rounding method and apparatus that are both effective and efficient. Further, I submit that my method and apparatus provide performance generally superior to the best prior art techniques.
Number | Date | Country | |
---|---|---|---|
62662367 | Apr 2018 | US |