This application for Patent claims priority to European Patent Application No. EP 09 290 057.0 (attorney docket TI-67084EP) entitled “Multiplier with Shifter” filed Jan. 27, 2009 and incorporated by reference herein.
This invention generally relates to the field of digital signal processing, and more particularly to the implementation of digital filters and more specifically recursive filters such as infinite impulse response filters.
Mobile audio devices are a ubiquitous fixture of modern society. Cellular telephones, personal music players, portable gaming systems, etc. are constant companions for many people. Cell phones continue to increase in computer processing capability and sophistication. The increased memory capacity and computing resources on a cell phone support the installation of various applications, often referred to as “apps” that allow a diverse range of functions to be performed by the cell phone when not being used for conversation.
For example, even when not talking, social networking can continue using various messaging tools and features. A wide circle of friends can be kept current with a twittering app. Shopping venues can be located and found using navigation apps that provide mapping and global positioning system (GPS) functionality. Various game apps use the keyboard and display to provide a range of gaming opportunities.
Central to the operation of a cell phone and many of the apps placed on a cell phone is digital signal processing. Digital filters are used for modulation, demodulation, frequency separation and extraction, wave shaping and a host of other functions. The general theory and operation of digital filters is well known; for example, see “Digital Signal Processing for Measurement Systems, Theory and Applications,” Gabriele D'Antona and Alessandro Ferrero, 2006.
Many other types of devices, both mobile and fixed, also rely on digital signal processing to implement digital filters for a wide range of functions.
Particular embodiments in accordance with the invention will now be described, by way of example only, and with reference to the accompanying drawings:
Digital signal processing typically involves multiplying two operands together to form a quotient and then adding the quotient to a running value that is retained in an accumulator. This common function is referred to as “multiply-accumulate.” In order to prevent overflow, a shifter may be included with the multiplier for scaling the quotient. In order to allow the amount of shift to be dynamically specified, an encoded shift amount is concatenated with one of the operands. When the operand is received at the multiplier, the encoded shift amount is stripped from the operand and used to control the quotient shifter. In this manner, the multiply and shift operation may be performed in one clock cycle, as will be described in more detail below.
Digitals filters are often described and implemented in terms of a difference equation that defines how the output signal is related to the input signal:
where:
P is the feedforward filter order
bi are the feedforward filter coefficients
Q is the feedback filter order
ai are the feedback filter coefficients
x[n] is the input signal
y[n] is the output signal.
A more condensed form of the difference equation is:
which, when rearranged, becomes:
To find the transfer function of the filter, a Z-transform of each side of the above equation is taken, where the time-shift property is used to obtain:
The transfer function is defined to be:
Considering that in most IIR filter designs coefficient a0 is 1, the IIR filter transfer function takes the more traditional form:
A problem occurs in the implementation of digital filters, and more specially recursive filters which are used to form infinite impulse response filters (IIR), due to the fact that IIR filters have coefficients that can result in overflows in the adders 130 because of the gain introduced by the filter. This typically occurs in the feedback section.
For example, an exemplary elliptic low-pass filter expressed in MATLAB syntax is [b,a]=ellip(6, 0.1, 80, 0.1). The resulting direct-form coefficients are listed in Table 1.
The denominator operations translates to:
Y(output)=X(input)+5.322×Z−1−11.961×Z−1+14.51×Z−3 . . .
The denominator is the recursive part of the filter, and several of the coefficients are larger than one. One way to cope with this problem is to lower the amplitude of the input signal by scaling it down. However, this results in increasing the quantization noise floor. Another way is to lower the values of the coefficients, however, this leads to poor filter performance due to smaller useful bits per coefficient, and can lead to instabilities.
If all of the coefficients are scaled, then in this example they would all be scaled with a division by 16 which is the smallest power of two larger than the highest coefficient −14.514 . . . . As mentioned above, this results in poorer filter performance because four bits of coefficient accuracy is lost because of the scaling and an additional shift+16 instruction is required in order to restore the original scale. Alternatively, the input data samples could be scaled down to preserve the coefficients precision, but this would add quantization floor noise on the data.
Each pair of sample data and coefficient is accessed from RAM 202 and RAM 204 respectively and received by multiply-shift block 206. The encoded shift amount is separated from the coefficient data. The multiplication of the mantissa is done in multiplier 220 with the remaining 22-bits of coefficient and shifter 222 then implements amplitude scaling to prevent overflow. In this embodiment, multiplier 220 may be implemented as a 24×22 multiplier to save circuitry. The two bits of encoded shift amount that are separated from the coefficient data are decoded and used to control shifter 222. Since each time the next coefficient data is accessed from RAM an encoded shift amount is also included, the shift amount can be different for each multiply operation. Thus, the multiply and shift may be performed in a single clock cycle with an individually selected shift amount.
For example, in one embodiment, shift values of +6, +1, 0, and −6 may be encoded in two bits as 11, 10, 01, and 00. The 2 least significant bits (LSB) of the coefficients are used to tune the shifter. 24-bits from the coefficient memory are split into a 22-bit mantissa to the multiplier and 2-bits to the post-shifter 222. In another embodiment shift values of +4, +1, 0, and −6 may be encoded in two bits as 11, 10, 01, and 00, for example. In another embodiment, the encodings may be in a different order. In other embodiments, various combinations of shift amount may be encoded in two bits. In other embodiment, various combinations of shift amount may be encoded in three or more bits. In some embodiments, a single bit may be used to encode two shift amount values.
Referring again to the example above, from Table 1 the second denominator coefficient is 5.32262562108146 which may be represented in 24-bit binary as 101010100101001011110010. The two bits shift values of the coefficients (two LSBs) are defined at compilation time. In this example, assume the encoded shifts are +4, +1, 0, −6 respectively coded with “11, “10”, “01” and “00”.
In order to implement Y=Y×C (C=5.32262562108146), without using the concatenated shift amount feature described above, one option is to code C with scaling to the closest power of 2 (here 2<<3=8): C0=010101010010100101111001 and an extra instruction is needed to rescale the result by three bits. For an ALU with an accumulator with at least three guard bits an example instruction sequence may be:
ACC=Y×C0, followed by
ACC=ACC<<3
If a generic filtering subroutine is used, then an independent shift value can not be used for each coefficient but the worst case situation must be used instead. In this example, the largest coefficient is 14.51 so the scaling will be 4. C1 is now C1=001010101001010010111100 and the code is:
ACC=Y×C1, followed by
ACC=ACC<<4
The complete filter is a succession of similar code.
Referring again to
ACC=(Y×C2)<<4
Thus, the coefficients may be individually scaled. The input signal does not need to be scaled and the quantization noise of the signal is therefore not increased. Using this shifter capability allows the size of the multiplier to be reduced to a 24×22 configuration instead of 24×24 without loosing accuracy in the filter computations.
As each shifted quotient is produced on the output of multiply-shift unit 206, adder 208 adds it to the running value stored in accumulator 210. At the completion of one filter iteration the output sample value is then stored into sample data RAM 212, which may be the same as RAM 202.
As each shifted quotient is produced on the output of each multiply-shift unit 306, 307 adder 308, 309 respectively, adds it to the running value stored in accumulators 310, 311 respectively. At the completion of one filter iteration the output sample values are then stored into sample data RAM 312, which may be the same as RAM 302. In this embodiment, an additional set of shifters 314, 315 is provided to allow data normalization, for example.
Each pair of sample data and coefficient is accessed from memory and received 404 by a multiply-shift unit. The coefficient operand includes the encoded shift amount. The encoded shift amount is separated 406 from the coefficient data. The multiplication 406 of the mantissa is done with the remaining 22-bits of coefficient and the 24-bit sample data operand.
The quotient is then shifted 408 according to the encoded shift amount to form a shifted quotient on an output of the multiply-shift unit. In this manner, amplitude scaling of the quotient is implemented to prevent overflow.
However, as was described above, the compiler is also configured to determine how much shift is required for each coefficient in Table 1 to prevent overflow. The compiler is configured to encode an amount of shift selected from a set of shift values and to concatenate the encoded shift amount onto each coefficient, as indicated in (nn) at 510, 512. The compiler then generates object code 507, 508 that instructs a multiply-shift unit as described with respect to
RF transceiver 1106 is a digital radio processor and includes a receiver for receiving a stream of coded data frames from a cellular base station via antenna 1107 and a transmitter for transmitting a stream of coded data frames to the cellular base station via antenna 1107. RF transceiver 1106 is connected to DBB 1102 which provides processing of the frames of encoded data being received and transmitted by cell phone 1100.
DBB unit 1002 may send or receive data to various devices connected to universal serial bus (USB) port 1026. DBB 1002 can be connected to subscriber identity module (SIM) card 1010 and stores and retrieves information used for making calls via the cellular system. DBB 1002 can also connected to memory 1012 that augments the onboard memory and is used for various processing needs. DBB 1002 can be connected to Bluetooth baseband unit 1030 for wireless connection to a microphone 1032a and headset 1032b for sending and receiving voice data. DBB 1002 can also be connected to display 1020 and can send information to it for interaction with a user of the mobile UE 1000 during a call process. Display 1020 may also display pictures received from the network, from a local camera 1026, or from other sources such as USB 1026. DBB 1002 may also send a video stream to display 1020 that is received from various sources such as the cellular network via RF transceiver 1006 or camera 1026. DBB 1002 may also send a video stream to an external video display unit via encoder 1022 over composite output terminal 1024. Encoder unit 1022 can provide encoding according to PAL/SECAM/NTSC video standards. In some embodiments, audio codec 1109 receives an audio stream from FM Radio tuner 1108 and sends an audio stream to stereo headset 1116 and/or stereo speakers 1118. In other embodiments, there may be other sources of an audio stream, such a compact disc (CD) player, a solid state memory module, etc.
As described in more detail above, DBB unit 1002 contains a multiply-shift unit that is configured to receive two operands on respective inputs of the multiply-shift unit, wherein one of the operands includes a concatenated encoded shift amount, multiply the two operands to form a quotient after separating the concatenated encoded shift amount from the one operand, and shift the quotient according to the encoded shift amount to form a shifted quotient on an output of the multiply-shift unit.
While the invention has been described with reference to illustrative embodiments, this description is not intended to be construed in a limiting sense. Various other embodiments of the invention will be apparent to persons skilled in the art upon reference to this description. For example, while a 24-bit sample data size and coefficient size was described herein, multiply-shift units that operate on other sizes of operands and coefficients may be easily embodied using the techniques described herein.
While embodiments of the invention were described for implementing IIR filters herein, other types of digital signal processing may make use of various embodiments of a multiply-shift unit responsive to encoded shift amounts as described herein.
The multiply-shift unit may be a scalar multiplier instead of a floating point unit. While one or two units in parallel were illustrated herein, a system with more than two multiply-shift units in parallel may be embodied using the concepts described herein.
While a mobile handset has been described, embodiments of the invention are not limited to cellular phone devices. Various personal devices such as audio players, video players, radios, televisions, personal digital assistants (PDA) may use an embodiment of the invention to perform digital signal processing for various application provided by the device.
Although the invention finds particular application to systems using Digital Signal Processors (DSPs), implemented, for example, in an Application Specific Integrated Circuit (ASIC), it also finds application to other forms of processors. An ASIC may contain one or more megacells which each include custom designed functional circuits combined with pre-designed functional circuits provided by a design library.
An embodiment of the invention may include a system with a processor coupled to a computer readable medium in which a software program is stored that contains instructions that when executed by the processor perform the functions of modules and circuits described herein. The computer readable medium may be memory storage such as dynamic random access memory (DRAM), static RAM (SRAM), read only memory (ROM), Programmable ROM (PROM), erasable PROM (EPROM) or other similar types of memory. The computer readable media may also be in the form of magnetic, optical, semiconductor or other types of discs or other portable memory devices that can be used to distribute the software for downloading to a system for execution by a processor. The computer readable media may also be in the form of magnetic, optical, semiconductor or other types of disc unit coupled to a system that can store the software for downloading or for direct execution by a processor.
As used herein, the terms “applied,” “connected,” and “connection” mean electrically connected, including where additional elements may be in the electrical connection path. “Associated” means a controlling relationship, such as a memory resource that is controlled by an associated port. The terms assert, assertion, de-assert, de-assertion, negate and negation are used to avoid confusion when dealing with a mixture of active high and active low signals. Assert and assertion are used to indicate that a signal is rendered active, or logically true. De-assert, de-assertion, negate, and negation are used to indicate that a signal is rendered inactive, or logically false.
It is therefore contemplated that the appended claims will cover any such modifications of the embodiments as fall within the true scope and spirit of the invention.
Number | Date | Country | Kind |
---|---|---|---|
EP 09 290 057.0 | Jan 2009 | EP | regional |