This invention relates generally to digital signal processing, and more particularly to a configurable device for performing multiply-and-accumulate operations.
Designers of modern digital signal processing systems have typically used application-specific integrated circuits (ASICs) to implement their digital filter designs. A commonly-implemented digital filter design is what is denoted as a finite impulse response (FIR) digital filter. For example, the filtering requirements for various wireless telecommunication protocols such as WCDMA, GSM/EDGE, CDMA 2000, & TD-SCDMA may be implemented with such devices. Turning now to
The number of digitized samples filtered by FIR filter 100 depends upon the number of taps it possesses. Each tap is represented by a multiplier 140. FIR filter 100 includes an integer N number of taps and thus has N multipliers 140. Buffer 130 provides a corresponding number of N samples to the taps. The number of bits per sample at each tap may be denoted as the precision for FIR filter 100. For example, if each sample is one byte, the precision would be one byte. In FIR filter 100, a first multiplier 140a multiplies a current sample X0 with a corresponding coefficient C0. A second multiplier multiplies a sample X1 (the sample preceding X0) with a corresponding coefficient C1, and so on. Finally, an Nth multiplier 140N multiples a sample XN-1 with a corresponding coefficient CN-1. A summer 150 sums the tap outputs (from the multipliers) to provide an output sample 160. It will thus be appreciated that FIR filter 100 provides a multiply-and-accumulate (MAC) function.
In an ASIC implementation of FIR filter 100, hardware is provided to implement multipliers 140 and summer 150. However, the filtering needs may vary widely depending upon the desired protocol. For example, a decimation filter for a WDCMA handset may have six taps, each tap having 10 bits of precision whereas a decimation filter for a TDMA handset may have 10 taps, each tap having 10 bits of precision. In general, the number of taps and bits of precision per tap will depend upon the application. An ASIC-implemented digital filter will typically have a fixed (rather than configurable) number of taps and bits of precision per tap. An ASIC designer having to support multiple digital filtering protocols is thus faced with the excessive die area demands of providing multiply-and-accumulate (MAC) hardware to meet worst-case scenarios (i.e., large number of taps with high bit precision) that may not be used.
As an alternative to an ASIC design, digital filters have been implemented using lookup tables (LUTs) such as provided in field programmable gate arrays and other configurable devices. Such LUT-based implementations use a distributed arithmetic approach to perform the necessary MAC operations. Although LUTs are readily reconfigurable, conventional LUT-based distributed arithmetic implementations of digital filters are awkward with regard to input/output (I/O) signal flow.
Accordingly, there is a need in the art for improved digital filter implementations having both a configurable number of taps and also a configurable number of bits of precision per tap.
In accordance with one aspect of the invention, a configurable device is provided for multiplying a plurality of digital words with a corresponding plurality of coefficients, comprising: a plurality of lookup tables, each lookup table corresponding to at least one of the coefficients and operable to receive at least a portion of a corresponding at least one of the digital samples, each lookup table configured with multiples of the corresponding at least one coefficient such that the lookup table is operable to retrieve an entry to provide an output equaling a multiplication of the portion with the corresponding at least one coefficient.
In accordance with another aspect of the invention, a method of implementing a first digital filter for multiplying a plurality of digital input words with a corresponding plurality of first coefficients is provided. The method includes the acts of: configuring at least one lookup table with multiples of each of the first coefficients; and for each digital input word, retrieving a selected one of the multiples of the first coefficients from the at least one lookup table to provide a tap output of the first digital filter, wherein each selected one of the multiples equals the digital input word multiplied by the corresponding coefficient.
In accordance with another aspect of the invention, a lookup table group operable to implement at least a tap for a digital filter is provided, wherein the tap corresponds to the multiplication of a digital input word with a coefficient. The lookup table group includes a plurality of lookup tables, each lookup table configured with multiples of the coefficient such that the lookup table is operable to retrieve an entry to provide an output equaling a multiplication of a portion of the digital input word with the coefficient.
a illustrates the implementation of a tap for a 4-bit sample using a 4-bit LUT in accordance with an embodiment of the invention.
b is an illustration of a FIR filter implemented using LUTs in accordance with an embodiment of the invention.
Use of the same reference symbols in different figures indicates similar or identical items.
Reference will now be made in detail to one or more embodiments of the invention. While the invention will be described with respect to these embodiments, it should be understood that the invention is not limited to any particular embodiment. On the contrary, the invention includes alternatives, modifications, and equivalents as may come within the spirit and scope of the appended claims. Furthermore, in the following description, numerous specific details are set forth to provide a thorough understanding of the invention. The invention may be practiced without some or all of these specific details. In other instances, well-known structures and principles of operation have not been described in detail to avoid obscuring the invention.
A lookup table (LUT)-based digital filter implementation is provided in which both the number of taps and the bits of precision used per tap are configurable. To provide an efficient implementation, a LUT is configured to implement one or more taps of a digital filter (the multiplication part of the desired MAC function). For example, if each sample has four bits of precision, there are thus sixteen potential values for each sample. In turn, there are thus sixteen potential values for each tap output. Turning now to
Turning now to
It will be appreciated that the required number of entries in each LUT (corresponding to integer m) will increase as the precision is increased. In that regard, communication protocols requiring, for example, 16 bits of precision at each tap are quite common. For example, handsets configured for either WCDMA or CDMA2000 require digital filters having 16 bits of precision at each tap. Each LUT would then require 64K entries to provide such a precision value. To avoid providing memory space for such relatively-large LUTs, each LUT may comprise a group of LUTs such that each group of LUTs implements a tap. For example, turning now to
The coefficient value used to configure LUTs 305 in LUT group 300 will be represented by Ci to indicate that it represents an arbitrary coefficient tap value (such as C0, C1, etc from
As discussed with regard to
Configurable digital filters incorporating the LUT-based approach described herein may be implemented using an arbitrary number of LUTs. In addition, the bit size (number of entries) within each LUT is also arbitrary. The number of LUTs used and their size may thus be adjusted to suit individual design needs. Turning now to
Because each LUT group 405 processes 16 bits of one or more taps at a time, eight LUT groups 405 processes 128 bits in parallel. A buffer 420 is thus required to provide these 128 bits. To aid in the retrieval of the appropriate bits, buffer 420 may be organized as a 256-bit wide memory, wherein each line of 256 bits is formed from two logical 128-bit wide memories: a even buffer 425, and an odd buffer 430. Operation of buffer 420 may be better understood with regard to the following example. Suppose a digital filter is being implemented having eight taps with 16-bit precision. Each line of even and odd buffers 425 and 430 each comprises eight input samples, which may be considered as being stored in a zeroth word location to an seventh word location as illustrated in
It will thus be appreciated that LUT groups 405 require samples selected from an “even-odd” line across buffer 420 or from an “odd-even” line across buffer 420. Referring back to
Multiple output samples may be produced in parallel by configurable device 400. For example, suppose a digital filter to be implemented has four taps of four-bit precision. Each LUT group 405 may thus implement instantiations of this filter. In that regard, a first LUT group 405 may process a first though a fourth input sample to provide an output sample. The subsequent output sample may be provided by an adjacent LUT group 405 by processing a second though a fifth input sample, and so on. It will thus be appreciated that each LUT group may be provided the appropriate input bits through selection by shifters 460. With regard to preceding example, a first shifter 460 would select the first through fourth input samples whereas a second shifter 460 would select the second through fifth input samples, and so on.
An adder network 470 processes the outputs from LUT groups 405 to provide an output word. In one embodiment, the output word may be a 256-bit wide output word. This output word is then provided to buffer 420. In that regard, buffer 420 comprises both an input buffer, an intermediate buffer, and an output buffer (all not illustrated). When operating as an input buffer, buffer 420 receives input samples from a source (not illustrated) such as an analog-to-digital converter. Should multiple filters be implemented simultaneously by configurable device 400, the output from adder network is written to the intermediate buffer, which then provides the input word to MUX 440 as discussed above. If all required digital filtering has been completed, the output from adder network 470 may be written to the output buffer. The contents of the output buffer may be provided to a frame buffer (not illustrated). A micro-controller 480 controls operation of configurable device 400. For example, micro-controller 480 controls the loading of the appropriate coefficient multiples into the LUTs within LUT groups 405. In addition, micro-controller controls the retrieval of input samples from buffer 420, and so on.
Operation of adder network 470 may be better understood with regard to
Conversely, should each LUT group 405 correspond to a tap of, for example, a 16 tap filter having 16 bits of precision, the first eight taps may be processed and stored in an accumulator 650. The next eight taps may then be processed and added to the previous taps values through feedback from accumulator 650 in a summer 660. An output 665 of accumulator 650 may then form output word 605a. It will be appreciated that filters having greater than 16 taps of 16-bit precision may be processed analogously through additional tap calculations and corresponding summations at accumulator 650. Adder network 470 has further configurability as well. In one embodiment, samples from a digital filter may be implemented in parallel through appropriate configuration of LUT groups 405 and adder network 470. For example, if each output sample is implemented using two LUT groups, there will be four output samples being provided in parallel. These outputs then correspond to output words 605a through 605d, which are formed from the outputs of summers 620. On the other hand, if each output sample is implemented using four LUT groups, there will be two output samples These outputs then correspond to corresponding output words 605a and 605b, which are formed using the outputs of summers 625.
Once LUT groups 405 have been loaded with the appropriate coefficient multiples to implement one or more digital filters, these groups must be re-loaded with new coefficient sets to implement different digital filters. Moreover, should the digital filter be large (such as with 16 taps of 16 bit coefficient), these groups would have to be reloaded just to implement a single digital filter. In such a case, a first cycle would process the first eight taps whereupon LUT groups 405 would require reconfiguration to process the ninth through sixteenth taps in a second cycle. Such reconfigurations require time and thus add overhead to the required processing time.
To avoid this overhead, multiple page lookup tables may be implemented such that switching between filters may be performed in a single calculation cycle. It will be understood that a “calculation cycle” refers to those calculations that may be performed without re-loading the LUTs with new coefficient multiples. Turning now to
After a calculation cycle is finished, LUT pages 705 may be reloaded with new sets of coefficient multiples 830 to implement another digital filter is so desired. An address 820 provided by micro-controller 480 determines where a given coefficient multiple 830 will be written within LUT pages 705. During such configuration, multiplexers 840 select for addresses 820. However, during a calculation cycle, multiplexers 840 select for input portions 805.
The above-described embodiments of the present invention are merely meant to be illustrative and not limiting. For example, in addition to supporting a 4-bit LUT table mode, a configurable device such as device 400 of