The present invention relates generally to digital signal processing methods and devices, and more particularly to low latency digital filters.
The finite impulse response (FIR) filter is a basic digital signal processing building blocks. In its most basic form, a p-tap FIR filter transforms an incoming time domain signal S, formed of symbols S=S(0)S(1) . . . S(j), to produce
y(n)=C(0)S(n)+C(1)S(n−1)+C(2)S(n−2) . . . C(p−1)S(n−p+1) (1)
C(0), C(1), C(2) . . . C(p−1) are said to be the filter coefficients. FIR filters are detailed generally in A. V. Oppenheim and R. W. Schafer, “Discrete-Time Signal Processing” Prentice-Hall, Englewood Cliffs, N.J. 1989, the contents of which are hereby incorporated by reference.
Proper choice of filter coefficients C(0)(1) . . . C(2), in turn, allows the filter to transform the incoming signal in a multitude of ways.
As is readily appreciated, each output of a p-tap FIR filter relies on p symbols of the incoming signal S. So, typical FIR filter implementations as for example detailed in U.S. Pat. No. 6,367,003 buffers the p incoming samples, and performs the entire calculation of equation (1) to determine the filter output y(n), after arrival of the nth sample S(n).
The delay (or latency) of the filter after arrival of the nth sample is equal to the time required to perform p filter calculations. For many real time applications, significant delay is not tolerable. As such, the rate at which calculations are performed is typically greater than the symbol arrival rate. However, there are practical limits to the rate at which filter calculations are performed, introduced by such things as filter power requirements, electrical interference, and the like.
Accordingly, there is a need for a DSP FIR filter that introduces less delay than conventional DSP FIR filters.
In accordance with the present invention, a FIR filter pre-calculates C(1)*S(n−1), C(2)*S(n−2) . . . C(p−1)*S(n−p+1), prior to the arrival of sample S(n). As such
may be calculated as a result of a single further multiply and accumulate operation, upon arrival of the symbol S(n). This, significantly reduces the latency of the filter.
In accordance with a first aspect of the present invention, a method of filtering a digital stream of symbols S(i) using a pth order finite impulse response filter having filter coefficients C(0)C(1) . . . C(p−1), includes pre-calculating the sum
between the arrival of the (n−1)th and nth symbol. Upon arrival of the nth of the symbols,
using the pre-calculated sum, is calculated prior to the arrival of the (n+1)th symbol.
In accordance with yet another aspect of the present invention, a digital filter, includes a symbol buffer having p−1 storage locations for storing arriving symbols, a coefficient memory having p−1 storage locations; a counter for counting from k=0 to p−1; and a multiply and accumulate (MAC) block for multiplying and accumulating C(k)*S(n−k) from the storage locations of the symbol buffer and the coefficient memory. The counter provides control signals to cause the MAC block to calculate,
between the arrival of the n−1th and nth of the symbols; and, upon arrival of the nth of the symbols, calculate
Other aspects and features of the present invention will become apparent to those of ordinary skill in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
In the figures which illustrate by way of example only, embodiments of the present invention,
Filter 10 is a p tap FIR digital filter, suitable for filtering an incoming stream of symbols S(i), in accordance with the time domain transfer function,
where C(i) are the coefficients of the FIR filter.
As will be appreciated, coefficients C and symbols S may be real or complex, and may thus be expressed as C(k)=CI(k)+jCQ(k), and S(k)=SI(k)+jSQ(k), with CI, SI and CQ, SQ representing real and imaginary components, respectively of C and S. For purposes of illustration only, SI, SQ, CI and CQ are assumed to take m=8 bit values.
As illustrated, filter 10 includes a symbol buffer 12, storing arriving symbols S and providing symbol values to multiply and accumulate (MAC) unit 14, by way of a data selector 26. Coefficient memory 16 is further in communication with MAC unit 14, by way of a data selector 24. Optionally, a coefficient adaptation block 18 may be interposed between selector 24 and MAC unit 14, to make filter 10 adaptive. A symbol clock generator 20 generates clock pulses upon the arrival of symbols S, and is used to reset an internal clock/counter 22 used by filter 10, as detailed below. A latch 28 may latch filter output values calculated by MAC unit 14.
For reasons that will become apparent, clock/counter 22 generates a count k from 0 to p−1, after being reset by symbol clock 20. With each count, a clock pulse is output on line CLK. As count k transitions from 0 to 1, an advance signal is output on line ADV. Clock/counter 22 increments at rate that is at least p times as great as the average symbol rate R.
MAC unit 14 calculates,
A schematic block diagram of an example MAC unit 14, buffers 12 and 16 is illustrated in
Adder 32a thus calculates
SI(k)*CI(k)−SQ(k)*CQ(k).
Adder 32b calculates
CQ(k)*SI(k)+SQ(k)*CI(k).
Put another way, adders 32a and 32b output real and imaginary portions of the complex product (SI+jSQ)*(CI+jCQ). Outputs of adders 32a and 32b are respectively provided to accumulate blocks 34a and 34b. In the example embodiment, with m=8, accumulate blocks 34a and 34b are sixteen bit accumulate blocks that sum values at their inputs to previously accumulated values upon transition of a pulse, at line ADV. The values of accumulate blocks 24a and 24b may be reset to zero upon receipt of reset signal on line CLR.
The organization of symbol buffer 12 is also illustrated in
Buffer 12 is a first-in, first-out (FIFO) buffer. A buffer advance input causes the buffer to advance. That is, buffer 12 acts in a manner similar to a shift register: a buffer advance signal at line ADV causes elements within buffer 12 to be shifted right, from one storage element into the adjacent storage element. The value of the right-most storage element may optionally be output by buffer 12, for cascading of buffers/filters as detailed below.
A data selector 26 interconnects symbol buffer 12 to MAC unit 14. Data selector 26 is a eight (p) sixteen bit input, two eight bit output data selector that selects which of the eight storage locations of buffer 12 are provided as outputs SI and SQ.
The organization of coefficient memory 16 is similarly illustrated in
In operation, symbols arrive at buffer 12. As the current symbol S(i)=(SI, SQ) is stored in location 0 of buffer 12. Upon the arrival of each symbol, a symbol synch pulse is generated by clock 20, to initialize clock/counter 22. Clock/counter 22 counts from k=0 to p−1. Selectors 24 and 26 are controlled by the value k of clock/counter 22 to provide the kth buffer location of interconnected buffers 12, 16 to MAC unit 14.
When the counter value k transitions to a value of 1, the decoder, strobes the ADV line and resets accumulators 24a and 24b of MAC unit 14 (
The value k of clock/counter 22 and the generation of a CLK and ADV signals are illustrated in
As will now be appreciated, upon arrival of the next symbol S(n), yI(n) and yQ(n) may be calculated as
yI(n)=XI(n)+SI(n)*CI(0)−SQ(n)*CQ(0),
and
yQ(n)=XQ(n)+SI(n)*CI(0)+SQ(n)*CI(0).
Conveniently, this requires only a single calculation by MAC unit 14 (i.e. a single calculation by each of multipliers 30a, 30b, 30c and 30d and accumulate blocks 32a and 32b).
As noted, buffer 12 stores each arriving symbol S(i) in its 0th location. So, as S(n) arrives, it will be stored in location 0 of buffer 12. S(n) will thus be presented to MAC unit 14 as the value of clock/counter 22 advances from p−1 to 0. MAC unit 14 thus calculates yQ(n) and yI(n), after one transition of clock/counter 22. The output of MAC unit 14 may optionally be latched in storage element 28 (
As counter 22 increments to a value of 1, S(n) is shifted into the first location of buffer 12, and XI(n+1), and XQ(n+1) may be pre-calculated as clock/counter 22 is increments from k=1 to k=p−1. Again, upon the arrival of S(n+1), yI(n+1) and yQ(n+1) are calculated.
For convenience, the contents of buffer 12 for the arrival of symbols S(n−1), S(n) and S(n+1) are illustrated in
Conveniently, at steady state, the latency between the arrival of each symbol S(n) and filter output y(n) is only a single transition of clock/counter 22, after the arrival of symbol S(n).
Optionally, filter 10 could be made adaptive by updating filter coefficient C(k). This may be effected by adaptation block 18 operating on the values of C(k) at the same time as XI and XQ are pre-calculated. Adapted filter coefficient values could be placed within coefficient memory 16, at some time between the use of the filter coefficient, and the next use of that filter coefficient.
Filter 10 may easily be modified to operate on only real valued coefficients and symbols.
Buffer 112 is like buffer 12 (
As illustrated, MAC unit 114 includes two multipliers 130a and 130b calculating,
C(k)*S(k), and
S(k−8)*C(k+8)
So, MAC unit 114 contemporaneously calculates the partial sums
As explained with reference to filter 10 and
In yet another alternate embodiment, depicted in
Once again, yI(n) and yQ(n) may be calculated a fraction of symbol clock cycle after the arrival of S(n).
A person of ordinary skill will readily appreciate that filters 10, 10′ and 10″ may actually be formed as a single configurable filter, using conventional large (or very large) scale integration (LSI/VLSI) design and fabrication techniques. That is, configuration inputs (not shown) may select which one of three configurations are enabled, allowing simple reconfiguration of a single filter to operate in one of the three depicted modes.
The above described filters 10, 10′ and 10″ may be combined to form higher order or higher precision filters.
For example,
Likewise, filter 10″ (
Similarly, the summer of
Multiple filters 10′ could be interconnected as illustrated in
Of course, the above described embodiments are intended to be illustrative only and in no way limiting. The described embodiments of carrying out the invention are susceptible to many modifications of form, arrangement of parts, details and order of operation. The invention, rather, is intended to encompass all such modification within its scope, as defined by the claims.