1. Technical Field
This disclosure relates to the digital filtering of signals, and particularly to the optimization of digital filter computations in a processor.
2. Background
The digital filter is an important building block in the digital signal processing of audio information. As is well known in the art, digital filters can provide high precision processing of audio signals at very low cost, especially for audio applications in which the audio content emanates from a digital source to begin with. The capabilities of digital filters to precisely process audio signals has especially increased with the high performance digital signal processors (DSPs) that are now available. These advances have also resulted in custom and semi-custom logic circuits that have built-in digital filter blocks.
The infinite-impulse response (IIR) digital filter is an important type of digital filter for audio processing. The second order IIR digital filter, commonly referred to as a “biquad”, is a popular IIR building block, and can be cascaded to provide very high order digital filter functions at low cost and high efficiency.
Modern logic architectures have achieved some efficiencies in the execution of a biquad digital filter by identifying those operations that can be performed in parallel with one another. For example, a conventional biquad architecture can be implemented by way of a single multiply-and-accumulate stage (not illustrated). However, further optimizations are desirable.
The number of clock cycles required for execution of a biquad can become a critical parameter in the implementation of a digital signal processing function. In the audio processing context, the degree or extent to which digital filtering can be performed on an audio channel is limited by the amount of latency that can be tolerated in the system, and by the available clock rate. Conversely, if the desired level of filtering can be accomplished with fewer clock cycles, either the clock rate of the digital filters can be reduced, reducing the cost of the audio processor, or alternatively additional functionality may be implemented within the audio signal flow. In either case, a reduction in the number of clock cycles that are required to carry out digital filters directly translates into lower cost, or improved functionality, in an audio processing system.
The method disclosed here is adaptable to an integrated-circuit hardware optimization whereby a normally fixed algorithm to calculate a second-order IIR is modified in order to reduce the number of writes to storage elements that must be performed in order to compute the HR.
By way of background,
Y(n)=B0·X(n)+B1·X(n−1)+B2·X(n−2)+A1·Y(n−1)+A2·Y(n−2)
where the sample indices n−1, n−2 refer to previous values of the input and output data streams. Referring to
From this representation, one can readily derive the number of digital operations necessary for implementing a biquad digital filter. The necessary operations for conventional realizations (using registers for temporary storage):
These twenty-five operations can readily be seen from the Direct Form I illustration of
There are many ways to compute an IIR using software, hardware, pencil and paper, etc. For integrated circuit designers, this is often done (for many reasons, taking into account residual error, saturation, number of required bits, available storage, MAC operations, etc.) using a Direct Form I architecture, as shown in the figures. With this arrangement, and for each IIR sample calculation, each storage element, labeled J, K, L, M, is both written and read. However, it is possible to cut in half the number of required writes if the inputs and outputs of adjacent storage elements can be alternated on the fly in a specific manner.
This can be accomplished by hardware that alternates states for every sample period, called here a “frame.” That is, the hardware switches between the states shown in
The EvenFrame signal should be built into the instruction set such that there is no overhead to execute instructions. A processor having such a signal is the QF3DFX processor, manufactured by Quickfilter Technologies.
Assume that all data samples (J, K, L, and M) are in a single RAM. By convention, we allocate the
The following table is an example of the manipulation of the address pointers:
The code executing the filter reads the EvenFrame signal and, based on its value, either adds 1 to the RAM address pointer, or subtracts 1 from the address pointer. When EvenFrame is 0, the address pointer to the ram will access the RAM in the usual way. When EvenFrame is 1, at the point where there would normally be a reference to K, the logic adds 1 to the address pointer, meaning it will access J instead.
At the point where there would normally be a reference to J, the logic subtracts 1 from the address pointer, meaning it will access K instead. A similar sequence is used for L and M.
Assuming the address map from above, and that X(0) is in a variable called R0 already. The following pseudocode for each sample period shows the alternating pointer created by the EvenFrame signal and its application to the data in RAM:
The equivalent operation could be done in prior-art software but every software operation will require a checking of the state of the EvenFrame signal and then a determination of how to proceed to choose one addressing variant or the other of the biquad operation. Such an operation would consume more clock cycles than the embodiments disclosed and probably more clock cycles than the standard way of implementing the biquad calculation. Thus the number of writes can be cut in half, while the number of reads remains the same. There is no need for the data to be written into each register on every frame. Because the same data is accessed twice, once in frame N and once in frame N+1, it can just remain where it is and have the addressing change such that the data itself does not need to be written twice.
None of the description in this application should be read as implying that any particular element, step, or function is an essential element which must be included in the claim scope; the scope of patented subject matter is defined only by the allowed claims. Moreover, none of these claims are intended to invoke paragraph six of 35 U.S.C. Section 112 unless the exact words “means for” are used, followed by a gerund. The claims as filed are intended to be as comprehensive as possible, and no subject matter is intentionally relinquished, dedicated, or abandoned.