The present invention relates generally to polyphase rate change filters and, more particularly, to a polyphase rate change filter with multiple branches.
In digital hardware, it is possible to modify the sampling rate of a signal, for example from 400 MHz to 600 MHz, by implementing a digital filter called a polyphase finite-impulse response (FIR) rate change filter (RCF). In a polyphase FIR rate change filter, every output sample y(m) is generated by multiplying the input sample stream with a subset of the filter coefficients (also called phase), and by summing the resulting products. The upsampling and downsampling factors, denoted U and D respectively, are determined by the ratio of the filter input and output sample rates. In the example of a stream being rate changed from 400 MHz to 600 MHz, the U and D factors could be almost any combination of integers that produce a ratio of 1.50. In this example, the upsampling and downsampling factors could be: U=150 and D=100.
In advanced communication systems, very fast data rates are sometimes needed to implement a group of signal processing functions. However, these rates may be too fast to be realized in digital hardware using the existing technologies, such as Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Therefore, there is a need for new designs for rate change filters using existing technologies that can increase the effective processing speed for high data rate applications.
The present involve relates to a rate change filter having multiple branches. The multi-branch rate change filter of the present invention achieves higher effective output rates by processing the input sample stream in two or more parallel filter branches with offset states and optionally combining the output samples from each branch.
Exemplary embodiments of the invention comprise methods for filtering an input sample stream having a first sample rate to generate an output sample stream having a second sample rate. In one exemplary embodiment, the method comprises inputting the sample stream to a rate change filter with two or more filter branches having offset states, and filtering the input sample stream in parallel filter branches with filter coefficients corresponding to different phases to generate a multiple output sample substreams.
Other exemplary embodiments of the invention comprise a rate change filter configured to filter an input sample stream having a first sample rate to generate an output sample stream having a second sample rate. In one embodiment of the invention, the rate change filter comprises two or more parallel filter branches with offset states to filter the input sample streams using filter coefficients corresponding to different phases and to generate multiple output substreams; and a control circuit to control input of the input sample stream to the filter branches and the selection of filter coefficients for the parallel filter branches.
Referring now to the drawings,
The operation of the rate change filter 10 is given by the equation:
where
Lx is the length of the input sample stream x.
The upsampling and downsampling factors, U and D, are determined by the ratio of the filter input and output sample rates. In the example of a input sample stream being converted from 400 MHz to 600 MHz, the U and D factors could be almost any combinations of integers that produce a ratio of 1.50. In this example, the up and down factors could be: U=150 and D=100. The length of the filter impulse response is determined by the upsampling factor U and the number of filter taps N. One constraint on the design of the rate change filter 10 is the selection of the upsampling factor U so that the rate change filter 10 has a sufficient number of coefficients to provide good performance.
Equation 1 can be rewritten as:
The term Pm functions as a read pointer and the term Rm functions as a phase offset as hereinafter described.
As the m index (the output sample stream index) is incremented from one cycle to another, the value of the phase offset Rm is either incremented by D, or incremented by D and decreased by a multiple of U, by definition of the modulo function. The multiple of U is determined by the expression └mD/U┘. For implementations where the ratio U/D is greater than one, the D factor will always be smaller than the U factor. With this constraint, the term └mD/U┘ will either remain constant, or will be incremented by one, from one output clock cycle to another in order to perform the modulo function with respect to U.
Conceptually, the input sample stream is scanned by a sliding window of N samples, given by the term
in Equation 2, which is implemented by the shift register 16, as shown in
in Equation 2, which is dependent on the phase offset Rm (given in Equation 3). Thus, output sample 0 is generated using the coefficients of phase 0, output sample 1 is generated using the coefficients of phase 1, and so on until the maximum number of phases is reached, which corresponds to the upsampling factor U. The phase selection process then restarts at phase 0, so that the Uth output sample is generated using the coefficients of phase 0.
A detailed mathematical description of polyphase multi-rate filters can be found in chapter 11 of John. G. Proakis and Dimitris G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, 4th edition, Prentice Hall, 2006.
When the sample rate of the input sample stream is increased by the rate change filter 10, the output rate may be too fast to realize with existing technologies using conventional designs for rate change filters. According to various embodiments of the present invention, higher effective output rates can be achieved by processing the input sample stream in a multi-branch rate change filter 20 with two or more parallel branches as conceptually shown in
Control logic (not shown in
In a two-branch rate change filter 20, two output samples are generated every output clock cycle. In the following description, the branches 24 are referred to herein individually as Branch 0 and Branch 1. Branch 0 produces the even output samples and Branch 1 produces the odd output samples. By definition of odd and even numbers, the output of Branch (y0) and the output of Branch 1 (y1) can be written as:
The variable k in Equations 5 and 6 is the output stream index of each branch 24. By substituting the variable m in Equation 1 by its odd and even representations, the instantaneous output vector of the proposed dual-branch rate change filter 20 can be written as:
Comparing Equations 1 and 7, it may be noted that the number of cycles required to process the input sample stream in the dual branch rate change filter 20 is half the number of cycles required by a single branch rate change filter 10. For practical purposes, the input sample stream must have an even number of input samples. In cases where the input sample stream contains an odd number of samples, an extra zero may be appended to the sample stream for the last sample of Branch 1.
In order to implement Equations 8 and 9 in hardware, two state machines denoted as Sm0 and Sm1, operate in parallel. There is an offset of one state between Sm0 and Sm1. More particularly, the state of Sm0 is given by the expression 2k and the state of Sm1 is given by the expression 2k+1. During every output clock cycle, the state of Sm0 and Sm1 are incremented by two states. State machine Sm0 controls Branch 0 and tracks the state variables Rm0 and Pm0. From Equations 3, 4, and 8, the value of Rm0 and Pm0 are given by:
State machine Sm1 controls Branch 1 and tracks the state variables Rm1 and Pm1 From Equations 3, 4, and 9, the values of Rm1 and Pm0 are given by:
Combining Equations 7-12, Equations 7-9 can be rewritten as:
The values of Lx, D, and U are constants. In the case where U>D, the initial value of Rm1 simplifies to Rm0+D and the initial value of Pm1 equals 0.
After the initialization of the state machines Sm0 and Sm1, the multi-branch rate change filter 20 is ready to process the input sample stream. For as long as the read pointers are below Lx−1 (block 104), the multi-branch rate change filter 20 calculates output samples (block 106), updates the state machine variables (block 108), and increments the per-branch output sample stream index (block 110) during each output clock cycle. In the two branch rate change filter 20, two output samples are calculated during each output clock cycle. The output samples are computed according to Equations 15 and 16 respectively. The state machines increment the phase offsets Rm0 and Rm1 and the read pointers Pm0 and Pm1 for Branch 0 and Branch 1 during each clock cycle according to Equations 10-13.
In the situation where two branches 24 do not provide enough processing speed given the hardware operating frequency, a designer can choose to implement the polyphase FIR rate change filter 20 using M branches 24. Note that the value of M can be larger than the total number of phases. In this alternative embodiment, M output samples are generated every output clock cycle. Branch 0 produces every M output samples with an offset of 0, Branch 1 produces the every M output samples with an offset of 1, and so on. The output of the different branches 24 can be written as:
The instantaneous output vector of the multiple-branch polyphase FIR rate change filter can be written as:
In case the term
in Equation 18 is not an integer, some extra zeros will be padded in hardware for the last sample of the last
branches 24. The implementation details of the dual-branch rate change filter as described herein can be easily extended to rate change filters with three or more branches 24.
One constraint on implementing the state machines Sm0 and Sm1 in hardware is that the state machines have to complete their operations within one clock cycle. This constraint means that the while statements in
For hardware implementations, two situations have to be considered:
The U/D ratio of the rate change filter 20 is in the range of [1.0; 2.0].
The U/D ratio of the rate change filter 20 is greater than 2.
In the situation where the U/D ratio of the rate change filter is in the range of [1.0; 2.0], the values of the phase offsets Rm0 and Rm1 are either incremented by 2×D, or incremented by 2×D and decreased by a multiple of U as k index is incremented from one cycle to another. The value of the multiple of U is determined by the expression:
where l represents the branch index. By analyzing Equation 19 for the two corner cases
(U/D=1 and U/D=2), the following relationships are obtained:
Based on the above corner cases analysis, for U/D rates in the range of [1.0; 2.0], from one cycle to another (as k is incremented) either one times U or two times U will have to be subtracted from the phase offset Rmx in order to implement the modulo function with respect to U. In Equation 20, the term l simply represents the initial offset of the different branches 24.
The inequality in the IF statement can be written as:
Every time U quantity is subtracted from phase offset Rmx, the value of the corresponding read pointer Pmx has to be incremented by one, which is why Pmx is incremented by two in the first condition of the above IF statement.
For U/D ratios greater than 2.0, the upsampling factor U becomes larger than two times the downsampling factor D so that the subtraction by U is not always necessary. The following relationship is intuitively obtained from Equation 21:
For these ratios, the inequality in the IF statement can be derived from Equations 21 and 22 as follows:
In cases where the ratio U/D is less than or equal to two, the process follows along the left branch of
By breaking down the state machine operation into two steps, it is possible to derive another valid implementation for the state machines Sm0 and Sm1. The idea is to incorporate as a common factor the worst case of subtraction by U, so that the Rmx values in the inequality are compared to zero. Using this approach, the inequality in the IF statement for ratios in the range of [1.0:2.0] can be written as:
The same transformation is applied for U/D ratios greater than two. Note that this state machine is also able to complete all of its operations within one clock cycle. The only drawback is that the adders/subtractors have to handle negative values, so that they are bigger and slower in hardware.
In cases where the ratio U/D is less than or equal to two, the process follows along the left branch of
The hardware implementation of the state machine 40 for this alternative embodiment is illustrated in
In order to distribute the input samples to the different branches 24 of the rate change filter 20, the input sample stream is written to two identical input buffers 22. The read pointer Pm0 reads the first input buffer and the pointer Pm1 reads the second input buffer. The read pointer Pm0 is primarily feeding the shift register 26 of Branch 0 and the read pointer Pm1 is primarily feeding the shift register of Branch 1. However, since the state machines Sm0 and Sm1 are incremented by two states every clock cycle (Rmx=Rmx+2×D), it is possible for a given read pointer Pmx to be incremented by two addresses during one clock cycle (see, for example, block 146 in
Because state machine Sm0 is advanced by one state compared to state machine Sm1, there is a delay of one clock cycle between the read pointer Pm1 and the shift register 26 of Branch 0. Another observation is that the samples read by Pm0 always go to the last position of the shift register in Branch 0. Whenever there is a shift by two samples, the sample provided by read pointer Pm1 is input to the second last position in the shift register 26 for Branch 0. The same process applies to Branch 1. Also, since the rate change filter 10 up/down ratios covered by this invention are always greater than one, it is guaranteed that the increments of the read pointers will always increment by 1, increment by 2, or remain unchanged.
In the rate change filter 20 with two branches 24, the input sample stream is effectively scanned by two sliding windows of N samples, implemented as shift registers 26 in the different branches 24, as shown on
Instead of writing the input sample stream to two identical input buffers 22, a designer can choose to write the input data to a single buffer. In this situation, the two read pointers Pm0 and Pm1 are still needed, and they are still used in the same way. The only difference is that some control logic is required to ensure that only one pointer has access to the memory location in case of a pointer collision (i.e. both read pointers are reading the same address at the same time).
In case the input rate (input clock domain) is too fast to be implemented in hardware, many parallel write pointers, for example L write pointers, can be used to write to contiguous input buffer addresses. On startup, write pointer 0 is initialized to address 0, write pointer 1 is initialized to address 1, and so on up to write pointer L−1. Every clock cycle, each write pointer is incremented by L addresses, and they wrap around when they reach the maximum input buffer address.
In a polyphase FIR rate change filter, output sample 0 is generated using the coefficients of phase 0, output sample 1 is generated using the coefficients of phase 1, and so on until the maximum number of phases is reached (upsampling factor). Then, the process restarts at phase 0. In the dual-branch rate change filter, for the first output sample, Branch 0 will be provided with the coefficients of phase 0 and Branch 1 will be provided with the coefficients of phase 1. Then for the second output sample, Branch 0 will be provided with the coefficients of phase 2 and Branch 1 will be provided with the coefficients of phase 3. This process goes on until phase U−1 is reached. Then, the process restarts at phase 0. When the upsampling factor U is an even number, Branch 0 is always fed with the coefficients of even phases and Branch 1 is always fed with the coefficients of odd phases. However, when the upsampling factor U is an odd number, each branch 24 is provided in alternation with the coefficients of even and odd phases. Every time phase U−1 is reached, each branch 24 switches between even and odd phases.
One approach to implement the coefficient distribution in hardware is to program a true dual-port RAM with the coefficients already interleaved by phases, in the right order. This approach means that the memory location zero would contain all the coefficients of phase zero, memory location one would contain all the coefficients of phase one, and so on up to phase U−1. The first read pointer provides the coefficients for branch 0 and the second read pointer provides the coefficients for branch 1. Initially, the read pointer of Branch 0 is set to zero, and the read pointer of Branch 1 is set to one. Every clock cycle, both read pointers are incremented by two addresses, and they wrap around when they reach the maximum number of phases.
The same process can be implemented using two identical single-port RAMs also programmed with the coefficients already interleaved by phases, in the right order. The first coefficient memory provides the coefficients for Branch 0 and the second coefficient memory provides the coefficients for Branch 1. Initially, the coefficient memory read pointer of Branch 0 is set to zero, and the coefficient memory read pointer of Branch 1 is set to one. Every clock cycle, both read pointers are incremented by two addresses, and they wrap around when they reach the maximum number of phases. The drawback of this approach is that half of the memory area is wasted in hardware because of duplications.
Instead of using two identical single-port RAMs as in paragraph 061, the filter coefficients can be separated into two halves using two single-port RAMs filled with half of the filter impulse response.
However, when the upsampling factor U is an odd number, three coefficients memories 62 are needed as well as some extra control logic to coordinate the operations.
Table 2 illustrates this process for an upsampling factor of 5. In this example, phase 0 is used as a separator between odd and even phases.
The particular example presented in this second alternative embodiment (paragraphs 062-064) for odd upsampling factors considers that phase 0 is stored separately, in a different memory. Note however that a designer could also use phase U−1 as the separator, and re-organize the coefficients selection control logic accordingly. This second alternative embodiment is not very flexible in the sense that it cannot support both even and odd upsampling factors. However, it can be used in applications where the rate change filter ratio is always constant.
The methods and apparatus herein described allow design teams to double, and possibly multiply by some larger factors, the processing speed available in digital hardware for a polyphase FIR rate change filter 20. This invention is technology independent. The implementation algorithm presented in this disclosure enables processing speeds which are not possible to realize using the existing technologies such as ASICs and FPGAs, by solving the problem of coordinating many branches 24 in a polyphase FIR rate change filter 20.
The present invention may, of course, be carried out in other specific ways than those herein set forth without departing from the scope and essential characteristics of the invention.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, and all changes coming within the meaning and equivalency range of the appended claims are intended to be embraced therein.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB10/02148 | 8/13/2010 | WO | 00 | 3/15/2011 |