This application claims the priority benefit of Taiwan application serial no. 103137590, filed on Oct. 30, 2014. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The present disclosure relates to a parallel finite impulse response (FIR) filter and a corresponding filtering method.
A finite impulse response filter is usually used by a transmitter of a wireless communication system, and configured to shape a spectrum of a signal can pending for transmission, so that the signal match a spectrum mask desired by the specification.
In recent years, with developments of communication technologies starting form the wireless local area network (WLAN) through the fourth generation (4G) technology to the upcoming fifth generation (5G) technology, the communication technologies have become more complex and diverse. Accordingly, issues of the communication system such as power consumption, transport speed, and hardware area will receive more attentions.
The present disclosure is directed to a finite impulse response filter and a corresponding filtering method, which are capable of reducing power consumption and hardware area for a communication system while increasing a throughput of the communication system.
The finite impulse response filter of the present disclosure receives an input sequence. The input sequence includes a plurality of input values. The finite impulse response filter includes at least one first adder, at least one multiplier, and a second adder. Each of the at least one first adder performs a plurality of addition operations simultaneously in parallel. Each of the addition operations outputs a sum of two of the input values. The multiplier is coupled to the first adder. Each of the at least one multiplier performs a plurality of multiplication operations simultaneously in parallel. Each of the multiplication operations outputs a product of one of the sums and one of a plurality of coefficients of the finite impulse response filter. The second adder is coupled to the multiplier, and outputs a total sum of the products.
The filtering method of the present disclosure include the following steps: receiving an input sequence, wherein the input sequence comprises a plurality of input values; in each clock cycle of a plurality of clock cycles, performing a plurality of addition operations simultaneously in parallel, wherein each of the addition operations outputs a sum of two of the input values; in each of the clock cycles, performing a plurality of multiplication operations simultaneously in parallel, wherein each of the multiplication operations outputs a product of one of the sums and one of a plurality of coefficients; and outputting a total sum of the products.
The finite impulse response filter and the filtering method as described above are capable of reducing power consumption while increasing the throughput by a parallel architecture. The power consumption may be further reduced by disabling a part of the multiplication operations and the hardware area may be reduced by simplifying the multiplication operations.
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, that one or more embodiments may be practiced without these specific details. In other instances, well-known structures and devices are schematically shown in order to simplify the drawing.
A finite impulse response filter (hereinafter, referred to as the FIR filter) may be expressed as a formula (1) below.
In the formula (1), x(n) represents input values of the FIR filter, y(n) represents output values of the FIR filter, the value of n ranges from 0 to infinity, h(i) are coefficients of the FIR filter, and N is the number of h(i). The output values y(n) are convolutions of the input values x(n) to x(n−(N−1)) and the coefficients h(0) to h(N−1).
The coefficients h(i) of the FIR filter are symmetric, which means that the coefficients h(i) conform to a formula (2) below. According to the formula (2), the formula (1) may be simplified to obtain a formula (3) below.
In the formula (3) above, it is assumed that N is an odd number. If N is an even number, the formula (3) should be replaced with a formula (4) below.
The delay chain 110 receives the input sequence, and groups the input values into a plurality of batches Xn to Xn+13 according to the input order of the input values, in which each of the batches includes 4 input values. For instance, the 4 input values of the batch Xn+1 are represented by Xn+1,1, Xn+1,2, Xn+1,3 and Xn+1,4, and the other input values may be deduced by analogy. The delay chain 110 may include at least one delayer coupled in series, such as delayers 111 to 113. Among the delayers of the delay chain 110, the first delayer receives the batches Xn to Xn+13 one by one directly from the input sequence. Each of the remaining delayers receives the batches Xn to Xn+13 one by one from the previous delayer. Each of the delayers delays the received batches for a predetermined time and then outputs the delayed bathes. The predetermined time may be one cycle of one clock signal. Each of the delayers of the FIR filter 100 may receive the clock signal as a basis for the delay.
Each of the adders 121 to 127 performs a plurality of addition operations simultaneously in parallel, and each of the addition operations outputs a sum of two of the input values in the input sequence. Each of the adders 121 to 127 directly obtains the input values from the batches outputted by the delayers of the delay chain 110. Each of the multipliers 141 to 147 performs a plurality of multiplication operations simultaneously in parallel, and each of the multiplication operations outputs a product of one of the sums and one of a plurality of coefficients of the FIR filter 100. Table 1 below lists the addition operations performed by the adders 121 to 127 and the multiplication operations performed by the multipliers 141 to 147.
In the addition operations of Table 1, operands Xn+1,3 to Xn+13,1 are equivalent to the input values x(n) to x(n−(N−1)) in the formula (3). Each of sn,1 to Sn,26 is the sum generated by the addition operation at the same row. For example, Sn,24=Xn+8,4+Xn+7,4) and the rest may be deduced by analogy. h1 to h26 are equivalent to the coefficients h(i) in the formula (3).
In view of Table 1, each of the adders 121 to 127 may perform at most four addition operations simultaneously. Each of the multipliers 141 to 147 may perform at most four multiplication operations simultaneously. The input value Xn+7,2 is a midpoint of the entire input sequence. For each of the addition operations, the two of the input values for generating the sum are respectively before the midpoint of the input sequence and after the midpoint, and locations of the two of the input values in the input sequence are symmetric with respect to the midpoint Xn+7,2. If the number of the coefficients of the FIR filter 100 is an even number, the midpoint of the input sequence falls between two input values at the most middle. A symmetric relation of aforesaid locations of the input values can be observed in view of the formulas (3 and (4). Further, it view of Table 1, each of the adders 121 to 127 uses two groups of consecutive input values in the input values to perform the addition operations simultaneously. For example, a first group of the consecutive input values used by the adder 125 includes Xn+9,1 to Xn+9,4, whereas a second group of the consecutive input values includes Xn+5,3 to Xn+6,4.
The delayers 131 to 137 serve to allow each of the adders 121 to 127 and the corresponding multipliers 141 to 147 to operate in different clock cycles. For example, the adder 123 calculates the four sums sn,9 to sn,12 simultaneously in parallel in one specific clock cycle. Then, after the four sums are delayed by the delayer 133, the adder 143 obtains said four sums sn,9 to sn,12, so as to perform four multiplication operations simultaneously.
The adder 150 includes a plurality of adders 151 to 160 and a plurality of delayers. The adder 151 calculates a sum of four products generated by the multiplier 141 in parallel and then outputs the sum. The adder 152 calculates a sum of four products generated by the multiplier 141 in parallel before outputting the sum, and the rest may be deduced by analogy. The adder 158 calculates a sum of output values of the adders 151 to 154 and then output the sum. The adder 159 calculates a sum of output values of the adders 155 to 157 and then output the sum. The adder 160 adds output values of the adders 158 and 159 and output a result thereof. Accordingly, a final output of the adder 150 is a total sum of all the products generated by the multipliers 141 to 147 which is equivalent to y(n) in the formula (3). The delayers in the adder 150 serve to add a buffer of one clock cycle between two consecutive stages of the adders.
The adder 150 of
In view of Table 1, each of the adders 121 to 126 performs four addition operations, and the adder 127 performs two addition operations. Each of the multipliers 141 to 146 performs four multiplication operations, and the multiplier 147 performs two multiplication operations. Accordingly, in comparison with a general non-parallel FIR filter, the FIR filter 100 is capable of achieving almost four times the throughput. With the same demand for the throughput, the operation frequency may be reduced in order to reduce requirements for the power consumption. For example, because the FIR filter 100 adopting the 4-way parallel architecture only requires a quarter of the operation frequency, the power consumption may be reduced significantly.
The number of the coefficients of the FIR filter 100 of
The FIR filter 100 of
The FIR filter 100 of
The parallel architecture of the FIR filter 100 can increase a number and an area of the hardware. In order to reduce the hardware area, the coefficients h1 to h26 can be simplified. For instance, assuming that each of the coefficients h1 to h26 is a predetermined constant of 10-bits, such that each of the multiplication operations performed by the multipliers 141 to 147 requires λ times of shifts and the addition operations, where λ is the number of non-zero bits corresponding to the coefficients, and the maximum number of λ is 10. If each of the coefficients h1 to h26 can be simplified to include only two or three non-zero bits, the multiplication operations and the corresponding hardware area may then be significantly simplified.
As mentioned above, the coefficients h1 to h26 in Table 1 are equivalent to the coefficients h(i) in formulas (3) and (4). Hereinafter, h(i) is used to represent the coefficients h1 to h26. In an embodiment, a formula (5) may be used to calculate one corresponding simplified coefficient ĥ(i) for each original coefficient h(i).
In the formula (5), λi is equal to 2 or 3. ck,i is equal to −1, 0 or 1. gk,i is an integer greater than or equal to 0 and less than the number of bits of the original coefficients h(i). ck,i and gk,i are optimal parameters obtained by searching in the time domain and the frequency domain by using a tap search. The tap search of the present embodiment will be described in details later. Each multiplication operation in Table 1 and the corresponding hardware circuit may be simplified by replacing the corresponding original coefficient h(i) with the corresponding simplified coefficient ĥ(i).
The shifters 201 to 203 receive the sum sn of the corresponding multiplication operation. The shifters 201 to 203 are respectively corresponding to the parameters g1,i, g2,i and g3,i corresponding to the coefficients h(i) of the multiplication circuit. The shifter 201 shifts the sum sn for g1,i times and the outputs the shifted sum which is equivalent to the sum sn multiplied by 2−g
In another embodiment, if λi corresponding to the multiplication circuit of
The shifter 301 receives the sum sn of the multiplication operation corresponding to the multiplication circuit. In the kth cycle of a clock signal, the sifter 301 shifts the sum sn for gk,i times and outputs the shifted sum which is equivalent to the sum sn multiplied by 2−g
Description regarding how to search the optimal parameters ck,i and gk,i by using the tap search in an embodiment of the present disclosure is provided as follows. First of all, the corresponding parameters ck,i and gk,i for each of the original coefficients h(i) are searched in the time domain according to a formula (6).
In the formula (6), N is the number of coefficients of the FIR filter 100 of the present embodiment, and N of the present embodiment is an odd number. G is a parameter having a plurality of possible values. For example, an arithmetic sequence may be defined, and G may be any one value in the arithmetic sequence. For instance, the range of 0.5 to 1 may be divided into 500 equal parts, and the length of each of the equal parts is (1−0.5)/500=0.001. The aforesaid arithmetic sequence may be the 501 endpoints of the 500 equal parts, in which 0.5 and 1 are two endpoints among the 501 endpoints, and G may be any one of the 501 endpoints. Q( ) is a quantization function mapping one real number to the one element in a domain D which is closest to that real number. A formula (7) below is a definition of the domain D.
β in the formula (7) represents the elements of the domain D, and R represents the set of all real numbers. β in the formula (7) has a definition similar to that of the simplified coefficient ĥ(i) in the formula (5), such that λ in formula (7) is analogous to λi in the formula (5). λ is equal to 2 or 3. ck is equal to −1, 0 or 1. gk is an integer greater than or equal to 0 and less than the number of bits of the original coefficients h(i)). The domain D is the set composed of all real numbers that can be expressed in the manner of the formula
The formula (6) is equivalent to a calculation of an error value eq(G) between all the original coefficients h(i) and all the simplified coefficients ĥ(i). The definition of the simplified coefficient ĥ(i) in the formula (6) is identical to that in the formula (5). For each simplified coefficient ĥ(i), each of the corresponding parameters ck,i and gk,i has a plurality of possible values. The parameter G also has a plurality of possible values. In the formula (6), each combination of the possible values of (N−1)/2+1 parameters ck,i, (N−1)/2+1 parameters gk,i and one parameter G may be used for calculating one corresponding error value eq(G). By sorting the error values eq(G) obtained from all the combinations, a minimum error value eq(min) among the error values may be obtained, and then a plurality of error values eq less than M*eq(min) may be selected from the error values. The selected error values eq also include the minimum error value eq(min). M is a predetermined constant and M of the present embodiment is equal to 5. In another embodiment, M may be any integer greater than one.
The formula (6) shows that each selected error value eq is corresponding to a plurality of parameters ck,i and a plurality of parameters gk,i. A frequency response of the FIR filter 100 may be calculated by replacing the original coefficients h(i) with the simplified coefficients ĥ(i) calculated based on the parameters ck,i and gk,i. Therefore, each selected error value eq is corresponding to one frequency response. The next step is parameters searching in the frequency domain. In other words, the frequency response corresponding to each of the selected error values eq is compared with the original frequency response of the FIR filter 100, such that the frequency response that is most similar to the original frequency response may be found, and the error value eq corresponding to the most similar frequency response may also be found. The parameters ck,i and gk,i corresponding to this most similar error value eq are the optimal parameters adopted in the formula (5).
There are many existing methods for determining whether two frequency responses are similar, and the aforesaid parameters searching in the frequency domain may use any one of those methods. For example, the mean of the corresponding frequency responses for each of the error values eq in the pass band of the FIR filter 100 may be calculated in the frequency domain, and the mean of the original frequency response of the FIR filter 100 in the same pass band may be calculated in the frequency domain. Which one of the frequency responses corresponding to the selected error values eq is most similar to the original frequency response may be decided by comparing the aforesaid means.
The formula (6) is suitable for the circumstance where the number N of the coefficients of the FIR filter 100 is an odd number. If the number N of the coefficients of the FIR filter 100 is an even number, the formula (8) below may be used to replace the formula (6).
The FIR filter 100 is capable disabling a part of the multiplication operations based on desired applications, so that outputs of the multiplication operation being disabled may be zero. Accordingly, the same FIR filter may be used to satisfy a variety of spectrum masks while reducing unnecessary power consumption.
More specifically, the coefficients h(i) of the FIR filter 100 may be numbered 0 to N−1 (i.e., h(0) to h(N−1)). The coefficients h(i) may be divided into two sets S1 and S2. The set S1 includes the jth coefficient to the (N−1−j)th coefficient in the coefficients h(i) (i.e., h(j) to h(N−1−j)), and j is a positive integer less than N/2. The other set S2 includes the remaining coefficients h(i). The FIR filter 100 is capable of disabling the multiplication operations corresponding to the coefficients in the set S2, so that outputs of the disabled multiplication operations are zero. As described in the embodiments of
For instance, each device class of the DSRC (Dedicated Short Range Communications) system of IEEE (Institute of Electrical and Electronics Engineers) 802.11p communication standard has a corresponding transmission spectrum mask.
Take the FIR filter 100 in one embodiment of the present disclosure as an example, it is assumed that the number N of the coefficients h(i) is 71. As shown in
Similarly, the device class C needs to suppress the power outside the operation band to approximately −30 dBr. In this case, the set S1 needs to include the 39 coefficients at the middle of h(i) (i.e., h(16) to h (54)), which also means that only the 20 multiplication operations corresponding to h(16) to h(54) are required. The multiplication circuits corresponding to the remaining multiplication operations may be disabled.
The device class D needs to suppress the power outside the operation band to approximately −45 dBr. In this case, all of the coefficients are to be used and all of the 36 multiplication operations are required. Each of the multiplication circuits is enabled.
The numbers of the multiplication circuits used by the device classes A and B are only one third of the number of the multiplication circuits used by the device class D. That is to say, two-thirds of the multiplication circuits of the FIR filter 100 may be disabled for the device classes A and B, so as to avoid unnecessary power consumption. The FIR filter 100 may be designed based on a windowing algorithm to benefit from the aforesaid operation of disabling a part of the multiplication circuits.
In another embodiment of the present disclosure, a combination of multiple parallel FIR filters similar to the FIR filter 100 may be used to achieve higher degree of parallelism. For example, four FIR filters (the FIR filter 100 of
Table 1 shows that the FIR filter 100 calculates the convolution of the input values Xn+1,3 to Xn+13,1 and the coefficients h1 to h51. Because the coefficients h1 to h51 are symmetric, the FIR filter 100 practically only uses the coefficients h1 to h26. Table 2 shows that the FIR filter 500 calculates the convolution of the input values Xn+1,4 to Xn+13,2 and the coefficients h1 to h51, the FIR filter 600 calculates the convolution of the input values Xn,1 to Xn+13,3 and the coefficients h1 to h51, and the FIR filter 700 calculates the convolution of the input values Xn,2 to Xn+13,4 and the coefficients h1 to h51. In this way, there are four FIR filters calculating four different convolutions simultaneously. The combination of the FIR filters 100, 500, 600 and 700 is capable of performing 16 addition operations and 16 multiplication operations simultaneously in parallel in each clock cycle and thereby increasing the throughput to 16 times the throughput of a general non-parallel FIR filter.
In order to describe the aforesaid parallel FIR filter more clearly, the input values in the input sequence may be consecutively numbered. For example, the batch X1 includes input values x(1) to x(4), the batch X2 includes input values x(5) to x(8), and the rest may be deduced by analogy. Table 3 below lists the convolutions calculated in each clock cycle of four clock cycles and the relation between the convolutions and the output values y(n) in the formula (3) under the circumstance where only the FIR filter 100 is used. The other clock cycles may be deduced by analogy. Table 4 below lists the convolutions calculated in each clock cycle of four clock cycles and the relation between the convolutions and the output values y(n) in the formula (3) under the circumstance where the FIR filter composed of the FIR filters 100, 500, 600 and 700 is used. The other clock cycles may be deduced by analogy.
In view of Table 3, if only the FIR filter 100 is used, one input value x(n) may be received and one output value y(n) may be calculated in each clock cycle. In view of Table 4, if the combined parallel FIR filter including the FIR filters 100, 500, 600 and 700 is used, each of the FIR filters 100, 500, 600 and 700 may receive one input value x(n) respectively and calculate one output value y(n) respectively in each clock cycle. As such, the entire combined parallel FIR filter is capable of receiving four input values x(n) and calculating four output values y(n) in each clock cycle. In another embodiment, any number of FIR filters may be combined according to the aforesaid rule in order to achieve lower or higher degree of parallelism.
A filtering method is provided according to an embodiment of the present disclosure. The FIR filter 100 of
In summary, the aforesaid FIR filter is capable of reducing the operation frequency of the transmitter of a communication system in order to reduce power consumption. In aforesaid FIR filter, adders and shifters may be used to replace the multipliers, so as to significantly save the hardware area for the multiplication circuits. Aforesaid FIR filter is also capable of dynamically disabling a part of the multiplication circuits in order to reduce power consumption, and one FIR filter is enough to satisfy the demands for a variety of spectrum masks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments. It is intended that the specification and examples be considered as exemplary only, with a true scope of the disclosure being indicated by the following claims and their equivalents
Number | Date | Country | Kind |
---|---|---|---|
103137590 | Oct 2014 | TW | national |