Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
The invention now will be described more fully hereinafter with reference to the accompanying drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. One skilled in the art may be able to use the various embodiments of the invention.
An interpolation operation enables to increase the sampling rate by filling in in-between samples of x(n), by zeros for instance. An interpolation factor L means that “L” zeros are inserted between every alternate sample x(n) so to obtain a signal with a scaled frequency response that is replicated L times over a 2π interval. x(n) is a sequence of discrete input values which are processed by a digital filter or a chain of digital filters to produce a sequence of discrete values y(n).
In order to reduce further the computation requirement, a poly-phase structure in combination with an interpolation factor L can be implemented.
For a Finite Impulse Response (FIR) digital filter of “m” order, the poly-phase structure, as shown in
When the FIR filter is in linear phase, its coefficients are symmetric such that coefficients b(i)=b(m-i) in an implementation as show in
Furthermore, by using the half band property of the interpolating filter coefficients, the computational requirement can be further reduced.
In a preferred embodiment as shown in
Table 1 shows an example of different input data rates and corresponding interpolation factors for a chain of transmit digital filters that can be used with the implementation as shown in
As shown in
First in the chain is an Infinite Impulse Response (IIR) filter 106-1 followed by five Finite Impulse Response (FIR) filters 106-2 to 106-6. At an input of each filter DF1 to DF6 referred as 106-1 to 106-6, there is a configurable interpolation block I1-to-16 referred as 104-1 to 104-6. According to the standards requirements, the digital filters 106-1 to 106-6 are enabled and the corresponding interpolation factors of 104-1 to 104-6 can be configured as given in Table 2 which describes ten modes Mode#1 to Mode#10.
For instance in Mode#1, DF1, which is an IIR digital filter, has an interpolation factor equal to 1 (with an interpolating factor equal to 1, the digital filter can be considered as a non-interpolating digital filter), whereas DF2 to DF5, which are FIR digital filters, respectively have interpolation factors equal to 2, 1, 2 and 2. DF2 to DF5 are standard poly-phase interpolating digital filters which follow the principles as described in
DF6, which is also a FIR digital filter, has an interpolation factor equal to 32. DF6 is a special FIR digital filter where a SINC filter is used for higher order of interpolation whose value can be as high as 32. For each mode from Mode#1 to Mode#10, the total interpolation factor L is obtained by the multiplication of DF1-DF6 interpolation factors. It should be kept in mind that DF1 interpolation factor is always equal to 1 from Mode#1 to Mode#8 such that it can be independently bypassed in any mode.
In the preferred embodiment, input datastream X(n) is a sequence of discrete input values which are processed by DF1 to produce a first output datastream Y(n) which is also a sequence of discrete values after the first cascaded 2nd order biquad IIR filter and to produce a second output datastream Z(n) after the second cascaded 2nd order biquad IIR filter. A first feed-forward is implemented by multiplier 302 for multiplying current input value X(n) by coefficient a_B[0], multiplier 304 for multiplying once delayed input value X(n−1) from delay stage 310 by coefficient a_B[1] and multiplier 306 for multiplying twice delayed input value X(n−2) from delay stage 320 by coefficient a_B[2]. On the feed-back side of the first cascaded 2nd order biquad IIR filter, multiplier 314 multiplies once delayed first output Y(n−1) from delay stage 330 by coefficient a_A[1]_neg, and multiplier 316 multiplies twice delayed first output Y(n−2) from delay stage 340 by coefficient a_A[2]_neg. The outputs of multipliers 302, 304, 306 and 312, 314, 316 are all applied to inputs of an accumulator 362 whose resulting sum constitutes the first output datastream Y(n) after being divided by a coefficient a_A[0]=2k in a divider 312 and going through a saturation and rounding operation in block 372.
The first output datastream Y(n) is then used an input in the second cascaded 2nd order biquad IIR filter. A second feed-forward is implemented by multiplier 322 for multiplying current first output value Y(n) by coefficient b_B[0], multiplier 324 for multiplying once delayed first output value Y(n−1) from delay stage 330 by coefficient b_B[1] and multiplier 326 for multiplying twice delayed first output value Y(n−2) from delay stage 340 by coefficient b_B[2]. On the feed-back side of the second cascaded 2nd order biquad IIR filter, multiplier 334 multiplies once delayed second output Z(n−1) from delay stage 350 by coefficient b_A[1]_neg, and multiplier 336 multiplies twice delayed second output Z(n−2) from delay stage 360 by coefficient b_A[2]_neg. The outputs of multipliers 322, 324, 326 and 332, 334, 336 are all applied to inputs of an accumulator 382 whose resulting sum constitutes the second output datastream Z(n) after being divided by a coefficient b_A[0]=2k in a divider 332 and going through a saturation and rounding operation in block 392.
In a preferred embodiment, digital filter DF1106-1 is a 4th order IIR filter whereas digital filters DF2-to-DF5106-2 to 106-5 are not interpolating but poly-phase in order to save computation. Digital filter DF6106-6 is a special case where a SINC filter is used for higher order of interpolation. The SINC filter has the property of having a filter length same as interpolation factor. Thus by using a poly-phase structure, DF6 gives burst of output samples from DAC. And with the use of First-In-First-Out, the dusty samples are periodically given to DAC.
The transmitter digital filter logics use a 423.9 MHz clock, thus depending on the data rate, the numbers of clocks per input available with logics to provide corresponding outputs to the DAC are shown in Table 1. Since the DAC is running at 70.656 MHz, the FIFO generates an output every 6th clock. The input to FIFO is a burst of 16 samples from DF6 after every 96 clocks. When DF6 is interpolating by an interpolation factor 32, the output is generated as two bursts of 16 samples separated by 96 clocks. When DF6 is inactive as in Mode#9 and Mode#10 in Table 2, the input to FIFO is irregular.
In order to design a digital filter hardware that could operate in the ten modes Mode#1 to Mode#10 of Table 2, the number of modes can be higher in another example, there is a need to build an optimized structure that can support multiple digital filters simultaneously, wherein each filter property such as filter order, symmetry coefficient, half-band and poly-phase can be programmed independently to comply with the different system requirements and to extract the maximum throughput of the digital filter hardware.
Table 3 shows the programmable control options for digital filters DF2 to DF5 according to a preferred embodiment wherein the FIR filters are implemented in cascaded as illustrated in
All digital filters in
As is shown in Table 3, DF1 is inactive, meaning that “DF1_bypass” bit is set to ‘1’. In such case, DF1 does not occupy any location in the coefficients RAM nor does it consume any clocks.
In a preferred embodiment, DF2 to DF5 are active or bypassed depending on the transmitter requirement as shown in Table 3. Because of their active status, they work either as non-interpolating filters or as interpolating filters with an interpolating factor equal to 2 by 2 filters, depending on the configuration as shown in Table 2. The outputs of these filters have scaling block where data amplitude can be scaled by “4, 2, 1 and ½” by programming signed value of “−2, −1, 0 and 2” respectively in a 2-bit “DFx_A[0]_shift” register. The filter order of each of these filters DF2 to DF5 is separately programmable by programming “DFx_order” register of the respective filter.
When “DFx_symmetric” bit is set to ‘1’, only half the coefficients are required to be programmed in the coefficient RAM. Only odd number of filter length (even order) is supported when using coefficient symmetric property. When the filter is in interpolating mode, the coefficients need to be programmed by splitting them into even and odd coefficient sets for poly-phase structure. Even coefficients are stored first followed by odd coefficients in the address range.
When “DFx_hb” bit is set to ‘1’, half-band property enables interpolating filters to consume minimum clocks without requiring to change the way the coefficients are programmed. The digital filters will use only centre odd coefficient while other odd coefficients are neglected and assumed to be zeros. Thus, using half band property will reduce the required number of cycles for odd coefficients to one.
If “DFx_symmetric” bit is set in interpolating mode, then only half the coefficients are required to be programmed for poly-phase structure. “DFx_hb” and “DFx_symmetric” bits can be set to ‘1’ at the same time to use both the symmetric coefficient property and the half band property together. Inactive filters are bypassed and do not occupy any location in the coefficients RAM 430 and the data RAM 402, and the corresponding filter registers are neglected. However when the filters are active, they consumes hardware resources according the requirements as shown in Table 4.
Table 4 shows different combinations of the Digital Filters parameters, whether the digital filter is an interpolating or non-interpolating filter, whether the “DFx_symmetric” and “DFx_hb” bits are set or not, whether the minimum number of taps (filter length) is 3 or 7. Table 4 determines the number of locations occupied in DATA RAM 402 being equal to DFx_order+1 if the DF_x order is an even number and being equal to fix(DFx_order/2)+1 if the DFx_order is an odd number. In the same way, the number of locations occupied in Coefficient RAM 430 is determined being equal to DFx_order+1 if the DFx_symmetric bit is set to ‘0’ and equal to DFx_order/2+1 if the DFx_symmetric bit is set to ‘1’. The number of cycles required per input sample is determined depending on whether the digital filter is an interpolating or non-interpolating filter. If it is an interpolating filter, the number of cycles required is split into two branches, an even coefficient branch DFx_EVEN and an odd coefficient branch DFx_ODD. Both branches will depend on the values of the DFx_order and DFx_symmetric bits. The number of cycles required must be an integer, therefore the fix function (that returns the largest integer less than or equal to the value) and ceiling function (that returns the smallest integer not less than the value) are used in the computations.
DF6 is a symmetric FIR filter with a Sinc frequency response and is configured according to the transmitter interpolation requirement. Accordingly, this filter has an interpolation factor of 16 or 32 as shown in Table 2, with the filter length being equal to the interpolation factor. DF6 is implemented as a poly-phase filter to perform one coefficient multiplication per output and it occupies sixteen or thirty-two 16-bit coefficients locations depending on the length of the filter. Since it is the last filter of the cascaded chain of filters, its 16/32 coefficients are placed after all other filters coefficients. Just like DF2 to DF5, the output of DF6 also has scaling block, where data amplitude can be scaled by “4, 2, 1 and ½” by programming signed value of “−2,−1, 0 and 2” respectively in a 2-bit “DF6_A[0]_shift” register.
Tables 3 and 4 illustrate the numerous possibilities of programming the digital filters that can be supported by the Digital Filter Processor (DFP). Almost all the plausible parameters of the digital filters are programmable, making the DFP as flexible as the DSP.
As previously mentioned, the block diagram of the transmitter filter is illustrated in
A Multiplexer 401 receives the inputs which are then dispatched to the two instances of DATA RAM 402 whose outputs are added in an Adder 440 before generating outputs which are transmitted to a multiplier 460.
At this stage it should be kept in mind that control logic or Controller 400 is controlling the addressing and/or the accessing of Multiplexer 401, DATA RAM 402, Coefficient RAM 430 and Multiplier 460 by generating control signals depending on the programmed instructions of RAMs and ALU. Controller 400 preferably operates in response to decoded program instructions or other control signals produced elsewhere in the integrated circuit.
Multiplier 460 multiplies the coefficients from Coefficient RAM 430 with the data from DATA RAM 402 to generate a product. The product outputs of multiplier 460 are added in an Adder 480 with data in an Accumulator 490 before storing the product back in accumulator 490. As a matter of fact, the output of Adder 480 is coupled to the input of Accumulator 490, which accumulates the output from adder 480 with previously accumulated output when clocked. The output of Accumulator 490 is coupled back to Adder 480.
The data in Accumulator 490 are rounded and saturated in a Round and Saturate block 492 before they are stored back in DATA RAM 402, each time the intermediate filters outputs are ready. The intermediate filters outputs are outputs B, C to F from DF1 to DF5 as shown in the case of an implementation represented in
In a preferred embodiment, the two instances of the single-port DATA RAM 402 support the symmetric coefficient property where two samples are needed for one coefficient multiplication. The data for each filter are stored in continuous locations within a “DATA RAM segment” which is dedicated to the corresponding filter. As shown in
The Coefficient RAM 430 is a single-port RAM that can only be programmed with coefficients of active filters by the DSP. The coefficients of each filter are stored separately in different segments 601, 602, 603, . . . , as shown in
The start and end address of segments are generated internally. When programming the controlling registers, shown in Table 3, the controller deciphers the register settings such as filter length, poly-phase etc. and creates the segmentation structure as shown in Table 4. The values in Table 4 are used to calculate the start address of the coefficient/data RAM and to keep track of the current coefficient RAM address and the current data RAM address when a particular filter state is active.
Depending on the programmed mode set in “MODE” register defined in Table 3, the interpolation factors for different filters are derived from Table 2. In one implementation, a single control register can be used to control all filter interpolation factors. In another implementation, the interpolation factor of each filter can be independently controlled by different control registers.
The “MODE” register also controls state-machine which schedule filter computations to a single ALU. The sequencer goes through different number of states for different structures following a poly-phase splitting strategy. The computation advances from one stage to another or from one phase of a stage to that of a next stage based on whether the stages are programmed to be poly-phase or not.
In Mode#4, digital filters DF1 to DF6 are all active and have respectively interpolation factors 1, 2, 1, 2, 2, 16. DF1 and DF3 are simple digital filters. Since DF2, DF4 and DF5 are interpolating digital filters, according to Table 4, the number of cycles required per input sample is to be determined for the even coefficient branch and for the odd coefficient branch. The last digital filter DF6 has an interpolating factor of 16 and will therefore deal with 16 sample output. After the computation is completed in DF1 where the input is stored in DATA RAM in the allocated data RAM segment of DF1, the even coefficients DF2_E are being dealt with before the output is generated to DF3. Once the computation is completed in DF3, the output is generated to DF4 where the even coefficients DF4_E are being dealt with before the output is generated to DF5, where again the even coefficients DF5_E are being dealt with. DF6 receives the output from DF5 and performs 16 computations. Once they are completed, odd coefficients DF5_O are being dealt with before generating an output to DF6 where another 16 computations are performed again. Once they are completed, odd coefficients DF4_O are being dealt with before generating an output to DF5 where even coefficients DF5_E are being dealt with. DF6 receives the output from DF5 and performs 16 computations. Once they are completed, odd coefficients DF5_O are being dealt with before generating an output to DF6 where another 16 computations are performed for the 4th time. Once they are completed, odd coefficients DF2_O are being dealt with and so on until all the even and odd coefficients of DF2, DF4 and DF5 are being dealt with.
For any mode, the state moves out of IDLE stage after receiving an input sample and moves back to “IDLE” state before receiving the next input sample. The hardware will ignore samples received in non IDLE states, whereas the software is programmed so as make the state machine come back to IDLE state before the next sample is received by programming the proper values in the registers. Only the sequence of the state for a given mode is hard coded in a design.
In a preferred embodiment, if required, the sequence of the state for any given mode can be programmable at the cost of area and verification effort. Only the sequence of the state is controlled by the “MODE” registers, but the operations and the number of clocks required by each state are controlled by other registers as shown in Table 3. For each filter there is separate hardware which computes the different values of the number of locations and clock requirements as given in Table 4. The values of the number of locations occupied in Data RAM and in Coefficient RAM are also used to calculate the start addresses of data/coefficient RAM and keep track of current coefficient and data RAM address when particular filter state is active.
Table 5 shows the number of available clock cycles per input sample for different input data-rates. The software has to configure the digital filters such the entire set of active filters complete their computations within the clocks available between two input samples. If the input sample is received when state is in its last stage, then the “IDLE” state is bypassed so that the ALU can use 100% clock for data computation. The software is also programmed so as to control the coefficient and data RAM sizes. The outputs will be junk if all active digital filters need more than 192 memory locations to store the coefficients or more than 164 memory locations to store the delay data since there is no extra hardware.
According to a preferred embodiment, the Digital Filter Processor is implemented in a 90 nm digital process using low area library. The total area of DFP is 0.095 mm2 including the data and coefficient RAMs. If the DSP is running at 360 MHz, the C62x CPU area itself will require 0.83 mm2 and the area for RAM, memory controller and logic to transfer data from RAM to DAC will be additional
Table 6 compares the performance MIPS/second/mm2 of the C62X of the DSP configuration against the DFP configuration. It is clear that the DFP provides a true 10× improvement in performance per unit area of silicon.
Table Table 7 shows the C62x DSP loading for running the same filters with an optimized software in the filters. In the fastest modes, the filters would occupy a maximum of 67% of the CPU availability, which is lower than the percentage of the CPU availability obtained with the DFP configuration.
According the invention, the concept of Digital Filter Processor (DFP) with a basic structure and a low area filter design has all the advantages in terms of Power Performance Area (PPA). Its advanced control logic provides all the flexibility needed to program desired filter parameters in the required configuration and still use 100% of available multiplier time and RAM area. The programmability provided by a DFP processor is very close to that provided by a Programmable DSP but without the cost of its large hardware. Thus DFP brings the best of the two configurations: PPA advantage with complete software controllability of the digital filter parameters.
As shown in
The processor architecture of the invention can be used for wireless technology and can be implemented in a transmitter and/or a receiver of the wireless device.
This processor architecture can be implemented for programmable digital filters which can be used to support multiple standards like G.dmt.992, G.dmt.bis, ADSL2+ and VDSL2. In another implementation, the processor architecture can also fit the use of a chip which consists of a TMS320C62x™ based DSL PHY comprising a Data Converter subsystem and Digital signal processing subsystems.
While the invention has been described according to its preferred embodiments, it is of course contemplated that modifications of, and alternatives to, these embodiments, such modifications and alternatives obtaining the advantages and benefits of this invention, will be apparent to those of ordinary skill in the art having reference to this specification and its drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
This application claims the benefit of U.S. provisional applications 60/825,661 filed on Sep. 14, 2006, which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60825661 | Sep 2006 | US |