The present invention is related to the digital implementation of adaptive equalizers for high-speed communication systems, using finite precision arithmetic, for example, as implemented in a silicon ASIC.
Equalization in a digital receiver is a process whereby multipath, noise, and other interferences incurred in the digital broadcast are removed from the received signal, attempting to restore the original digital transmission. Since the characteristics of the broadcast channel are rarely known a priori to the receiver, and can change dynamically, equalizers are usually implemented using adaptive filters.
Most state-of-the-art digital receivers use some type of decision feedback equalizer (DFE), because it provides superior inter-symbol interference (ISI) cancellation with less noise gain than a finite impulse response (FIR)-only equalizer structure. Austin first proposed a DFE, in a report entitled “Decision feedback equalization for digital communication over dispersive channels,” MIT Lincoln Labs Technical Report No. 437, Lexington, Mass., August 1967. A DFE acts to additively cancel ISI by subtracting filtered symbol estimates from the received waveform. The feedback structure embeds a FIR filter in a feedback loop, and therefore overall has an infinite impulse response (IIR). Most modern DFE's use two adaptive filters, a first linear, forward filter coupled to the feedback structure which embeds a second, feedback filter.
For communication systems broadcasting through a channel medium with a long delay spread relative to the symbol period, the adaptive filters in the equalizer must be long enough to cover the channel delay spread, resulting in a significant number of equalizer coefficients, and implementation penalty needed to realize them. For example, digital television (DTV) broadcast in the U.S. is according to the Advanced Television Systems Committee (ATSC) standard (see ATSC Digital Television Standard (A/53) Revision E) and transmits about 10.76 million digital symbols per second through VHF/UHF, which can display delay spreads up to 100 microseconds. Hence, an ATSC receiver may have over a thousand adaptive equalizer coefficients. It is typical for such high speed receivers to have well over half their silicon real estate dedicated solely to adaptive equalizer circuitry, making the adaptive equalizer the most expensive signal processing on the chip. Implementation methods which reduce such burden are therefore desirable. The present invention relates to the efficient implementation of adaptive equalizers.
The present invention is related to the efficient implementation of adaptive equalizers for high-speed communication systems, using finite precision arithmetic, for example, as implemented in a silicon ASIC.
Other aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which:
Forward processing block 330 encompasses multiple prior art signal processing functions, and may include circuitry for adaptive forward filtering, carrier recovery, error term generation, et al., for example. See “Phase detector in a carrier recovery network for a vestigial sideband signal,” U.S. Pat. No. 5,706,057 issued Jan. 6, 1998, by C. H. Strolle et al., for carrier recovery techniques suitable to VSB signals. For QAM signals, decision-directed carrier estimation techniques are described in Chapter 16 of Digital Communication—Second Edition, Lee and Messerschmitt, Kluwer Academic Publishers, Boston, Mass., 1997. See Theory and Design of Adaptive Filters, New York, John Wiley and Sons, 1987, by Treichler et al for a description of adaptive filters, including forward adaptive filtering and error term generation.
Forward processing block 330 receives input samples from front end signal processing blocks of the digital receiver, for example, as shown in
Adder 340 sums x(k) with feedback filter 370 output w(k) to provide sample y(k), referred to as the soft-decision sample. Soft decision sample y(k) is provided to slicer 360. Slicer 360 produces a symbol estimate (also referred to as a hard decision sample). Slicer 360 can be a nearest-element decision device, selecting the source symbol with minimum Euclidean distance to the soft decision sample, or can take advantage of the channel coding. For example, a partial trellis decoder is used as slicer 360 in “A method of estimating trellis encoded symbols utilizing simplified trellis decoding,” U.S. Pat. No. 6,178,209, issued Jan. 23, 2001, by S. N. Hulyalkar et al. Slicer 360 may also receive an input signal from forward processing block 330, for example, including sine and cosine terms which may be used for rotation and de-rotation in accordance with previously cited prior art techniques.
The output from slicer 360 is used to form regressor sample z(k) for feedback filter 370. Feedback filter 370 receives regressor samples z(k) and produces output sample w(k) to adder 340. Feedback filter 370 is usually implemented with adaptive coefficients, and is therefore provided error term e(n) for coefficient adjustment. Error term e(n) may be generated in forward processing block 330 or elsewhere in the receiver architecture.
The adaptive filter contained in forward processing block 330 and feedback filter 370 may be comprised of real- or complex-valued coefficients, may process real- or complex-valued data, and may adjust coefficients or blocks of coefficients using real- or complex-valued error.
For an adaptive filter updated with a stochastic gradient descent rule, the adaptation process can generally be written as
{right arrow over (ƒ)}n+1={right arrow over (ƒ)}n−{right arrow over (ƒ)}n>>ρ
−
{right arrow over (r)}n−∂*·en−∂>>μ
where {right arrow over (ƒ)} is a vector of adaptive filter coefficients, μ is the stepsize (assumed time-invariant for simplicity), {right arrow over (r)}n* is the conjugated regressor vector of inputs to the adaptive filter, en is an adaptive error term, and ρ is a leakage term. We immediately assume that both the stepsize and leakage will be implemented with a shift, instead of a pure multiplier, and are denoted by >> operator to mean a shift down by the designated number of bits. Delay ∂ is ideally zero, though a small number is usually tolerable to ease timing constraints for implementation.
The stochastic gradient descent update style is perhaps the most commonly implemented adaptive filter architecture, due to its relatively low computational burden. For a succinct but thorough study of this and other adaptive filter theories, see Theory and Design of Adaptive Filters, New York, John Wiley and Sons, 1987, by Treichler et al. See this reference also for descriptions of common adaptive error terms such as the Least Mean Squares algorithm (LMS) and Constant Modulus Algorithm (CMA), as well as for a study of leakage and stepsize selection.
For full-complex arithmetic, i.e., the regressor data in the filter's tapped delay line, the error terms used to update the filter coefficients, and the filter coefficients themselves are all complex-valued, the above update equation for the filter coefficients is split into complex components, and the equation can be re-written as
{right arrow over (ƒ)}n+1l={right arrow over (ƒ)}nl−{right arrow over (ƒ)}nl>>ρ
−
({right arrow over (r)}n−∂l·en−∂l)+({right arrow over (r)}n−∂Q·en−∂Q))>>μ
{right arrow over (ƒ)}n+1Q={right arrow over (ƒ)}nQ−{right arrow over (ƒ)}nQ>>ρ
−
(({right arrow over (r)}n−∂l·en−∂Q)−({right arrow over (r)}n−∂Q·en−∂l))>>μ
where (·)l denotes in-phase component and (·)Q denotes quadrature-phase component.
By separating into in-phase and quadrature-phase components, it is clear that four multipliers and two additions are needed for the inner product of regressor data and error term, which must necessarily be calculated separately for each adaptive filter coefficient. Even with resource sharing possible due to over-clocking, the number of multiplies and adds required for this adaptation still stands as a major contributor to silicon area.
Analogously for the quadrature phase side, adder 440 subtracts the result of multiplier 420 from the result of multiplier 460, which form the products of quadrature-phase regressor data and in-phase error term, and in-phase regressor data and quadrature-phase error term, respectively. The stepsize μ is applied to the output of adder 440 in barrel shift 480 to form the update term for the quadrature-phase adaptive filter coefficient.
Observe that four multipliers and two adders are needed to form the update terms according to this standard prior art technique.
After some algebraic manipulations, however, the adaptation equations can be re-written as
{right arrow over (ƒ)}n+1l={right arrow over (ƒ)}nl−{right arrow over (ƒ)}nl>>ρ
−
(({right arrow over (r)}n−∂l+{right arrow over (r)}n−∂Q)·en−∂l+{right arrow over (r)}n−∂Q·(en−∂Q−en−∂l))>>μ
{right arrow over (ƒ)}n+1Q={right arrow over (ƒ)}nQ−{right arrow over (ƒ)}nQ>>ρ
−
(−({right arrow over (r)}n−∂l+{right arrow over (r)}n−∂Q)·en−∂l+{right arrow over (r)}n−∂l·(en−∂Q−en−∂l))>>μ
In these equations, the number of multiplies needed for the update term (inner product of regressor data and error term) is reduced from four to three, at the expense of one additional adder, for each adaptive filter coefficient—observe that the product of in-phase error term with sum of in-phase and quadrature-phase components of regressor data is common to both equations. Since a multiplier requires more silicon area than an adder, there is a potential for significant implementation savings. Sum and differences of error terms are used, rather than just the error terms themselves, and are calculated common to all equalizer coefficients.
To illustrate the implementation of these reduced-complexity equations as for a silicon ASIC, the following finite-precision notation is adopted. A signal is represented in two's complement notation by <m,n,s> where m is the total number of bits used to represent the signal, n is the number of integer bits, and s=0 denotes an unsigned number, while s=1 denotes a signed number. As an example, consider a signal represented as <6,2,1>, which can be written as
s2l20·2−12−22−3=sxx·xxx
and has a range of values from −4 to +(4−2−3).
Multiplier 730 multiplies the in-phase part of the regressor sample with the sum error esum(k) from error term preparation module 520 and the result is added to the <25,8,1> output of multiplier 710 in adder 725, producing the <26,9,1> value that will be operated on by stepsize and leakage and used to update the quadrature-phase component of the complex-valued adaptive coefficient.
The update of the in-phase component of the complex-valued coefficient is next described; update of the quadrature-phase component is analogous, based on the output of adder 725 instead of adder 715. The <26,9,1> output of adder 715 is shifted down by an integer number of bits in barrel shift 735 according to the stepsize value μ. The stepsize value μ is an unsigned integer between 0 and 15, represented as <4,4,0>, and assigns a shift value of (μ+11) to barrel shift 735 for nonzero μ, thus accomplishing the shift range of 12 to 26. If μ is zero, the output of barrel shift 735 is zeroed. The output of barrel shift 735 extends to <40,−3,1> to accommodate the complete shift range possible, and is truncated to <29,−3,1> in format 740. This <29,−3,1> output of format 740 is applied to the adder 745 which produces the updated coefficient.
In this adaptation circuitry, the coefficients are updated and stored at a higher precision than what is used in the filtering process of the adaptive filter; this prior art implementation detail helps save silicon area, compared to constraining the coefficient to the same bit width in filtering and adaptation processes, as studied in “Effects of finite bit precision on the constant modulus algorithm,” by L. Litwin et al., in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Phoenix, Ariz., April, 1999.
The <35,3,1> updated coefficient produced by adder 745 is truncated to <32,1,1> in format 760 and stored in register 765. Format 770 truncates the stored coefficient to <12,1,1> and this result is used in the filtering process, and for leakage in barrel shift 775. Barrel shift 775 applies a shift value of (ρ+16) to barrel shift 775 for nonzero ρ, thus accomplishing the shift range of 17 to 31. If ρ is zero, the output of barrel shift 775 is zeroed. The output of barrel shift 775 extends to <26,−16,1> to accommodate the complete shift range possible, and is truncated to <16,−16,1> in format 755. This <16,−16,1> output of format 755 is summed with the <32,2,1> stored coefficient from register 765 in adder 750, and the <34,2,1> result is supplied to adder 745, which produces the next updated coefficient.
According to the present invention, for an adaptive filter using real-valued signal processing, the error term from error term generation module 510 is split into mantissa and exponent (e=eMant·2−eExp) to reduce the multiplier sizes needed to update the adaptive filter coefficients, and the adaptation equation is written as
{right arrow over (ƒ)}n+1={right arrow over (ƒ)}n−{right arrow over (ƒ)}n>>ρ
−
({right arrow over (x)}n−∂·eMantn−∂)>>(μ+eExpn−∂)
The mantissa portion of the error term is reduced-precision compared to the error term from error term generation module 510, and the exponent portion of the error term is a shift by a power of two, so is added to the stepsize shift value, and a single barrel shift can accomplish application of stepsize and error term exponent. Thus, silicon area is saved.
In this adaptation circuitry, the coefficients are updated and stored at a higher precision than what is used in the filtering process of the adaptive filter; this prior art implementation detail helps save silicon area, compared to constraining the coefficient to the same bit width in filtering and adaptation processes, as studied in “Effects of finite bit precision on the constant modulus algorithm,” by L. Litwin et al., in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Phoenix, Ariz., April, 1999.
The <35,3,1> updated coefficient produced by adder 925 is truncated to <32,1,1> in format 940 and stored in register 945. Format 950 truncates the stored coefficient to <12,1,1> and this result is used in the filtering process, and for leakage in barrel shift 955. Barrel shift 955 applies a shift value of (ρ+16) to barrel shift 955 for nonzero ρ, thus accomplishing the shift range of 17 to 31. If ρ is zero, the output of barrel shift 955 is zeroed. The output of barrel shift 955 extends to <26,−16,1> to accommodate the complete shift range possible, and is truncated to <16,−16,1> in format 935. This <16,−16,1> output of format 935 is summed with the <32,2,1> stored coefficient from register 945 in adder 930, and the <34,2,1> result is supplied to adder 925, which produces the next updated coefficient.
One skilled in the art would understand that the equations described herein may include scaling, change of sign, or similar constant modifications that are not shown for simplicity. One skilled in the art would realize that such modifications can be readily determined or derived for the particular implementation. Thus, the described equations may be subject to such modifications, and are not limited to the exact forms presented herein.
As would be apparent to one skilled in the art, the various functions of equalization, signal combining, carrier correction, and automatic gain control may be implemented with circuit elements or may also be implemented in the digital domain as processing steps in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the principle and scope of the invention as expressed in the following claims.
This application claims the benefit of Provisional Application No. 60/966,620, filed Aug. 29, 2007.
Number | Date | Country | |
---|---|---|---|
60966620 | Aug 2007 | US |