1. Field of the Invention
The present invention relates to filter adaptation, for example in communications such as wireline communications.
2. State of the Art
Broadband communications solutions, such as HDSL2/G.SHDSL (High-Speed Digital Subscriber Line) are increasingly in demand. The ability to achieve high data rates (e.g., 1.5 Mbps and above) between customer premises and the telephone system central office over existing (unconditioned) telephone lines requires exacting performance. Various components of a high-speed modem that contribute to this performance require training, e.g., a timing section (PLL, or phase lock loop), an adaptive equalizer, an adaptive echo canceller. Typically, these components are all trained in serial fashion, one after another, during an initial training sequence in which known data is transmitted between one end of the line and the other.
Equalization is especially critical for HDSL2/G.SHDSL2 modems, which are required to operate over various line lengths and wire models and wirelines with and without bridge taps, with extremely divergent cross-talk scenarios. In general, intersymbol interference (ISI), which equalization aims to eliminate, is the limiting factor in XDSL communications. Hence, good equalization, characterized by the ability to accurately compute the optimal channel equalizer coefficients at the start-up phase of the modem and adaptively update those coefficients to accommodate any change in the level of cross-talk, is essential to any HDSL2/G.SHDSL system.
Known training methods for high-speed modems suffer from various disadvantages. Existing commercial products invariably use a Least Mean Squares (LMS) training algorithm, which is assumed to converge to an optimal training solution. The LMS algorithm is well-known and has generally been found to be stable and easy to implement. Conventional wisdom holds that the steady-state performance of LMS cannot be improved upon. Despite the widespread use of LMS and its attendant advantages, the adequacy of performance of LMS is being tested by the performance requirements of high-speed modems.
Nor are the alternatives to LMS particularly appealing. Other proposed algorithms have chiefly been of academic interest. The Recursive Least Squares (RLS) algorithm, for example, requires a far shorter training time than LMS (potentially one tenth the training time needed for LMS), but RLS entails exceedingly greater computational complexity. If N is the total number of taps in an adaptive filter, then the complexity of RLS is roughly N2, as compared to 2N for LMS. Also, RLS is less familiar and less tractable, suffering from stability problems.
An improved RLS algorithm (“fast RLS”) considerably reduces the computational complexity of RLS, from N2 to 28N. The original fast RLS algorithm is described in Falconer and Ljung, Application of Fast Kalman Estimation to Adaptive Equalization, IEEE Transaction on Communications, Vol. COM-26, No. 10, Oct. 1978, incorporated herein by reference. The fast RLS algorithm, however, requires that training be performed on contiguous data symbols. If training is performed “on-line,” then a high-performance processor is required to perform training computations at a rate sufficient to keep pace with the data rate, e.g., 1.5 Mbps or greater. Although the computational demand (demand for MIPs) “spikes up” during training, once training is completed, computational demands are modest. If training is performed “off-line” using stored data samples, then the processor need not keep up with the data rate, reducing peak performance requirements. However, a potentially long sequence of training data must be stored to satisfy the requirement of the algorithm for contiguous data, requiring a sizable memory. Again, the memory requirement, like training itself, is transient. Once training has been completed, the need for such a large memory is removed.
Apart from training, because communications channels vary over time, continuous or periodic filter adaptation is required. In the case of rapidly varying channel conditions, as in wireless communications and especially mobile wireless communications, and in the case of especially long filters relative to adaptation processing power, the use of RLS is indicated. In wireline communications, these conditions are typically not present. Even in the demanding case of HDSL2/G.SHDSL, filter lengths are moderate and channel variation can be considered to be slow. To applicant's knowledge, all wireline modems use LMS “on-line” for non-training filter adaptation.
Although the error criteria used by the LMS and RLS algorithms differ, the prevalent mathematical analysis of these algorithms suggests that the algorithms converge to the same solution, albeit at different rates. LMS uses mean squared error, a statistical average, as the error criterion. RLS eliminates such statistical averaging. Instead, RLS uses a deterministic approach based on squared error (note the absence of the word mean) as the error criterion. In effect, instead of the statistical averaging of LMS, RLS substitutes temporal averaging, with the result that the filter depends on the number of samples used in the computation. Although the prevalent mathematical analysis predicts equivalent performance for the two algorithms, the mathematical analysis for LMS is approximate only. Although a mathematically exact analysis of LMS has recently been advanced, the overwhelming complexity of that analysis defies any meaningful insight into the behavior of the algorithm and requires numeric solution.
There remains a need, particularly in high-speed wireline communications, for a filter adaptation solution the overcomes the foregoing disadvantages, i.e., that achieves greater optimality without requiring undue computational resources.
The present invention, generally speaking, uses adaptation based on a least squares error criterion to improve performance of a wireline modem or the like. In accordance with one aspect of the invention, a high-speed, broadband, wireline modem includes an adaptive equalizer having both a training mode and a decision-directed, non-training mode, the adaptive equalizer including a memory for storing received signal samples; a forward path coupled to receive the signal samples, the forward path including a forward filter and a decision element; a feedback path coupled between an output of the decision element and an input of the decision element, the feedback path including a feedback filter; wherein the combined length of the forward filter and the feedback filter is moderate relative to adaptation processing power; and an adaptation circuit or processor for adapting the forward filter and the feedback filter is based on a least squares error criterion, as distinguished from a least mean squares error criterion. A lower noise floor is thereby achieved. The resulting improved noise margin may be used to “buy” greater line length, better quality of service (QoS), higher speed using denser symbol constellations, greater robustness in the presence of interference or noise, lower-power operation (improving interference conditions) or any combination of the foregoing. In accordance with another aspect of the invention, an adaptation algorithm based on the least squares error criterion is provided for use during training of a high-speed, broadband modem. The algorithm converges to a more optimal solution than LMS. Furthermore, the algorithm achieves a high level of robustness with decreased computational complexity as compared to known algorithms. The algorithm is well-suited for fixed-point implementation. Significantly, unlike known algorithms, the algorithm allows for reinitialization and the use of non-contiguous data. This features allows for a wide spectrum of system initialization strategies to be followed, including strategies in which training of multiple subsystems is interleaved to achieve superior training of multiple subsystems and hence the overall system, strategies tailored to meet a specified computational budget, etc.
The present invention may be further understood from the following description in conjunction with the appended drawings. In the drawings:
Referring to
Since ISI, which equalization aims to eliminate, is typically the limiting factor in XDSL communications, the focus of the following description will be equalizer training. The same principles, however, may be applied to the training of various different communications subsystems.
Referring to
An important, even startling, discovery of the present inventors is that RLS-type algorithms, apart from converging faster, converge to a lower noise floor than the LMS algorithm. That is, better equalization can be performed using the RLS-type algorithms than with LMS. This result is illustrated in FIG. 3. Only in the exacting environment of high-speed, wide-band wireline modems such as HDSL2/G.SHDSL does this important difference come to the fore. In fact, experiments have shown that in this environment, even if an adaptive filter is set to a near-optimal solution obtained using an RLS-type algorithm, if the LMS algorithm is then used, the filter settings will actually diverge from the near-optimal solution. A great incentive therefore exists to use an RLS-type algorithm instead of the prevalent LMS algorithm. Impediments to the use of RLS-type algorithms in this environment include computational complexity and instability.
Although the computational complexity of the fast RLS algorithm is greatly reduced, it remains significant. The computational complexity of adaptation is measured in terms of the number of multiplications and/or divisions required per filter coefficient times the total number of filter coefficients N for the structure. Although the present invention may be used with equalizers of other structures and in other applications of adaptive filters, the invention will be described with respect to the exemplary embodiment of FIG. 2.
Whereas the original fast RLS algorithm requires 28N multiplications and matrix inversion, the computational complexity of the present “RLC-Fast” algorithm is 22N multiplications and involves 2 divisions. This improvement in computational efficiency is achieved by efficiently rewriting the original algorithm. Note that there are algorithms with computational complexity as low as 17N; however, they are very susceptible to error accumulation, and are hard to stabilize without the use of additional correction terms. In the case of a fixed-point equalizer implementation, stability is crucial for overall system reliability. The computational complexity of RLC-fast is reduced without significantly degrading the stability of the algorithm.
Referring to
Referring to
Derivation of the RLC-fast algorithm from the original algorithm and the computational advantages of the RLC-fast algorithm are described in detail in Appendix B.
A fixed-point implementation of the RLC-fast algorithm is desirable to reduce computational load and hence increase the speed of the algorithm, as well as to avoid the cost and increased power consumption of a floating-point processor. Because of the underlying stability issues of RLS-type algorithms, such a fixed-point implementation must be carefully considered. The binary point cannot be assumed to be at the beginning just after the sign bit—i.e., all numbers within [−1, 1)—to avoid saturation of the variables, since, for some of the internal variables, the actual values may become larger than 1.
Key elements for successful implementation of the RLC-fast algorithm include: (1) Appropriate scaling of the input variables; (2) the position of the binary point for internal variables; (3) efficient internal scaling of the variables after multiplication and division to reduce loss of precision; (4) complete analysis of the dynamic range of various internal variable; and (5) judicious choice of delta (δi) and lambda (λ) for convergence speed and stability.
A currently preferred implementation assumes 32-bit precision for all the variables, with all the numbers being of signed integer form. The integer numbers are given a floating point interpretation in which the leading bit is the sign bit, followed by a 5-bit integer part and a 26-bit fractional part. Multiplication and division are performed assuming the foregoing interpretation of the integer variables. There occur two divisions per update. Both are computed as 11(1+x) instead of 1/x to reduce the loss in precision.
A more detailed description of the RLC-fast algorithm is given in Appendix A (implemented in fixed point arithmetic for the DSP TI-C6x).
Due to the high data rate of the HDSL2/G.SHDSL system, for moderate-size problems (N about 100), the RLC-fast algorithm, even with its reduced complexity, poses a high computational burden on a typical processor (say, an X MIPS processor). In many modems, RLC-fast will be executed only once at the start-up phase of the modem and will not be used in the steady-state, which is the normal operating state for the modem.
Hence, although it may be feasible to deploy a high-speed, power-hungry DSP for on-line execution of RLC-fast, such a measure adversely impacts power consumption and may not be cost effective. As a result, off-line implementation of RLC-fast will often be the preferred alternative.
However, off-line implementation itself raises problems. The RLS-type algorithm requires a certain data length in order to converge to a near optimal value. The convergence time is a function of the so-called forgetting factor. An aggressive choice of the forgetting factor can be used to reduce the required data length but at the cost of stability.
A reasonable choice for the forgetting factor may require a long data length (say, 100N) for convergence. This in turn implies a large storage requirement even for a moderate size problem. Once again, if this memory is only used during the start-up phase, a straight-forward implementation wastes large amounts of silicon and results in inefficient design.
The original fast RLS algorithm offers no solution to the foregoing problem. Referring more particularly to
To circumvent the requirement of a contiguous data stream, RLC-fast uses a re-initialization scheme that allows the use of a non-contiguous data block without restarting the algorithm. At start-up, the algorithm is initialized in the usual way. However, the algorithm can be stopped at any time and started at a later time with a new initialization. This manner of operations is illustrated in FIG. 8. No difference in performance is observed if individual data blocks are not too small (say, no smaller than 10N). Hence, storage requirements may be reduced by an order of magnitude (e.g., 10N instead of 100N).
The particulars of re-initialization are illustrated in FIG. 9. Instead of setting the intermediate variables to zero or a scaled identity matrix, the previous values are used for all variables except Xfast. The variables Afast, Ffast, Kfast, bn, Dfast, and Cfast are all stored for this purpose.
The foregoing re-initialization capability allows for a store/process mode of operation. More particularly, even with the reduced complexity of RLC-fast, the amount of computation required for real-time processing of moderate size problems can be prohibitive for most DSPs due to the high data rate of the system. To alleviate this problem, a store/process mode of operation is followed in which, during the first half of a cycle, a small block of data (e.g., size 10N) is stored, and during the second half of the cycle, the data is processed to update the filter coefficients. Instead of operating in real-time, since the data is stored, each update need not be finished within the sample time T. Instead, the computation can be distributed over multiple sample periods.
One approach is to partition the computation of the update for each data sample in small enough segments such that an individual segment can be finished in one sample time. The smaller the partition, the less processing is required each sample period. Total time to finish the update increases. Hence, store/process operation, along with partitioning of the update computation, provides a flexible mechanism that allows for trade-off between processing load and total time to process a data block. Without the capability of re-initialization, this flexibility is not obtainable.
The same flexibility may be extended from the adaptive equalizer or other isolated sub-system to the system as a whole, in such as way as to achieve not only great flexibility but also improved performance. In reality, the performance of each sub-system is interdependent on the performance of other sub-systems and should not be viewed in isolation. Referring to
It will be appreciated by those of ordinary skill in the art that the invention can be embodied in other specific forms without departing from the spirit or essential character thereof. The presently disclosed embodiments are therefore considered in all respects to be illustrative and not restrictive. The scope of the invention is indicated by the appended claims rather than the foregoing description, and all changes which come within the meaning and range of equivalents thereof are intended to be embraced therein.
Number | Name | Date | Kind |
---|---|---|---|
5247541 | Nakai | Sep 1993 | A |
5398259 | Tsujimoto | Mar 1995 | A |
5539774 | Nobakht et al. | Jul 1996 | A |
5581585 | Takatori et al. | Dec 1996 | A |
5757855 | Strolle et al. | May 1998 | A |
5909426 | Liau et al. | Jun 1999 | A |
6393073 | Eilts | May 2002 | B1 |
6643676 | Slock et al. | Nov 2003 | B1 |
Number | Date | Country |
---|---|---|
0428129 | Nov 1989 | EP |
0880253 | Nov 1998 | EP |
WO 9939443 | Aug 1999 | WO |