The present invention relates to digital communications. More specifically, the present invention relates to low latency architectures for high-throughput Viterbi decoders used in, for example and without limitation, digital communication systems, magnetic storage systems, serializer-deserializer (SERDES) applications, backplane transceivers, and high-speed wireless transceivers.
Convolutional codes are widely used in modern digital communication systems, such as satellite communications, mobile communications systems, and magnetic storage systems. Other applications include serializer/deserializers (SERDES), backplane transceivers, and high-speed wireless transceivers. These codes provide relatively low-error rate data transmission. The Viterbi algorithm is an efficient method for maximum-likelihood (ML) decoding of convolutional codes.
In hardware implementation, a conventional Viterbi decoder is composed of three basic computation units: branch metric unit (BMU), add-compare select unit (ACSU), and survivor path memory unit (SMU). The BMU and the SMU are only composed of feed-forward paths. It is relatively easy to shorten the critical path in these two units by utilizing pipelining techniques. However, the feedback loop in the ACS unit is a major bottleneck for the design of a high-speed Viterbi decoder. A look-ahead technique, which combines several trellis steps into one trellis step in time sequence, has been used for breaking the iteration bound of the Viterbi decoding algorithm. One iteration in an M-step look ahead ACS unit is equivalent to M iterations in the non-look-ahead or sequential implementation. Thus, the speed requirement on the ACS unit for a given decoding data rate is reduced by M times. However, the total number of parallel branch metrics in the trellis increases exponentially as M increases linearly. On the other hand, the latency of the ACS pre-computation is relatively long due to the M-step look-ahead, especially when M is large. Generally, the ACS pre-computation latency increases linearly with respect to M.
What are needed are methods and systems to reduce the complexity and latency of the ACS pre-computation part in high-throughput Viterbi decoders.
Convergence of parallel trellis paths of length-K, where K is the encoder constraint length, is taught in U.S. Pat. No. 7,308,640, titled, “Low-Latency Architectures for High-Throughput Viterbi Decoders,” issued Dec. 11, 2007, to Parhi, et. al., and U.S. Provisional Patent Application No. 60/496,307, titled, “Low-Latency Architectures for High-Throughput Viterbi Decoders,” filed on Aug. 19, 2003, incorporated herein by reference above (hereinafter, the “'640 patent” and the “'307 application,” respectively).
The '640 patent and the '307 application teach to combine K trellis steps in a first layer into M/K sub-trellis steps, and to combine the resulting sub-trellises in a tree structure. This K-nested layered M-step look-ahead (LLA) method can efficiently reduce latency.
Disclosed herein are methods and systems that combine K1 trellis steps in a first layer of a K-nested layered M-step Viterbi decoder, where K≦K1<M. Parallel paths exist when K1≧K. Thus, ACS-pre-computation of K1 stages can be combined in the first layer. Although the increased look-ahead steps in the first layer may increase latency in the first layer, the total number of layers can be reduced, thus decreasing the overall latency. A K1-nested LLA, as disclosed herein, can be implemented to reduce hardware complexity, for example, when K is relatively large.
Further embodiments, features, and advantages of the present invention, along with structure and operation of various embodiments of the present invention, are discussed in detail below with reference to the accompanying figures.
The present invention is described with reference to the accompanying figures. The accompanying figures, which are incorporated herein and form part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the relevant art to make and use the invention.
In conventional M-step look-ahead Viterbi decoding, M look-ahead steps are performed to move the combined branch metrics computation outside of the ACS recursion loop. This allows the ACS loop to be pipelined or computed in parallel. Thus, the decoding throughput rate can be increased.
{circumflex over (λ)}00n=λ00n+λ00n+1 EQ. (1)
Compared with the original trellis diagram 102 in
γ0n+2=max{γ0n+{circumflex over (λ)}00n,γ1n+{circumflex over (λ)}10n,γ2n+{circumflex over (λ)}20n,γ3n+{circumflex over (λ)}30n} EQ. (2)
The number of required 2-input adders and compare-select (CS) units for this example are 2K−1×2K−1 and 2K×(2K−1−1), respectively.
{λ00n+λ00n+1+λ00n+2λ00n+3,λ01n+λ12n+1+λ20n+2λ00n+3,λ00n+λ01n+1+λ12n+2+λ20n+3,λ01n+λ13n+1+λ32n+2+λ20n+3}
Other branch metrics in the resulting 4-step look-ahead trellis diagram can be obtained by using similar methodology.
In general, where K=3, and M>3, using similar strategy as in the example of
Hardware costs for a conventional M-step look-ahead technique are in the order of O((2K−1))2=O(4K), and increase with the look-ahead step M. On the other hand, the latency of the resulting Viterbi decoder increases linearly as M increases. For a high-throughput Viterbi decoder, the look-ahead step M is usually very large. The long latency associated with a high-throughput Viterbi decoder is a drawback. Thus, it would be useful to reduce the latency when M is very large.
The '640 patent and the '307 application, incorporated by reference above, teach to combine K trellis steps in a first layer into M/K sub-trellis steps, and to combine the resulting sub-trellises into a tree structure. Unlike conventional M-step look-ahead methods, the LLA method first combines M steps into M/K groups, and performs K-step look-ahead in parallel for all M/K groups. The resulting M/K sub-trellises are then combined in a tree structure, or a layered manner. The K-nested layered M-step look-ahead (LLA) method can efficiently reduce latency, for example, when M is relatively large.
where Lkong
The methods and systems taught in the '640 patent and the '307 application are suitable for many applications. Disclosed herein are methods and systems that reduce hardware complexity, for example, when K is relatively large, and that reduce ACS pre-computation latency for essentially any level of parallelism, M, including when M is a non-integer multiple of the encoder constraint length K.
As described above, the number of parallel paths from one state to the same state is 2M−K−1 for an M-step conventional look-ahead. Thus, parallel paths exist when the look-ahead step is larger than the encoder constraint length K. In addition, the resulting trellis diagram is fully connected when M>K. This makes it possible to combine the resulting sub-trellises into a tree structure.
As disclosed herein, K-nested layered M-step look-ahead (LLA) techniques are implemented with a look-ahead step K1, where K≦K1≦M, and where M can be an integer multiple of K and/or K1, or can be a non-integer multiple of K and/or K1. The look-ahead step M is effectively decomposed into
groups combined with K1 steps, and, where M is not an integer multiple of K1, a remainder group combined with mod (M, K1) steps. The notation mod (M, K1) represents the remainder of dividing M by K1. The resulting
sub-trellises are then combined into a tree structure.
The methods and systems disclosed herein can be applied to high-throughput Viterbi decoding for any M greater than K. Increasing the look-ahead step from K to K1 will typically increase the latency in the first layer. However, the number of sub-trellises is also reduced due to the increased look-ahead step K1 in the first layer. Thus, the overall latency of M-step look-ahead can be reduced. The overall hardware complexity is controllable by adjusting K1. Example embodiments are provided below. The invention is not, however, limited to the examples herein.
In this example, M is a multiple of K1. M can thus be decomposed into 2 groups, and each group is combined with K1 steps.
Table I summarizes hardware complexity and latency for example 1, compared with a conventional design and a K-nested LLA implementation as taught in the '640 patent and the '307 application. It can be seen that the K1-nested LLA implementation of example 1 utilizes 64 fewer adders than the K-nested LLA implementation. Although latency of example 1 is 3 clock cycles greater that the K-nested LLA implementation, it is significantly less than a conventional implementation.
In this example, M is a multiple of K, but not a multiple of K1.
sub-trellises obtained by performing K1-step look-ahead and a remainder sub-trellis obtained by performing 3-step look-ahead, as mod (M, K1)=3. Continuing to combine these sub-trellises in a tree manner provides the trellis diagrams 806 (layer 3) and 804 (layer 4) shown in
Table II summarizes hardware complexity and latency for example 2, compared with a conventional design and a K-nested LLA implementation. It can be seen that the K1-nested LLA example 2 utilizes 64 fewer adders than the K-nested LLA implementation. In addition, latency is reduced by 1 clock cycle.
In this example, M is neither a multiple of K nor of K1.
sub-trellises obtained by performing K1-step look-ahead and a remainder sub-trellis obtained by performing 3-step look-ahead, as mod (M, K1)=3. Combining these sub-trellises in a tree manner leads to trellis diagram 906 (layer 3) shown in
sub-trellis obtained by performing K1-step look-ahead and a remainder sub-trellis obtained by performing 5-step look-ahead, as mod (M, K1)=5.
Examples 3 and 4 illustrate the effect of K1on the overall complexity and latency. Table III summarizes hardware complexity and latency for examples 3 and 4, compared with a conventional design. It can be seen that by increasing K1, the overall hardware cost is reduced while latency increases. Thus, optimum K1 can be selected according to different system requirements on latency and hardware cost.
In general, for a high-throughput Viterbi decoder design with M-step look-ahead, the hardware complexity of a K1-nested LLA architecture can be computed as:
The latency of the ACS pre-computation for the K1-nested LLA architecture can be computed as:
An example comparison between a K-nested LLA implementation and a K1-nested LLA implementation is provided below for a 4-state 10 Gigabit/second serializer/deserializer (SERDES), having a latency requirement of less than 60 ns. A K-nested LLA architecture, implemented with 0.13-um technology, and 48-stage ACS pre-computation, has a latency of 15×1.5=22.5 ns. A K1-nested LLA architecture, implemented with 0.13-um technology, has an ACS pre-computation latency of 18×1.5=27 ns. Both implementations meet the latency requirement, while the K1-nested LLA reduces hardware complexity by 9.47%.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by those skilled in the art that various changes in form and details can be made therein without departing from the spirit and scope of the invention as defined in the appended claims. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
This application is a continuation-in-part of U.S. Utility patent application Ser. No. 10/922,205, titled, “Low-Latency Architectures for High-Throughput Viterbi Decoders,” filed Aug. 19, 2004 (to issue on Dec. 11, 2007 as U.S. Pat. No. 7,308,640), which claims the benefit of U.S. Provisional Patent Application No. 60/496,307, titled, “Low-Latency Architectures for High-Throughput Viterbi Decoders,” filed on Aug. 19, 2003, both of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60496307 | Aug 2003 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10922205 | Aug 2004 | US |
Child | 11951822 | Dec 2007 | US |