Not Applicable
1. Technical Field
This invention relates in general to communications and, more particularly, to a Viterbi decoder using a cascaded add-compare-select (ACS) circuit.
2. Description of the Related Art
Many electronic devices use error correction techniques in conjunction with data transfers between components and/or data storage. Error correction is used in many situations, but is particularly important for wireless data communications, where data can easily be corrupted between the transmitter and the receiver. In some cases, errant data is identified as such and retransmission is requested. Using more robust error correction schemes, however, errant data can be reconstructed without retransmission.
One popular error correction technique uses Viterbi decoding to detect and correct errors in a data stream from a convolution encoder. A Viterbi decoder determines costs associated with multiple possible paths between nodes. After a specified number of stages, the node with the minimum associated cost is chosen, and a path is traced back through the previous stages. The data is decoded based on the selected path. To calculate the path with the lowest cost, add-compare-select (ACS) units are used.
As wireless communication becomes more popular, faster speeds are very desirable. Accordingly, higher speeds are required from the Viterbi decoders. As an example, current 802.1 n wireless LAN devices have data rates of 320 Mbps (mega-bits per second) up to 640 Mbps, while MB-OFDM (Multi-Band Orthogonal Frequency-Division Multiplexing) devices have a current maximum data rate of 480 Mbps. An ACS having a Radix-2 architecture, which processes one bit per clock, requires a clock rate of 320 MHz to maintain a 320 Mbps data stream or a clock rate of 640 MHz to maintain a 640 Mbps data stream. The clock rate can be reduced if a Radix-4 architecture is used, because a Radix-4 architecture processes two bits per clock. Similarly, a Radix-8 architecture processes three bits per clock and a Radix-16 architecture processes four bits per clock. Unfortunately, as the radix is increased, the gate count complexity is exponentially increased, resulting in very complex and costly circuits.
Therefore, a need has arisen for a high-speed Viterbi decoder using an ACS unit with a lower gate count.
In the present invention, a Viterbi decoder includes a branch metric unit for generating branch metrics between two states at two different time periods, a traceback unit, a traceback memory and an add-compare-select circuit. The add-compare-select circuit includes a plurality of cascaded add-compare-select sub-circuits, each add-compare-select sub-circuit calculating a path metric responsive to a plurality of branch metrics from the branch metric unit and a plurality of pre-calculated path metrics, where at least one of the add-compare-select sub-circuits receives a set of pre-calculated path metrics from another one of the other add-compare-select sub-circuits.
The present invention provides an architecture by which the number of information bits processed per clock cycle can be increased without increasing the number of adders/bit processed per clock cycle. This can greatly reduce the cost and complexity of the Viterbi decoder.
For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
a through 6d are trellis diagrams showing the calculation of path metrics through the trellis diagram;
a through 8d illustrate operation of the prior art Viterbi decoder of
a,
9
b and 9c illustrate block diagrams of Radix-2, Radix-4 and Radix-4 fast ACS units;
The present invention is best understood in relation to
For illustration of convolutional encoding, an example using a k=1, n=2 structure is shown in
The convolutional encoder 12 has a constraint length (K) of 3, meaning that the current output is dependent upon the last three inputs. The dependency on previous values to affect the encoded data output allows the Viterbi decoder to reconstruct the data despite transmission errors. Convolutional decoders are often classified as (n,k,K) encoders; hence the encoder shown in
The “state” of the encoder 12 is defined as the outputs of the flip-flops 18 and 24. Thus the state of encoder 12 can be notated as “(output of FF 18, output of FF 24)”. A state diagram for the encoder of
The state diagram of
The encoded output “11 10 00 01” will be transmitted to a receiving device with a Viterbi decoder. The two-bit encoded outputs are used to reconstruct the data. By convention, a data transmission begins in state “00”. Hence, the first encoded output “11” would signify that the first input data bit was a “1” and the next state was “10”. Assuming no errors in transmission, the data input could be determined by state diagram of
However, in real-world conditions, the encoded data may be corrupted during transmission. In essence, the Viterbi decoder 16 traces all possible paths, maintaining a “path metric” for each path, which accumulates differences (“branch metrics”) between the each of the encoded outputs actually received and the encoded outputs that would be expected for that path. The path with the lowest path metric is the maximum likelihood path.
The Viterbi decoder 16 can also trace all possible paths, accumulating the correlation between the each of the encoded outputs actually received and the encoded outputs that would be expected for that path. If this correlation metric is used, the path with the highest path metric is the maximum likelihood path, but this new metric does not change the ACS circuit data-path and hence the same ACS circuit and sub-circuits can be used.
a illustrates computation of the branch metrics for the transition from the initial state of “00”. In this case, an “11” was received. With two-bit outputs, a “Hamming distance” may be used to calculate the branch metric. The Hamming distance is the sum of exclusive-or operations on respective bits of the received output and the expected output. For the path assuming a “0” input, the branch metric between the received encoded output (“11”) and the expected encoded output (“00”) is two. For the path assuming a “1” input, the branch metric between the received encoded output (“11”) and the expected encoded output (“11”) is zero. Hence the path metric at state “00” at time T1 is two and the path metric at state “10” at time T1 is zero. The path metrics are shown above the states in the diagram.
b illustrates the path through time T2. In this example, it is assumed that there is a data transmission error, and the received encoded output (symbol) is “11” rather than “10”. Hence, at T2, the branch metric between state “00” at T1 and state “00” at T2 is two; when added to the previous path metric of two at state “00” at T1, the path metric is four for state “00” at T2. Similarly, at T2, the path metric is one for state “01”, two for state “10” and one for state “11”.
c illustrates the path through time T3. At this point, two potential paths are entering each state. For each state, the branch metric is computed for each path entering the state, and the path with the lowest path metric is chosen (the “surviving path”). If two paths have the same path metric (such as state “01” at T3), a path can be chosen randomly or deterministically (such as by always choosing the upper path).
d shows the path through time T4. At this point, the actual path through states “10 01 10 11” has the lowest path metric. If the example sequence were longer, the path metrics for all other paths would increase as the path metric for the actual path remained the same (assuming no additional errors). When the end of a path is reached, the most likely path is determined through a process called “traceback”.
As can be seen in
The ACS unit 26 contains a plurality of ACS sub-units. For each clock, an ACS sub-unit determines the path metrics at a given state and selects the optimal path. A Radix-2 ACS sub-unit selects one path from the previous clock (i.e., between times Tz and Tz+1). This is shown diagrammatically in
a and 9b illustrate schematic representations of a conventional Radix-2 ACS sub-unit and a conventional Radix-4 ACS sub-unit 30, respectively. Referring to
b illustrates a Radix-4 ACS sub-unit 40. The Radix-4 ACS sub-unit unit 40 is similar to two Radix-2 units, with an additional adder 42 and multiplexer 44 to choose a path from between the outputs of adders 36. The critical path of Radix-4 ACS sub-unit 40 includes three adders.
c illustrates a Radix-4 “Fast” ACS sub-unit 50 where all path comparisons are made in parallel by adders 36a-f. This design allows the elimination of adder 42, and thus reduces the critical path to two adders, but increases the overall number of adders in the unit and requires a control logic unit 52 to determine the selected path. Control logic 52 selects an output through multiplexer 54. An ACS sub-unit of this type is described in connection with U.S. Ser. No. 10/322876, filed Dec. 18, 2002, entitled “High Speed Add-Compare-Select Circuit For Radix-4 Viterbi Decoder”, to Seok-Jun Lee and Manish Goel, and assigned to Texas Instruments incorporated, which is incorporated by reference herein. A similar architecture can be used for Radix-8 Fast ACS units and Radix-16 Fast ACS units.
Larger radix units can have a substantially longer critical path. Table I summarizes important criteria for various ACS types (where N represents the number of states for a given time period).
In the table above, the adders/bit column indicates how many adders are used in the ACS unit 26 for each bit output per clock cycle. The present invention uses cascaded ACS units, which can be of any design, in order to improve the number of adders/bit relative to the speed of the ACS, which is substantially determined by the number of adders in the critical path.
In operation, the Cascaded ACS unit 64 includes two or more ACS units similar to ACS unit 26 of
On each clock, the path metric will be computed for a number of bits equal to log2(s)+log2(t)+log2(u), where s, t, and u are the radix units of the various ACS units 65 (it being understood that there could be additional ACS units 65). For example, if two Radix-4 ACS units are used, then four bits will be calculated on each clock. In this case, the branch metric unit 62 would need to calculate, in each clock cycle, the branch metric between Tz and Tz+2 (for an arbitrary starting point Tz) for each state of the first Radix-4 ACS unit 65a and the branch metric between Tz+2 and Tz+4 for each state of the second Radix-4 ACS unit 65b. If a Radix-4 and a Radix-8 ACS unit are used in the Cascaded ACS unit 64, then five bits will be calculated on each clock. In this case, the branch metric unit 62 would need to calculate the branch metric between Tz and Tz+2 for each state of the Radix-4 ACS unit 65a and the branch metric between Tz+2 and Tz+5 for each state of the Radix-8 ACS unit 65b.
Advantageously, if, for example, Radix-4 fast ACS units were used for the ACS units 65 of
In contrast, a Radix 16 fast unit, which also processes four bits per clock cycle and also has four adders in its critical path, uses 136 adders, a substantial increase in complexity and die area. A comparison of various ACS complexity using cascaded ACS units is shown in Table II. Thus, the cascaded Radix-4 fast Cascaded ACS unit 64 uses five adders per bit produced each clock cycle whereas the Radix-16 ACS unit uses 8.5 adders per bit produced each clock cycle.
Unlike the geometric increase in gate count due to processing more bits per clock cycle by increasing the Radix of the ACS unit, cascading ACS units in an Cascaded ACS unit is a linear increase in gate count. Hence, the gate count of cascading three Radix-4 ACS units would triple the number of gates relative to a single Radix-4 ACS unit and would triple the number of bits processed per clock cycle.
Accordingly, the present invention provides an architecture by which the number of bits of information processed per clock cycle by the Cascaded ACS unit can be increased without increasing the number of adders/bit processed per clock cycle. This can greatly reduce the cost and complexity of the Viterbi decoder.
Although the Detailed Description of the invention has been directed to certain exemplary embodiments, various modifications of these embodiments, as well as alternative embodiments, will be suggested to those skilled in the art. The invention encompasses any modifications or alternative embodiments that fall within the scope of the Claims.
This application claims the benefit of the filing date of copending provisional application U.S. Ser. No. 60/736,368, filed Nov. 14, 2005, entitled “Cascaded Radix Architecture For High-Speed Viterbi Decoder”
Number | Date | Country | |
---|---|---|---|
60736368 | Nov 2005 | US |