The present invention relates to digital communications. More particularly, the present invention relates to pipelined add-compare-select circuits and methods, and applications thereof.
Communicating information via the intemet and other digital communications systems has become common in the United States and elsewhere. As the number of people using these communications systems has increased so has the need for transmitting digital data at ever increasing rates.
As will be understood by persons skilled in the relevant arts, digital communications systems are designed, for example, using conventional pipelining, look-ahead, and parallelism techniques. These conventional design techniques have enabled engineers to build digital communications systems, using available manufacturing technologies, which operate at data rates in excess of 1 Gb/s. Applying these conventional techniques to the design of high-speed digital circuits, however, is difficult particularly when dealing with feedback and/or recursive operations. Furthermore, many of these conventional techniques will not improve the performance of the digital circuit to which they are applied, and some of these conventional techniques can even degrade circuit performance.
There is a current need for new design techniques and digital logic circuits that can be used to build high-speed digital communications systems. In particular, design techniques and digital logic circuits are needed that improve the throughput of add-compare-select circuits used in digital communications systems.
Digital communications devices having high-speed add-compare-select circuits, and methods for designing the same are provided. The add-compare-select circuits include logic segments separated by delay devices. The separation of the logic segments allows for pipelining of the add-compare-select processes and advantageous circuit retiming. The pipelining and advantageous circuit retiming permit the digital communications devices to be clocked at higher rates than similar digital communications devices having conventional add-compare-select circuits.
In an embodiment, an add-compare-select (ACS) circuit is provided. The ACS circuit includes an adder, two code converters, a maximum or minimum select circuit, two decision logic circuits, and a delay circuit. The adder has an input port, a sum output port, and a carry output port. A first one of the code converters has an input port and an output port. The input port of this code converter is coupled to the sum output port of the adder. The second code converter also has an input port and an output port. The input port of this code converter is coupled to the carry output port of the adder. The maximum or minimum select circuit has a first input port, a second input port, and an output port. The first input port is coupled to the output port of the first code converter. The output is coupled to the input port of the adder. A first one of the decision logic circuits has an input port and an output port. The input port is coupled to the output port of the second code converter. The delay circuit has an input port and an output port. The input port is coupled to the output port of the first decision logic circuit. The second decision logic circuit has an input port and an output port. The input port is coupled to the output port of the delay device. The output port is coupled to the second input port of the maximum or minimum select circuit.
In an embodiment, a method for designing an add-compare-select circuit is provided. A number of bits (B) to be compared is selected. An initial most-significant bit first add-compare-select circuit capable of operating on B-bits is formed. A critical path in the initial most-significant-bit-first add-compare-select circuit is identified. The processing time of this critical path is designated as T. A sub-circuit of the initial most-significant-bit-first add-compare-select circuit is divided into a first sub-circuit segment and a second sub-circuit segment. This divided sub-circuit forms part of the identified critical path. A delay circuit is added between the first sub-circuit segment and the second sub-circuit segment to form a modified most-significant bit first add-compare-select circuit. A clocking circuit is formed to clock the modified most-significant bit first add-compare-select circuit. The clocking circuit formed has a clock period less than T.
Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below.
The present invention is described with reference to the accompanying figures. In the figures, like reference numbers indicate identical or functionally similar elements. Additionally, the leftmost digit or digits of a reference number identify the figure in which the reference number first appears. The accompanying figures, which are incorporated herein and form part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the relevant art to make and use the invention.
The present invention presents add-compare-select circuits and methods, and applications thereof. Add-compare-select circuits and methods are used to implement digital communications systems such as, for example, digital communications systems employing convolutional encoding with Viterbi decoding. Convolutional encoding with Viterbi decoding is a forward error correction technique that improves the capacity of a digital communications channel. Viterbi decoding can be viewed as a process for identifying a most likely transition path through a trellis diagram representing possible state transitions in a digital communications system.
The branch metric unit 104 computes minimum or maximum branch metrics, λij, for a trellis diagram. As described herein, these branch metrics represent the difference between a received symbol and one or more symbols responsible for a state transition in the trellis diagram. Once computed, the branch metrics, λij, are passed to the ACS unit 106.
The ACS unit 106 computes state metrics, γj. This computation is performed using the branch metrics, λij, computed by branch metric unit 104. ACS unit 106 then compares the computed state metrics, γj, and selects maximum or minimum state metrics, γj, associated with survivor paths of the trellis diagram. Survivor paths represent the paths in the trellis diagram that have the best metric (e.g., maximum or minimum state metric) at a point in time under consideration.
The survivor path memory 108 stores the survivor paths selected by ACS unit 106. A final determination of the best path is made from the stored survivor paths residing in the survivor path memory 108.
While only one adder 110, one code converter 112, and one maximum/minimum select circuit 114 are shown in
As would be known to persons skilled in the relevant arts, the Viterbi algorithm implemented by a Viterbi decoder can be used to correct data transmission errors in a digital communication system. The Viterbi algorithm involves, for example, determining the most likely path taken to reach a particular state of a given trellis diagram such as trellis diagram 200. In embodiments, this is achieved by calculating all possible metrics for a particular state of the trellis diagram and selecting the path associated with either the maximum metric or the minimum metric as the most likely path taken to reach the particular state.
The branch metrics λij(n) for the trellis diagram 200 are indicated along each path leading from one state at time index “n” to another state at time index “n+1”. The branch metric λ01(n), for example, represents the metric associated with a transition from state 0 to state 1 along branch 202. The metric associated with the state 0, for a transition along branch 202, is equal to the sum of the metric associated with state 0 (i.e., γ0(n)) and the metric λ01(n). As illustrated in
In an embodiment, the state metrics γ0(n+1), γ1(n+1), γ2(n+1), and γ3(n+1) represent maximum metrics. The maximum metric for each state of trellis diagram 200 at time index “n+1” can be calculated using EQs. 1–4 below.
γ0(n+1)=max[γ0(n)+λ00(n),γ2(n)+λ20(n)] EQ. 1
γ2(n+1)=max[γ1(n)+λ12(n),γ3(n)+λ32(n)] EQ. 2
γ1(n+1)=max[γ0(n)+λ01(n),γ2(n)+λ21(n)] EQ. 3
γ3(n+1)=max[γ1(n)+λ13(n),γ3(n)+λ33(n)] EQ. 4
Where it is desired to identify the minimum metric for each state, the minimum (min) function can be used in place of the maximum (max) function in EQs. 1–4.
As would be known to persons skilled in the relevant arts, the operation of a Viterbi decoder is often limited by speed bottlenecks found in add-compare-select circuits. These speed bottlenecks are created, for example, as a result of applying conventional design techniques to the recursive nature of add-compare-select operations. One technique that can be used to accelerate the operating speed of a Viterbi decoder is to use an N-step look-ahead network, where N is an integer greater than 0, to provide inputs to parallel processing pipelines. An advantage of using an N-step look-ahead network is that it will result in a fully connected trellis diagram such as the one illustrated in
γ0(n+2)=max[γ0(n)+λ′00(n+1),γ1(n)+λ′10(n+1),γ2(n)+λ′20(n+1),γ3(n)+λ′30(n+1)] EQ. 5
where λ′ij(n) is the combined branch metric of the path i-j. The path metric, γj(n+2), for the state “j” of trellis diagram 300 is given by EQ. 6.
γj(n+2)=maxj[γi(n)+λij′(n)] ∀ i, j=0, 1, 2, 3 EQ.6
Where it is desired to identify the minimum metric for each state, the minimum (min) function can be used in place of the maximum (max) function in EQ. 6.
The four-state trellis diagrams of
As shown in
In some embodiments of the invention, each adder 110 is replaced by two adders. A first adder is used to perform the carry computation shown in
ACS unit 800 contains a number of loops or paths. These loops or paths are illustrated in
As shown in
In a typical implementation, the computation time for an adder 110 is approximately 0.4 ns, the computation time for a code converter 112 is approximately 0.15 ns, and the computation time for an MS circuit 114 varies with the total number of states being implemented. For example, in a typical 8-state Viterbi decoder, the computation time for a MS circuit 114 is approximately 1.2 ns. A computation time of 1.2 ns is attributable to the decision logic circuit 906 and 0.8 ns is attributable to the maximum/minimum select circuit 904. The maximum time of these two computation times is the computation time of MS circuit 114. In a typical 4-state Viterbi decoder, the computation time for a MS circuit 114 is approximately 0.7 ns. This is because 0.7 ns is attributable to the decision logic circuit 906 and 0.4 ns is attributable to the maximum/minimum select circuit 904. The increased computation time of the MS circuit 114 in an 8-state Viterbi decoder is due to the extra logic needed to select among a larger number of states.
Using the typical computation times stated above, the settling time of the critical path 1102 in
Using the typical computation times stated above for a 4-state Viterbi decoder, the settling time of the critical path 1102 is 2.2 ns. This time is the computation time of two adders 110 (0.4 ns+0.4 ns=0.8 ns), the computation time of two code converters 112 (0.15 ns+0.15 ns=0.3 ns), the computation time of one maximum/minimum select circuit 904 (0.4 ns), and the computation time of one decision logic circuit 906 (0.7 ns). This is greater than the loop bound of a 4-state Viterbi decoder circuit (i.e., loop 910 shown in
Table 1 below summarizes the iteration bound times and the critical path times of a typical 4-state Viterbi decoder and a typical 8-state Viterbi decoder implemented using the circuits and methods described above.
Using the circuits and methods of the invention described below, the critical path times shown in Table 1 can be further reduced. As described below, the present invention improves the retiming technique applied to ACS unit 800 to form circuit 1100 by pipelining the functions of the ACS unit. In this way, the ACS unit can be retimed to achieve a critical path time that is closer to the iteration bound.
As shown in
As shown in
The computation times Td1, and Td2 represent the time required for each decision logic segment to perform its computation. In an embodiment of the present invention, the computation time Td2 is set equal to a propagation delay time (T). The propagation delay time (T) is used to ensure that the calculations performed by the decision logic segment 1304 are completed at approximately the same time as the calculations performed in the code converter 112. Since decision logic segment 1304 and code converter 112 each provide an input to a decision logic segment 1302, it is advantageous in embodiments to have these input values available for input to decision logic segment 1302 at approximately the same time. Thus, in embodiments, the decision logic segment 1304 is designed to have a computation time approximately equal to the computation time of an adder 110 and code converter 112 (i.e., 0.4 ns+0.15 ns=0.55 ns or approximately 0.6 ns).
Although
Two other paths present in circuit 1420 are path 1424 and path 1426. Path 1424 includes two adders 110, two code converters 112, and two maximum/minimum select circuits 904. Using the typical computation times stated above for an 8-state Viterbi decode, the settling time of the path 1424 is approximately 2.7 ns. This time is the computation time of two adders 110 (0.4 ns +0.4 ns=0.8 ns), the computation time of two code converters 112 (0.15 ns+0.15 ns=0.3 ns), and the computation time of two maximum/minimum select circuit 904 (0.8 ns+0.8 ns=1.6 ns). Path 1426 includes one decision logic segment 1304, one adder 110, code converter 112, and two maximum/minimum select circuits 904. Using the typical computation times stated above for an 8-state Viterbi decode, the settling time of the path 1424 is approximately 2.75 ns. This time is the computation time of one decision logic segment 1304 (0.6 ns), the computation time of one adder 110 (0.4 ns), the computation time of one code converter 112 (0.15 ns), and the computation time of two maximum/minimum select circuit 904 (0.8 ns+0.8 ns=1.6 ns). Thus, based on the above stated computation times, path 1424 is the critical path of circuit 1420.
For the retimed circuit 1420, using the typical computation times stated herein for a 4-state Viterbi decode, the settling time of the path 1424 is approximately 1.9 ns. This time is the computation time of two adders 110 (0.4 ns +0.4 ns=0.8 ns), the computation time of two code converters 112 (0.15 ns+0.15 ns=0.3 ns), and the computation time of two maximum/minimum select circuit 904 (0.4 ns+0.4 ns=0.8 ns). The settling time of the path 1424 is approximately 1.7 ns. This time is the computation time of one decision logic segment 1304 (0.35 ns or one-half of the total computation time (0.7 ns) of decision logic circuit 906), the computation time of one adder 110 (0.4 ns), the computation time of one code converter 112 (0.15 ns), and the computation time of two maximum/minimum select circuit 904 (0.4 ns+0.4 ns=0.8 ns). Based on these computation times, path 1424 is the critical path for a 4-state Viterbi decoder.
As would be known to persons skilled in the relevant arts, once the critical path of a circuit has been determined, a clock period for the circuit can be set equal to the settling time of the critical path plus a margin factor.
Table 2 below shows the iteration bound and critical path results for a 4-state Viterbi decoder and an 8-state Viterbi decoder designed in accordance with both the pipelining and retiming techniques of the present invention described herein.
As shown in Table 2, the present invention achieves critical path computation times that are close to the iteration bound. Such computation times are not possible using conventional design techniques.
Circuit 1600 operates as follows. A maximum select circuit 1602 is used to select the maximum digit of the digits (CB, SB), (CC, SC), and (CD, SD. This maximum digit is shown in
As shown in
The inputs to the decision logic circuit 1604 include the values CiMAX, SiMAX, dif, dip, Cif, and Sif. The digit (CA, SA) is combined with the final decision value dif,0 using AND gates 1614 and 1616 to produce the final digit value (Cif, Sif). Using some or all of these inputs, decision logic circuit 1604 computes two decision state values di−1f,0 and di−1p,0.
Circuit 1700 generates the two decision state values di−1f,0 and di−1p,0 in accordance with the mapping shown in Table 4 below.
The four delays 1802, 1804, 1806, and 1808 in circuit 1800 divide the circuit 1800 into part of a first decision logic segment 1820 and a second decision logic segment 1840. The first decision logic segment 1820 includes the four 2-to-1 multiplexers 1702a–d (shown in
Circuit 2100 generates two decision state values di−1f,0 and di−1p,0 in accordance with the mapping shown in Table 5 below.
The four delays 2202, 2204, 2206, and 2208 in circuit 2200 divide the circuit 2200 into part of a first decision logic segment 2220 and a second decision logic segment 2240. The first decision logic segment 2220 includes the four 2-to-1 multiplexers 2102a–d (shown in
Referring to
As described herein, the present invention can be used to design and implement high-speed digital communications circuits and systems that cannot be designed and implemented using conventional circuits and techniques. This point is illustrated by the following example.
Consider, for a moment, how to implement a 10 Gb/s Viterbi decoder. As would be known to persons skilled in the relevant arts, in order to implement a 10 Gb/s Viterbi decoder some form of parallel Viterbi decoding using look-ahead or a sliding block Viterbi decoder is needed. In a conventional implementation, an 8-state Viterbi decoder requires a clock period of at least 3.4 ns. This is based on a 3.1 ns critical path and a clock setup/hold time of 0.3 ns. Unfortunately, this does not permit a 32-parallel design using conventional MSB-first pipelined operations because a 32-parallel design must be clocked with a clock period of 3.2 ns to achieve a decoding speed of 10 Gb/s. Thus, using conventional circuits and design techniques, a 10 Gb/s Viterbi decoder must be implemented using either a 64-parallel design in a look-ahead Viterbi decoder or a 48-parallel design in a sliding-block Viterbi decoder. In a look-ahead parallel Viterbi decoder, the level of parallelism is constrained to be a power of two (e.g., 2x). In a sliding-block Viterbi decoder, the level of parallelism is assumed to be a multiple of eight (e.g., 8×).
Using the circuits and methods of the present invention described herein, an 8-state Viterbi decoder can be implemented that has a critical path of only 2.7 ns. How this is achieved is described above. Thus, using a clock setup/hold time of 0.3 ns, an 8-state Viterbi decoder designed and implemented in accordance with the present invention can be clocked with a clock period of 3 ns. In this way, a 32-parallel implementation for achieving a 10 Gb/s Viterbi decoder is feasible.
Further features and advantages of the present invention will become apparent to persons skilled in the relevant arts given the description herein.
Various embodiments of the present invention have been described above. It should be understood that these embodiments have been presented by way of example only, and not limitation. It will be understood by those skilled in the relevant arts that various changes in form and details of the embodiments described above may be made without departing from the spirit and scope of the present invention as defined in the claims. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
4606027 | Otani | Aug 1986 | A |
5418795 | Itakura et al. | May 1995 | A |
6111835 | Honma | Aug 2000 | A |
6148431 | Lee et al. | Nov 2000 | A |
6212664 | Feygin et al. | Apr 2001 | B1 |
Number | Date | Country | |
---|---|---|---|
20040117721 A1 | Jun 2004 | US |