1. Field of the Invention
The present invention relates to decoder circuitry, and in particular, to a high speed add-compare-select (ACS) circuit useful in Viterbi and log-maximum a posteriori (log-MAP) decoders for decoding turbo and low-density parity-cheek codes (LDPC-codes).
2. Description of the Related Art
ACS units are core elements of Viterbi, turbo and log-MAP decoders. The manner in which ACS units are connected between themselves is defined by a specific code's trellis diagram. ACS operation is a bottleneck arithmetic operation for such trellis based decoding algorithms as Viterbi and log-MAP. These algorithms are extensively used for decoding of the convolutional, turbo and LDPC-codes. Viterbi and log-MAP algorithms are organized in such a manner that if these algorithms are implemented in hardware, then each ACS operation appears on a critical path of of the corresponding Viterbi and/or log-MAP algorithm implementation. The ACS operation determines a depth of the algorithm and corresponding a maximum operating frequency of the decoder.
The decoding process of a generic trellis-based decoding algorithm is typically an iterative process. Each iteration is processed on a single layer of the trellis. The total number of trellis layers is generally equal to a codeword length. A computational procedure that is performed for every trellis layer includes two steps: (i) branch metrics calculation and (ii) state metrics calculation. These two steps are common either for Viterbi or for log-MAP algorithms. Because branch metrics calculation doesn't reside on the critical path of the hardware implementation of the decoder, branch metrics calculation can be pipelined over trellis layers. In contrary, state metrics calculation includes an internal loop back structure. Results of the next iteration essentially depend on the results of the previous iteration for the state metrics calculation. Thus, the state metrics calculation resides on the critical path of the decoder and consequently determines maximum possible operating frequency of a whole design of the decoder.
In one embodiment, the present invention is a method of iteratively performing an add-compare-selection (ACS) operation. The method includes, for an iteration, providing at least two state metrics with carry-save arithmetic to a first ACS layer module having first respective sum components, producing, by the first ACS layer module, a first set of at least two computing state metrics in carry-save arithmetic in response to a first set of at least two respective branch metrics in a single clock cycle applying the first set of at least two computing state metrics to a second ACS layer module having second respective sum and carry components, producing, by the second ACS layer module, a second set of at least two computing state metrics in carry-save arithmetic in response to a second set of at least two respective branch metrics and the first set of at least two computing state metrics in the clock cycle, storing the second set of at least another to computing state metrics as carry components attic second ACS layer module, and providing, the second set of at last two computing state metrics to the first ACS layer module for a next iteration.
In another embodiment, the present invention is an apparatus for performing an add-compare-select (ACS) operation including at least two ACS layers coupled in series configured to form an iterative loop with carry components in a single clock cycle, wherein the ACS layer includes at least two branch metrics represented by a plurality of bits and adders and configured to i) generate a plurality of state metrics in accordance with carry-save arithmetic, and a plurality of multiplexers and ii) perform a selection of a maximum state metric in carry-save arithmetic which are stored in the carry components.
In another embodiment, the present invention is an apparatus for performing an add-compare-select (ACS) operation including at least two layers of an ACS module configured to perform state metric computations using carry-save arithmetic, each having corresponding input and Output states and corresponding input and output vectors, and carry components of stored state metrics, wherein the output state of a preceding layer of the ACS module is provided to a subsequent layer of the ACS module having an input vector different from the input vector of the preceding layer of the ACS module, the apparatus configured to form a ACS layer computing in a single clock cycle to generate at least a maximum state metric in carry-save arithmetic.
In another embodiment, the present invention is a trellis decoder including a memory including a set of registers, and an add-compare-select (ACS) module including at least two ACS layer modules coupled in series and configured to form a feedback loop with carry components in a single clock cycle, wherein the ACS layer module includes at least two branch metrics represented by a plurality of bits and adders configured to generate a plurality of state metrics using carry-save arithmetic, and a plurality of multiplexers configured to perform a selection of a maximum state metric in carry-save arithmetic stored in memory as the carry components.
Other aspects, features, and advantages of the present invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements.
Described embodiments of the present invention relate to a high speed ACS circuit useful in Viterbi and log-MAP decoders for decoding turbo and LDPC-codes. A set of schemes for high speed computation of ACS operation in accordance with exemplary embodiments of the present invention are developed for 2 and more trellis layers on a clock cycle. The described embodiments below are examples for 2 trellis layers. These examples, however, might be easily adapted for 3 trellis layers and more. The developed schemes might use carry-save arithmetic computations which might provide a specific structure of the ACS circuit. This feature might make it possible to recognize an inprintment of designs of the ACS circuit. In addition, the developed schemes might contain two or more identical combinatorial ACS layer submodules which might help to recognize the inprintment of these designs and further increase the calculation speed.
Hereinafter, embodiments of the present invention are described with reference to the drawings.
Note that herein, the terms “ACS design”, “ACS scheme”, “ACS circuit”, “ACS module”, “ACS layer”, “ACS technique” and “ACS operation” might be used interchangeably. It is understood that an ACS design might correspond to, or contain an ACS scheme of and ACS module, an ACS circuit and an ACS operation, and that the ACS scheme, the ACS module, the ACS layer, the ACS circuit, the ACS technique and the ACS operation might refer to the ACS design.
Referring to
Furthermore, in the described embodiments, carry-save arithmetic might be employed in the combinatorial part of ACS layers, which might enable a deep optimization of the ACS design with doubled combinatorial part in terms of maximal operating frequency. Thus, doubled ACS design might perform on frequencies higher than half of the working frequency of the standard ACS design. For example, a simulation of a standard ACS layer is successfully closed at 1000 MHz and a simulation of an ACS layer with double speed is closed at 650 MHz. First and second layers 302, 304 of double speed ACS module 300 with carry-save arithmetic are described subsequently below in detail.
SM=max(BM1+SM1, BM2+Sm2) (1)
where “max” denotes a maximum operation.
In some modifications of Viterbi or log-MAP algorithms, a minimum operation might be performed, for example, in relation (1) instead of a maximum operation. However, such modifications generally. do not change the design of an ACS significantly. Consequently, one skilled in the art might readily extend the teachings of embodiments of the present invention described herein to embodiments for the minimum operation case(s). The total depth of the scheme might be a depth of an adder (adder 402 or 404) plus a depth of compare-select circuit 406, which might be approximately the depth of the adder for a corresponding number of arguments. Thus, a total depth of a given ACS design might significantly depend on the number of its arguments. In general, the number of arguments of the ACS operation is typically equivalent to the number of states in the trellis layer of the ACS module. Generally, an ACS operation of four operands (ACS4), an ACS operation of eight operands (ACS8) and an ACS operation of sixteen operands (ACS16) are usually employed in modern trellis decoders. Accordingly, ACS operation of four operands (ACS4), ACS operation of eight operands (ACS8) and ACS operation of sixteen operands (ACS16) might be applied to the disclosed embodiments.
Since module 400 only includes adders 402, 404 and compare-select circuit 406, as shown in
As shown in
However, the ACS scheme of the described embodiment shown in
First full adder 602 (FAi) might compute output bit si of the result of addition and carry bit ci+1. The carry bit ci+1 output from first full adder 602 might be used by following second full adder 604. Output bit si of the result of addition and carry bit ci+1 bits might satisfy following relations si=ai⊕bi, c0=0, ci+1=a1 v bi, i=0, . . . , n−1. Thus, the total depth of ripple carry adder 600 might equal number of bits n. As the number of bits increases, the depth of ripple carry adder 600 might increase, which might slow the speed of calculations.
For given implementations, the layout of ripple carry adder 600 might be relatively simple, which might allow for fast design time for the implementation; however, ripple carry adder 600 might be relatively slow, since each full adder, for example, first and second full adder 602, 604, waits for the carry bit to be calculated from the previous full adder. The gate delay might easily be calculated from observation of the full adder circuit. Each full adder, for example, first and second adder 602, 604, might require three levels of logic. A 32-bit ripple carry adder includes 32 full adders, so the critical path (worst ease) delay might be calculated as 3 delay-units of time (from input to carry in first adder)+31*2 (for carry propagation in later adders), yielding, the equivalent of 65 gate delays.
Carry-save addition techniques might be employed to reduce the depth of addition scheme shown in
Carry-save adder 700, as shown in
A+B=Σ
i=0
n−1
s
i2i+Σi=0n−1ci2i=Σi=0n−1vi2i, vi=si+ciε{0, 1, 2}, i=0, . . . , n−1, (2)
and, as such, the result of the carry-save addition of the numbers A and B might be an array of carry-save bus vi. Accordingly, a depth of the carry save adder might equal the depth of a single full adder, i.e., the depth might be equal to 1.
Since carry-save adders reduce the depth of the addition scheme to 1, the described embodiments applying carry-save arithmetic might increase the speed of the calculations. Referring to
CSMs 807, 808, 809, 810 might select the largest sum computed using the relation SM=max (BM1+SM1, BM2+SM2), as described in
As shown in
Referring to
Processor 12 and memory 14 might preferably be part of a digital signal processor (DSP) used to implement the double speed decoder. However, it is to be understood that the term “processor” as used herein might be generally intended to include one or more processing devices and for other processing circuitry (e.g., application-specific integrated circuits or ASICs, Gas, FPGAs, etc). The term “memory” as used herein might be generally intended to include memory associated with the one or more processing devices and/or circuitry, such as, for example, RAM, ROM, a fixed and removable memory devices, etc. Also, in an alternative embodiment, the ACS module might be implemented in accordance with a coprocessor associated with the DSP used to implement the overall turbo decoder. In such case, the coprocessor might share in use of the memory associated with the DSP.
Accordingly, software components including instructions or code for performing the Methodologies of the invention, as described herein, might be stored in the associated memory of the turbo decoder and, when ready to be utilized, loaded in part or in whole and executed by one or more of the processing devices and/or circuitry of the turbo decoder.
Referring to
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term “implementation.”
As used in this application, the word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word exemplary is intended to present concepts in a concrete fashion.
Additionally, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form.
Although the subject matter described herein may be described in the context of illustrative implementations to process one or more computing application features/operations for a computing application having user-interactive components the subject matter is not limited to these particular embodiments. Rather, the techniques described herein can be applied to any suitable type of user-interactive component execution management methods, systems, platforms, and/or apparatus.
The present invention may be implemented as circuit-based processes, including possible implementation as a single integrated circuit (such as an ASIC or an FPGA), a multi-chip module, a single card, or a multi-card circuit pack. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium or carrier, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits. The present invention can also be embodied in the form of a bitstream or other sequence of signal values electrically or optically transmitted through a medium, stored magnetic-field variations in a magnetic recording medium, etc., generated using a method and/or an apparatus of the present invention.
The use of figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as necessarily limiting the scope of those claims to the embodiments shown in the corresponding figures.
It should be understood that the steps of the exemplary methods set forth herein are not necessarily required to be performed in the order described, and the order of the steps of such methods should be understood to be merely exemplary. Likewise, additional steps may be included in such methods, and certain steps may be omitted or combined, in methods consistent with various embodiments of the present invention.
Although the elements in the following method claims, if any, are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
No claim element herein is to be construed under the provisions of 35 U.S.C §112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or “step for.”
It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
2012133248 | Aug 2012 | RU | national |