Programmable compute unit with internal register and bit FIFO for executing Viterbi code

Information

  • Patent Grant
  • 8301990
  • Patent Number
    8,301,990
  • Date Filed
    Thursday, September 27, 2007
    16 years ago
  • Date Issued
    Tuesday, October 30, 2012
    11 years ago
Abstract
A programmable compute unit with an internal register with a bit FIFO for executing Viterbi code is configured to accumulate in the forward path the best-path to each state in an internal register and store the survivor trace back information bit for each state in each stage in a bit FIFO; and in the trace back, selecting the optimal best-path through the Viterbi trellis by tracing through the bit trace back information survivor bits beginning with the survivor bit of the last stage path; and generating in response to the Viterbi constrain length and a current bit FIFO address, the next bit FIFO address and decoded output bit for the next previous stage.
Description
FIELD OF THE INVENTION

This invention relates to a programmable compute unit with an internal register and bit FIFO for executing Viterbi code.


BACKGROUND OF THE INVENTION

The Viterbi decoding algorithm, known to be a maximum-likelihood algorithm, is widely used to decode convolutional codes. Convolutional coding is a bit-level coding technique rather than block-level techniques such as Reed-Solomon coding. In communication applications convolutional codes are advantages over block-level codes as the system gain degrades gracefully as the error rate increases, while block codes correct errors up to a point, after which the gain drops rapidly. Convolutional codes are decoded after an arbitrary length of data, while block codes introduce latency of an entire data block, convolutional codes do not require any block synchronization. Convolutionally encoded data is decoded through knowledge of the possible state transitions, created from the dependence of the current symbol on the past data. The allowable state transitions are represented by a trellis diagram. The Viterbi decoding algorithm involves the calculation of a Hamming distance between the received signal and the branches leading to each trellis state. At each trellis state, the path metric is stored. The actual decoding is accomplished by tracing the maximum likelihood path backwards through the trellis. A longer sequence results a more accurate reconstruction of the trellis such that in shorter sequences minimum path lengths give optimal results where in longer path lengths nearly all paths provide a solution as convergence is more and more attained. After a sequence of about five times the constraint length little accuracy is gained by additional inputs. The survivor path is determined during the trace back, and the output is generated. The number of trellis states amounts to 2k-1 with constraint length of k.


Most digital signal processors are designed to manipulate data having a fixed word size (e.g., 8-bit, 16-bit or 32-bit words). When the processor needs to manipulate non-standard word sizes the processor efficiency drops due the pipeline overhead for each retrieved bit. For example, when a 50 Mbit bit stream needs to be Viterbi error corrected a substantial percentage of the DSP is consumed by this single function.


In Viterbi decoding, on the forward pass, the minimum Hamming distance is accumulated and the survivor bit is stored for each state in each stage. Then the survivor bit path is generated during trace back. Conventional implementations can be in hardware or software. Hardware implementations are fast, able in some cases to accomplish trace back for each stage in a single cycle, but they are generally hardwired to a particular Viterbi application and not easily adapted to other applications. Software implementations are more flexible but much slower requiring many cycles of operation per stage in trace back. Attempts to increase speed generally resort to rearrangement or re-ordering of the accumulate-compare-select and trace back operations.


BRIEF SUMMARY OF THE INVENTION

It is therefore an object of this invention to provide an improved programmable compute unit with an internal register and bit FIFO for executing Viterbi decode.


It is a further object of this invention to provide such an improved programmable compute unit which operates with the speed and efficiency of hardware, e.g. ASIC implementation and flexibility of software implementation.


It is a further object of this invention to provide such an improved programmable compute unit which is easily adapted for a variety of Viterbi parameters.


It is a further object of this invention to provide such an improved programmable compute unit which generates the survivor bit path (Trace back) in a single cycle per stage.


It is a further object of this invention to provide such an improved programmable compute unit which is executable in a conventional compute unit using internal LUT/FIFO(s) for storing survivor bits and generating trace back survivor bit addresses and decoded bits.


The invention results from the realization that an improved programmable compute unit, which operates with the speed and efficiency of hardware implementation yet the flexibility of software implementation, can be achieved using a programmable compute unit with an internal register and internal bit FIFO for executing Viterbi decode configured to, in the forward path, accumulate the best-path to each state in an internal register and store the survivor trace back information bit for each state in each stage in a bit FIFO and in the trace back path selecting the optimal best-path through the Viterbi trellis by tracing through the bit FIFO trace back information survivor bits beginning with the survivor bit of the last stage best-path and generating in response to the Viterbi constrain length and the current bit FIFO address the next bit FIFO address and decoded output bit for the next previous stage.


The subject invention, however, in other embodiments, need not achieve all these objectives and the claims hereof should not be limited to structures or methods capable of achieving these objectives.


This invention features a programmable compute unit with an internal register and a bit FIFO for executing Viterbi decode configured to: in the forward path accumulate the best-path to each state in an internal register and store the survivor trace back information bit for each state in each stage in a bit FIFO. In the trace back path the optimal best-path is selected through the Viterbi trellis by tracing through the bit FIFO trace back information survivor bits beginning with the survivor bit of the last stage best-path. In response to the Viterbi constrain length and a current bit FIFO address, there is generated the next bit FIFO address and the decoded output bit for the next previous stage.


In a preferred embodiment the FIFO address may include a stage field and new state field. The stage field will be updated by the number of states per stage to point to the beginning of the next trace back stage. The next previous state may be the current state shifted by one and the next previous survivor trace back information bit may be deposited as the new decoded output bit. The bit FIFO may fill and spill an external memory using 32 bit words. The 32 bit words may be memory aligned. The internal register may be one of the compute unit register files.





BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Other objects, features and advantages will occur to those skilled in the art from the following description of a preferred embodiment and the accompanying drawings, in which:



FIG. 1 is a schematic diagram of a prior art Viterbi encoder;



FIG. 2 is a schematic block diagram of a prior art Viterbi decoder;



FIG. 3 is a diagram of a portion of a Viterbi trellis and survivor decision word occurring in the forward path;



FIG. 4 is a diagram of a portion of a Viterbi trellis and survivor decision word occurring in the trace back;



FIG. 5 is a schematic block diagram of programmable compute unit with internal register and bit FIFO according to this invention;



FIG. 6 is a schematic block diagram for compute units performing accumulate-compare-select (ACS) operations and depositing survivor decision words in one or more internal bit FIFO's;



FIG. 7 is a schematic diagram of an address generator of the bit FIFO of FIG. 6;



FIG. 8 is a schematic diagram of two compute units configured for this invention;



FIG. 9 is a schematic diagram of four compute units with their bit FIFO's arranged to fill and spill to external memory; and



FIG. 10 is a schematic diagram of a bit FIFO address generation from stage and state address.





DETAILED DESCRIPTION OF THE INVENTION

Aside from the preferred embodiment or embodiments disclosed below, this invention is capable of other embodiments and of being practiced or being carried out in various ways. Thus, it is to be understood that the invention is not limited in its application to the details of construction and the arrangements of components set forth in the following description or illustrated in the drawings. If only one embodiment is described herein, the claims hereof are not to be limited to that embodiment. Moreover, the claims hereof are not to be read restrictively unless there is clear and convincing evidence manifesting a certain exclusion, restriction, or disclaimer.


There is shown in FIG. 1 a conventional Viterbi encoder 10 which is shown simplified using a delay line 12 including just a six one bit delays 14, 16, 18, 20, 22 and 24. Each bit of input data be it one or zero is submitted to delay line 12 and propagates through one bit at a time. As each bit arrives two convolutions are performed. One by exclusive OR circuit 26 having inputs 28 to provide one output 30 and another by exclusive OR circuit 32 having inputs 34 to provide a second output 36.


Viterbi encoder 10, FIG. 1, is shown simplified for purposes of explanation and so too is the explanation of Viterbi decoder 40, FIG. 2. A fuller explanation of Viterbi coding and implementations for accomplishing it are well known and can be widely found in the literature, a recent sampling of which includes RECONFIGURABLE VITERBI DECODER FOR MOBILE PLATFORM, by Riswan Rasjeed et al., Mobile Communications Department, Institu Eurecom, Sophia Antipolis, France; VITERBI DECODING TECHNIQUES FOR THE TMS320C54X DSP GENERATION, Texas Instruments, Application Report SPRA071A, January 2002, pgs 1-8; U.S. Pat. No. 7,173,985, Diaz-Mareno et al., entitled: METHOD AND APPARATUS FOR IMPLEMENTING A VITERBI DECODER; U.S. Pat. No. 7,187,729, Nagano, entitled VITERBI DECODER; U.S. Patent Application Publication US2007/0044008A1, Chen et al., entitled: ACS CIRCUIT AND VITERBI DECODER WITH THE CIRCUIT; and U.S. Patent Application Publication US2007/0089043A1, Chae et al, entitled: VITERBI DECODER AND VITERBI DECODING METHOD, each of which is hereby incorporated in its entirety by this reference.


In Viterbi decoder 40 the noisy channel data arrives at input 42 to branch metric unit 44 here the cost to each state is determined and delivered to the add, compare select (ACS) circuit 46 which accumulates the cost to each state, compares them and selects the least costly in terms of the shortest Hamming distance as the state survivor path. By applying the ACS to all states in a stage the stage survivor decision word is generated. Typically, then, the shortest best-path is chosen as the optimal best-path to use for the trace back operation indicated at 48. In trace back the survivor decision word bits are used to trace backwards the maximum likelihood path through the Viterbi trellis, which reconstructs the bit sequence with the highest probability of matching the transmitted sequence. Typically Viterbi decoding uses a number of stages, each stage including a number of states. The states may be 16, 64, 128, 256. Likewise the number of stages in a decoded window may be in the tens, hundreds, or thousands. When a few number of stages are involved the optimal path is typically chosen as the shortest best-path, but when many stages are involved the convergence of the Viterbi approach is such that any of the best-paths, whether it be the shortest best-path or not, will through trace back arrive at the most likely value for the decoded bit.


The forward path operation for k=3 is shown in diagrammatic form in FIG. 3, there are three stages 60, 62, and 64, each of which contains eight states 0-7. Referring now to FIG. 2 and FIG. 3 together, branch metric unit 44 determines the cost to each state, line 66, 68, and lines 70, and 72. ASC 46 then determines the shortest or the lowest cost to each state or shortest Hamming distance. In this case, assume that this is 66, and so a zero is placed in the associated bit 74 of survivor decision word 76. This continues for all of the 0-7 states in each stage 60, 62, 64, creating what is known as a Viterbi trellis 77. A survivor decision word 76 is created for each stage so that a path is remembered for each path through the Viterbi trellis 77. At the end of the forward path (ACS) operation the decoder seeks the optimal path, for example, the shortest path of accumulated Hamming distances and begins the trace back as shown in FIG. 4, starting with the survivor bit 78 of the last survivor decision word 80 and working back from stage to stage to arrive at the most likely decoded data bit by following the path lines 82, 84, 86.


In accordance with this invention branch metric unit 44, FIG. 5, is followed by one or more compute units 90 which includes the ACS 46a as well as a bit FIFO for storing survivor decision words 92. In the trace back process a bit FIFO next address generator 94 which uses the survivor decision word 96 to generate the next bit FIFO address and also provide the decoded data bit 98.


In accordance with this invention FIG. 6, a number of compute units, for example, 100, 102, 104, and 106 may be used together and they may all deliver their survivor decision word bits to the bit FIFO 108 in compute unit 100. When that one is full it may use the bit FIFO unit in compute unit 102 and thence 104 and 106. With the availability of a number of compute units the work may be distributed so that compute unit 100 may service states 0 and 1, compute unit 102 states 2 and 3, compute unit 104 states 4 and 5, and compute unit 106 states 6 and 7.


The next address generator 94 and survivor decision word decoding 96, FIG. 5, are shown in more detail in FIG. 7. There are three registers, the output register 110 which receives the previous decoded output bit, address register 112 which holds the previous address, and the number of states register which holds the Viterbi constraint length. In fact the number of states per stage is two to the power of one less than the Viterbi constraint length. So if there are 8=23 states K=3 and the value in register 114 will be 3. This number can be changed as desired making the system wholly programmable for Viterbi decoding of any constraint length, thus garnering one of the great advantages of software implementations and yet providing the single cycle complete stage processing available only in hardware implementation. The output register 110 is shifted up by one position and the current state shifted out bit address bit <0> on line 115 is deposit as the new output decoded bit at register 118. The present address 112 is updated to the beginning of the next previous stage by subtracting from it at 120 two to the power of one less the Viterbi constrain length 114 or number of states 122 per stage to obtain the next previous stage address 124. The survivor bit from the previous bit FIFO 126 retrieval is used to create the next new state 128, the next new state is created by shifting the current (k−1) state window (where k is the Viterbi constraint length) by one and depositing the new survivor trace back bit as the new bit. The current state may be shifted up or down in accordance with the hardware implementation. The current state shifted out bit (the MSB bit of the (k−1) state window in the shift up case) is deposited as the new decoded output bit in register 110. The updated next new state 128 is added to the address 124 to create the bit level new address 130. In fact the address 124 created at 120 is the stage address, whereas the address created at 128 and presented on line 132 is the new state address. The stage and state address combined provide the new bit FIFO address at 130.


The invention may be implemented in conventional programmable compute units 150, 152, FIG. 8. Each compute unit responds to the processed state branch metric register 154, 156 and each has a pair of accumulated state cost registers 158, 160, 162, and 164 since each compute unit serves two states. Within each compute unit there are the accumulator functions 166, 168, and the comparator functions 170, 172 for providing to registers 176 and 178 the path with the shortest Hamming distance. The survivor decision bits <0>, <1>, <2> and <3> form the four states 158, 160, 162 and 164 processed by compute unit 150 and 152 are collected by bit FIFO 174 in compute unit 150 typically each compute unit includes such a bit FIFO and both can be used. The accumulated state cost and the branch metrics registers may be implemented with any of the existing compute unit register file. See also co-pending application by one or more of the inventors hereof entitled COMPUTE UNIT WITH AN INTERNAL BIT FIFO CIRCUIT, Ser. No. 11/728,358 filed on Mar. 26, 2007, hereby incorporated in its entirety by this reference.


In the case where the Viterbi decoded window is larger then the bit FIFO (1K of decision words for k=7), the spill and fill functionality of each bit FIFO 108, 108a, 108b, 108c, FIG. 9, for each compute unit 100, 102, 104, 106, respectively, as shown in FIG. 9 with reference to an LI memory 180 which is external to the compute units may be used to extend the bit FIFO to any required size. Spilling the Bit FIFO on the forward pass (ACS) every time the bit FIFO is out of space and filling it back during the trace back operation as needed.


The generation of the new address from the stage and state portions is illustrated in FIG. 10, where bit FIFO 108 is addressed by the stage address 190 to access the stage survivor word and the state address 192 for the particular state decision bit within the stage.


Although specific features of the invention are shown in some drawings and not in others, this is for convenience only as each feature may be combined with any or all of the other features in accordance with the invention. The words “including”, “comprising”, “having”, and “with” as used herein are to be interpreted broadly and comprehensively and are not limited to any physical interconnection. Moreover, any embodiments disclosed in the subject application are not to be taken as the only possible embodiments.


In addition, any amendment presented during the prosecution of the patent application for this patent is not a disclaimer of any claim element presented in the application as filed: those skilled in the art cannot reasonably be expected to draft a claim that would literally encompass all possible equivalents, many equivalents will be unforeseeable at the time of the amendment and are beyond a fair interpretation of what is to be surrendered (if anything), the rationale underlying the amendment may bear no more than a tangential relation to many equivalents, and/or there are many other reasons the applicant can not be expected to describe certain insubstantial substitutes for any claim element amended.


Other embodiments will occur to those skilled in the art and are within the following claims.

Claims
  • 1. A programmable compute unit for executing a Viterbi decode, the programmable compute unit comprising: forward-path circuitry for (i) accumulating a best path to each state in each stage in a decoded window of received data in an internal register and (ii) storing in a bit FIFO, in parallel, a plurality of survivor trace-back information bits for a plurality of states;trace-back circuitry for selecting an optimal best path through a Viterbi trellis comprising the accumulated best paths by reading, one bit at a time, trace-back information bits from the bit FIFO, beginning with a survivor bit of a last-stage best path; andoutput circuitry for generating, in response at least in part to a current bit FIFO address, (i) a next bit FIFO address by subtracting a given Viterbi constraint length factor from higher-order bits of the current bit FIFO address and concatenating, to a result of the subtraction, lower-order bits of the current bit FIFO address and a survivor trace-back information bit and (ii) a decoded output bit for a stage previous to a current stage.
  • 2. The programmable compute unit of claim 1 in which said next bit FIFO address includes a stage field and a new state field.
  • 3. The programmable compute unit of claim 2 in which said stage field is updated by a number of states per stage to thereby point to a beginning of the stage previous to the current stage.
  • 4. The programmable compute unit of claim 2 in which the new state field comprises a new state created by shifting a current state by one and in which a shifted-out bit of the current state is a new decoded output bit.
  • 5. The programmable compute unit of claim 4 in which the new survivor trace back bit is deposited as a new bit in the current state.
  • 6. The programmable compute unit of claim 5 in which the new survivor trace back bit is deposited as a new LSB bit in the current state.
  • 7. The programmable compute unit of claim 1 in which said bit FIFO collects a trace back survivor bit from all of a plurality of compute units.
  • 8. The programmable compute unit of claim 1 in which said bit FIFO fills and spills data to and from an external memory using 32 bit words.
  • 9. The programmable compute unit of claim 8 in which said 32 bit words are memory aligned.
  • 10. The programmable compute unit of claim 1 in which said internal register is a register file in the programmable compute unit.
  • 11. The programmable compute unit of claim 1 in which said current bit FIFO address points to current stage survivor trace back bit information in said bit FIFO needed to generate a previous trellis state.
  • 12. The programmable compute unit of claim 1 further including multiple compute units comprising multiple bit FIFOs.
  • 13. The programmable compute unit of claim 12 in which said multiple bit FIFOs and said single bit FIFO collect trace-back information from all compute units.
  • 14. A system for executing a Viterbi decode on a received window of input data, the system comprising: an address register for storing a current address comprising a stage number and a state in a Viterbi trellis, the trellis corresponding to the received window of input data;a number-of-states register for storing a given Viterbi constraint length;a bit FIFO for storing a plurality of survivor decision words and for providing, given the current address in the address register, one survivor decision bit per cycle;a next-address generator for generating a next address based at least in part on the current address, the Viterbi constraint length, and the survivor decision bit by (i) subtracting a given Viterbi constraint length factor from higher-order bits of the current address and (ii) concatenating, to a result of the subtraction, lower-order bits of the current address and the survivor decision bit; andan output register for providing a decoded output stream based on bits received from the address register.
  • 15. A method for executing a Viterbi decode on a received window of input data, the method comprising: addressing a bit FIFO using a current address to thereby read out a single survivor trace-back information bit;generating a next address of a state in a Viterbi trellis, the trellis corresponding to the received window of input data, by: i. subtracting a given Viterbi constraint length factor from higher-order bits of the current address, andii. concatenating, to a result of the subtraction, lower-order bits of the current address and the survivor decision bit; andshifting a bit of a current address into an output register, the output register comprising decoded data.
  • 16. The method claim 15 in which the Viterbi constraint length factor comprises a result of raising two to the power of one less than the Viterbi constraint length or number of states per stage.
  • 17. The method claim 15 in which concatenating the lower-order bits of the current address and the survivor decision bit comprises shifting the result of the subtraction and the lower-order bits of the current address left by one and depositing the new survivor decision bit as the new bit.
US Referenced Citations (79)
Number Name Date Kind
3303477 Voigt Feb 1967 A
3805037 Ellison Apr 1974 A
3959638 Blum et al. May 1976 A
4757506 Heichler Jul 1988 A
5031131 Mikos Jul 1991 A
5062057 Blacken et al. Oct 1991 A
5101338 Fujiwara et al. Mar 1992 A
5260898 Richardson Nov 1993 A
5287511 Robinson et al. Feb 1994 A
5351047 Behlen Sep 1994 A
5386523 Crook et al. Jan 1995 A
5530825 Black et al. Jun 1996 A
5537579 Hiroyuki et al. Jul 1996 A
5666116 Bakhmutsky Sep 1997 A
5675332 Limberg Oct 1997 A
5689452 Cameron Nov 1997 A
5696941 Jung et al. Dec 1997 A
5710939 Ballachino et al. Jan 1998 A
5819102 Reed et al. Oct 1998 A
5832290 Gostin et al. Nov 1998 A
5937438 Raghunath et al. Aug 1999 A
5961640 Chambers et al. Oct 1999 A
5970241 Deao et al. Oct 1999 A
5996057 Scales, III et al. Nov 1999 A
5996066 Yung Nov 1999 A
6009499 Koppala Dec 1999 A
6029242 Sidman Feb 2000 A
6061749 Webb et al. May 2000 A
6067609 Meeker et al. May 2000 A
6094726 Gonion et al. Jul 2000 A
6134676 VanHuben et al. Oct 2000 A
6138208 Dhong et al. Oct 2000 A
6151705 Santhanam Nov 2000 A
6223320 Dubey et al. Apr 2001 B1
6230179 Dworkin et al. May 2001 B1
6263420 Tan et al. Jul 2001 B1
6272452 Wu et al. Aug 2001 B1
6272661 Yamaguchi Aug 2001 B1
6285607 Sinclair Sep 2001 B1
6289487 Hessel et al. Sep 2001 B1
6292923 Genrich et al. Sep 2001 B1
6332188 Garde et al. Dec 2001 B1
6430672 Dhong et al. Aug 2002 B1
6480845 Egolf et al. Nov 2002 B1
6539477 Seawright Mar 2003 B1
6587864 Stein et al. Jul 2003 B2
6757806 Shim Jun 2004 B2
6771196 Hsiun Aug 2004 B2
6829694 Stein et al. Dec 2004 B2
7173985 Diaz-Manero et al. Feb 2007 B1
7187729 Nagano Mar 2007 B2
7243210 Pedersen et al. Jul 2007 B2
7331013 Rudosky et al. Feb 2008 B2
7424597 Lee et al. Sep 2008 B2
7673224 Chakraborty Mar 2010 B2
7728744 Stein et al. Jun 2010 B2
7861146 Watanabe et al. Dec 2010 B2
7882284 Wilson et al. Feb 2011 B2
20030085822 Scheuermann May 2003 A1
20030103626 Stein et al. Jun 2003 A1
20030133568 Stein et al. Jul 2003 A1
20030149857 Stein et al. Aug 2003 A1
20030196072 Chinnakonda et al. Oct 2003 A1
20030229769 Montemayor Dec 2003 A1
20040145942 Leijten-Nowak Jul 2004 A1
20040193850 Lee et al. Sep 2004 A1
20040210618 Stein et al. Oct 2004 A1
20050086452 Ross Apr 2005 A1
20050228966 Nakamura Oct 2005 A1
20050267996 O'Connor et al. Dec 2005 A1
20060143554 Sudhakar et al. Jun 2006 A1
20060271763 Pedersen et al. Nov 2006 A1
20070044008 Chen et al. Feb 2007 A1
20070089043 Chae et al. Apr 2007 A1
20070094474 Wilson et al. Apr 2007 A1
20070094483 Wilson et al. Apr 2007 A1
20070277021 O'Connor et al. Nov 2007 A1
20080010439 Stein et al. Jan 2008 A1
20080244237 Wilson et al. Oct 2008 A1
Foreign Referenced Citations (7)
Number Date Country
04092921 Mar 1992 JP
06110852 Apr 1994 JP
2001210357 Aug 2001 JP
02290494 Oct 2002 JP
05513541 May 2005 JP
WO-9610226 Apr 1996 WO
WO-03067364 Aug 2003 WO
Related Publications (1)
Number Date Country
20090089649 A1 Apr 2009 US