This invention relates to digital signal processors for wireless mobile and base station applications and, more particularly, to the use of digital signal processors for turbo and Viterbi channel decoding in wireless base stations.
Second and third generation wireless systems employ channel coding and decoding algorithms and spread spectrum techniques to enhance transmission reliability. In third generation wireless systems, a convolutional coding scheme is specified for voice transmission, and a parallel concatenated convolutional coding (PCCC) scheme is specified for data transmission. The convolutional encoded data is decoded using the Viterbi decoding algorithm, and the PCCC encoded data is decoded using a turbo decoding algorithm. The turbo and Viterbi decoding schemes are trellis-based algorithms.
Viterbi and turbo decoder algorithms are extremely computational intensive. The forward error correction, or channel decoding, block in a wireless base station can approach 80% of the symbol rate processing in the software radio. Proposed approaches to executing these algorithms within the allotted time constraints have included the use of ASICs and the use of a hardware block having the most basic components in a digital signal processor.
A digital signal processor, rather than an ASIC, is a desirable solution because of its software programmability. However, no currently available digital signal processor can handle the complete chip and symbol rate processing requirement of the software radio. System designers are therefore researching solutions which use a digital signal processor and an ASIC or an ASIC alone to handle the symbol rate processing. At a minimum, the ASIC would execute the forward error correction.
A digital signal processor having dual computation units, wide memory buses and the ability to handle multiple tasks in parallel is disclosed in U.S. Pat. No. 5,896,543 issued Apr. 20, 1999 to Garde. The disclosed digital signal processor delivers extremely high performance, but as currently configured cannot efficiently execute the forward error correction of a wireless base station within the allotted time constraints.
Accordingly, there is a need for improved implementations of the turbo and Viterbi channel decoding algorithms used in wireless systems.
According to a first aspect of the invention, a method is provided for calculating metrics of a trellis function in a digital signal processor. The metrics of the trellis function are calculated for selected trellis states in response to trellis state metrics for a time t0 and transition metrics from time t0 to time t1 specified by a trellis instruction. The calculations for each selected trellis state include adding a transition metric to a first state metric for time t0 to provide a first value, subtracting the transition metric from a second state metric for time t0 to provide a second value, comparing the corresponding first and second values, and selecting the maximum of the corresponding first and second values to provide trellis state metrics for time t1.
The method may further comprise the step of, for each selected trellis state, adding to the maximum value a correction factor that is a function of the corresponding first and second values. The step of adding a correction factor may comprise accessing a lookup table containing correction factors.
In one embodiment, the trellis instruction implements a forward trellis function for calculating α trellis state metrics. In another embodiment, the trellis instruction implements a reverse trellis function for calculating β trellis state metrics. In yet another embodiment, the trellis instruction simultaneously implements a forward trellis function for calculating α trellis state metrics and a reverse trellis function for calculating β trellis state metrics, using a single instruction, multiple data approach.
According to another aspect of the invention, a method is provided for calculating metrics of a trellis function in a digital signal processor. In response to α metrics for a time t0 and transition metrics from time t0 to time t1 specified by a trellis instruction, an α metric is calculated for selected trellis states for time t1. In response to β metrics for a time t2 and transition metrics from time t2 to t1 specified by the trellis instruction, a β metric is calculated for the selected trellis states for time t1.
The step of calculating an α metric for the selected trellis states may comprise the steps of, for each selected trellis state, adding a transition metric to a first α metric for time t0 to provide a first value and subtracting the transition metric from a second α metric for time t0 to provide a second value, for each selected trellis state, comparing the corresponding first and second values, and selecting the maximum of the corresponding first and second values for each selected trellis state to provide α metrics for time t1.
The step of calculating a β metric for the selected trellis states may comprise the steps of, for each selected trellis state, adding a transition metric to a first β metric for time t2 to provide a first value and subtracting the transition metric from a second β metric for time t2 to provide a second value, for each selected trellis state, comparing the corresponding first and second values, and selecting the maximum of the corresponding first and second values for each selected trellis state to provide β metrics for time t1.
The steps of calculating an α metric and calculating a β metric may each further comprise the step of, for each selected trellis state, adding to the maximum value a correction factor that is a function of the corresponding first and second values. The steps of calculating an α metric and calculating a β metric may be performed simultaneously.
According to a further aspect of the invention, a method is provided for calculating a log MAP function in a digital signal processor. A log MAP instruction specifies locations of first, second, third and fourth parameters. The sum or difference of the first and second parameters is calculated to provide a first value, and the sum or difference of the third and fourth parameters is calculated to provide a second value. The maximum of the first and second values is selected. Then a correction factor that is a function of the first and second values is added to the maximum value to provide log MAP result. The step of adding a correction factor may comprise accessing a lookup table containing correction factors.
A digital signal processor may comprise a memory for storing instructions and operands for digital signal computations, a program sequencer for generating instruction addresses for fetching selected ones of the instructions from the memory, and a computation block comprising a register file for temporary storage of operands and results, and an accelerator for performing the operations described above, either separately or in any combination. In a preferred embodiment, the digital signal processor comprises two or more computation blocks for performing multiple operations in parallel.
According to a further aspect of the invention, an accelerator is provided for use in the digital signal processor computation block. The accelerator comprises a first carry save adder for receiving inputs to the accelerator, a first full adder for combining sum and carry outputs of the first carry save adder, a lookup table for generating a correction factor in response to the output of the first full adder, a multiplexer for selecting one or more of the inputs to the accelerator in response to the sign of the output of the first full adder, a second carry save adder for adding one or more outputs of the multiplexer and the output of the lookup table, and a second full adder for combining sum and carry outputs of the second carry save adder.
The first carry save adder and the first full adder may comprise a first pipeline stage; the lookup table, the multiplexer and the second carry save adder may comprise a second pipeline stage; and the second full adder may comprise a third pipeline stage. In a preferred embodiment, the accelerator further comprises a data selector for supplying the sum and carry outputs of the second carry sum adder to the inputs of the first carry sum adder.
For a better understanding of the present invention, reference is made to the accompanying drawings, which are incorporated herein by reference and in which:
A block diagram of an example of a wireless base station signal chain is shown in
Aspects of the present invention are directed to implementations of the channel decoding block 34 in a digital signal processor. The channel coding block 22 may utilize a convolutional code for voice or low data rate transmission and a PCCC scheme for high data rate transmission. The channel decoding block 34 may utilize a Viterbi decoding algorithm for voice and a turbo decoding algorithm for data.
A simplified block diagram of an example of a turbo decoder is shown in
The turbo and Viterbi channel decoding algorithms are trellis-based algorithms performed on blocks of received data. An example of an eight state trellis typically used in wireless systems is shown in
An equation for calculating alpha metrics for each trellis state is shown in
An application of the equation of
α0′=MAX[α0+γ0, α4−γ0]+C0′ (1)
α1′=MAX[α0−γ0, α4+γ0]+C1′ (2)
where Co′ and C1′ are correction factors that depend on the values of α0 and α4 as shown in
Thus, the alpha metrics for each state are calculated by algebraically summing, for each of two previous states from which a transition to the current state is possible, the alpha metric of the previous state and the transition metric for a transition from the previous state to the current state to provide two values. Then, the maximum of the two valves is selected. The correction factor is added to the selected maximum value. As described below, the correction factor may be obtained from a lookup table. The alpha metrics may be calculated for each state in the trellis in a similar manner. Likewise, the equation of
The log likelihood ratio is also calculated in connection with channel decoding. The log likelihood ratio is the log of a ratio of the profitability of state 1 to the probability of state 0. An equation for calculating log likelihood ratio is shown in
A block diagram of an example of a digital signal processor (DSP) 110 suitable for implementing features of the present invention is shown in
The memory 116 may include three independent, large capacity memory banks 140, 142 and 144. In a preferred embodiment, each of the memory banks 140, 142 and 144 has a capacity of 64 K words of 32 bits each. As discussed below, each of the memory banks 140, 142 and 144 preferably has a 128-bit data bus. Up to four consecutive aligned data words of 32 bits each can be transferred to or from each memory bank in a single clock cycle.
The elements of the DSP 110 are interconnected by buses for efficient, high speed operation. Each of the buses includes multiple lines for parallel transfer of binary information. A first address bus 150 (MA0) interconnects memory bank 140 (M0) and control block 124. A second address bus 152 (MA1) interconnects memory bank 142 (M1) and control block 124. A third address bus 154 (MA2) interconnects memory bank 144 (M2) and control block 124. Each of the address buses 150, 152 and 154 is preferably 16 bits wide. An external address bus 156 (MAE) interconnects external port 128 and control block 124. The external address bus 156 is interconnected through external port 128 to external address bus 158. Each of the external address buses 156 and 158 is preferably 32 bits wide. A first data bus 160 (MD0) interconnects memory bank 140, computation blocks 112 and 114, control block 124, link port buffers 126, IAB 132 and external port 128. A second data bus 162 (MD1) interconnects memory bank 142, computation blocks 112 and 114, control block 124, link port buffers 126, IAB 132 and external port 128. A third data bus 164 (MD2) interconnects memory bank 144, computation blocks 112 and 114, control block 124, link port buffers 126, IAB 132 and external port 128. The data buses 160, 162 and 164 are connected through external port 128 to external data bus 168. Each of the data buses 160, 162 and 164 is preferably 128 bits wide, and external data bus 168 is preferably 64 bits wide.
The first address bus 150 and the first data bus 160 comprise a bus for transfer of data to and from memory bank 140. The second address bus 152 and the second data bus 162 comprise a second bus for transfer of data to and from memory bank 142. The third address bus 154 and the third data bus 164 comprise a third bus for transfer of data to and from memory bank 144. Since each of the memory banks 140, 142 and 144 has a separate bus, the memory banks 140, 142 and 144 may be accessed simultaneously. As used herein, “data” refers to binary words, which may represent either instructions or operands that are associated with the operation of the DSP 110.
In a typical operating mode, program instructions are stored in one of the memory banks, and operands are stored in the other two memory banks. Thus, at least one instruction and two operands can be provided to the computation blocks 112 and 114 in a single clock cycle. Each of the memory banks 140, 142 and 144 may be configured to permit reading and writing of multiple data words in a single clock cycle. The simultaneous transfer of multiple data words from each memory bank in a single clock cycle is accomplished without requiring an instruction cache or a data cache.
As indicated above, each of the memory banks 140, 142 and 144 preferably has a capacity of 64 K words of 32 bits each. Each memory bank may be connected to a data bus that is 128 bits wide. In an alternative embodiment, each data bus may be 64 bits wide, and 64 bits are transferred on each of clock phase 1 and clock phase 2, thus providing an effective bus width of 128 bits. Multiple data words can be accessed in each memory bank in a single clock cycle. Specifically, data can be accessed as single, dual or quad words of 32 bits each.
Using quad word transfers, four instructions and eight operands, each of 32 bits, can be supplied to the computation blocks 112 and 114 in a single clock cycle. The number of data words transferred and the computation block or blocks to which the data words are transferred are selected by control bits in the instruction. The single, dual or quad data words can be transferred to computation block 112, to computation block 114, or to both. Dual and quad data word accesses improve the performance of the DSP 110 in many applications by allowing several operands to be transferred to the computation blocks 112 and 114 in a single clock cycle. The ability to access multiple instructions in each clock cycle allows multiple operations to be executed in each clock cycle, thereby improving performance.
A block diagram of an embodiment of each of the computation blocks 112 and 114 is shown in
The computation block shown in
Each of the computation blocks 112 and 114 in the DSP includes the accelerator 216 for enhanced performance in wireless base stations. The accelerator includes registers for temporary storage of data and control values and accelerator circuitry for executing specified instructions. The structure and operation of the accelerator 216 are described in detail below.
It will be understood that the DSP 110 is described by way of example only. Features of the present invention may be implemented in different digital signal processor architectures.
A data flow diagram of the operations performed by each accelerator in response to an ACS, or trellis, instruction is shown in
The MAX/TMAX units 270, 272, 274 and 276 each perform one of two functions that may be specified in the trellis instruction. In the MAX function, the maximum of the two inputs is selected and is stored in quad register TRsq. In the TMAX function, the maximum of the two inputs is selected and a correction value is added to the selected maximum value. The sum is stored in quad register TRsq. The correction factor is a function of the two inputs to the MAX/TMAX unit. As described below, the correction factor can be determined from a lookup table. The MAX/TMAX units 270, 272, 274 and 276 each provide an output bit to a bit selection register pair THRs. Each output bit indicates the input that was selected as the maximum value.
In the embodiment of
In another embodiment of the trellis instruction, shown in
An example of software code for calculating alpha metrics and beta metrics of a trellis function is shown in
In the first instruction line of
The execution of the first two instruction lines in the software code of
As shown in
Referring now to
The accelerator circuits 300 and 302 perform four butterfly calculations in a first DSP cycle as shown in
A data flow diagram that illustrates operations performed in response to a first type of log MAP instruction is shown in
A data flow diagram that illustrates operations performed in response to a second type of log MAP instruction is shown in
An example of software code for calculating the log likelihood ratio of a trellis function is shown in
In the first instruction line of
The execution of the first instruction line in the software code of
An embodiment of each accelerator circuit 300, 302 (
The accelerator circuit shown in
MAX(TRmd+Rm, TRnd−Rm)+C (3)
where C is the optional correction factor. The MAX operation is equivalent to subtracting the two values in parentheses to obtain:
TRmd−TRnd+2Rm (4)
The circuit then determines whether the value of expression (4) is positive or negative. When the value of expression (4) is positive, the first term within parentheses in expression (3) is the maximum value, and when this value is negative, the second term within parentheses in expression (3) is the maximum value.
When an ACS instruction is being executed, the data value in register TRmd is supplied to inputs OP1 and OP2, the data value in register TRnd is supplied to input OP3 and the data value 2Rm is supplied to input OP4. The output of 32-bit adder 426 represents the value of expression (4) above. This value is used to access a correction factor in lookup table 432. The sign of the output of 32-bit adder 426 is used as a control signal for multiplexer 430, thereby selecting TRmd and Rm or TRnd and Rm. The selected values and the output of lookup table 432 are supplied to inputs of carry save adder 434. The output of 32-bit adder 440 represents the selected maximum value plus the correction factor C provided by lookup table 432. In order to reduce the execution time of the ACS instruction to two pipeline cycles, the output of carry save adder 434 may be supplied to inputs OP1 and OP2 of carry save adder 424. When the result of a previous ACS instruction is being used, the carry output of adder 424 is supplied through multiplexer 442 to input OP1 and the sum output of adder 424 is supplied through multiplexer 442 to input OP2. When the input to the accelerator is provided from a register, then the bypass function is not utilized and the register input is supplied through multiplexer 442 to input OP1. In cases where the correction factor is not utilized in the ACS instruction, the output of lookup table 432 is zero.
The accelerator circuit shown in
MAX(TRmd+Rm3, TRnd+Rm1)+C (5)
The MAX operation in expression (5) is equivalent to subtracting the two values as follows.
TRmd+Rm3−TRnd−Rm1 (6)
The circuit then determines whether the value of expression (6) is positive or negative. When the value of expression (6) is positive, the first term within parentheses in expression (5) is the maximum, and when the value of expression (6) is negative, the second term within parentheses in expression (5) is the maximum.
Referring again to
While there have been shown and described what are at present considered the preferred embodiments of the present invention, it will be obvious to those skilled in the art that various changes and modifications may be made therein without departing from the scope of the invention as defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
4823346 | Kobayashi et al. | Apr 1989 | A |
5220570 | Lou et al. | Jun 1993 | A |
5291499 | Behrens et al. | Mar 1994 | A |
5295142 | Hatakeyama | Mar 1994 | A |
5412669 | Foland, Jr. | May 1995 | A |
5490178 | Blaker et al. | Feb 1996 | A |
5511081 | Hagenauer | Apr 1996 | A |
5742621 | Amon et al. | Apr 1998 | A |
5933462 | Viterbi et al. | Aug 1999 | A |
5987490 | Alidina et al. | Nov 1999 | A |
6028899 | Petersen | Feb 2000 | A |
6192084 | Miyauchi et al. | Feb 2001 | B1 |
6343368 | Lerzer | Jan 2002 | B1 |
6393076 | Dinc et al. | May 2002 | B1 |
6848074 | Coombs | Jan 2005 | B2 |
20010024474 | Rakib et al. | Sep 2001 | A1 |
20040019843 | Kishino | Jan 2004 | A1 |
Number | Date | Country |
---|---|---|
1 128 560 | Aug 2001 | EP |
1 204 211 | May 2002 | EP |
Number | Date | Country | |
---|---|---|---|
20030028845 A1 | Feb 2003 | US |