1. Field
The present invention relates generally to digital communications systems, and more specifically to a method and hardware architecture for parallel channel interleaving and de-interleaving.
2. Background
Digital communication systems use numerous techniques to increase the amount of information transferred while minimizing transmission errors. In these communication systems, the information is typically represented as a sequence of binary bits or blocks of bits called frames. The binary information is modulated to signal waveforms and transmitted over a communication channel. Communication channels tend to introduce noise and interference that corrupt the transmitted signal. At a receiver, the received information may be corrupted and is an estimate of the transmitted binary information. The number of bit errors or frame errors depends on the amount of noise and interference in the communication channel.
To counter the effects of transmission channel corruption, channel interleaving error correction coding is often used in digital communication systems to protect the digital information from noise and interference and reduce the number of bit/frame errors. Channel interleaving is employed in most modern wireless communications systems to protect against burst errors. A channel interleaver reshuffles encoded symbols in such a way that consecutive symbols are spread apart from each other as far as possible in order to break the temporal correlation between successive symbols involved in a burst of errors. A reverse de-interleaving operation is performed at the receiver side before feeding the symbols to a channel decoder. Typically, this interleaving, and subsequent de-interleaving are performed in an inefficient serial manner.
There is therefore a need in the art for a faster and more efficient parallel method of interleaving and de-interleaving having improved performance. Moreover, there is a need for the improved parallel interleaving and de-interleaving to have an easily implemented hardware architecture that can be constructed using basic logic gates with a short critical path delay.
a) is a block diagram of an exemplary convolutional encoder that can be used for parallel interleaving;
b) is a block diagram of a parallel interleaver for convolutionally encoded sub-packets;
a) is a block diagram of an exemplary turbo encoder that can be used for parallel interleaving;
b) is a block diagram of a parallel interleaver for turbo encoded sub-packets;
a) shows a block diagram of an exemplary convolutional channel deinterleaver datapath;
b) details exemplary functionality of the PBRI block illustrated in
a) further details exemplary operations of the turbo codeword splitter and the PBRI blocks; and
b) details exemplary deinterleaved/depunctured rate-⅕ and rate-⅓ turbo packet structures.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
The word “packet” is used herein to mean formatted unit of data. The word “subpacket” is used herein to mean a portion of a packet. The words “packet” and “subpacket” may be used interchangeably herein.
CRC bits are appended by CRC inserter 104 to the incoming information bits of a subpacket for the purpose of error detection at a receiver. The CRC bits are generated by clocking a 24-stage shift-register with a generator function. In one embodiment, the generator function is 0x1864CFB. The shift-register is initialized to logic one, and then clocked a number of times equal to the number of input bits in the subpacket. Next, the register is clocked an additional number of times equal to the number of CRC bits to be generated (≦24) with the input disconnected. These output bits generated from the shift-register constitute the CRC bits and are appended to the incoming information bits in the order they are generated. For data channel packets, the number of CRC bits is 24. For other channels some of the CRC bits may be truncated.
a) is a block diagram of an exemplary embodiment of the encoder 106 component of the parallel interleaver coding and interleaving structure 100, for performing convolutional encoding. A rate-⅓ convolutional code is used to encode CRC-padded control channel subpackets as well as CRC-padded data channel packets of length (Nv) 128 bits or less. The constraint length of the rate-⅓ code is nine for generator functions g0 106a, g1 106b, and g2 106c. In one exemplary embodiment, the generator functions are g0=557(8), g1=663(8) and g2=711(8) for memory elements D1, D2 D3, D4, D5, D6, D7 and D8. The input bits to the encoder consist of the CRC-padded subpacket appended with eight all-zero tail bits to reset the encoder state when encoding of a packet is complete. The length of an encoded packet in this exemplary embodiment is 3(Nv+8).
a) is a block diagram of another embodiment of encoder 106 for a rate-⅕ turbo code encoder used to encode CRC-padded data channel subpackets of length 128 bits or more. The constituent codes of the turbo code are two (302 and 304) rate-⅓, contraint-length-4, systematic, recursive convolutional encoders with identical transfer functions of G(D)=[1 n0(D)/d(D) n1(D)/d(D)], where d(D)=1+D2+D3, n0(D)=1+D+D3, and n1(D)=1+D+D2+D3.
The turbo encoder 106 generates 5NT+18 encoded output bits, where NT is the number of encoder input bits. The first 5NT output bits are the encoder output data bits, which are generated by clocking the constituent encoders (302 and 304) once for every input bit with the switches in the upwards position, and puncturing out the systematic output bits X′ from the second constituent encoder 304 with symbol puncturing component 306. The sequence of constituent encoder (302 and 304) output bits for each input bit is XY0Y1Y′0Y′1. The last 18 tail bits are generated after the constituent encoder (302 and 304) has been clocked for NT cycles with the switches held in the upwards position. The first 9 tail bits are generated by clocking the first constituent encoder 302 three times with the switch held in the downwards position, with the sequence of output bits being XY0Y1. The last nine tail bits are generated by clocking the second constituent encoder 304 three times with the switch held in the downwards position while constituent encoder 1302 is not clocked. The sequence of output bits from constituent encoder 2304 in this case is X′Y′0Y′1. The 18 tail bits ensure that both constituent encoders (302 and 304) are reset to the all-zero state after encoding a subpacket.
The exemplary turbo channel interleaver 308 is based on Linear Congruential Sequences (LCS). It interleaves subpackets of length between 128 bits and 16,384 bits, but can be applied to any arbitrary length. The sequence of interleaver output addresses generated by an LCS turbo interleaver 308 is equivalent to the sequence obtained by the following process. A 2D R×C array is filled with a sequence of linear addresses row by row from top to bottom, the entries of the array are shuffled according to a procedure to be described next, and the resulting shuffled entries are read column by column from left to right. The shuffling of the array entries is based on applying an independent permutation to the column entries in every row, and then permuting the order of the rows. First, a small positive integer r is chosen depending on the memory bank architecture of the interleaver. In one embodiment, r is set to 5 so that the interleaver memory is composed of 32 banks.
Next, the smallest positive integer n such that NT≦2r+n is determined. This is equivalent to finding the smallest sized 2r×2n array that can hold the NT entries. The 2n entries of each row are interleaved independently using a linear congruential sequence recursion whose parameters are determined using a 2D look-up table (LUT) based on the row index and n. The result of this operation is a set of new interleaved column indices. Next, the 2r rows are shuffled in bit-reversed order. The result of this operation is a set of new interleaved row indices.
Finally, the interleaved addresses are formed by concatenating the corresponding interleaved column and row indices in opposite order with respect to their order in the linear address. The last step is equivalent to reading the interleaved array entries in the opposite order (i.e., column by column) to which it was filled in (i.e., row by row). If the resulting interleaved address is greater than or equal to NT1, then it is pruned away and the same operations are repeated on the next consecutive address in linear order.
Let x be an (r+n)-bit linear address, and y=ρr,n(x) be the corresponding (r+n)-bit turbo-interleaved address. Then, ρr,n(x) is given by:
With π being indicative of an interleaver, πr is the r-bit reversal function and the LUT is a 2D look-up table that stores the moduli of the 2r LCS recursions for n=2, 3, . . . , 9.
Channel Interleaving
Channel interleaving is applied to encoded subpackets by channel interleaver 108 as shown in
Referring to
b) is a block diagram of a channel interleaver 108 for turbo encoded sub-packets. For turbo encoded subpackets, the encoder 106 5NT output data bits are demultiplexed by demultiplexer 302 into five sequences U, V0, V1, V′0, V′1. The first encoder 106 output bit goes to the U sequence, the second to the V0 sequence, the third to the V1 sequence, the fourth to the V′0 sequence, the fifth to the V′1 sequence, the sixth to the U sequence, etc. The last 18 tail bits, numbered from 0 to 17, are demultiplexed as follows: Tail bits 0, 3, 6, 9, 12, and 15 go to sequence U, tail bits 1, 4, and 7 go to sequence V0, tail bits 2, 5, and 8 go to sequence V1, tail bits 10, 13, and 16 go to sequence V′0, and tail bits 11, 14, and 17 go to sequence V′1. As a result, sequence U has length NT+6, while the other four sequences have length NT+3.
Next, the demultiplexed sequences are bit-permuted using five Pruned Bit Reversal Interleavers (PBRIs) (304a-304e) into three separate interleaved blocks, denoted as π(U), π(V0/V′0), and π(V1/V′1), as follows. The sequence U is permuted into the block π(U) using a length NT+6 PBRI. Sequences V0 and V′0 are each permuted independently using a length NT+6 PBRI into sequences π(V0) and π(V′0), respectively. The two sequences π(V0) and π(V′0) are then combined into the block π(V0/V′0) by selecting bits from the two sequences in an alternating fashion, starting with π(V0). The length of the resulting π(V0/V′0) block is 2NT+6. Similarly, the block π(V1/V′1) is generated from the two sequences V1 and V′1. The three output blocks are then concatenated by Multiplexer 306 into the sequence π(U)/π(V0/V′0)/π(V1/V′1) and generated as the channel interleaver 108 output.
Depending on the channel type and the hybrid automatic repeat-request (HARQ) interlacing structure determined by the Medium Access Control (MAC) protocol, four code rates of ⅕, ⅓, ½ and ⅔ are supported by puncturing bits from the blocks π(V0/V′0) and π(V1/V′1) as detailed below in
The functions described above used to generate the interleaver addresses are invertible. Thus, their inverse interleaver functions can be used to generate deinterleaver addresses.
Subpackets can be dispatched in parallel to both datapaths (506 and 508). After decoding is complete, information bits of decoded subpackets from both datapaths (506 and 508) are formatted and written into an external packet memory 512 via an arbiter 510. Due to requirements of the system under consideration, the architecture of the symbol LLR memory 504, its bandwidth, and the way symbols are stored were designed to be standard-independent, with support for subpacket specifications for other standards such as LTE and WiMAX. Hence, the interface to LLR memory 504 can be made generic by employing a codeword reader from LLR memory 504 through the buffer manager 502, and by providing internal buffering of subpackets to be decoded inside the deinterleaver.
Deinterleaving for the Viterbi Convolutional Code Datapath
a) shows a more detailed block diagram of the exemplary convolutional channel deinterleaver datapath 506. The codeword reader 602 copies a packet length of LLR symbols into internal codeword buffer 604 where symbols are stored in sequential order. The subpacket length, puncturing length, and packet destination address are propagated along with the packet through each block of
The interleaving structure shown in
The PBRI block 606 must generate a valid interleaved address every clock cycle. However, due to the pruning operation, the bit-reversed address of a linear address x might not be a valid address, i.e. π(x)≧NT+8. The property is used that two consecutive linear addresses cannot both have invalid interleaved addresses in generating a valid interleaved address every clock cycle as shown using the sequential PBRI in
b) details the functionality of the PBRI block illustrated in
With reference to
Deinterleaving for the Turbo Code Datapath
a) further details exemplary operations 800 of the codeword splitter 706 and the PBRI blocks 710.
The input demux 802 decomposes the incoming stream of symbols into πι(U), πι(V0/V0′), and πι(V1/V1′). A splitter 804 further splits πι(V0/V0′) into πι(V0) and πι(V0′), and πι(V1/V1′) into πι(V1) and πι(V1′). The five streams πι(U), πι(V0), πι(V0′), πι(V1), pi(V1′) are then independently permuted using a PBRI 806 to generate the streams U, V0, V0′, V1, V1′. The structure of these streams is shown at the output of the PBRI blocks 806 in the figure: U contains NT data symbols and 6 tail symbols, while the remaining 4 streams each contain NT data symbols and 3 tails symbols. A switch 808 then performs depuncturing operations on these streams to produce the outputs, sym0-sym4. For rate-⅕ depuncturing, five streams are produced (U, V0, V1, V0′, V1′ in this order), while for rate-⅓ depuncturing only 3 streams are produced (U, V0, V0′ in this order). Zeros are inserted in the tail symbols.
Depuncturing operations performed on the tail bits are further detailed below. Note that these streams must be passed to the turbo decoder 714, which expects five symbols corresponding to the five turbo encoder output bits generated for every input bit when decoding a rate-⅕ code. The first three symbols are passed to the first constituent decoder 302, while the first and last two symbols are passed to the second constituent decoder 304. Hence for the six tail bits, a total of 30 symbols are required. However, the five deinterleaved sequences generate only 6+4*3=18 tail symbols. The remaining 12 symbols are inserted by extending the four sequences V0, V′0, V1 and V′1 into NT+6 symbols and filling in zeros as shown. For compatibility issues with earlier EV-DO Rev-A data rates and puncturing patterns, the tail bits of V0 are moved to the third sequence, and those for V′1 are moved to the fourth sequence. The depuncturing patterns for the rate-⅓ code are also shown. The resulting deinterleaved/depunctured rate-⅕ and rate-⅓ subpacket structure of the operations are further detailed in
The deinterleaved symbols output by the PBRI block 806 are written into a deinterleaver buffer 712, which controls the operation of the turbo decoder 714. For each decoding iteration, the turbo decoder 714 reads forward and back LLR symbols in parallel from the deinterleaver buffer 712. Upon completion, it writes the decoded output decision bits into an output buffer 716 together with other decoding statistics about the decoded subpacket. In one embodiment, the output buffer 716 is a circular buffer than can hold up to eight decoded subpackets. The packet writer block 718 removes the CRC bits and writes the information bits of all decoded subpackets corresponding to the same packet into the external packet memory through an arbiter 510. Subpackets belonging to the same packet are abutted to each other in packet memory. Where any subpacket fails to decode, the turbo decoder 714 flushes the internal buffers (704, 708, 712, 716) in the turbo code datapath. If the output buffer 716 is full, then the turbo decoder is stalled until the packet writer 718 is granted access to the external bus to empty subpackets from the output buffer 716.
b) shows the deinterleaved/depunctured rate-⅕ and rate-⅓ packet structures 850 resulting from the operations detailed in
Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.