1. Technical Field of the Invention
The invention relates generally to communication systems; and, more particularly, it relates to communication systems employing turbo coding.
2. Description of Related Art
Data communication systems have been under continual development for many years. One such type of communication system that has been of significant interest lately is a communication system that employs iterative error correction codes. Of those, one particular type of communication system that has received interest in recent years has been one which employs turbo codes (one type of iterative error correcting code). Communications systems with iterative codes are often able to achieve lower bit error rates (BER) than alternative codes for a given signal to noise ratio (SNR).
A continual and primary directive in this area of development has been to try continually to lower the SNR required to achieve a given BER within a communication system. The ideal goal has been to try to reach Shannon's limit in a communication channel. Shannon's limit may be viewed as being the data rate to be used in a communication channel, having a particular SNR, that achieves error free transmission through the communication channel. In other words, the Shannon limit is the theoretical bound for channel capacity for a given modulation and code rate.
The use of turbo codes providing such relatively lower error rates, while operating at relatively low data throughput rates, has largely been in the context of communication systems having a large degree of noise within the communication channel and where substantially error free communication is held at the highest premium. Some of the earliest application arenas for turbo coding were space related where accurate (i.e., ideally error free) communication is often deemed an essential design criterion. The direction of development then moved towards developing terrestrial-applicable and consumer-related applications. Still, based on the heritage of space related application, the focus of effort in the turbo coding environment then continued to be achieving relatively lower error floors, and not specifically towards reaching higher throughput.
More recently, focus in the art has been towards developing turbo coding, and variants thereof, that are operable to support higher amounts of throughput while still preserving the relatively low error floors offered within the turbo code context.
In fact, as the throughput requirement in communication systems increases, parallel turbo decoding, which employs a plurality of processors and a plurality of memory banks, become necessary. Many of the current systems support a wide range of codeword sizes. Thus, efficiency and flexibility in parallel turbo decoder design is of critical importance.
Generally speaking, within the context of communication systems that employ turbo codes, there is a first communication device at one end of a communication channel with encoder capability and second communication device at the other end of the communication channel with decoder capability. In many instances, one or both of these two communication devices includes encoder and decoder capability (e.g., within a bidirectional communication system).
The present invention is directed to apparatus and methods of operation that are further described in the following Brief Description of the Several Views of the Drawings, the Detailed Description of the Invention, and the claims. Other features and advantages of the present invention will become apparent from the following detailed description of the invention made with reference to the accompanying drawings.
Many communication systems incorporate the use of a turbo code. While there are many potential applications that can employ turbo codes, means are presented herein that can be applied to the 3GPP channel code to support an arbitrary number of information bits. Some examples of the number of bits that can be supported using the various aspects of the invention presented herein are 40 to 5114 for WCDMA and HSDPA and more for LTE.
Additional information regarding the UTRA-UTRAN Long Term Evolution (LTE) and 3GPP System Architecture Evolution (SAE) can be found at the following Internet web site:
www.3gpp.org
Within the channel coding system in 3GPP LTE, there is a need and desire to supply and provide for a wide range of block sizes (i.e., turbo code block lengths). Furthermore, turbo decoding of this system generally needs to be implemented using a parallel decoding arrangement because of the very high data throughput and large block size desired. The parallel decoding requires the contention-free memory accessing (i.e., any one turbo decoder (of a group of parallel arranged turbo decoders) accesses only memory (of a group of parallel arranged memories) at any given time). Turbo coding was suggested for 3GPP LTE channel coding. For this coding system, the algebraic interleave referred to as the “almost regular permutation (ARP)” in reference [1] is considered as one of the candidates.
The goal of digital communications systems is to transmit digital data from one location, or subsystem, to another either error free or with an acceptably low error rate. As shown in
Referring to
Continuing on with the turbo decoding process and functionality, the metrics 241 that are calculated by the metric generator 204 are then provided simultaneously to a first soft-in/soft-out (SISO 0) decoder 210 and a second SISO 1 decoder 230. In the context of trellis coding (e.g., turbo trellis coded modulation (TTCM)), each of the first SISO 0 decoder 210 and the second SISO 1 decoder 230 calculates forward metrics (alphas) and backward metrics (betas), and extrinsic values according to the trellis employed.
These alphas, betas, and extrinsics are all calculated for each symbol within a frame that is to be decoded. These calculations of alphas, betas, and extrinsics are all based on the trellis.
Starting with the first SISO 0 decoder 210, after the extrinsic values 211 have been calculated, they are passed to an interleaver (π) 220 after which it is passed to the second SISO 1 decoder 230 as “a priori probability” (app) information 221. Similarly, after extrinsic values 231 have been calculated within the second SISO 1 decoder 230, they are passed to a de-interleaver (π−1) 240 after which it is passed to the first SISO 0 decoder 210 as “a priori probability” (app) information 241. It is noted that a single decoding iteration, within the iterative decoding process of the turbo decoder 200 consists of performing two SISO operations; that is to say, the iterative decoding process must pass through both the first SISO 0 decoder 210 and through the second SISO 1 decoder 230.
After a significant level of confidence has been achieved and a solution is being converged upon, or after a predetermined number of decoding iterations have been performed, then the output from the second SISO 1 decoder 230 is passed as output to an output processor 250. The operation of the SISOs 210 and 230 may generally be referred to as calculating soft symbol decisions of the symbol contained within the received symbol. These soft symbol decisions may be performed on a true bit level in certain embodiments. The output processor 250 uses these soft symbol decisions to generate best estimates 251 (e.g., hard bit and/or symbol decisions) for the information bits that have been encoded within the original turbo coded signal (e.g., generally within a turbo encoder location at another end of a communication channel into which the signal 201 was originally launched.
It is also noted that each of the interleaving performed within the interleaver (π) 220 can be performed using an embodiment of an ARP interleave, as shown by reference numeral 291. Also, there are embodiment in which the de-interleaving performed within the de-interleaver (π−1) 240 can also be performed using an embodiment of an ARP de-interleave.
Many of the embodiments presented herein employ various embodiments of the ARP (almost regular permutation) interleaves. Even more details are provided below with respect to the means by which a structure can be employed to perform both ARP interleaving and ARP de-interleaving. Before doing so, however, a regular permutation is considered for comparative analysis for the reader.
i=π(j)=Pj mod L, 0≦i,j≦L−1
L is the frame size, and gcd(P,L)=1, which then implies that π(j)≠π(j′) if j≠j′.
The implementation of the regular permutation 300 is relatively straight-forward, but the performance is not very good.
An ARP (almost regular permutation) of information block size L=CW (i.e. C is a divider of L) introduced in reference [1] is defined by
i=π(j)=jP+θ+A(j mod C)P+B(j mod C) mod L
where P is relative prime to L, θ is a constant and A(χ) and B(χ) are integer function defined on {0,1, . . . , C−1}. To insure the function defined the function is a permutation (i.e. one to one and on to), in reference [1]A(χ) and B(χ) are further restricted to
A(i)P+B(i)=C[α(i)P+β(i)],i=0, . . . , C−1
where α and β are integer functions. In this document, we call C the dithering cycle of the ARP.
As can be seen, C|L (thus gcd(C,P)=1), and therefore π(j)≠π(j′), if j≠j′.
For example, if L<2000, then C=4, otherwise a larger C is necessary.
Example: C=4
A first example of an ARP interleave is provided here:
If the inputs of the following are provided to such an ARP interleave (π),
0,1,2,3,×4,5,6,7,×8,9,10,11,×12,13,14,15,×16,17,18,19,×20,21,22,23,
then the output thereof is as follows:
0,11,22,5,×4,15,2,9,×8,19,6,13,×12,23,10,17,×16,3,14,21,×20,7,18,1.
Another example of an ARP interleave is provided here:
If the inputs of the following are provided to such an ARP interleave (π),
0,1,2,3,×4,5,6,7,×8,9,10,11,×12,13,14,15,×16,17,18,19
then the output thereof is as follows:
1,6,16,15,×13,18,8,7,×5,10,0,19,×17,2,12,11,×9,14,4,3.
There are some special cases for ARP as well.
Case 1:
A(χ)=Cα(χ), and B(χ)=Cβ(χ)
When θ=0, equations (10), (11) and (12) in reference [1].
When θ=3, C=4, [2] France Telecom, GET, “Enhancement of Rel. 6 turbo Code,” 3GPP TSG RAN WG1#43, R1-051310, 2005
When θ=3, C=4 and 8, Table 1, [3] Motorola, “A contention-free interleaver design for LTE codes,”, 3GPP TSG RAN WG1#47.
Case 2:
equations (13) in reference [1].
In addition, certain properties of ARP are also provided below:
Property 1:
χ0=χ1 mod C, which implies that π(χ0)=π(χ1) mod C.
Proof: Set χ1=χ0+kC. Then π(χ1)−π(χ0)=PkC=0 mod C.
Property 2:
Define Ψ: {0,1,. . . C−1}{0,1,. . . C−1} by ψ(μ)=π(η) mod C.
π is a permutation, which implies that Ψ is a bijection.
Proof: Assume μ0μ1ε{0,1,. . . C−1}, μ0≠μ1, but Ψ(μ1). There are L/C elements in {0,1,. . . L−1} congruent μ0 modulo C and another L/C elements congruent μ1 modulo C. So, by property 1, there are 2L/C elements μ such that π(μ) have the same congruence module C. This contradicts the hypothesis that π is a permutation.
Property 3:
Define Ψ: {0,1,. . . C−1}{0,1,. . . C−1} by ψ(μ)=π(μ)mod C.
Ψis a bijection, then π is a permutation.
Proof: Assume χ0, χ1ε{0,1, . . . L−1}, χ0<χ1, but π(χ0)=π(χ1). Let μ0=χ0 mod C and μ1=χ1 mod C. If μ0≠μ1, then π(χ0) mod C≠π(χ1) mod C since Ψ is a bijection. If μ0=μ1, then let χ1=χ0+kC for a kε{0,1,. . . , L/C−1}. So, π(χ1)−π(χ0)=PkC mod L. Since gcd (P,L)=1 and C|L, PkC mod L=0 implies k=0 or k divides L/C. By the range on k, a contradiction cannot be avoided; so, π(χ0)=π(χ1) can only occur if χ0=χ1.
Property 4:
If π(χ) is an ARP interleave (π) with period C, then π(χ) is also an ARP interleave (π) with period {tilde over (C)}=mC provided {tilde over (C)}|L.
Proof: Let π(χ)=[Pχ+A(χmod C)P+B(χmod C)+θ]mod L.
Clearly, π(χ) can also be written as follows:
π(χ)=[Pχ+Ã(χmod {tilde over (C)})P+{tilde over (B)}(χmod {tilde over (C)})+Θ] mod L, where
Ã(χmod {tilde over (C)})def=A(χmod C) and {tilde over (B)}(χmod {tilde over (C)})def=B(χmod C).
So, if {tilde over (C)}|L, then π(χ) is an ARP interleave (π) with period {tilde over (C)}.
During a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), a first portion of each window is processed, as shown by the corresponding shaded portions of each window. Then, during a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), a second portion of each window is processed, as shown by the corresponding shaded portions of each window. This continues on until during a cycle W−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), a final portion of each window is processed, as shown by the corresponding shaded portions of each window.
During each cycle, a given portion of each window is processed using one decoding processor (e.g., one turbo decoder) in a parallel implementation of a plurality of decoding processor (e.g., a plurality of turbo decoders).
In accordance with the parallel turbo decoding processing which involves employing M decoding processors, during a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), the respective first portions of each window that are processed are as follows:
1. processor 0 processes portion 0 of the information block.
2. processor 1 processes portion W of the information block.
3. processor 2 processes portion 2W of the information block.
. . .
s. processor s processes portion sW of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W of the information block.
During a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion 1 of the information block.
2. processor 1 processes portion W+1 of the information block.
3. processor 2 processes portion 2W+1 of the information block.
. . .
s. processor s processes portion sW+1 of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W+1 of the information block.
This process continues on until during a cycle W−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion W−1 of the information block.
2. processor 1 processes portion W+W−1 of the information block.
3. processor 2 processes portion W+2W−1 of the information block.
. . .
s. processor s processes portion sW+W−1 of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W+W−1 of the information block.
The index that is set at the t-th decoding cycle is as follows (as also shown by reference numeral 503):
E0={0,W, . . . , (M−1)W}, and
E1={i,W+t, . . . , (M−1)W+t}.
In accordance with the parallel turbo decoding processing which involves employing M decoding processors, during a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), the respective first portions of each window that are processed are as follows (note: these are the interleaved (π) portions):
1. processor 0 processes portion π(0) of the information block.
2. processor 1 processes portion π(W) of the information block.
3. processor 2 processes portion π(2W) of the information block.
. . .
s. processor s processes portion π(sW)of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W) of the information block.
During a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion π(1) of the information block.
2. processor 1 processes portion π(W+1) of the information block.
3. processor 2 processes portion π(2W+1) of the information block.
. . .
s. processor s processes portion π(sW+1) of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W+1) of the information block.
This process continues on until during a cycle W−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion π(W−1) of the information block.
2. processor 1 processes portion π(W+W−1) of the information block.
3. processor 2 processes portion π(2W+W−1) of the information block.
. . .
s. processor s processes portion π(sW+W−1) of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W+W−1) of the information block.
The index that is set at the t-th decoding cycle is as follows (as also shown by reference numeral 603):
Ê0={π(0),π(W), . . . ,π((M−1)W)}, and
Ê1={π(t),π(W+t), . . . ,π((M−1)W+t)}.
Memory mapping is contention-free is the following relationship holds:
i,i′εE1,i≠i′(i)≠(i′)
j,j′εÊ1,j≠j′(j)≠(j′)
It is noted that the elements in the index set of the t-th cycle should be mapped to different memory banks (e.g., different memories within a plurality of memories provisioned to service a plurality of parallel arranged turbo decoders).
This turbo decoder 700 includes a plurality of turbo decoders 721-727, a plurality of memories 741-747, and a processing module 730 that is operable to perform the memory mapping between the plurality of turbo decoders 721-727 and the plurality of memories 741-747. As can be seen, more than one turbo decoders tries to access a same memory at a given time. Specifically, turbo decoder 721 and turbo decoder 722 are trying to access memory 741, and turbo decoder 724 and turbo decoder 725 are trying to access memory 745. Also, turbo decoder 727 and another turbo decoder (included in the region as shown by the ellipsis . . . ) are trying to access memory 747. This creates conflicts and incurs deleterious performance.
This turbo decoder 800 includes a plurality of turbo decoders 821-827, a plurality of memories 841-847, and a processing module 830 that is operable to perform contention-free memory mapping between the plurality of turbo decoders 821-827 and the plurality of memories 841-847. As can be seen, only one turbo decoder accesses any one memory at a given time. This is a truly contention-free memory mapping between the plurality of turbo decoders 821-827 and the plurality of memories 841-847.
Referring to the communication system 900 of
The other device 990 to which the communication device 910 is coupled via the communication channel 999 can be another communication device 992, a storage media 994 (e.g., such as within the context of a hard disk drive (HDD)), or any other type of device that is capable to receive and/or transmit signals. In some embodiments, the communication channel 999 is a bi-directional communication channel that is operable to perform transmission of a first signal during a first time and receiving of a second signal during a second time. If desired, full duplex communication may also be employed, in which each of the communication device 910 and the device 990 can be transmitted and/or receiving from one another simultaneously.
The decoder 921 of the communication device 910 includes a turbo decoder 920, a processing module 930, and a memory 940. The processing module 930 can be coupled to the memory 940 so that the memory is operable to store operational instructions that enable to the processing module 930 to perform certain functions.
Generally speaking, the processing module 930 is operable to perform contention-free memory mapping between the turbo decoder 920 and the memory 940 during iterative decoding processing.
It is also noted that the processing module 930, as well as any other processing module described herein, can be implemented in any number of ways as described below. In one embodiment, the processing module 930 can be implemented strictly as circuitry. Alternatively, the processing module 930 can be implemented strictly in software such as can be employed within a digital signal processor (DSP) or similar type device. In even another embodiment, the processing module 930 can be implemented as a combination of hardware and software as well without departing from the scope and spirit of the invention.
In even other embodiments, the processing module 930 can be implemented using a shared processing device, individual processing devices, or a plurality of processing devices. Such a processing device may be a microprocessor, micro-controller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on operational instructions. The processing module 930 can be coupled to the memory 940 that is operable to store operational instructions that enable to processing module 930 to perform the appropriate contention-free memory mapping between the turbo decoder 920 and the memory 940.
Such a memory 940 may be a single memory device or a plurality of memory devices. Such a memory 940 may be a read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, and/or any device that stores digital information. Note that when the processing module 930 implements one or more of its functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory storing the corresponding operational instructions is embedded with the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry.
Referring to the communication system 1000 of
A communication device includes a turbo decoder that is itself composed of a plurality of turbo decoders 1121, 1122, and 1123. Such a communication device also includes a memory that is itself composed of a plurality of memories 1141, 1142, and 1143. A processing module 1130 is operable to perform contention-free memory mapping between the plurality of turbo decoders 1121, 1122, and 1123 and the plurality of memories 1141, 1142, and 1143 during iterative decoding processing of a turbo coded signal.
At any given time, the processing module 1130 is operable to ensure that only one turbo decoder accesses a given memory at a given time. For example, a processing module 1130 is operable to perform a first contention-free memory mapping at a time 1, as shown by reference numeral 1101. The processing module 1130 is operable to perform a second contention-free memory mapping at a time 2, as shown by reference numeral 1102. The processing module 1130 is operable to perform a second contention-free memory mapping at a time 3, as shown by reference numeral 1103. The processing module 1130 is operable to perform a second contention-free memory mapping at a time 4, as shown by reference numeral 1104. As can be seen, only one turbo decoder is connected to any one memory at any given time in each of these 4 diagrams.
As can be seen, the contention-free memory mapping between the turbo decoders 1121, 1122, and 1123 and the plurality of memories 1141, 1142, and 1143 changes as a function of time during iterative decoding processing of a turbo coded signal.
There is a form of memory mapping, referred to as division mapping (i.e., DIV mapping for short) that has been defined in reference [4] cited below.
According to this DIV mapping approach,
:i└i/W┘, where W is the window size of the parallel decoding architecture.
The index set at the i-th decoding cycle is as follows:
E1={i,W+i, . . . ,(M−1)W+i}, where
M is the number of processors, and C is the period of the ARP interleave (π).
Also, if M is a factor of the ratio, L/C, then the map on an ARP interleave (π) is in fact contention-free.
It is noted, however, that examples in the reference [3] and reference [4] cited below do not have this property.
Each of these embodiments 1201 and 1202 employ a plurality of memory banks 1210 that includes 3 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1201 is as follows for the natural order when performing turbo decoding processing.
E0={0,8,16}→{0,1,2}
Also during the first decoding cycle, the memory mapping as shown in embodiment 1201 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê0={0,8,16}→{0,1,2}
During a second decoding cycle, the memory mapping as shown in embodiment 1202 is as follows for the natural order when performing turbo decoding processing.
E1={1,9,17}→{0,1,2}
Also during the second decoding cycle, the memory mapping as shown in embodiment 1202 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê1={11,19,3}→{2,1,0}
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order/interleaved order) is E2={2,10,18}→{0,1,2}/Ê2={22,6,14}→{1,0,2}.
Fourth decoding cycle (natural order/interleaved order) is E3={3,11,19}→{0,1,2}/Ê3={5,13,21}→{2,1,0}.
Fifth decoding cycle (natural order/interleaved order) is E4={4,12,20}→{0,1,2}/Ê4={4,12,20}→{1,0,2}.
Sixth decoding cycle (natural order/interleaved order) is E5={5,13,21}→{0,1,2}/Ê5={15,23,7}→{0,2,1}.
Seventh decoding cycle (natural order/interleaved order) is E6={6,14,22}→{0,1,2}/Ê6={2,10,18}→{2,1,0}.
Eighth decoding cycle (natural order/interleaved order) is E7={7,15,23}→{0,1,2}/Ê7={9,17,1}→{0,2,1}.
As can be seen, the natural order and the interleaved order are both contention-free.
Continuing with another form of memory mapping, there is another form of memory mapping, referred to as modular mapping (i.e., MOD mapping for short) that has been defined in reference [1] and [2] cited below.
According to this MOD mapping approach (again, M is the number of decoding processors),
:i mod M, where
M is the number of decoding processors,
C is the period of the ARP interleave (π), and
M is a multiple of C.
This MOD mapping approach embodiment is only contention-free if gcd(W, M)=1.
Each of these embodiments 1301 and 1302 employ a plurality of memory banks 1310 that includes 4 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1301 is as follows for the natural order when performing turbo decoding processing.
E0={0,5,10,15}→{0,1,2,3}
Also during the first decoding cycle, the memory mapping as shown in embodiment 1201 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê0={1,8,10,11}→{1,0,2,3}
During a second decoding cycle, the memory mapping as shown in embodiment 1302 is as follows for the natural order when performing turbo decoding processing.
E1={1,6,11,16}→{1,2,3,0}
Also during the second decoding cycle, the memory mapping as shown in embodiment 1302 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê1={16,18,19,9}→{0,2,3,1}
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order/interleaved order) is E2={2,7,12,17}→{2,3,0,1}/E2={6,7,17,4}→{2,3,1,0}.
Fourth decoding cycle (natural order/interleaved order) is E3={3,8,13,18}→{3,0,1,2}/E3={15,5,12,14}→{3,1,0,2}.
Fifth decoding cycle (natural order/interleaved order) is E4={4,9,14,19}→{0,1,2,3}/E4={13,0,2,3}→{1,0,2,3}.
As can be seen in this embodiment, the natural order and the interleaved order are both contention-free.
However, many of the previous embodiments, particular those that employs an ARP interleave (π) within the turbo encoding and turbo decoding, there is a restriction in the number of decoding processors, M, that can be employed. For example, these previous embodiments necessitate that the number of decoding processors, M, be a factor of the length of the information block, L.
The following provides a means by which an arbitrarily selected number (M) of decoding processors can be employed for performing parallel turbo decoding processing.
In doing so, a virtual block length, L′, is judiciously chosen such that the arbitrarily selected number (M) of decoding processors can be employed in conjunction with an appropriate memory mapping that is contention-free.
It is particularly that there no longer is any requirement that M be a factor of L (i.e., M divides L with no remainder) as there is with many of the embodiments described above.
During a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), a first portion of each window is processed, as shown by the corresponding shaded portions of each window. Then, during a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), a second portion of each window is processed, as shown by the corresponding shaded portions of each window. This continues on until during a cycle W−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), a final portion of each window is processed, as shown by the corresponding shaded portions of each window.
During each cycle, a given portion of each window is processed using one decoding processor (e.g., one turbo decoder) in a parallel implementation of a plurality of decoding processor (e.g., a plurality of turbo decoders).
This embodiment differs from the previous embodiment of
In accordance with the parallel turbo decoding processing which involves employing M decoding processors, during a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), the respective first portions of each window that are processed are as follows:
1. processor 0 processes portion 0 of the information block.
2. processor 1 processes portion W′ of the information block.
3. processor 2 processes portion 2W′ of the information block.
. . .
s. processor s processes portion sW′ of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W′ of the information block.
During a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion 1 of the information block.
2. processor 1 processes portion W′+1 of the information block.
3. processor 2 processes portion 2W′+1 of the information block.
. . .
s. processor s processes portion sW′+1 of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W′+1 of the information block.
This process continues on until during a cycle W′−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion W′−1 of the information block.
2. processor 1 processes portion W′+W′−1 of the information block.
3. processor 2 processes portion W′+2W′−1 of the information block.
. . .
s. processor s processes portion sW′+W′−1 of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W′+W′−1 of the information block.
The index that is set at the t-th decoding cycle is as follows (as also shown by reference numeral 1503):
E0={sW′|sε{0,1, . . . ,M−1} and sW′<L}, and
E1={sW′+t|sε{0,0, . . . ,M−1} and sW′+t<L}.
It is also noted that certain of the processors may perform dummy decoding cycles (i.e., be idle) as shown by reference numeral 1504. There are dummy decoding cycles for all sW′+t≧L.
In accordance with the parallel turbo decoding processing which involves employing M decoding processors, during a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), the respective first portions of each window that are processed are as follows (note: these are the interleaved (π) portions):
1. processor 0 processes portion π(0) of the information block.
2. processor 1 processes portion π(W′) of the information block.
3. processor 2 processes portion π(2W′) of the information block.
. . .
s. processor s processes portion π(sW′)of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W′) of the information block.
During a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion π(1) of the information block.
2. processor 1 processes portion π(W′+1) of the information block.
3. processor 2 processes portion π(2W′+1) of the information block.
. . .
s. processor s processes portion π(sW′+1)of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W′+1) of the information block.
This process continues on until during a cycle W′−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion π(W′−1) of the information block.
2. processor 1 processes portion π(W′+W′−1) of the information block.
3. processor 2 processes portion π(2W′+W′−1) of the information block.
. . .
s. processor s processes portion π(sW′+W′−1)of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W′+W′−1) of the information block.
The index that is set at the t-th decoding cycle is as follows (as also shown by reference numeral 1603):
Ê0={π(sW′)|sε{0,1, . . . ,M−1} and sW′<L}, and
Ê1={π(sW′+t)|sε{0,1, . . . ,M−1} and sW′+t<L}.
It is also noted that certain of the processors may perform dummy decoding cycles (i.e., be idle) as shown by reference numeral 1604. There are dummy decoding cycles for all sW′+t≧L.
The following approach (Method 1) is then applied to the following example (MOD mapping).
Step 1: For a given M, choose a {tilde over (C)} such that π(χ) is an ARP interleave (π) with period {tilde over (C)} and M≧{tilde over (C)}. (e.g., see property 4 described above with respect to an ARP interleave (π)). Then, {tilde over (C)} is the number of memory banks (e.g., memories) to be used, and is preferably selected as being small for better efficiency.
Step 2: Find a W′ such that
and gcd(W′,{tilde over (C)})=1. Set L′=MW′.
W′ is the number of decoding cycles (e.g., decoding iterations) that is required in each phase (e.g., in natural order phase of decoding and interleaved phase of decoding), and is preferably selected as being small for better efficiency.
Typically,
can be selected as being relatively small.
Step 3: Use {tilde over (C)} memory banks (e.g., memories) and memory mapping as follows: (χ)=MOD(χ)=χmod {tilde over (C)}.
Step 3: In cycle t, decoding processor s processes bit position sW′+t in the natural order phase of decoding, and π(sW′+t) in the interleaved order phase of decoding, provided sW′+t<L, otherwise, decoding processor s does nothing (i.e., remains idle).
Proof of validity of Method 1:
In the natural order phase of decoding, if (s0W′+t)=(s1W′+t), then s0W′≡s1W′mod {tilde over (C)}. But s0 and s1 are less than M≦{tilde over (C)} and gcd(W′,{tilde over (C)})=1. So, (s0W′+t)=(s1W′+t) s0=s1.
In the interleaved order phase of decoding, if (π(s0W′+t))=(π(s1W′+t)), then π(s0W′+t)≡π(s1W′+t) mod {tilde over (C)}. By property 2 of an ARP interleave (π) with period {tilde over (C)} (Property 2 is described above), then π(s0W′+t)≡π(s1W′+t) mod {tilde over (C)}, which then implies (s0W′+t)≡(s1W′+t) mod {tilde over (C)}.
So, once again, the following relationship holds: s0=s1.
The previous approach (Method 1) is then applied to the following example. Initially, these parameters are provided to the designer: L=24, C=4, P=7.
Step 1: Select the number decoding processors; assume that one wants to use 5 processors, e.g., M=5.
Step 2: choose the scheduled ARP period, {tilde over (C)}. Choose {tilde over (C)}=8 for this example. It is noted that, generally, the chosen value for the scheduled ARP period, {tilde over (C)}, is an integer multiple of C.
Step 3: Let the virtual window size, W′, be as follows: W′=5L′=25. This corresponds to a virtual block length of 25.
Step 4: Use 8 memory banks (e.g., memories), and set the memory mapping as follows: (χ)=χmod 8.
Each of these embodiments 1701 and 1702 employ a plurality of memory banks 1710 that includes 8 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1701 is as follows for the natural order when performing turbo decoding processing.
E0={0,5,10,15,20}→{0,5,2,7,4}
Also during the first decoding cycle, the memory mapping as shown in embodiment 1701 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê0={0,1,5,6,17,20}→{0,7,6,1,4}
During a second decoding cycle, the memory mapping as shown in embodiment 1702 is as follows for the natural order when performing turbo decoding processing.
E1={1,6,11,16,21}→{1,6,3,0,5}
Also during the second decoding cycle, the memory mapping as shown in embodiment 1702 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê1={11,2,13,16,7}→{3,2,5,0,7}
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order/interleaved order) is E2={2,7,12,17,22}→{2,7,4,1,6}/Ê2={22,9,12,3,18}→{6,1,4,3,2}.
Fourth decoding cycle (natural order/interleaved order) is E3={3,8,13,18,23}→{3,0,5,2,7}/Ê3={5,8,23,14,1}→{5,0,7,6,1}.
Fifth decoding cycle (natural order/interleaved order) is E4={4,9,14,19}→{4,1,6,3}/Ê4={4,19,10,21}→{4,3,2,5}.
As can be seen, the natural order and the interleaved order are both contention-free.
In accordance with Method 1 as provided above, there are some additional properties for finding W′.
One of which is: if {tilde over (C)}=pm with p prime, then for any integer W, either gcd(W,{tilde over (C)})=1 or gcd(W+1, {tilde over (C)})=1. This can be proven by the following: if p divides W, then p does not divide W+1.
One of which is: if {tilde over (C)}=2m3n, then for any integer W, there exists a εε{0,1,2,3} such that gcd(W+ε,{tilde over (C)})=1. This can be proven using the following:
Some advantages provided by various aspects of the invention include the ability to remove the restriction on M, which is the number of decoding processors (e.g., parallel arranged turbo decoders). Generally speaking, a designer has the opportunity to select whatever number of decoding processors is desired in a particular turbo coding application that employs an embodiment of an ARP interleaver (π).
In addition, because of this ability to perform the selection of number of decoding processors (e.g., parallel arranged turbo decoders), a designer has a great deal of flexibility in terms of optimizing various design criteria, including area, power, latency, among other design considerations.
This embodiment illustrates the typical arrangement (except that the memory mapping will change for different decoding cycles) for all of the decoding cycles except for dummy decoding cycles.
This embodiment 1801 of turbo 1800 includes a plurality of turbo decoders 1821-1827, a plurality of memories 1841-1847, and a processing module 1830 that is operable to perform contention-free memory mapping between the plurality of turbo decoders 1821-1827 and the plurality of memories 1841-1847. As can be seen, only one turbo decoder accesses any one memory at a given time. This is a truly contention-free memory mapping between the plurality of turbo decoders 1821-1827 and the plurality of memories 1841-1847.
As can be seen, each turbo decoder within the plurality of turbo decoders 1821-1827 is operable within these decoding cycles. Again, the particular memory mapping depicted in this embodiment 1801 corresponds to one particular memory mapping, and the memory mapping will change for various decoding cycles.
This embodiment illustrates the arrangement (again, except that the memory mapping will change for different decoding cycles) for all of the decoding cycles that involve performing some dummy decoding cycles for at least some of the turbo decoders within the plurality of turbo decoders 1821-1827.
This embodiment 1802 of turbo 1800 includes a plurality of turbo decoders 1821-1827, a plurality of memories 1841-1847, and a processing module 1830 that is operable to perform contention-free memory mapping between the plurality of turbo decoders 1821-1827 and the plurality of memories 1841-1847. As can be seen, only one turbo decoder accesses any one memory at a given time. This is a truly contention-free memory mapping between the plurality of turbo decoders 1821-1827 and the plurality of memories 1841-1847.
As can also be seen, a first subset of turbo decoders within the plurality of turbo decoders 1821-1827 is operable within these decoding cycles (i.e., turbo decoders 1821-1825), and a second subset of turbo decoders within the plurality of turbo decoders 1821-1827 is not operable within these decoding cycles (i.e., turbo decoders 1826-1827). These turbo decoders 1826-1827 perform dummy decoding processing, as shown by reference numeral 1804, in which the turbo decoders 1826-1827 are idle during these decoding cycles. Again, the particular memory mapping depicted in this embodiment 1802 corresponds to one particular memory mapping, and the memory mapping will change for various decoding cycles.
As shown in a block 1910, the method 1900 begins by selecting number of decoding processors, M, to be employed for parallel implemented turbo decoding processing. The method 1900 continues by selecting a scheduled period, C (tilde) (i.e., {tilde over (C)}), so that ARP interleave (π) is a period thereof, as shown in a block 1920. The method 1900 continues by selecting window size, W′, such that W′ and C (tilde) (i.e., {tilde over (C)}) are relative prime, as shown in a block 1930. The method 1900 continues by determining virtual block length, L′, based on selected window size, W′, as shown in a block 1940. The method 1900 continues by performing contention-free memory mapping, as shown in a block 1950. The method 1900 continues by implementing parallel turbo decoding processing architecture, as shown in a block 1960. If desired in some embodiments, the method 1900 can continue by turbo decoding an encoded block using parallel turbo decoding processing thereby generating best estimates of information bits encoded therein (including performing any dummy cycles, when and if necessary), as shown in a block 1970.
The method 2000 continues by performing two operations during a second decoding cycle. The method 2000 continues by during a second decoding cycle, turbo decoding the encoded block using a first subset of decoding processors of the plurality of decoding processors in accordance with parallel turbo decoding processing, as shown in a block 2041, and the method 2000 continues by during the second decoding cycle, performing dummy decoding cycles using a second subset of decoding processors of the plurality of decoding processors in accordance with parallel turbo decoding processing, as shown in a block 2042. The method 2000 then continues by generating best estimates of information bits encoded within the encoded block of the turbo coded signal, as shown in a block 2050.
The present invention has also been described above with the aid of method steps illustrating the performance of specified functions and relationships thereof. The boundaries and sequence of these functional building blocks and method steps have been arbitrarily defined herein for convenience of description. Alternate boundaries and sequences can be defined so long as the specified functions and relationships are appropriately performed. Any such alternate boundaries or sequences are thus within the scope and spirit of the claimed invention.
The present invention has been described above with the aid of functional building blocks illustrating the performance of certain significant functions. The boundaries of these functional building blocks have been arbitrarily defined for convenience of description. Alternate boundaries could be defined as long as the certain significant functions are appropriately performed. Similarly, flow diagram blocks may also have been arbitrarily defined herein to illustrate certain significant functionality. To the extent used, the flow diagram block boundaries and sequence could have been defined otherwise and still perform the certain significant functionality. Such alternate definitions of both functional building blocks and flow diagram blocks and sequences are thus within the scope and spirit of the claimed invention.
One of average skill in the art will also recognize that the functional building blocks, and other illustrative blocks, modules and components herein, can be implemented as illustrated or by discrete components, application specific integrated. circuits, processors executing appropriate software and the like or any combination thereof.
Moreover, although described in detail for purposes of clarity and understanding by way of the aforementioned embodiments, the present invention is not limited to such embodiments. It will be obvious to one of average skill in the art that various changes and modifications may be practiced within the spirit and scope of the invention, as limited only by the scope of the appended claims.
[1] C. Berrou, Y. Saouter, C. Douillard, S. Kerouedan, and M. Jezequel, “Designing good permutations for turbo codes: towards a single model,” 2004 IEEE International Conference on Communications (ICC), Vol.: 1, pp: 341-345, 20-24 Jun. 2004.
[2] France Telecom, GET, “Enhancement of Rel. 6 turbo Code,” 3GPP TSG RAN WG1 #43, R1-051310, 2005.
[3] Motorola, “A contention-free interleaver design for LTE codes,” 3GPP TSG RAN WG1 #47.
[4] A. Nimbalker, T. E. Fuja, D. J. Costello, Jr. T. K. Blankenship and B. Classon, “Contention-Free Interleavers,” IEEE ISIT 2004, Chicago, USA, Jun. 27-Jul. 2, 2004.
The present U.S. Utility Patent Application claims priority pursuant to 35 U.S.C. §119(e) to the following U.S. Provisional Patent Applications which are hereby incorporated herein by reference in their entirety and made part of the present U.S. Utility Patent Application for all purposes: 1. U.S. Provisional Application Ser. No. 60/850,492, entitled “General and algebraic-constructed contention-free memory mapping for parallel turbo decoding with algebraic interleave ARP (almost regular permutation) of all possible sizes,” filed Oct. 10, 2006, pending. 2. U.S. Provisional Application Ser. No. 60/872,367, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and inverse thereof as de-interleave,” filed Dec. 1, 2006, pending. 3. U.S. Provisional Application Ser. No. 60/872,716, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and arbitrary number of decoding processors,” filed Dec. 4, 2006, pending. 4. U.S. Provisional Application Ser. No. 60/861,832, entitled “Reduced complexity ARP (almost regular permutation) interleaves providing flexible granularity and parallelism adaptable to any possible turbo code block size,” filed Nov. 29, 2006, pending. 5. U.S. Provisional Application Ser. No. 60/879,301, entitled “Address generation for contention-free memory mappings of turbo codes with ARP (almost regular permutation) interleaves,” filed Jan. 8, 2007, pending. The following U.S. Utility Patent Applications are hereby incorporated herein by reference in their entirety and made part of the present U.S. Utility Patent Application for all purposes: 1. U.S. Utility application Ser. No. 11/704,068, entitled “General and algebraic-constructed contention-free memory mapping for parallel turbo decoding with algebraic interleave ARP (almost regular permutation) of all possible sizes,” filed Feb. 8, 2007, pending. 2. U.S. Utility application Ser. No. 11/657,819, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and inverse thereof as de-interleave,” filed Jan. 25, 2007, pending. 3. U.S. Utility application Ser. No. 11/811,013, entitled “Reduced complexity ARP (almost regular permutation) interleaves providing flexible granularity and parallelism adaptable to any possible turbo code block size,” filed concurrently on Jun. 7, 2007, pending. 4. U.S. Utility application Ser. No. 11/810,989, entitled “Address generation for contention-free memory mappings of turbo codes with ARP (almost regular permutation) interleaves,” filed concurrently on Jun. 7, 2007, pending.
Number | Name | Date | Kind |
---|---|---|---|
5406570 | Berrou et al. | Apr 1995 | A |
5446747 | Berrou | Aug 1995 | A |
5563897 | Pyndiah et al. | Oct 1996 | A |
5721745 | Hladik et al. | Feb 1998 | A |
5734962 | Hladik et al. | Mar 1998 | A |
6023783 | Divsalar et al. | Feb 2000 | A |
6065147 | Pyndiah et al. | May 2000 | A |
6119264 | Berrou et al. | Sep 2000 | A |
6122763 | Pyndiah et al. | Sep 2000 | A |
6392572 | Shiu et al. | May 2002 | B1 |
6580767 | Koehler et al. | Jun 2003 | B1 |
6594792 | Hladik et al. | Jul 2003 | B1 |
6678843 | Giulietti et al. | Jan 2004 | B2 |
6715120 | Hladik et al. | Mar 2004 | B1 |
6775800 | Edmonston et al. | Aug 2004 | B2 |
6789218 | Edmonston et al. | Sep 2004 | B1 |
6983412 | Fukumasa | Jan 2006 | B2 |
7113554 | Dielissen et al. | Sep 2006 | B2 |
7180843 | Furuta et al. | Feb 2007 | B2 |
7281198 | Yagihashi | Oct 2007 | B2 |
7302621 | Edmonston et al. | Nov 2007 | B2 |
7530011 | Obuchii et al. | May 2009 | B2 |
7720017 | Khan | May 2010 | B2 |
20060101319 | Park et al. | May 2006 | A1 |
Number | Date | Country |
---|---|---|
0 735 696 | Oct 1996 | EP |
0 735 696 | Jan 1999 | EP |
91 05278 | Oct 1992 | FR |
10-2004-0034607 | Apr 2004 | KR |
02093755 | Nov 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20080104482 A1 | May 2008 | US |
Number | Date | Country | |
---|---|---|---|
60850492 | Oct 2006 | US | |
60861832 | Nov 2006 | US | |
60872367 | Dec 2006 | US | |
60872716 | Dec 2006 | US | |
60879301 | Jan 2007 | US |