1. Technical Field of the Invention
The invention relates generally to communication systems; and, more particularly, it relates to communication systems employing turbo coding.
2. Description of Related Art
Data communication systems have been under continual development for many years. One such type of communication system that has been of significant interest lately is a communication system that employs iterative error correction codes. Of those, one particular type of communication system that has received interest in recent years has been one which employs turbo codes (one type of iterative error correcting code). Communications systems with iterative codes are often able to achieve lower bit error rates (BER) than alternative codes for a given signal to noise ratio (SNR).
A continual and primary directive in this area of development has been to try continually to lower the SNR required to achieve a given BER within a communication system. The ideal goal has been to try to reach Shannon's limit in a communication channel. Shannon's limit may be viewed as being the data rate to be used in a communication channel, having a particular SNR, that achieves error free transmission through the communication channel. In other words, the Shannon limit is the theoretical bound for channel capacity for a given modulation and code rate.
The use of turbo codes providing such relatively lower error rates, while operating at relatively low data throughput rates, has largely been in the context of communication systems having a large degree of noise within the communication channel and where substantially error free communication is held at the highest premium. Some of the earliest application arenas for turbo coding were space related where accurate (i.e., ideally error free) communication is often deemed an essential design criterion. The direction of development then moved towards developing terrestrial-applicable and consumer-related applications. Still, based on the heritage of space related application, the focus of effort in the turbo coding environment then continued to be achieving relatively lower error floors, and not specifically towards reaching higher throughput.
More recently, focus in the art has been towards developing turbo coding, and variants thereof, that are operable to support higher amounts of throughput while still preserving the relatively low error floors offered within the turbo code context.
In fact, as the throughput requirement in communication systems increases, parallel turbo decoding, which employs a plurality of processors and a plurality of memory banks, become necessary. Many of the current systems support a wide range of codeword sizes. Thus, efficiency and flexibility in parallel turbo decoder design is of critical importance.
Generally speaking, within the context of communication systems that employ turbo codes, there is a first communication device at one end of a communication channel with encoder capability and second communication device at the other end of the communication channel with decoder capability. In many instances, one or both of these two communication devices includes encoder and decoder capability (e.g., within a bi-directional communication system).
The present invention is directed to apparatus and methods of operation that are further described in the following Brief Description of the Several Views of the Drawings, the Detailed Description of the Invention, and the claims. Other features and advantages of the present invention will become apparent from the following detailed description of the invention made with reference to the accompanying drawings.
Many communication systems incorporate the use of a turbo code. While there are many potential applications that can employ turbo codes, means are presented herein that can be applied to the 3GPP channel code to support an arbitrary number of information bits. Some examples of the number of bits that can be supported using the various aspects of the invention presented herein are 40 to 5114 for WCDMA and HSDPA and more for LTE.
Additional information regarding the UTRA-UTRAN Long Term Evolution (LTE) and 3GPP System Architecture Evolution (SAE) can be found at the following Internet web site:
www.3gpp.org
Within the channel coding system in 3GPP LTE, there is a need and desire to supply and provide for a wide range of block sizes (i.e., turbo code block lengths). Furthermore, turbo decoding of this system generally needs to be implemented using a parallel decoding arrangement because of the very high data throughput and large block size desired. The parallel decoding requires the contention-free memory accessing (i.e., any one turbo decoder (of a group of parallel arranged turbo decoders) accesses only one memory (of a group of parallel arranged memories) at any given time). Turbo coding was suggested for 3GPP LTE channel coding. For this coding system, the algebraic interleave referred to as the “almost regular permutation (ARP)” in reference [1] is considered as one of the candidates.
The goal of digital communications systems is to transmit digital data from one location, or subsystem, to another either error free or with an acceptably low error rate. As shown in
Referring to
Continuing on with the turbo decoding process and functionality, the metrics 241 that are calculated by the metric generator 204 are then provided simultaneously to a first soft-in/soft-out (SISO 0) decoder 210 and a second SISO 1 decoder 230. In the context of trellis coding (e.g., turbo trellis coded modulation (TTCM)), each of the first SISO 0 decoder 210 and the second SISO 1 decoder 230 calculates forward metrics (alphas) and backward metrics (betas), and extrinsic values according to the trellis employed.
These alphas, betas, and extrinsics are all calculated for each symbol within a frame that is to be decoded. These calculations of alphas, betas, and extrinsics are all based on the trellis.
Starting with the first SISO 0 decoder 210, after the extrinsic values 211 have been calculated, they are passed to an interleaver (π) 220 after which it is passed to the second SISO 1 decoder 230 as “a priori probability” (app) information 221. Similarly, after extrinsic values 231 have been calculated within the second SISO 1 decoder 230, they are passed to a de-interleaver (π−1) 240 after which it is passed to the first SISO 0 decoder 210 as “a priori probability” (app) information 241. It is noted that a single decoding iteration, within the iterative decoding process of the turbo decoder 200 consists of performing two SISO operations; that is to say, the iterative decoding process must pass through both the first SISO 0 decoder 210 and through the second SISO 1 decoder 230.
After a significant level of confidence has been achieved and a solution is being converged upon, or after a predetermined number of decoding iterations have been performed, then the output from the second SISO 1 decoder 230 is passed as output to an output processor 250. The operation of the SISOs 210 and 230 may generally be referred to as calculating soft symbol decisions of the symbol contained within the received symbol. These soft symbol decisions may be performed on a true bit level in certain embodiments. The output processor 250 uses these soft symbol decisions to generate best estimates 251 (e.g., hard bit and/or symbol decisions) for the information bits that have been encoded within the original turbo coded signal (e.g., generally within a turbo encoder location at another end of a communication channel into which the signal 201 was originally launched.
It is also noted that each of the interleaving performed within the interleaver (π) 220 can be performed using an embodiment of an ARP interleave, as shown by reference numeral 291. Also, there are embodiments in which the de-interleaving performed within the de-interleaver (π−1) 240 can also be performed using an embodiment of an ARP de-interleave.
Many of the embodiments presented herein employ various embodiments of the ARP (almost regular permutation) interleaves. Even more details are provided below with respect to the means by which a structure can be employed to perform both ARP interleaving and ARP de-interleaving. Before doing so, however, a regular permutation is considered for comparative analysis for the reader.
In one embodiment, as depicted by reference numeral 312, when performing the natural order phase decoding (e.g., SISO 0 decoding operations), the accessing of memory entries is performed when the select signal 303 indicates an even phase to the MUX 306. Also, when performing the interleaved (π) order phase decoding (e.g., SISO 1 decoding operations), the accessing of memory entries is performed when the select signal 303 indicates an odd phase to the MUX 306.
During a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), a first portion of each window is processed, as shown by the corresponding shaded portions of each window. Then, during a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), a second portion of each window is processed, as shown by the corresponding shaded portions of each window. This continues on until during a cycle W−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), a final portion of each window is processed, as shown by the corresponding shaded portions of each window.
During each cycle, a given portion of each window is processed using one decoding processor (e.g., one turbo decoder) in a parallel implementation of a plurality of decoding processor (e.g., a plurality of turbo decoders).
In accordance with the parallel turbo decoding processing which involves employing M decoding processors, during a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), the respective first portions of each window that are processed are as follows:
1. processor 0 processes portion 0 of the information block.
2. processor 1 processes portion W of the information block.
3. processor 2 processes portion 2W of the information block.
. . .
s. processor s processes portion 2W of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W of the information block.
During a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion 1 of the information block.
2. processor 1 processes portion W+1 of the information block.
3. processor 2 processes portion 2W+1 of the information block.
. . .
s. processor s processes portion sW+1 of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W+1 of the information block.
This process continues on until during a cycle W−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion W−1 of the information block.
2. processor 1 processes portion W+W−1 of the information block.
3. processor 2 processes portion W+2W−1 of the information block.
. . .
s. processor s processes portion sW+W−1 of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion (M−1)W+−1 of the information block.
The index that is set at the t-th decoding cycle is as follows (as also shown by reference numeral 503):
E0={0,W, . . . , (M−1)W}, and
Et={t,W+t, . . . , (M−1)W+t}.
In accordance with the parallel turbo decoding processing which involves employing M decoding processors, during a cycle 0 (i.e., a first decoding iteration within the iterative decoding processing of turbo decoding), the respective first portions of each window that are processed are as follows (note: these are the interleaved (π) portions):
1. processor 0 processes portion π(0) of the information block.
2. processor 1 processes portion π(W) of the information block.
3. processor 2 processes portion π(2W) of the information block.
. . .
s. processor s processes portion π(sW) of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W) of the information block.
During a cycle 1 (i.e., a second decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion π(1) of the information block.
2. processor 1 processes portion π(W+1) of the information block.
3. processor 2 processes portion π(2W+1) of the information block.
. . .
s. processor s processes portion π(sW+1) of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W+1) of the information block.
This process continues on until during a cycle W−1 (i.e., a final decoding iteration within the iterative decoding processing of turbo decoding), the respective second portions of each window that are processed are as follows:
1. processor 0 processes portion π(W−1) of the information block.
2. processor 1 processes portion π(W+W−1) of the information block.
3. processor 2 processes portion π(2W+W−1) of the information block.
. . .
s. processor s processes portion π(sW+W−1) of the information block (s is an integer).
. . .
M−1. processor M−1 processes portion π((M−1)W+W−1) of the information block.
The index that is set at the t-th decoding cycle is as follows (as also shown by reference numeral 603):
Ê0={π(0),π(W), . . . , π((M−1)W)}, and
Êt={π(t),π(W+t), . . . , π((M−1)W+t)}.
Memory mapping is contention-free is the following relationship holds:
i,i′εEt,i≠i′(i)≠(i′)
j,j′,j≠j′εÊt(j)≠(j′)
It is noted that the elements in the index set of the t-th cycle should be mapped to different memory banks (e.g., different memories within a plurality of memories provisioned to service a plurality of parallel arranged turbo decoders).
In the natural-order phase, the first sub-block begins with data location 0, 1, and up to W−1. The second sub-block begins with data location W, W+1, and continues up to 2W−1. The third sub-block begins with data location 2W, 2W+1, and continues up to 3W−1. The third sub-block begins with data location 2W, 2W+1, and continues up to 3W−1. The fourth sub-block begins with data location 3W, 3W+1, and continues up to 4W−1.
In cycle 0, the first data of the first sub-block (i.e., the data in location 0) is stored in the first location of memory 742.
In cycle 0, the first data of the second sub-block (i.e., the data in location W) is stored in the first location of memory 743.
In cycle 0, the first data of the third sub-block (i.e., the data in location 2W) is stored in the first location of memory 741.
In cycle 0, the first data of the fourth sub-block (i.e., the data in location 3W) is stored in the first location of memory 744.
In cycle 1, the second data of the first sub-block (i.e., the data in location 1) is stored in the second location of memory 741.
In cycle 1, the second data of the second sub-block (i.e., the data in location W+1) is stored in the second location of memory 744.
In cycle 1, the second data of the third sub-block (i.e., the data in location 2W+1) is stored in the second location of memory 742.
In cycle 1, the second data of the fourth sub-block (i.e., the data in location 3W+1) is stored in the second location of memory 743.
This process continues until all data elements of the encoded block have been stored into corresponding memory locations within each of the 4 memory banks. The placement of into which memory bank a particular data element is to be stored depends on the mapping employed.
After undergoing interleaving (π), the interleaved encoded block also is shown as being partitioned into a plurality of sub-blocks. This particular encoded block includes 4W data locations.
In the interleaved-order phase, the first sub-block begins with data location π(0), π(1), and up to π(W−1). The second sub-block begins with data location π(W), π(W+1), and continues up to π(2W−1). The third sub-block begins with data location π(2W), π(2W+1), and continues up to π(3W−1). The third sub-block begins with data location π(2W), π(2W+1), and continues up to π(3W−1). The fourth sub-block begins with data location π(3W), π(3W+1), and continues up to π(4W−1).
In cycle 0, the first data of the first sub-block (i.e., the data in location π(0)) is stored in the a location of memory 741 as governed by the particular interleave and mapping employed.
In cycle 0, the first data of the second sub-block (i.e., the data in location π(W)) is stored in a location of memory 744 as governed by the particular interleave and mapping employed.
In cycle 0, the first data of the third sub-block (i.e., the data in location π(2W)) is stored in a location of memory 743 as governed by the particular interleave and mapping employed.
In cycle 0, the first data of the fourth sub-block (i.e., the data in location π(3W)) is stored in the a of memory 742 as governed by the particular interleave and mapping employed.
In cycle 1, the second data of the first sub-block (i.e., the data in location π(1)) is stored in a location of memory 742 as governed by the particular interleave and mapping employed.
In cycle 1, the second data of the second sub-block (i.e., the data in location π(W+1)) is stored in a location of memory 741 as governed by the particular interleave and mapping employed.
In cycle 1, the second data of the third sub-block (i.e., the data in location π(2W+1)) is stored in a location of memory 742 as governed by the particular interleave and mapping employed.
In cycle 1, the second data of the fourth sub-block (i.e., the data in location π(3W+1)) is stored in a location of memory 744 as governed by the particular interleave and mapping employed.
This process continues until all data elements of the elements of the interleaved encoded block have been stored into corresponding memory locations within each of the 4 memory banks. The placement of into which memory bank a particular data element is to be stored depends on the mapping employed.
Note that this mapping is not contention-free since in cycle 1 of the interleaved-order phase, the second data of the first sub-block (i.e., the data in location {pi}(1)) and the second data of the third sub-block (i.e., the data in location {pi}(2W+1)) both map to the same memory 742.
This turbo decoder 800 includes a plurality of turbo decoders 821-827, a plurality of memories 841-847, and a processing module 830 that is operable to perform the memory mapping between the plurality of turbo decoders 821-827 and the plurality of memories 841-847. As can be seen, more than one turbo decoders tries to access a same memory at a given time. Specifically, turbo decoder 821 and turbo decoder 822 are trying to access memory 841, and turbo decoder 824 and turbo decoder 825 are trying to access memory 845. Also, turbo decoder 827 and another turbo decoder (included in the region as shown by the ellipsis . . . ) are trying to access memory 847. This creates conflicts and incurs deleterious performance.
This turbo decoder 900 includes a plurality of turbo decoders 921-927, a plurality of memories 941-947, and a processing module 930 that is operable to perform contention-free memory mapping between the plurality of turbo decoders 921-927 and the plurality of memories 941-947. As can be seen, only one turbo decoder accesses any one memory at a given time. This is a truly contention-free memory mapping between the plurality of turbo decoders 921-927 and the plurality of memories 941-947.
The other device 1090 to which the communication device 1010 is coupled via the communication channel 1099 can be another communication device 1092, a storage media 1094 (e.g., such as within the context of a hard disk drive (HDD)), or any other type of device that is capable to receive and/or transmit signals. In some embodiments, the communication channel 1099 is a bi-directional communication channel that is operable to perform transmission of a first signal during a first time and receiving of a second signal during a second time. If desired, full duplex communication may also be employed, in which each of the communication device 1010 and the device 1090 can be transmitted and/or receiving from one another simultaneously.
The decoder 1021 of the communication device 1010 includes a turbo decoder 1020, a processing module 1030, and a memory 1040. The processing module 1030 can be coupled to the memory 1040 so that the memory is operable to store operational instructions that enable to the processing module 1030 to perform certain functions.
Generally speaking, the processing module 1030 is operable to perform contention-free memory mapping between the turbo decoder 1020 and the memory 1040 during iterative decoding processing.
It is also noted that the processing module 1030, as well as any other processing module described herein, can be implemented in any number of ways as described below. In one embodiment, the processing module 1030 can be implemented strictly as circuitry. Alternatively, the processing module 1030 can be implemented strictly in software such as can be employed within a digital signal processor (DSP) or similar type device. In even another embodiment, the processing module 1030 can be implemented as a combination of hardware and software as well without departing from the scope and spirit of the invention.
In even other embodiments, the processing module 1030 can be implemented using a shared processing device, individual processing devices, or a plurality of processing devices. Such a processing device may be a microprocessor, micro-controller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on operational instructions. The processing module 1030 can be coupled to the memory 1040 that is operable to store operational instructions that enable to processing module 1030 to perform the appropriate contention-free memory mapping between the turbo decoder 1020 and the memory 1040.
Such a memory 1040 may be a single memory device or a plurality of memory devices. Such a memory 1040 may be a read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, and/or any device that stores digital information. Note that when the processing module 1030 implements one or more of its functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory storing the corresponding operational instructions is embedded with the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry.
Referring to the communication system 1100 of
A communication device includes a turbo decoder that is itself composed of a plurality of turbo decoders 1221, 1222, and 1223. Such a communication device also includes a memory that is itself composed of a plurality of memories 1241, 1242, and 1243. A processing module 1230 is operable to perform contention-free memory mapping between the plurality of turbo decoders 1221, 1222, and 1223 and the plurality of memories 1241, 1242, and 1243 during iterative decoding processing of a turbo coded signal.
At any given time, the processing module 1230 is operable to ensure that only one turbo decoder accesses a given memory at a given time. For example, a processing module 1230 is operable to perform a first contention-free memory mapping at a time 1, as shown by reference numeral 1201. The processing module 1230 is operable to perform a second contention-free memory mapping at a time 2, as shown by reference numeral 1202. The processing module 1230 is operable to perform a third contention-free memory mapping at a time 3, as shown by reference numeral 1203. The processing module 1230 is operable to perform a fourth contention-free memory mapping at a time 4, as shown by reference numeral 1204. As can be seen, only one turbo decoder is connected to any one memory at any given time in each of these 4 diagrams.
As can be seen, the contention-free memory mapping between the turbo decoders 1221, 1222, and 1223 and the plurality of memories 1241, 1242, and 1243 changes as a function of time during iterative decoding processing of a turbo coded signal.
Each of these embodiments 1301 and 1302 employ a plurality of memory banks 1310 that includes 4 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1301 is as follows for the natural order when performing turbo decoding processing.
E0={0,6,12,18}→{0,2,1,3}
Also during the first decoding cycle, the memory mapping as shown in embodiment 1301 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê0={0,2,12,14}→{0,2,1,3}
During a second decoding cycle, the memory mapping as shown in embodiment 1302 is as follows for the natural order when performing turbo decoding processing.
E1={1,7,13,19}→{1,3,2,0}
Also during the second decoding cycle, the memory mapping as shown in embodiment 1302 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê1={11,9,23,21}→{3,1,0,2}
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order/interleaved order) is E2={2,8,14,20}→{2,0,3,1}/Ê2={22,8,10,20}→{3,0,2,1}.
Fourth decoding cycle (natural order/interleaved order) is E3={3,9,15,21}→{3,1,0,2}/Ê3={5,19,17,7}→{1,0,2,3}.
Fifth decoding cycle (natural order/interleaved order) is E4={4,10,16,22}→{0,2,1,3}/Ê4={4,6,16,18}→{0,2,1,3}.
Sixth decoding cycle (natural order/interleaved order) is E5={5,11,17,23}→{1,3,2,0}/Ê5={15,13,3,1}→{0,2,3,1}.
As can be seen, the natural order and the interleaved order are both contention-free.
i=π(j)=Pj mod L, 0≦i,j≦L−1
L is the frame size, and gcd(P,L)=1, which then implies that π(j)≠π(j′) if j≠j′.
The implementation of the regular permutation 1400 is relatively straight-forward, but the performance is not very good.
An ARP (almost regular permutation) of information block size L=CW (i.e. C is a divider of L) introduced in reference [1] is defined by
i=π(j)=jP+θ+A(j mod C)P+B(j mod C)mod L
where P is relative prime to L, θ is a constant and A(x) and B(x) are integer function defined on {0, 1, . . . , C−1}. To insure the function defined the function is a permutation (i.e. one to one and on to), in reference [1] A(x) and B(x) are further restricted to
A(i)P+B(i)=C[α(i)P+β(i)],i=0, . . . , C−1
where α and β are integer functions. In this document, we call C the dithering cycle of the ARP.
As can be seen, C|L (thus gcd(C,P)=1), and therefore π(j)≠π(j′), if j≠j′.
Example: C=4
A first example of an ARP interleave is provided here:
which indicates that
If the inputs of the following are provided to such an ARP interleave (π),
0, 1, 2, 3,×4, 5, 6, 7,×8, 9, 10, 11,×12, 13, 14, 15,×16, 17, 18, 19,×20, 21, 22, 23,
then the output thereof is as follows:
0, 11, 22, 5,×4, 15, 2, 9,×8, 19, 6, 13,×12, 23, 10, 17,×16, 3, 14, 21,×20, 7, 18, 1.
Another example of an ARP interleave is provided here:
If the inputs of the following are provided to such an ARP interleave (π), 0, 1, 2, 3,×4, 5, 6, 7,×8, 9, 10, 11,×12, 13, 14, 15,×16, 17, 18, 19
then the output thereof is as follows:
1, 6, 16, 15,×13, 18, 8, 7,×5, 10, 0, 19,×17, 2, 12, 11,×9, 14, 4, 3.
There are some special cases for ARP as well.
Case 1:
A(x)=Cα(x), and B(x)=Cβ(x)
When θ=0, equations (10), (11) and (12) in reference [1].
When θ=3, C=4, [2] France Telecom, GET, “Enhancement of Rel. 6 turbo Code,” 3GPP TSG RAN WG1#43, R1-051310, 2005
When θ=3, C=4 and 8, Table 1, [3] Motorola, “A contention-free interleaver design for LTE codes,”, 3GPP TSG RAN WG1#47.
Case 2:
equations (13) in reference [1].
In addition, certain properties of ARP are also provided below:
Property 1:
x0=x1 mod C, which implies that π(x0)=π(x1)mod C.
Proof Set x1=x0+kC. Then π(x1)−π(x0)=PkC=0 mod C.
Property 2:
Define Ψ: {0, 1, . . . C−1}{0, 1, . . . C−1} by ψ(u)=π(u)mod C.
π is a permutation, which implies that Ψ is a bijection.
Proof: Assume u0,u1ε{0, 1, . . . C−1}, u0≠u1, but Ψ(u0)=Ψ(u1). There are L/C elements in {0, 1, . . . L−1} congruent u0 modulo C and another L/C elements congruent u1 modulo C. So, by property 1, there are 2L/C elements u such that π(u) have the same congruence module C. This contradicts the hypothesis that π is a permutation.
Property 3:
Define Ψ: {0, 1, . . . C−1}{0, 1, . . . C−1} by ψ(u)=π(u)mod C.
Ψ is a bijection, then π is a permutation.
Proof Assume x0,x1ε{0, 1, . . . L−1}, x0<x1, but π(x0)=π(x1). Let u0=x0 mod C and u1=x1 mod C. If u0≠u1, then π(x0)mod C≠π(x1)mod C since Ψ is a bijection. If u0=u1, then let x1=x0+kC for a kε{0, 1, . . . , L/C−1}. So, π(x1)−π(x0)=PkC mod L. Since gcd(P,L)=1 and C|L, PkC mod L=0 implies k=0 or k divides L/C. By the range on k, a contradiction cannot be avoided; so, π(x0)=π(x1) can only occur if x0=x1.
Property 4:
If π(x) is an ARP interleave (π) with period C, then π(x) is also an ARP interleave (π) with period {tilde over (C)}=mC provided {tilde over (C)}|L.
Proof Let π(x)=[Px+A(x mod C)P+B(x mod C)+θ] mod L.
Clearly, π(x) can also be written as follows:
So, if {tilde over (C)}|L, then π(x) is an ARP interleave (π) with period {tilde over (C)}.
Continuing with another form of memory mapping, there is another form of memory mapping, referred to as modular mapping (i.e., MOD mapping for short) that has been defined in reference [1] and [2] cited below.
According to this MOD mapping approach (again, M is the number of decoding processors),
MOD:ii mod M, where
M is the number of decoding processors,
C is the period of the ARP interleave (π), and
M is a multiple of C.
This MOD mapping approach embodiment is only contention-free if gcd(W,M)=1.
Each of these embodiments 1501 and 1502 correspond to the situation of MOD mapping when gcd(W,M)=1. Also, these embodiments 1501 and 1502 employ MOD mapping on index sets (W=5, C=4, M=4 and gcd(W,M)=1).
Each of these embodiments 1501 and 1502 employ a plurality of memory banks 1510 that includes 4 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1501 is as follows for the natural order when performing turbo decoding processing.
E0={0,5,10,15}→{0,1,2,3}
Also during the first decoding cycle, the memory mapping as shown in embodiment 1501 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê0={0,19,2,17}→{0,3,2,1}
During a second decoding cycle, the memory mapping as shown in embodiment 1502 is as follows for the natural order when performing turbo decoding processing.
E1={1,6,11,16}→{1,2,3,0}
Also during the second decoding cycle, the memory mapping as shown in embodiment 1502 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê1={11,14,9,12}→{3,2,1,0}
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order/interleaved order) is E2={2,7,12,17}→{2,3,0,1}/Ê2={6,1,4,3}→{2,1,0,3}.
Fourth decoding cycle (natural order/interleaved order) is E3={3,8,13,18}→{3,0,1,2}/Ê3={13,16,15,18}→{1,0,3,2}.
Fifth decoding cycle (natural order/interleaved order) is E4={4,9,14,19}→{0,1,2,3}/Ê4={8,7,10,5}→{0,3,2,1}.
As can be seen in this embodiment, the natural order and the interleaved order are both contention-free.
However, many of the previous embodiments, particular those that employs an ARP interleave (π) within the turbo encoding and turbo decoding, there is a restriction in the number of decoding processors, M, that can be employed. For example, these previous embodiments necessitate that the number of decoding processors, M, be a factor of the length of the information block, L.
The following provides a means by which an arbitrarily selected number (M) of decoding processors can be employed for performing parallel turbo decoding processing.
In doing so, a scheduled block length, L′, is judiciously chosen such that the arbitrarily selected number (M) of decoding processors can be employed in conjunction with an appropriate memory mapping that is contention-free.
Each of these embodiments 1601 and 1602 employ a plurality of memory banks 1610 that includes 4 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1601 is as follows for the natural order when performing turbo decoding processing.
E0={0,6,12,18}→{0,2,0,2}
Also during the first decoding cycle, the memory mapping as shown in embodiment 1601 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê0={0,2,12,14}→{0,2,0,2}
During a second decoding cycle, the memory mapping as shown in embodiment 1602 is as follows for the natural order when performing turbo decoding processing.
E1={1,7,13,19}→{1,3,1,3}
Also during the second decoding cycle, the memory mapping as shown in embodiment 1602 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê1={11,9,23,21}→{3,1,3,1}
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order/interleaved order) is E2={2,8,14,20}→{2,0,2,0}/Ê2={22,8,10,20}→{2,0,2,0}.
Fourth decoding cycle (natural order/interleaved order) is E3={3,9,15,21}→{3,1,3,1}/Ê3={5,19,17,7}→{1,3,1,3}.
Fifth decoding cycle (natural order/interleaved order) is E4={4,10,16,22}→{0,2,0,2}/Ê4={4,6,16,18}→{0,2,0,2}.
Sixth decoding cycle (natural order/interleaved order) is E5={5,11,17,23}→{1,3,1,3}/Ê5={15,13,3,1}→{3,1,3,1}.
As can be seen in this embodiment, both the natural order and the interleaved order are not contention-free.
There is another form of memory mapping, referred to as division mapping (i.e., DIV mapping for short) that has been defined in reference [4] cited below.
According to this DIV mapping approach,
DIV:i└i/W┘, where W is the window size of the parallel decoding architecture.
The index set at the i-th decoding cycle is as follows:
Ei={i,W+i, . . . , (M−1)W+i}, where
M is the number of processors, and C is the period of the ARP interleave (π).
Also, if M is a factor of the ratio, L/C, then the map on an ARP interleave (π) is in fact contention-free.
It is noted, however, that examples in the reference [3] and reference [4] cited below do not have this property.
Each of these embodiments 1701 and 1702 employ a plurality of memory banks 1710 that includes 3 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1701 is as follows for the natural order when performing turbo decoding processing.
E0={0,8,16}→{0,1,2}
Also during the first decoding cycle, the memory mapping as shown in embodiment 1701 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê0={0,8,16}→{0,1,2}
During a second decoding cycle, the memory mapping as shown in embodiment 1202 is as follows for the natural order when performing turbo decoding processing.
E1={1,9,17}→{0,1,2}
Also during the second decoding cycle, the memory mapping as shown in embodiment 1702 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê1={11,19,3}→{2,1,0}
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order/interleaved order) is E2={2,10,18}→{0,1,2}/Ê2={22,6,14}→{1,0,2}.
Fourth decoding cycle (natural order/interleaved order) is E3={3,11,19}→{0,1,2}/Ê3={5,13,21}→{2,1,0}.
Fifth decoding cycle (natural order/interleaved order) is E4={4,12,20}→{0,1,2}/Ê4={4,12,20}→{1,0,2}.
Sixth decoding cycle (natural order/interleaved order) is E5={5,13,21}→{0,1,2}/Ê5={15,23,7}→{0,2,1}.
Seventh decoding cycle (natural order/interleaved order) is E6={6,14,22}→{0,1,2}/Ê6={2,10,18}→{2,1,0}.
Eighth decoding cycle (natural order/interleaved order) is E7={7,15,23}→{0,1,2}/Ê7={9,17,1}→{0,2,1}.
As can be seen, the natural order and the interleaved order are both contention-free.
Each of these embodiments 1801 and 1802 correspond to the situation when M is not a factor of L/C; in other words, M does not divide into L/C. Also, these embodiments 1801 and 1802 employ DIV mapping on index sets (C=4, M=4 which is not a factor of L/C=6, and the window size, W=6).
Each of these embodiments 1801 and 1802 employ a plurality of memory banks 1810 that includes 4 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1701 is as follows for the natural order when performing turbo decoding processing.
E0={0,6,12,18}→{0,1,2,3}
Also during the first decoding cycle, the memory mapping as shown in embodiment 1801 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê0={0,2,12,14}→{0,0,2,2}
During a second decoding cycle, the memory mapping as shown in embodiment 1802 is as follows for the natural order when performing turbo decoding processing.
E1={1,7,13,19}→{0,1,2,3}
Also during the second decoding cycle, the memory mapping as shown in embodiment 1802 is as follows for the interleaved (π) order when performing turbo decoding processing.
Ê1={11,9,23,21}→{1,1,3,3}
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order/interleaved order) is E2={2,8,14,20}→{0,1,2,3}/Ê2={22,8,10,20}→{3,1,1,3}.
Fourth decoding cycle (natural order/interleaved order) is E3={3,9,15,21}→{0,1,2,3}/Ê3={5,19,17,7}→{0,3,2,1}.
Fifth decoding cycle (natural order/interleaved order) is E4={4,10,16,22}→{0,1,2,3}/Ê4={4,6,16,18}→{0,1,2,3}.
Sixth decoding cycle (natural order/interleaved order) is E5={5,11,17,23}→{0,1,2,3}/Ê5={15,13,3,1}→{2,2,0,0}.
As can be seen in this embodiment, the natural order is contention-free, but the interleaved order is not contention-free.
In accordance with the ARP interleave (π) described above, a novel contention-free memory mapping for the ARP interleave (π) is presented below (depicted as ADD or ADD mapping). For an ARP interleave (π) of arbitrary size, the following memory mapping is contention-free:
where
C: period of ARP;
M: # of processors, where M is a multiple of C (i.e., M=mC, where m is an integer);
W: window size of parallel decoding; and
q: the smallest positive integer with property qW=0 mod(M).
It is asserted that this ADD mapping is contention-free between a plurality of processors and a plurality of memories (e.g., memory banks) for an ARP interleave (π) of arbitrary size and when employing any number M of parallel implemented decoders.
This ADD mapping maps the index sets at the t-th decoding cycle to different memories (e.g., memory banks) as follows:
Et={t,W+t, . . . , (M−1)W+t}, and
Êt={π(t),π(W+t), . . . , π((M−1)W+t)}.
Each of these embodiments 1901 and 1902 correspond to the situation C=4, the window size, W=6, q=2, and qW=12.
Each of these embodiments 1901 and 1902 employ a plurality of memory banks 1910 that includes 4 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 1901 is as follows for the natural order when performing turbo decoding processing.
Also during the first decoding cycle, the memory mapping as shown in embodiment 1901 is as follows for the interleaved (π) order when performing turbo decoding processing.
During a second decoding cycle, the memory mapping as shown in embodiment 1902 is as follows for the natural order when performing turbo decoding processing.
Also during the second decoding cycle, the memory mapping as shown in embodiment 1902 is as follows for the interleaved (π) order when performing turbo decoding processing.
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Third decoding cycle (natural order) is:
Third decoding cycle (interleaved order) is:
Fourth decoding cycle (natural order) is:
Fourth decoding cycle (interleaved order) is:
Fifth decoding cycle (natural order) is:
Fifth decoding cycle (interleaved order) is:
Sixth decoding cycle (natural order) is:
Sixth decoding cycle (interleaved order) is:
As can be seen, the natural order and the interleaved order are both contention-free.
This embodiment 2000 corresponds to the situation C=4, m=2, the window size, W=3, q=8, and qW=24.
Since qW>L−1, we have the following └x/24┘=0,x<L, which then implies:
ADD:xx mod(8)
This embodiment 2000 employs a plurality of memory banks 2010 that includes 8 memory banks.
During a first decoding cycle, the memory mapping as shown in embodiment 2001 is as follows for the natural order when performing turbo decoding processing.
Also during the first decoding cycle, the memory mapping as shown in embodiment 2001 is as follows for the interleaved (π) order when performing turbo decoding processing.
During subsequent decoding cycles (e.g., decoding iterations), the memory mapping between processors and memories is as follows:
Second decoding cycle (natural order) is:
Second decoding cycle (interleaved order) is:
Third decoding cycle (natural order) is:
Third decoding cycle (interleaved order) is:
As can be seen, the natural order and the interleaved order are both contention-free.
:xx mod(W).
This straightforward address generation 2100 can be applied to a variety of types of memory mapping, but it will be seen that the subsequent complexity in implementation is sometimes undesirable and sometimes can incur certain deleterious effects such as a reduction in decoding speed.
Referring specifically to the
In accordance with the natural order phase of turbo decoding processing (e.g., the SISO 0 decoding operations), the addresses are accessed sequentially, as indicated by reference numeral 2103. The mapping (in this case the ADD mapping) determines which data from which particular memory bank is provided which processor. Nevertheless, the following can be seen:
1. The top data entry of each memory bank is accessed at time 0.
2. The 2nd to top data entry of each memory bank is accessed at time 1.
3. The 3rd to top data entry of each memory bank is accessed at time 2.
4. The 4th to top data entry of each memory bank is accessed at time 3.
5. The 2nd to bottom data entry of each memory bank is accessed at time 4.
6. The bottom data entry of each memory bank is accessed at time 5.
In accordance with the interleaved (π) order phase of turbo decoding processing (e.g., the SISO 1 decoding operations), the addresses are accessed in a different order (i.e., not necessarily sequentially), depending on the memory mapping and the interleave (π) employed, as indicated by reference numeral 2104. The mapping (in this case the ADD mapping) determines which data from which particular memory bank is provided which processor. Nevertheless, the following can be seen when considering each of the memory banks.
Considering memory bank B02120:
1. The top data entry of memory bank B02120 is accessed at time 0.
2. The bottom data entry of memory bank B02120 is accessed at time 1.
3. The 3rd to top data entry of memory bank B02120 is accessed at time 2.
4. The 2nd to top data entry of memory bank B02120 is accessed at time 3.
5. The 2nd to bottom data entry of memory bank B02120 is accessed at time 4.
6. The 4th to top data entry of memory bank B02120 is accessed at time 5.
Considering memory bank B12121:
1. The top data entry of memory bank B12121 is accessed at time 0.
2. The 4th to top data entry of memory bank B12121 is accessed at time 1.
3. The 3rd to top data entry of memory bank B12121 is accessed at time 2.
4. The bottom data entry of memory bank B12121 is accessed at time 3.
5. The 2nd to bottom data entry of memory bank B12121 is accessed at time 4.
6. The 2nd to top data entry of memory bank B12121 is accessed at time 5.
Analogous observations can be made when considering memory bank B22122 and memory bank B32123.
Referring to
1. At a cycle t, processor Ps computes bit position xs (sW+t in natural order phase; π(sW+t) in interleaved (π) order phase).
2. Ps computes (xs) (e.g., which memory bank) and (xs) (e.g., which location within the memory bank) for bank and address of memory location for xs.
3. The cross-bar switch 2310 is configured according to the set of (xs)'s to connect each processor Ps with the corresponding bank (xs).
4. The address (xs) from processor Ps is sent to the corresponding bank (xs) so that (xs), the value at bit position xs, can be accessed.
Described another way, if the address mapping is as follows:
:xx mod(W).
In the natural order phase of parallel turbo decoding processing, the addresses are accessed sequentially, which is a relatively quick and easy implementation.
In the interleaved (π) order phase of parallel turbo decoding processing, the addresses are calculated by the processors and then sent to the appropriate memory bank. This has a very long cycle time and the cross-bar switch 2310 is required for both providing of and translating of the memory addresses and the data values between the memory banks and processors.
Initially, it is supposed that the address mapping is as follows:
:xx mod(W).
The question may then be posed as to whether or not it is possible for each memory bank to calculate its own address sequence instead of necessitating that it be provided from the processors via the cross-bar switch. If the above-supposed address mapping (:xx mod(W)) is employed, then this can actually be performed but it is very difficult in implementation.
A novel approach is provided herein by which an index function, referred to as is calculated that gives the bit index x of the data value in the memory bank b at a cycle t of the interleaved (π) order phase of parallel turbo decoding processing. For example, the following index function, is calculated for this particular address mapping (:xx mod(W)).
(b,t)=x which implies that (x)=b and x=π(sW+t) for some s.
This index function, however, is generally very hard to construct for this particular address mapping (:xx mod(W)) because the equation x=π(sW+t) may have some congruence with mod(W). One exception to this would be the DIV memory mapping (e.g., DIV) and this is because therein the address (π(sW+0) is independent of s for that particular memory mapping.
Therefore, a novel address generation means is presented that addresses the memory mapping instead in accordance with the inverse interleave (π−1) order as follows:
:xπ−1(x)mod(W))
As the interleave (π) is calculated, the inverse thereof (i.e., the inverse interleave (π−1)) can also be saved.
In addition, the inverse interleave (π−1) can also evaluated as another ARP interleave (π) as described in the following commonly-assigned U.S. provisional patent application and U.S. utility patent application, the contents of which are hereby incorporated herein by reference in their entirety for all purposes:
1. U.S. Provisional Application Ser. No. 60/872,367, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and inverse thereof as de-interleave,”, filed Dec. 1, 2006, pending.
2. U.S. Utility application Ser. No. 11/657,819, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and inverse thereof as de-interleave,”, filed Jan. 25, 2007, pending.
An example of calculating the inverse interleave (π−1) for the Toy example (several embodiments of which are provided above) is shown below:
For the following parameters, L=24, C=4, and P=7. Then P−1=7 since PP−1 ≡1 mod(L).
The natural order phase (when considering 4 processors) is as follows:
0, 1, 2, 3, 4, 5×6, 7, 8, 9, 10, 11×12, 13, 14, 15, 16, 17×18, 19, 20, 21, 22, 23.
The interleaved (π) order phase (when considering 4 processors) is as follows:
0, 11, 22, 5, 4, 15×2, 9, 8, 19, 6, 13×12, 23, 10, 17, 16, 3×14, 21, 20, 7, 18, 1.
Referring back to the
The address generation 2400 using the natural order is as follows:
:xπ—1(x)mod(W).
This anticipatory address generation 2400 can be applied to a variety of types of memory mapping, and it provides for significantly improved decoding speed when compared to the straightforward address generation as described above.
Referring specifically to the
In accordance with the natural order phase of turbo decoding processing (e.g., the SISO 0 decoding operations), the addresses are accessed in a different order (i.e., not necessarily sequentially), depending on the memory mapping and the interleave (π) employed, as indicated by reference numeral 2402. The mapping (in this case the ADD mapping) determines which data from which particular memory bank is provided which processor. Nevertheless, the following can be seen when considering each of the memory banks.
Considering memory bank B02420:
1. The top data entry of memory bank B02420 is accessed at time 0.
2. The 4th to top data entry of memory bank B02420 is accessed at time 1.
3. The 3rd to top data entry of memory bank B02420 is accessed at time 2.
4. The bottom data entry of memory bank B02420 is accessed at time 3.
5. The 2nd to bottom data entry of memory bank B02420 is accessed at time 4.
6. The 2nd to top data entry of memory bank B02420 is accessed at time 5.
Considering memory bank B12421:
1. The top data entry of memory bank B12421 is accessed at time 0.
2. The bottom top data entry of memory bank B12421 is accessed at time 1.
3. The 3rd to top data entry of memory bank B12421 is accessed at time 2.
4. The 2nd to data entry of memory bank B12421 is accessed at time 3.
5. The 2nd to bottom data entry of memory bank B12421 is accessed at time 4.
6. The 4th to top data entry of memory bank B12421 is accessed at time 5.
Analogous observations can be made when considering memory bank B22422 and memory bank B32423.
In accordance with the interleaved (π) order of turbo decoding processing (e.g., the SISO 1 decoding operations), the addresses are accessed sequentially, as indicated by reference numeral 2404. The mapping (in this case the ADD mapping) determines which data from which particular memory bank is provided which processor. Nevertheless, the following can be seen:
1. The top data entry of each memory bank is accessed at time 0.
2. The 2nd to top data entry of each memory bank is accessed at time 1.
3. The 3rd to top data entry of each memory bank is accessed at time 2.
4. The 4th to top data entry of each memory bank is accessed at time 3.
5. The 2nd to bottom data entry of each memory bank is accessed at time 4.
6. The bottom data entry of each memory bank is accessed at time 5.
A cross-bar switch 2610 is employed to perform the appropriate providing of data values from each memory bank to the appropriate processor, but this embodiment is significantly less complex than the embodiment described above with respect to
Referring to
1. At a cycle t, processor Ps computes bit position xs (sW+t in natural order phase; π(sW+t) in interleaved (π) order phase).
2. Ps computes (xs) (e.g., which memory bank) for bank of memory location for xs.
3. The cross-bar switch 2610 is configured according to the set of (xs)'s to connect each processor Ps with the corresponding bank (xs).
4. Simultaneously, the memory generator of each memory bank determines the address (xs), so that (xs), the value at bit position xs, can be accessed and provided to the appropriate processor via the cross-bar switch 2610.
Described another way, if the address mapping is as follows:
:xπ−1(x)mod(W).
In the natural order phase of parallel turbo decoding processing, the addresses are calculated by each of the memory generators corresponding to each memory bank. For example, memory generator 2620 employs the calculated index function which then provides the information of the addresses of the data entries of the sub-block that have been stored in memory bank B0. Similarly, memory generator 2621 employs the calculated index function which then provides the information of the addresses of the data entries of the sub-block that have been stored in memory bank B1. Also analogously, memory generator 2622 employs the calculated index function which then provides the information of the addresses of the data entries of the sub-block that have been stored in memory bank B2, and memory generator 2623 employs the calculated index function which then provides the information of the addresses of the data entries of the sub-block that have been stored in memory bank B3.
In the interleaved (π) order phase of parallel turbo decoding processing, the addresses are simply access sequentially, which is very simple and quick. This use of these memory generators to calculate the addresses based on the index function allows for a much improved decoding speed in accordance with parallel turbo decoding processing.
In accordance with this anticipatory address generation, it is initially supposed that the address mapping is as follows:
:xπ−1(x)mod(W).
The question may then be posed as to whether or not it is possible for each memory bank to calculate its own address sequence instead of necessitating that it be provided from the processors via the cross-bar switch. If the above-supposed address mapping (:xπ−1(x)mod(W)) is employed, then this can be performed relatively easily thereby providing a much more simplistic implementation.
This employs the use of an index function that gives the bit index x of the data value in the memory bank b at a cycle t of the natural order phase of parallel turbo decoding processing. For example, the following index function, is calculated for this particular address mapping (:xπ−1(x)mod(W)).
(b,t)=x which implies that (x)=b and x=sW+t for some s.
This index function, can be found because for this particular address mapping (:xπ−1(x)mod(W)), the index function (b,t) has the specific form of sW+t.
Generally speaking, once x is known, then the address mapping, can be determined. Therefore, the novel address generation means is presented that addresses the memory mapping instead in accordance with the inverse interleave (π−1) order as follows:
:xπ−1(x)mod(W))
It is also noted that this anticipatory address generation, as employing an index function can be applied to any particular memory mapping that is desired in a particular application.
For one example, when considering the MOD mapping (i.e., MOD) function as follows:
MOD(i)=i mod M, then
the index function is as follows:
MOD(b,t)=(b−t)vW+t, where
vW≡1 mod(M), where v exists since gcd(W,M)=1.
The proof of this is provided below:
MOD(MOD(b,t))=((b−t)vW+t)mod M
MOD(MOD(b,t))=((b−t)+t)mod M
MOD(MOD(b,t))=b.
For another example, when considering the DIV mapping (i.e., MOD) function as follows:
DIV(i)=└i/W┘, then
the index function is as follows:
DIV(b,t))=bW+t.
The proof of this is provided below:
DIV(DIV(b,t))=└(bW+t)/W┘
DIV(DIV(b,t))=b.
For yet another example, when considering the ADD mapping (i.e., ADD) function as follows:
then
the index function is as follows:
It is also noted that v exists because q=M/g by the following:
q(W/g)≡0 mod(M/g)(M/g)|q, and
(M/g)W≡0 mod M(M/g)≧q).
The proof of this is provided below:
Let b−t=mg+n with 0≦n<g.
As such, it can be seen that this novel address function, (:xπ−1(x)mod(W), allows for each memory bank to compute its own sequence of addresses to perform appropriate memory bank access to provide the appropriate data portions therein to the appropriate processor for use in parallel turbo decoding processing.
Certain of the advantages of this novel, anticipatory address generation include a reduced cycle time. Because the memory banks themselves (or memory generators coupled to each of the memory banks, or even an integrated memory generator) are operable to generate the addresses themselves instead of needing to wait until these addresses are provided from the processors, and then passed to the memory banks via a cross-bar switch.
In addition, this novel address function, (:xπ−1(x)mod(W), allows for a smaller area in a hardware implementation because such a cross-bar switch need only perform data steering (i.e., the addresses are generated locally by the memory generators). This can be compared to a cross-bar switch that needs to perform both the providing of and directing of both addresses and data between the memory banks and the processors. Moreover, this novel address function, (:xπ−1(x)mod(W)), allows for less net congestion in the parallel turbo decoding processing, in that, the addresses are generated locally by the memory generators instead of being set from the processors via such a cross-bar switch.
Continuing on with the turbo decoding process and functionality, the metrics 2705 that are calculated by the metric generator 2704 are then provided to an anticipatory address generation module 2707. Initially, the anticipatory address generation module 2707 is operable to partition the received data block into a plurality of sub-blocks. Each of these sub-blocks includes a corresponding plurality of data. The individual data of each of the sub-blocks is then stored into a memory location within one memory bank of a plurality of memory banks 2790. The plurality of memory banks 2790 includes a number of memory banks, as shown by B02791, . . . , and Bn 2792. Based on the location of these data of each sub-block as they are placed into the memory banks 2790, the anticipatory address generation module 2707 is also operable to generate the appropriate index function (e.g., ) that is employed for appropriate accessing of the addresses of each of the individual data entries when performing the natural order phase of the turbo decoding processing (e.g., the SISO 0 decoding operations).
This index function is then operable to be employed by a plurality of memory generators (e.g., as also described within other embodiments) so that the appropriate address can be generated immediately without necessitating the involvement of the multiple decoding processors as employed within parallel turbo decoding processing.
The appropriate addresses are then provided for use by a first array of soft-in/soft-out (SISO) 0 decoders 2710. This first array of SISO 0 decoders 2710 includes a number of SISO 0 decoders, as shown by as shown by SISO 0 2711, . . . , and SISO 0 2712. Each individual SISO decoder in the array of SISO 0 decoders 2710 is operable to perform SISO decoding of data stored within a particular memory location within one of the particular memory banks 2790.
The earlier calculated metrics 2705 that are calculated by the metric generator 2704 are also provided to the second array of SISO 1 decoders 2730. This array of SISO 1 decoders 2730 includes a number of SISO 1 decoders, as shown by as shown by SISO 1 2731, . . . , and SISO 1 2732. Each individual SISO decoder in the array of SISO 1 decoders 2730 is also operable to perform SISO decoding of data stored within a particular memory location within one of the particular memory banks 2790.
In the context of trellis coding (e.g., turbo trellis coded modulation (TTCM)), each of the first array of SISO 0 decoder 2710 and the second array of SISO 1 decoders 2730 calculates forward metrics (alphas) and backward metrics (betas), and extrinsic values according to the trellis employed for each of the individual data entries within each of the corresponding memory locations that are being updated in that particular decoding iterations.
These alphas, betas, and extrinsics are all calculated for each symbol within a frame that is to be decoded. These calculations of alphas, betas, and extrinsics are all based on the trellis.
Starting with the first array of SISO 0 decoders 2710, after the extrinsic values 2711 have been calculated, they are passed to an interleaver (π) 2720 after which it is passed to the second array of SISO 1 decoders 2730 as “a priori probability” (app) information 2721. It is noted that the accessing of the data within the memory banks 2790 by the second array of SISO 1 decoders 2730 is performed sequentially due to the employing of the index function in accordance with the interleaved (π) order phase of the turbo decoding processing (e.g., the SISO 1 decoding operations).
Similarly, after extrinsic values 2731 have been calculated within the second array SISO 1 decoders 2730, they are passed to a de-interleaver (π−1) 2740 after which it is passed as “a priori probability” (app) information 2741 to the anticipatory address generation module 2707 that is operable to employ the appropriate index function (e.g., ) for appropriate accessing of the addresses of each of the individual data entries when performing the natural order phase of the turbo decoding processing (e.g., the SISO 0 decoding operations). As within other embodiments, the accessing of the addresses of each of the individual data entries when performing the natural order phase of the turbo decoding processing (e.g., the SISO 0 decoding operations) is not sequential; it is in a different order based on the particular memory mapping employed as well as the interleave (π).
It is noted that a single decoding iteration, within the iterative decoding process of the turbo decoder 2700 consists of performing two SISO operations; that is to say, the iterative decoding process must pass through both the first array of SISO 0 decoders 2710 and through the second array of SISO 1 decoders 2730.
After a significant level of confidence has been achieved and a solution is being converged upon, or after a predetermined number of decoding iterations have been performed, then the output from the second array of SISO 1 decoders 2730 is passed as output to an output processor 2750. The operation of the array of SISO 0 decoders 2710 and the array of SISO 1 decoders 2730 may generally be referred to as calculating soft symbol decisions of the symbol contained within the received symbol. These soft symbol decisions may be performed on a true bit level in certain embodiments. The output processor 2750 uses these soft symbol decisions to generate best estimates 2751 (e.g., hard bit and/or symbol decisions) for the information bits that have been encoded within the original turbo coded signal (e.g., generally within a turbo encoder location at another end of a communication channel into which the signal 2701 was originally launched.
It is also noted that each of the interleaving performed within the interleaver (π) 2720 can be performed using an embodiment of an ARP interleave, as shown by reference numeral 2791. Also, there are embodiments in which the de-interleaving performed within the de-interleaver (π−1) 2740 can also be performed using an embodiment of an ARP de-interleave.
Continuing on with the turbo decoding process and functionality, the metrics 2805 that are calculated by the metric generator 2804 are then provided to an anticipatory address generation module 2807. Initially, the anticipatory address generation module 2807 is operable to partition the received data block into a plurality of sub-blocks. Each of these sub-blocks includes a corresponding plurality of data. The individual data of each of the sub-blocks is then stored into a memory location within one memory bank of a plurality of memory banks 2890. The plurality of memory banks 2890 includes a number of memory banks, as shown by B02891, . . . , and Bn 2892. Based on the location of these data of each sub-block as they are placed into the memory banks 2890, the anticipatory address generation module 2807 is also operable to generate the appropriate index function (e.g., ) that is employed for appropriate accessing of the addresses of each of the individual data entries when performing the natural order phase of the turbo decoding processing (e.g., the SISO 0 decoding operations). As within other embodiments, the accessing of the addresses of each of the individual data entries when performing the natural order phase of the turbo decoding processing (e.g., the SISO 0 decoding operations) is not sequential; it is in a different order based on the particular memory mapping employed as well as the interleave (π).
This index function is then operable to be employed by a plurality of memory generators (e.g., as also described within other embodiments) so that the appropriate address can be generated immediately without necessitating the involvement of the multiple decoding processors as employed within parallel turbo decoding processing.
The appropriate addresses are then provided for use by an array of soft-in/soft-out (SISO) decoders 2810 that is operable to perform both the SISO 0 and the SISO 1 decoding operations. This array of SISO decoders 2810 includes a number of SISO decoders, as shown by as shown by SISO 2811, . . . , and SISO 2812. Each individual SISO decoder in the array of SISO decoders 2810 is operable to perform SISO decoding of data stored within a particular memory location within one of the particular memory banks 2890 (for both the SISO 0 and SISO 1 decoding operations).
The earlier calculated metrics 2805 that are calculated by the metric generator 2804 are also provided to the array of SISO decoders 2810 for performing of initial SISO 1 decoding operations, as shown by the reference numeral 2809.
In the context of trellis coding (e.g., turbo trellis coded modulation (TTCM)), each SISO decoder of the array of SISO decoder 2810 calculates forward metrics (alphas) and backward metrics (betas), and extrinsic values according to the trellis employed for each of the individual data entries within each of the corresponding memory locations that are being updated in that particular decoding iterations.
These alphas, betas, and extrinsics are all calculated for each symbol within a frame that is to be decoded. These calculations of alphas, betas, and extrinsics are all based on the trellis.
Starting with the first decoding operation (i.e., SISO 0) as performed by the array of SISO decoders 2810, after the extrinsic values 2811 have been calculated, they are passed to an interleaver (π) 2820 after which it is passed to back to the array of SISO decoders 2810 as “a priori probability” (app) information 2821. It is noted that the accessing of the data within the memory banks 2890 by the array of SISO decoders 2810, when performing the SISO 1 decoding operations, is performed sequentially due to the employing of the index function in accordance with the interleaved (π) order phase of the turbo decoding processing (e.g., the SISO 1 decoding operations).
Similarly, after extrinsic values 2831 have been calculated within the SISO decoders 2810 (i.e., during the SISO 1 decoding operations), they are passed to a de-interleaver (π−1) 2840 after which it is passed as “a priori probability” (app) information 2841 to the anticipatory address generation module 2807 that is operable to employ the appropriate index function (e.g., ) for appropriate accessing of the addresses of each of the individual data entries when performing the natural order phase of the turbo decoding processing (e.g., the SISO 0 decoding operations).
It is noted that a single decoding iteration, within the iterative decoding process of the turbo decoder 2800 consists of performing two SISO operations; that is to say, the iterative decoding process must pass through both the array of SISO decoders 2810 twice.
After a significant level of confidence has been achieved and a solution is being converged upon, or after a predetermined number of decoding iterations has been performed, then the output from the array of SISO decoders 2810 (after having performed the SISO 1 decoding operations) is passed as output to an output processor 2850. The operation of the array of SISO decoders 2810 may generally be referred to as calculating soft symbol decisions of the symbol contained within the received symbol. These soft symbol decisions may be performed on a true bit level in certain embodiments. The output processor 2850 uses these soft symbol decisions to generate best estimates 2851 (e.g., hard bit and/or symbol decisions) for the information bits that have been encoded within the original turbo coded signal (e.g., generally within a turbo encoder location at another end of a communication channel into which the signal 2801 was originally launched.
It is also noted that each of the interleaving performed within the interleaver (π) 2820 can be performed using an embodiment of an ARP interleave, as shown by reference numeral 2891. Also, there are embodiments in which the de-interleaving performed within the de-interleaver (π−1) 2840 can also be performed using an embodiment of an ARP de-interleave. As shown within this embodiment, a single array of SISO decoders 2810 is operable to perform both the SISO 0 and the SISO 1 decoding operations. Also, it sis noted that a single module can be employed to perform both the functionality of the interleaver (π) 2820 and the de-interleaver (π−1) 2840. Particularly when the interleave (π) employed is of ARP format, then a de-interleave (π−1) can be generated from the interleave (π) that is also of ARP format. In one such embodiment, a single module, software, hardware, and/or combination thereof can be employed to perform the functionality of both the interleaving (π) and de-interleaving (π−1) operations in accordance with parallel turbo decoding processing.
The method 2900 then continues by storing the plurality of data of the plurality of sub-blocks into a plurality of memory banks, as shown in a block 2930. The method 2900 also continues by performing anticipatory address generation (e.g., index function) for accessing of the plurality of data of the plurality of sub-blocks stored within the plurality of memory banks, as shown in a block 2950. The method 2900 also continues by turbo decoding the encoded block (i.e., the data within each of the plurality of sub-blocks) using a plurality of decoding processors in accordance with parallel turbo decoding processing, as shown in a block 2960. The method 2900 also continues by generating best estimates of information bits encoded within the turbo coded signal, as shown in a block 2970.
As shown in a block 3010, the method 3000 operates by performing anticipatory address generation (e.g., index function, ) for accessing of a plurality of data of a plurality of sub-blocks stored within a plurality of memory banks.
The method 3000 then continues by performing 1st SISO decoding (e.g., SISO 0, natural order phase) of the encoded block (i.e., the data within each of the plurality of sub-blocks) using a plurality of decoding processors in accordance with parallel turbo decoding processing thereby generating first extrinsic information, as shown in a block 3020.
The method 3000 then continues by interleaving (π) first extrinsic information thereby generating first a priori probability information, as shown in a block 3030.
The method 3000 then continues by performing 2nd SISO decoding (e.g., SISO 1, interleaved (π) order phase) of the encoded block (i.e., the data within each of the plurality of sub-blocks) using the plurality of decoding processors in accordance with parallel turbo decoding processing thereby generating second extrinsic information, as shown in a block 3040.
When performing additional decoding operations as shown by reference numeral 3051, the method 3000 continues by de-interleaving (π−1) the second extrinsic information thereby generating second a priori probability information, as shown in a block 3050. The method 3000 then continues by returning to block 3010 to perform the anticipatory address generation for subsequent decoding iterations.
However, when a final decoding iteration has been performed (e.g., all of the SISO 0 and SISO 1 decoding operations have been performed, and particularly after a final SISO 1 decoding operation has been performed) as shown by reference numeral 3041, then the method 3000 continues by generating best estimates of information bits encoded within the turbo coded signal, as shown in a block 3060.
The present invention has also been described above with the aid of method steps illustrating the performance of specified functions and relationships thereof. The boundaries and sequence of these functional building blocks and method steps have been arbitrarily defined herein for convenience of description. Alternate boundaries and sequences can be defined so long as the specified functions and relationships are appropriately performed. Any such alternate boundaries or sequences are thus within the scope and spirit of the claimed invention.
The present invention has been described above with the aid of functional building blocks illustrating the performance of certain significant functions. The boundaries of these functional building blocks have been arbitrarily defined for convenience of description. Alternate boundaries could be defined as long as the certain significant functions are appropriately performed. Similarly, flow diagram blocks may also have been arbitrarily defined herein to illustrate certain significant functionality. To the extent used, the flow diagram block boundaries and sequence could have been defined otherwise and still perform the certain significant functionality. Such alternate definitions of both functional building blocks and flow diagram blocks and sequences are thus within the scope and spirit of the claimed invention.
One of average skill in the art will also recognize that the functional building blocks, and other illustrative blocks, modules and components herein, can be implemented as illustrated or by discrete components, application specific integrated circuits, processors executing appropriate software and the like or any combination thereof.
Moreover, although described in detail for purposes of clarity and understanding by way of the aforementioned embodiments, the present invention is not limited to such embodiments. It will be obvious to one of average skill in the art that various changes and modifications may be practiced within the spirit and scope of the invention, as limited only by the scope of the appended claims.
The present U.S. Utility patent application claims priority pursuant to 35 U.S.C. §120, as a continuation, to the following U.S. Utility patent application which is hereby incorporated herein by reference in its entirety and made part of the present U.S. Utility patent application for all purposes: 1. U.S. Utility application Ser. No. 11/810,989, entitled “Address generation for contention-free memory mappings of turbo codes with ARP (almost regular permutation) interleaves,”filed Jun. 7, 2007, and scheduled to be issued as U.S. Pat. No. 7,831,894 on Nov. 9, 2010, which claims priority pursuant to 35 U.S.C. §119(e) to the following U.S. Provisional Patent Applications which are hereby incorporated herein by reference in their entirety and made part of the present U.S. Utility patent application for all purposes: a. U.S. Provisional Application Ser. No. 60/850,492, entitled “General and algebraic-constructed contention-free memory mapping for parallel turbo decoding with algebraic interleave ARP (almost regular permutation) of all possible sizes,” , filed Nov. 10, 2006, now expired.b. U.S. Provisional Application Ser. No. 60/872,367, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and inverse thereof as de-interleave,” filed Dec. 1, 2006, now expired.c. U.S. Provisional Application Ser. No. 60/872,716, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and arbitrary number of decoding processors,”), filed Dec. 4, 2006, now expired.d. U.S. Provisional Application Ser. No. 60/861,832, entitled “Reduced complexity ARP (almost regular permutation) interleaves providing flexible granularity and parallelism adaptable to any possible turbo code block size,”, filed Nov. 29, 2006, now expired.e. U.S. Provisional Application Ser. No. 60/879,301, entitled “Address generation for contention-free memory mappings of turbo codes with ARP (almost regular permutation) interleaves,” filed Jan. 8, 2007, now expired. The following U.S. Utility patent applications/U.S. patents are hereby incorporated herein by reference in their entirety and made part of the present U.S. Utility patent application for all purposes: 1. U.S. Utility application Ser. No. 11/704,068, entitled “General and algebraic-constructed contention-free memory mapping for parallel turbo decoding with algebraic interleave ARP (almost regular permutation) of all possible sizes,” filed Feb. 8, 2007, pending. 2. U.S. Utility application Ser. No. 11/657,819, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and inverse thereof as de-interleave,” filed Jan. 25, 2007, pending. 3. U.S. Utility application Ser. No. 11/811,014, entitled “Turbo decoder employing ARP (almost regular permutation) interleave and arbitrary number of decoding processors,” filed on Jun. 7, 2007, now U.S. Pat. No. 7,827,473 B2, issued on Nov. 2, 2010. 4. U.S. Utility application Ser. No. 11/811,013, entitled “Reduced complexity ARP (almost regular permutation) interleaves providing flexible granularity and parallelism adaptable to any possible turbo code block size,” filed on Jun. 7, 2007, pending.
Number | Name | Date | Kind |
---|---|---|---|
5721745 | Hladik et al. | Feb 1998 | A |
5734962 | Hladik et al. | Mar 1998 | A |
6023783 | Divsalar et al. | Feb 2000 | A |
6580767 | Koehler et al. | Jun 2003 | B1 |
6594792 | Hladik et al. | Jul 2003 | B1 |
6678843 | Giulietti et al. | Jan 2004 | B2 |
6715120 | Hladik et al. | Mar 2004 | B1 |
6775800 | Edmonston et al. | Aug 2004 | B2 |
6789218 | Edmonston et al. | Sep 2004 | B1 |
6983412 | Fukumasa | Jan 2006 | B2 |
7113554 | Dielissen et al. | Sep 2006 | B2 |
7180843 | Furuta et al. | Feb 2007 | B2 |
7281198 | Yagihashi | Oct 2007 | B2 |
7302621 | Edmonston et al. | Nov 2007 | B2 |
7530011 | Obuchii et al. | May 2009 | B2 |
7720017 | Khan | May 2010 | B2 |
20060101319 | Park et al. | May 2006 | A1 |
Number | Date | Country | |
---|---|---|---|
20110055663 A1 | Mar 2011 | US |
Number | Date | Country | |
---|---|---|---|
60850492 | Oct 2006 | US | |
60872367 | Dec 2006 | US | |
60872716 | Dec 2006 | US | |
60861832 | Nov 2006 | US | |
60879301 | Jan 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11810989 | Jun 2007 | US |
Child | 12941178 | US |