Interleaving is used to help signal receivers overcome errors in acquired signals. For example, interleaving is applied to signals that are sent in wireless telecommunications networks that comply with the Third Generation Partnership Program (3GPP) standards.
Generally speaking, interleaving, as performed by an interleaver, is the deliberate and reversible disordering of a sequence of information symbols. If a receiver acquires an interleaved signal in which an error affects a contiguous group of symbols, the error can be dispersed by reversing the interleaving, so-called “deinterleaving.”
The ability to disperse an error is important when a transmitted information symbol sequence is prepared using forward error correction (FEC) coding that enables the resolution of an error in an acquired sequence provided that the error does not extend over too great a contiguous part of the acquired sequence. Turbo encoding is an example of FEC that is used within the 3GPP standards.
In a turbo encoder, a symbol sequence is supplied in parallel to both a first convolutional encoder and an interleaver. The interleaver produces an interleaved version of the sequence, which is then input to a second convolutional encoder. The outputs of the convolutional encoders are combined with the original sequence to provide the output of the turbo encoder. In a turbo decoder, a received turbo encoded sequence is used to prime a pair of constituent decoders. The output of a first one of the constituent decoders is interleaved and sent to the second constituent decoder for a further decoding iteration and the output of the second constituent decoder is deinterleaved and sent to the first constituent decoder for a further decoding iteration. The outputs of the first and second constituent decoders are exchanged several times prior to the emergence of a finally decoded sequence. Thus, where turbo coding is employed, considerable processing effort is devoted to interleaving, particularly in turbo decoding.
Several classes of interleaving algorithm exist. For example, a rectangular interleaver loads a symbol sequence into a memory block in a column-wise fashion and reads the symbols out in a row-wise fashion. Another type of interleaving is quadratic permutation polynomial (QPP) interleaving, which will be discussed after a brief reminder regarding some mathematical operations that feature in the remainder of this document:
Given a data sequence of length K symbols (where K has an integer value) and two parameters a and b that are dependent on K), the relationship between the position x of a symbol in the interleaved sequence and its position f(x) in the original sequence of QPP interleaving is:
f(x)=(ax+bx2)mod K Equation 1
where x=0, 1, 2, 3, 4, . . . , K−1.
It has been shown in “A Decoder Architecture for High-Speed Free-Space Laser Communications” (M. Cheng, M. Nakashima, J. Hamkins, B. Moision, and M. Barsoum, Proceedings of SPIE, vol. 5712, pp. 174-185, April 2005) that if we define:
g(x)=(a+b+2bx)mod K
then:
f(x+1)=(f(x)+g(x))mod K
and
g(x+1)=(g(x)+2b)mod K
The function g(x) is an auxiliary function that recursively defines f(x).
According to one aspect, an embodiment of the invention provides a method of interleaving a set of data items from an original order to an interleaved order. The method includes calculating in advance an initial value of at least one parameter for use by logic circuitry to initialize interleaving operation, storing the initial value of at least one parameter, and using the stored initial value of at least one parameter with the logic circuitry to generate interleaved order positions for the set of data items.
The at least one parameter may include a plurality of sets of parameters for a plurality of respective block sizes of the set of data items. The logic circuitry may include a single processing engine and the at least one parameter includes a first parameter having a value represented by (a+b)mod K and a second parameter having a value represented by (2b)mod K, where K represents block size in the set of data items, and a and b represent coefficients of a quadratic permutation polynomial (QPP) determining the interleaved order from the original order.
It is appreciated that the at least one parameter may include a third and a fourth parameter. Accordingly, the logic circuitry may further include registered inputs arranged to receive the third parameter value represented by c−K and the fourth parameter value represented by c−2K, where c represents (2b)mod K.
In an alternative embodiment, the logic circuitry includes a plurality of processing engines and the at least one parameter includes a first, a second, a third, and a fourth parameter. The first parameter may have a value represented by (a+b)mod M, a second parameter may have a value represented by └(a+b)/M┘, a third parameter may have a value represented by (2b)mod M, and a fourth parameter may have a value represented by └(2b)/M┘, where a and b represent coefficients of a QPP determining the interleaved order from the original order. According to one embodiment, M represents length of sections into which the set of data items is divided for processing by the plurality of engines, and └ ┘ denotes floor function.
The logic circuitry may include a plurality of processing engines whose operation is synchronized, the set of data items can be visualized respectively as a rectangular array of data items with rows and columns. The interleaver includes a single logic assembly arranged to calculate both column addresses and row addresses.
The logic circuitry may alternatively include a plurality of processing engines and the at least one parameter may include a first parameter having a value represented by (a+b)mod M, (c−M)mod M, (c−2M)mod M, └(a+b)/M┘, where a and b represent coefficients of a QPP determining the interleaved order from the original order. In one embodiment, M represents length of sections into which the set of data items is divided for processing by the plurality of engines. The logic circuitry may include a Single-Input and Single-Output (SISO) decoder.
According to a further aspect, an embodiment of the present invention provides an interleaver, for interleaving a set of data items from an original order to an interleaved order, the interleaver including: logic circuitry for generating interleaved order positions for the set of data items, and memory coupled to the logic circuitry for holding an initial value of at least one parameter for use by the logic circuitry to initialize interleaving operation. The at least one parameter may include a plurality of sets of parameters for a plurality of respective block sizes of the set of data items. The interleaver may include a single processing engine and the at least one parameter includes a first parameter having a value represented by (a+b)mod K and a second parameter having a value represented by (2b)mod K, where K represents block size in the set of data items, and a and b represent coefficients of a QPP determining the interleaved order from the original order.
According to one embodiment, the at least one parameter may further include a third and a fourth parameter. Accordingly, the interleaver may further include registered inputs arranged to receive the third parameter value represented by c−K and the fourth parameter value represented by c−2K, where c represents (2b)mod K. In an alternative embodiment, the interleaver includes a plurality of processing engines and the at least one parameter includes a first parameter having a value represented by (a+b)mod M, a second parameter having a value represented by └(a+b)/M┘, a third parameter having a value represented by (2b)mod M, and a fourth parameter having a value represented by └(2b)/M┘, where a and b represent coefficients of a QPP determining the interleaved order from the original order, and M represents length of sections into which the set of data items is divided for processing by the plurality of engines.
The interleaver may includes a plurality of processing engines whose operation is synchronized. It is appreciated that the set of data items may be visualized respectively as a rectangular array of data items with rows and columns, and the interleaver may include a single logic assembly arranged to calculate both column addresses and row addresses.
The interleaver may alternatively include a plurality of processing engines and the at least one parameter includes a first parameter having a value represented by (a+b)mod M, (c−M)mod M, (c−2M)mod M, └(a+b)/M┘, where a and b represent coefficients of a QPP determining the interleaved order from the original order, and M represents length of sections into which the set of data items is divided for processing by the plurality of engines. The interleaver may include a SISO decoder.
According to a yet further aspect, an embodiment of the invention provides a method of interleaving a series of K data items from an original order to an interleaved order. The method includes: calculating from which position in the original order should be provided the data item for a current position in the interleaved order; where: the series of K data items represents communication signals (where K is an integer); the position in the original order is specified by a first function, which is a function of the current position in the interleaved order; the basis of the first function is a sum of two variable values each being less than K; a first of said variable values is the value of the first function when its argument is the preceding position in the interleaved order; a second of the variable values is the value of an auxiliary function when its argument is the preceding position in the interleaved order; and the method further includes: storing an initial value of at least one parameter; using the stored an initial value of at least one parameter to initialize interleaving operation; calculating a first sum, which is a sum of the first variable value and a first constant; calculating a second sum, which is a sum of the second variable value and a second constant; and using the first and second sums to calculate a value for the auxiliary function when its argument is the current position in the interleaved order; where the first and second sums are calculated in parallel by logic circuitry.
According to still a further aspect, an embodiment of the invention provides an interleaver for interleaving a series of K data items from an original order to an interleaved order, where: the interleaver is arranged to calculate from which position in the original order should be provided the data item for a current position in the interleaved order; the position in the original order is specified by a first function, which is a function of the current position in the interleaved order; the basis of the first function is a sum of two variable values each being less than K; a first of said variable values is the value of the first function when its argument is the preceding position in the interleaved order; a second of said variable values is the value of an auxiliary function when its argument is the preceding position in the interleaved order; and the interleaver includes: memory for holding an initial value of at least one parameter for use to initialize interleaving operation; first logic circuitry arranged to calculate a first sum, which is a sum of the first variable value and a first constant; second logic circuitry arranged to calculate a second sum, which is a sum of the second variable value and a second constant; and third logic circuitry arranged to use said first and second sums to calculate a value for the auxiliary function when its argument is the current position in the interleaved order; where the first and second logic circuitry are arranged to operate in parallel.
Thus, embodiments of the invention parallelize loading, thereby reducing overall system latency and improving throughput. There is no need to calculate interleaver addresses sequentially, which saves hardware resources. There is also no requirement for extra logic circuitry resources to store the sequence values in a memory. The interleaver addresses can be calculated in a SISO decoder. As a result, a single buffer is needed, hence saving memory bits. Accordingly, using embodiments of the present invention, decoders such as a Long Term Evolution (LTE) Turbo decoders are made more effective in terms of hardware and performance.
The novel features of the invention are set forth in the appended claims. However, for purpose of explanation, several aspects of particular embodiments of the invention are described by reference to the following figures.
The following description is presented to enable a person skilled in the art to make and use the invention, and is provided in the context of particular applications and their requirements. Various modifications to the exemplary embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
Turbo decoder 11 may include first and second constituent decoders 12 and 13, each arranged to work on the re-ordered output of the other. The output sequence that constituent decoder 12 produces during an iteration of the turbo decoding process is stored into a memory 14. That sequence is then read from the memory 14 under the control of an interleaver 15 in order to provide the input sequence for constituent decoder 13 in the next iteration of the turbo decoding process. In a similar manner, the output sequence that constituent decoder 13 produces during an iteration of the turbo decoding process is stored into a memory 16. That sequence is then read from the memory 16 under the control of a deinterleaver 17 in order to provide the input to constituent decoder 12 for the next iteration of the turbo decoding process.
The sequence loaded into memory 14 from constituent decoder 12 arrives in memory 14 in a certain order, which shall hereinafter be referred to as the “original order.” The sequence loaded into memory 14 is then read out from memory 14 in a different order than the “original order,” which shall hereinafter be called the “interleaved order,” in accordance with a QPP interleaving algorithm (the deinterleaver 17 applies the reverse of this reordering operation). The interleaved order is obtained from the original order using the QPP interleaving algorithm mentioned earlier, in the “Background” section. That is to say, where the sequence to be interleaved is a block of K symbols, the relationship between the position x of a symbol in an original order and its position f(x)in the interleaved order is:
f(x)=(ax+bx2)mod K
where:
x=0, 1, 2, 3, 4, . . . , K−1; and
a and b are dependent on K.
According to the aforementioned paper by Cheng et al, position f(x−1) can be deduced from position f(x) by:
f(x+1)=(f(x)+g(x))mod K
where:
g(x)=(a+b+2bx)mod K;
g(x+1)=(g(x)+2b)mod K; and
the initial conditions are f(0)=0 and g(0)=(a+b)mod K.
An exemplary hardware implementation of the interleaver 15, in accordance with one embodiment of the present invention, that deduces f(x+1) from f(x) and g(x) and which also deduces g(x+1) is shown in
First, c=2b mod K may be defined in one exemplary embodiment. The following values can be calculated:
f1(x+1)=f(x)+g(x)
f2(x+1)=f(x)+g(x)−K
g1(x+1)=g(x)+c
g2(x+1)=g(x)+c−K.
Since f(x)=(ax+bx2)mod K, it follows that f(x)<K. Since g(x)=(a+b+2bx)mod K, it follows that g(x)<K. Also, it is appreciated that c<K. Therefore:
More formally:
Thus, the FPGA 10 can be configured to implement the calculation of the values f1(x+1), f2(x+1), g1(x+1) and g2(x+1) and to evaluation of Equations 2 and 3, without performing the multiplication and division operations that appear to be required by Equation 1.
The hardware implementation for the calculation of f(x+1) that is shown in
The digital circuit shown in
The circuit of
f1(x+1)=f(x)+g(x)
which is the output of register 30.
f2(x+1)=f(x)+gK(x)
which is the output of register 28.
g1(x+1)=g(x)
which is the output of register 32.
g2(x+1)=g(x)+c−K
which is the output of register 34.
g3(x+1)=g(x)+c−2K
which is the output of register 36.
The values for f(x+1) and g(x+1) appear at the outputs of multiplexers 38 and 40, respectively.
Note that the definition of f2 has been modified and that g3 and gK have been introduced. The value gK appears at the output of the multiplexer 42 and is given by:
These changes avoid the “double addition” in f(x)+g(x)−K, which was the original formulation for f(x+1).
A deinterleaver adapted to support a parallel block decoding within a turbo decoder will now be described, with reference to
A sequence of symbols to be decoded by a constituent decoder (such as 12 or 13 in
The constituent decoder 12′ contains P parallel decoding engines 44-1 to 44-P (where P has an integer value), each for decoding a section of the deinterleaved version of the sequence produced by the deinterleaver 17′ reading from memory 16 the sequence produced by constituent decoder 13′ in the previous turbo decoding iteration. The constituent decoder 13′ contains P parallel decoding engines 46-1 to 46-P, each for decoding a section of the interleaved version of the sequence produced by the interleaver 15′ reading from memory 14 the sequence produced by the constituent decoder 12′ in the previous turbo decoding iteration. In reading symbols from memory 14, the interleaver 15′ implements a version of QPP interleaving that supplies decoding engines 46-1 to 46-P in parallel with the sequence sections that they are to decode (and the deinterleaver 17′ performs an analogous deinterleaving role for engines 44-1 to 44-P). As a precursor to describing the operation of the interleaver 15′, a nomenclature for describing the arrangement of the symbols that are provided to the engines 44-1 to 44-P and 46-1 to 46-P will now be developed.
Over the course of a turbo decoding iteration, each of the engines 44-1 to 44-P produces an estimate of a section of the original order. These sections are shown in
According to the QPP algorithm set out earlier, the content for position x in the interleaved sequence is obtained from position f(x) in the original sequence. Given that the original sequence is now arranged as array 48 in
f′(x)=f(x)mod m Equation 4
f″(x)=└f(x)/M┘ Equation 5.
This can be stated another way as:
f(x)=f′(x)+Mf″(x) Equation 6
We can define a similar decomposition for g(x):
g(x)=g′(x)+Mg″(x) Equation 7
where:
g′(x)=g(x)mod M Equation 8
g″(x)=└g(x)/M┘ Equation 9
where g′(x) and g″(x) are auxiliary functions too.
Given that f′(x+1)=f(x+1)mod M, we can substitute for f(x+1) to obtain:
f′(x+1)=[(f(x)+g(x))mod K]mod M.
Since M is a factor of K, this becomes:
f′(x+1)=(f(x)+g(x))mod M.
The definitions of f(x) and g(x) from Equations 6 and 7 can be substituted into the above equation to yield:
f′(x+1)=(f′(x)+Mf″(x)+g′(x)+Mg″(x))mod M
but f″(x) and g″(x) are integers, so the above result reduces to:
f′(x+1)=(f′(x)+g′(x))mod M.
Likewise,
g′(x+1)=g(x+1)mod M
g′(x+1)=[(g(x)+2b)mod K]mod M
g′(x+1)=(g(x)+2b)mod M
g′(x+1)=(g(x)mod M+2b mod M)mod M
g′(x+1)=(g′(x)+c′)mod M
where c′=(2b)mod M.
Given that f″(x+1)=└f(x+1)/M┘, we can substitute for f(x+1) to obtain:
Likewise,
where c″=└(2b)/M┘.
The quantities f″(x+1) and f′(x+1) are defined inductively. Therefore, x can be treated as a parameter that runs from 0 to M−1 along each row of array 50 as indicated by arrow 60 in
Considering the 0th row of the array 50, f(0)=0 and g(0)=(a+b)mod K and:
f′(0)=f(0)mod M=0 mod M=0
f″(0)=└f(0)/M┘=└0/M┘=0
g′(0)=[(a+b)mod K]mod M=(a+b)mod M (reacll that M is a factor of K)
g″(0)=└g(0)/M┘=└[(a+b)mod K]/M┘.
The procedure for calculating f′(x+1), f″(x+1), g′(x+1) and g″(x+1) from f′(x), f″(x), g′(x) and g″(x) within a row of the array 50 will now be described.
Since f′(x) is smaller than M and since g′(x) smaller than M, it follows that f′(x)+g′(x) is smaller than 2M. So, f′(x+1) has one of the two following values:
f1′(x+1)=f′(x)+g′(x) Equation 10
f2′(x+1)=f′(x)+g′(x)−M Equation 11.
Since f″(x) is smaller than P and since g″(x) is smaller than P, it follows that f″(x)+g″(x) is smaller than 2P. So, one of the two following values is smaller than P:
f1″(x+1)=f″(x)+g″(x) Equation 12
f2″(x+1)=f″(x)+g″(x)−P Equation 13.
Since g′(x) is smaller than M and since c′ is smaller than M, it follows that g′(x)+c′ is smaller than 2M. So, one of the two following values is smaller than M:
g1′(x+1)=g′(x)+c′ Equation 14
g2′(x+1)=g′(x)+c′−M Equation 15.
Since g″(x) is smaller than P and since c″ is smaller than P, it follows that g″(x)+c″ is smaller than 2P. So, g″(x+1) has one of the two following values:
g1″(x+1)=g″(x)+c″ Equation 16
g2″(x+1)=g″(x)+c″−P Equation 17.
Therefore,
Equations 10 to 21 can be implemented within FPGA 10 to produce an interleaver for determining the location in terms of ordinate or row f″(x) and abscissa or column f′(x) co-ordinates within array 48 whose content is placed at position x of row n of array 50 (where row n is of course specified by the values assigned for f′(0), f″(0), g′(0) and g″(0)). However, the set of equations 10 to 21 can be modified to avoid, as before, delay associated with the occurrence of multiple arithmetic operations in series in a data path, in which case Equation 11 becomes:
f2′(x+1)=f′(x)+gM′(x) Equation 22.
And the following definitions can be made:
g3′(x+1)=g′(x)+c′−2M
The interleaver implementation for a row of array 50 is shown in
In the case where the interleaver design shown in
f2′(0)=−1
f1′(0)=0
g1′(0)=(a+b)mod M
g2′(0)=c′−M
g3′(0)=c′−2M
f1″(0)=0
g1″(0)=└[(a+b)mod K]/M┘.
These initial values are loaded using the synchronous load inputs of the registers 62 to 74, as indicated by the inputs leading into the lower faces of the registers. If the design of
The circuit of
As shown in
Referring now to
In order to avoid these latency delays before the FPGA circuitry can produce the interleaved symbols, an arrangement such as shown in
Thus, as shown in
It will be understood that after one-time initial pre-calculation and storage of the initial values, the amount of time needed to read the values from the memory 130 and to apply them to the FPGA circuitry of
Thus, pre-calculating and storing initial values overcomes the disadvantages of the method described earlier of calculating during the loading time which requires extra logic resources on an IC to calculate interleaver addresses sequentially and save the intermediate register values during the calculation and makes the parallelism of loading infeasible, which affects the overall system latency and throughput. This technique therefore provides the advantages that:
It will be appreciated that, depending on the particular implementation, it may not be necessary to pre-calculate and store all of the required initial values as described above. In such cases it may be acceptable (keeping latency to an acceptably low level) to pre-calculate and store only some of the required initial values, while allowing others to be calculated at load-time as earlier described above. Below there will discussed four different implementations of LTE Turbo QPP interleaver: (i) hardware implementation of an LTE Turbo QPP interleaver for a single processing engine case, (ii) hardware-optimized implementation for a single processing engine case, (iii) hardware implementation of an LTE Turbo QPP interleaver for a multiple processing engine case, and (iv) hardware-optimized implementation of an LTE Turbo QPP interleaver for a multiple processing engine case.
Assuming that there are P parallel engines, then for P−1 locations, the register values are unknown. These register values are dependent on the block size K. It can be proved that for each block size:
(i) As discussed above, in one embodiment of a hardware implementation of an LTE Turbo QPP interleaver for a single engine case, initialization involves calculating f(0)=0, g(0)=(a+b)mod K, c=(2*b)mod K, and f(x+1), g(x+1) as described above. Thus, for each LTE supported block size K, g(0) and c can be calculated during compile time and stored in RAM/ROM (there being no need to pre-calculate f(0) since its value is known to be zero). Therefore no hardware resource is required for the initialization.
(ii) In one embodiment of a hardware-optimized implementation for a single engine case, one aim for hardware optimization is to increase the operating frequency of the interleaver. Operation of one embodiment of such an interleaver may be analyzed as follows:
Initialization:
f2(x+1) is modified. g3(x+1) and gK(x+1) are added.
Given that f(x) and g(x) are each smaller than K, the following steps are applied to calculate f(x+1) and g(x+1) (the resulting f(x+1), g(x+1) are smaller than K too):
In one embodiment, for each LTE supported block size K, g(0) and c are calculated during compile time and stored in ROM. Accordingly, the amount of hardware resources required for the initialization is reduced. Moreover, c−K and c−2K are calculated outside the interleaver and supplied to the interleaver as registered inputs.
This implementation reduces the length of the combinatorial path while performing more calculations in parallel resulting in superior operating frequency. Performance result comparisons between the standard hardware implementation and the optimized hardware implementation display the difference in resource usage.
(iii) In a hardware implementation of an LTE Turbo QPP interleaver for a multiple engine case, the interleaver operates according to the following analysis:
(iv) In one embodiment, for hardware optimized implementation, the mathematical formulation can be modified similarly to the single engine version:
It will be appreciated that, although the technique utilizing pre-calculation and storage of initial values has been described above in relation LTE Turbo QPP interleaving, it need not be limited to such applications, and could alternatively be applied to other types of interleaving (the parameters best suited for initial value pre-calculation being a matter for design choice of the person of ordinary skill in the art depending on a various design possibilities and performance considerations).
In at least one embodiment, the data interleaved by an interleaver or method of the present invention represent communication signals that are produced or received by a communication device. In one embodiment, the data represents demodulated RF communication signals that are demodulated from RF communication signals received by a base station. It is to be noted that “representing communication signals” is herein used broadly to refer to both representing either the communication signals that are actually transmitted or some other signals derived therefrom (e.g., signals that are modulated or demodulated versions of the actually transmitted communication signals).
While the present invention has been particularly described with respect to the illustrated embodiments, it will be appreciated that various alterations, modifications and adaptations may be made based on the present disclosure, and are intended to be within the scope of the present invention. While the invention has been described in connection with what are presently considered to be the most practical and preferred embodiments, it is to be understood that the present invention is not limited to the disclosed embodiments but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5815421 | Dulong et al. | Sep 1998 | A |
6163871 | Yang | Dec 2000 | A |
6530059 | Crozier et al. | Mar 2003 | B1 |
6845482 | Yao et al. | Jan 2005 | B2 |
7085969 | Zheng et al. | Aug 2006 | B2 |
7468994 | Tsuchiya | Dec 2008 | B2 |
7667628 | Breiling | Feb 2010 | B2 |
8291291 | Pan et al. | Oct 2012 | B1 |
20030182524 | Tsuchiya | Sep 2003 | A1 |
20030221157 | Becker et al. | Nov 2003 | A1 |
20100241922 | Furuhashi et al. | Sep 2010 | A1 |
20130311850 | Shinohara et al. | Nov 2013 | A1 |
Entry |
---|
U.S. Appl. No. 12/291,695, filed Nov. 13, 2008, Pan et al. |
Cheng, M. “A Decoder Architecture for High-Speed Free-Space Laser Communications,” Proceedings of SPIE, Apr. 2005, vol. 5712, pp. 174-185. |