The present invention relates to forward error correction as used in wireless communication and other systems. In particular, it relates to a class of high performance Low-Density Parity-Check (LDPC) codes suitable for efficient practical implementations.
In a typical communication system forward error correction (FEC) is often applied in order to improve robustness of the system against a wide range of impairments of the communication channel. Referring first to
In many modern systems, FEC uses Low Density Parity Check (LDPC) codes that are applied to a block of information data of the finite length. LDPC codes are specified by a parity check matrix of size M×N. This matrix defines an LDPC block code (N, K), where K is the information block size, N is the length of the codeword, and M is the number of parity check bits, M=N−K. A general characteristic of the LDPC parity check matrix is the low density of non-zero elements that allows utilization of efficient decoding algorithms. The structure of the LDPC code parity check matrix is first outlined in the context of prior art hardware architectures that can exploit the properties of these parity check matrices.
In recent years much effort has been expended in the design of LDPC codes leading to very efficient hardware implementations that can achieve high coding gains with very small performance penalty.
In summary, the shortcomings of the prior art related to encoding variable length data packets are as follows:
(a) No general rules exist for shortening and puncturing patterns;
(b) No mechanism is provided for q>qrate
(c) No limit on the amount of puncturing specified.
The present invention solves many of the aforementioned shortcomings, and has other advantages as will be obvious from the following description.
In a first aspect, the present invention comprises a method for low-density parity-check (LDPC) encoding of data comprising the steps of coding a channel coder using a parity check matrix H=[Hd|Hp], selecting a sparse structured base parity check matrix; and expanding the sparse structured base parity check matrix.
In a further aspect, the invention comprises a system for low-density parity-check encoding of data comprising intra-layer memory (155), a set of permuters (1821)-(1829), operably coupled to the intra-layer memory 155 through a read network, a first further permuter (1851), a second further permuter (1852), a number of processing units (184) whose outputs are directed through a write network, and whose inputs are operably coupled to the first set of permuters (1821)-(1829) through the read network, and through adders (1831) to the second further permuter (1852) that provides interlayer feedback of special bits, a set of inter-layer storage elements (186) for preserving information between layers, the inputs of which are operably coupled to the special bits outputs of the processing units (184), and the outputs of which are operably coupled to the inputs of the second further permuter (1852), and a set of inverse permuters (187) whose inputs are operably coupled to outputs of the processing units (184) through the write network, and through adders (1835) to outputs of the first further permuter (1851) through the write network, and whose outputs are directed to the intra-layer memory (155), thereby efficiently processing groups of rows.
In a yet further aspect, the invention comprises a system or method for low-density parity-check (LDPC) encoding of data, using a base matrix selected from the group consisting of those listed in
In contrast to prior art, the present invention, including structured base parity check matrices in combination with expansion, allows tradeoffs in terms of the throughput and complexity. As a consequence they enable several times larger throughput compared with earlier approaches.
This summary of the invention does not necessarily describe all features of the invention.
Embodiments of the invention will be described with reference to the following figures:
a, 26b, and 26c gives matrices for use in relevant encoding methods and systems.
To set the context for the invention, we first discuss various aspects of architectures and encoding/decoding strategies.
Efficient decoder architectures are enabled by designing the parity check matrix around some structural assumptions: “structured” LDPC codes. There have been several advances in that direction. The structure of these matrices in turn defines the code.
One trend is that of composing the parity check matrix of sub-matrices of the form of binary permutation or “pseudo-permutation” matrices.
Permutation matrices are defined as square matrices with the property that each row and each column has one element equal to 1 and other elements equal to 0. Pseudo-permutation matrices, are not necessarily square matrices and they may have row(s) and/or column(s) consisting of all zeros. It has been shown, that significant savings in wiring, memory, and power consumption are possible while still preserving the main portion of the coding gain. This approach enables various serial, parallel, and semi parallel hardware architectures and therefore various trade-off mechanisms.
This kind of structure allows the application of “layered” decoding (sometimes referred to as “layered belief propagation” decoding), which exhibits improved convergence properties compared to a conventional Sum-Product Algorithm (SPA) and its derivations. Each iteration of the layered decoding consists of a number of sub-iterations that equals the number of blocks of rows (or layers).
Another trend in LDPC parity check matrix design is the reduction in encoder complexity. Classical encoding of LDPC codes is much more complex than encoding of other advanced codes, such as turbo codes. In order to ease the problem it has become common to design systematic LDPC codes with the part of the parity check matrix corresponding to parity bits containing a lower triangular matrix. This allows simple recursive decoding. One simple example of a lower triangular matrix is a dual diagonal matrix as shown in
Here, the parity check matrix 30 is partitioned as H=[Hd|Hp]. Hd 31 is an M×K matrix that corresponds to the “data” bits of the codeword. The design of the Hd 31 matrix ensures high coding gain. Hp 32 is in this example an M×M “dual diagonal” matrix and corresponds to the “parity” bits of the codeword. These codes are systematic block codes. The codeword vector for these systematic (and also canonic in this case) codes has the structure:
where d=[d0 . . . dK-1]T is the block of (uncoded) data bits and p=[p0 . . . pM-1]T are the parity bits. A codeword is any binary (or non-binary, in general) N-vector c that satisfies:
Hc=H
d
d+H
p
p=0
Thus, a given data block d is encoded by solving binary equation Hdd=Hpp for the parity bits p. In principle, this involves inverting the M×M matrix Hp:
p=Hp−1Hdd
This assumes Hp is invertable. If Hp−1 is also low density then the direct encoding specified by the above formula can be done efficiently. However, with the dual diagonal structure of Hp, 32 encoding can be performed as a simple recursive algorithm:
where in0 is the index of the column in which row 0 contains a “1”
where in1 is the index of the column in which row 1 contains a
where inM-1 is the index of the column in which row M−1 contains a “1”
In these recursive expressions hr,c are non-zero elements (=1 in this example matrix) of the “data” part of the parity check matrix, Hd 31. The number of non-zero elements in rows 0, 1, . . . , M−1, is represented by k0, k1, . . . , kM-1, respectively.
In order to accommodate larger block sizes without redesigning the parity check matrix the original (“base”) matrix is expanded. This can be done for example by replacing each non-zero element with a permutation matrix of the size of the expansion factor. The most common way of performing expansion is as follows, (see [11])
Expansion of Hp is done by replacing each “0” element by an L×L zero matrix, 0L×L, and each “1” element by an L×L identity matrix, IL×L, where L represent the expansion factor.
Expansion of Hd is done by replacing each “0” element by an L×L zero matrix, 0L×L, and each “1” element by a circularly shifted version of an L×L identity matrix, IL×L. The shift order, s (number of circular shifts to the right, for example) is determined for each non-zero element of the base matrix.
For hardware implementations, it is important to note that these expansions can be implemented without the need to significantly change the base hardware wiring.
The simple recursive algorithm described earlier can be still applied in a slightly modified form to the expanded matrix. If hi,j represent elements of the Hd portion of the expanded parity check matrix, then parity bits can be determined as follows:
p
0
=h
0,0
d
0
+h
0,1
d
1
+h
0,2
d
2
+ . . . +h
0,11
d
11
p
1
=h
1,0
d
0
+h
1,1
d
1
+h
1,2
d
2
+ . . . +h
1,11
d
11
p
2
=h
2,0
d
0
+h
2,1
d
1
+h
2,2
d
2
+ . . . +h
2,11
d
11
p
3
=p
0
+h
3,0
d
0
+h
3,1
d
1
+h
3,2
d
2
+ . . . +h
3,11
d
11
p
4
=p
1
+h
4,0
d
0
+h
4,1
d
1
+h
4,2
d
2
+ . . . +h
4,11
d
11
p
5
=p
2
+h
5,0
d
0
+h
5,1
d
1
+h
5,2
d
2
+ . . . +h
5,11
d
11
p
6
=p
3
+h
6,0
d
0
+h
6,1
d
1
+h
6,2
d
2
+ . . . +h
6,11
d
11
p
7
=p
4
+h
7,0
d
0
+h
7,1
d
1
+h
7,2
d
2
+ . . . +h
7,11
d
11
p
8
=p
5
+h
8,0
d
0
+h
8,1
d
1
+h
8,2
d
2
+ . . . +h
8,11
d
11
p
9
=p
6
+h
9,0
d
0
+h
9,1
d
1
+h
9,2
d
2
+ . . . +h
9,11
d
11
p
10
=p
7
+h
10,0
d
0
+h
10,1
d
1
+h
10,2
d
2
+ . . . +h
10,11
d
11
p
11
=p
8
+h
11,0
d
0
+h
11,1
d
1
+h
11,2
d
2
+ . . . +h
11,11
d
11
However, when the expansion factor becomes large, then the number of columns with only one non-zero element (1) in the Hp becomes large as well. This may have negative effect on the performance of the code. One remedy for this situation is to use a slightly modified dual diagonal Hp matrix. This is illustrated with reference to
The parity check equations now become:
h
0,0
d
0
+h
0,1
d
1
+ . . . +h
0,11
d
11
+p
0
+p
3=0 [equ 0]
h
1,0
d
0
+h
1,1
d
1
+ . . . +h
1,11
d
11
+p
1
+p
4=0 [equ 1]
h
2,0
d
0
+h
2,1
d
1
+ . . . +h
2,11
d
11
+p
2
+p
5=0 [equ 2]
h
3,0
d
0
+h
3,1
d
1
+ . . . +h
3,11
d
11
+p
0
+p
3
+p
6=0 [equ 3]
h
4,0
d
0
+h
4,1
d
1
+ . . . +h
4,11
d
11
+p
1
+p
4
+p
7=0 [equ 4]
h
5,0
d
0
+h
5,1
d
1
+ . . . +h
5,11
d
11
+p
2
+p
5
+p
8=0 [equ 5]
h
6,0
d
0
+h
6,1
d
1
+ . . . +h
6,11
d
11
+p
6
+p
9=0 [equ 6]
h
7,0
d
0
+h
7,1
d
1
+ . . . +h
7,11
d
11
+p
7
+p
10=0 [equ 7]
h
8,0
d
0
+h
8,1
d
1
+ . . . +h
8,11
d
11
+p
8
+p
11=0 [equ 8]
h
9,0
d
0
+h
9,1
d
1
+ . . . +h
9,11
d
11
+p
0
+p
9=0 [equ 9]
h
10,0
d
0
+h
10,1
d
1
+ . . . +h
10,11
d
11
+p
1
+p
10=0 [equ 10]
h
11,0
d
0
+h
11,1
d
1
+ . . . +h
11,11
d
11
+p
2
+p
11=0 [equ 11]
Now by summing up equations 0, 3, 6, and 9, the following expression is obtained:
(h0,0+h3,0+h6,0+h9,0)d0+(h0,1+h3,1+h6,1+h9,1)d1+ . . . +(h0,11+h3,11+h6,11+h9,11)d11+p0+p3+p0+p3+p6+p6+p9+p0+p9=0
Since only p0 appears an odd number of times in the equation above, all other parity check bits cancel except for p0, and thus:
p
0=(h0,0+h3,0+h6,0+h9,0)d0+(h0,1+h3,1+h6,1+h9,1)d1+ . . . +(h0,11+h3,11+h6,11+9,11)d11
Likewise:
p
1=(h1,0+h4,0+h7,0+h10,0)d0+(h1,1+h4,1+h7,1+h10,1)d1+ . . . +(h1,11+h4,11+h7,11+10,11)d11
p
2=(h2,0+h5,0+h8,0+h11,0)d0+(h2,1+h5,1+h8,1+h11,1)d1+ . . . +(h2,11+h5,11+h8,11+11,11)d11
After determining p0, p1, p2 the other parity check bits are obtained recursively:
p
3
=h
0,0
d
0
+h
0,1
d
1
+ . . . +h
0,11
d
11
+p
0
p
4
=h
1,0
d
0
+h
1,1
d
1
+ . . . +h
1,11
d
11
+p
1
p
5
=h
2,0
d
0
+h
2,1
d
1
+ . . . +h
2,11
d
11
+p
2
p
6
=h
3,0
d
0
+h
3,1
d
1
+ . . . +h
3,11
d
11
+p
0
+p
3
p
7
=h
4,0
d
0
+h
4,1
d
1
+ . . . +h
4,11
d
11
+p
1
+p
4
p
8
=h
5,0
d
0
+h
5,1
d
1
+ . . . +h
5,11
d
11
+p
2
+p
5
p
9
=h
6,0
d
0
+h
6,1
d
1
+ . . . +h
6,11
d
11
+p
6
p
10
=h
7,0
d
0
+h
7,1
d
1
+ . . . +h
7,11
d
11
+p
7
p
11
=h
8,0
d
0
+h
8,1
d
1
+ . . . +h
8,11
d
11
+p
8
One desirable feature of LDPC codes is that they support various required code rates and block sizes. A common approach is to have a small “base” (or “mother”) matrix defined for each required code rate and to support various block sizes by expanding the base matrix. Since it is usually required to support a range of block sizes a common approach is to define expansion for the largest block size and then apply some kind of algorithm that specifies expansion for smaller block sizes. Below is an example of a base matrix specification:
In this example specification the base matrix is designed for the code rate R=½ and its dimensions are (Mb×Nb)=(6×12). Assume that the block (codeword) sizes (lengths) to be supported are in the range N=[72,144], with increments of 12, i.e. N=[72, 84, . . . , 132, 144]. In order to accommodate those block lengths the parity check matrix needs to be of the appropriate size (i.e. the number of columns must match N, the block length) and number of rows is defined by the code rate: M=(1−R)N. In order to generate these matrices the base matrix is expanded appropriately. The way this expansion is done is defined by the base matrix elements, which relate to the expansion supporting the maximum block size. One of the conventions, used in this example, for interpreting the numbers in the base matrix, is as follows:
The following example shows a rotated identity matrix where the integer specifying rotation equals 5:
Therefore, for the largest block (codeword) size of N=144, base matrix needs to be expanded by a factor of 12. That way the final parity check matrix to be used for encoding and generating the codeword of size 144, is of the size (72×144). In other words, the base matrix was expanded Lmax=12 times (from 6×12 to 72×144), L being the expansion factor. For the block sizes smaller than the maximum the base matrix gets expanded by a factor L<Lmax. In this case expansion is performed in the similar fashion except that now matrices IL and 0L, are used instead of I12 and 012, respectively. Integers specifying the amount of rotation of the appropriate identity matrix, IL, are derived from those corresponding to the maximum expansion by applying some algorithm. For example, such an algorithm may be a simple modulo operation:
rL=(rLmax)modulo L
Often base parity check matrices are designed to follow some assumed degree distribution. Degree distribution is defined as the distribution of column weights of the parity check matrix. Column weight in turn equals the number of 1's in a column. It has been shown that irregular degree distributions offer the best performance on Additive White Gaussian Noise (AWGN) channels. However, one consequence is that the base matrix does not exhibit any structure in its Hd part. Only after expansion does the final matrix attain the general form of
Some prior art base matrix designs utilize the reduction of number of rows (and consequently the number of block of rows in the expanded matrix) by combining rows to increase the code rate without changing the degree distribution. Although this results in the number of rows in a high rate derived matrix becoming smaller, it is still relatively large since usually the original (low rate) base matrix has quite a large number of rows in order to allow row combining. Furthermore, now decoding time becomes a function of the code rate: the higher the code rate the fewer layers in the layered decoding and, in general, less time taken by the decoder. One consequence is that the decoding of the original low rate code typically takes longer than normal.
A data packet of length L is required to be encoded using an LDPC block code (N,K), as previously presented, K is the information block size, N is the length of the codeword, and M is the number of parity check bits, M=N−K. The encoded data is to be transmitted using a number of symbols, each carrying S bits.
This scenario is described with reference to
(a) Keep the performance in terms of coding gain at as high level as possible. This objective translates into the following needs:
(b) Use as few of the modulated symbols as possible. This in turn means that it is desirable to utilize transmit power economically. This is especially important for hand held wireless and other devices operating on batteries. Keeping the “air time” at minimum translates into prolonged battery life.
(c) Keep the overall complexity at a reasonable level. This usually translates into a requirement to operate with a relatively small set of codewords of different size. In addition, it is desirable to have a code designed in such a way that various codeword lengths can be implemented efficiently. Finally, the actual procedure defining concatenation rules should be simple.
From (a) above it follows that in order to use a small number of codewords, an efficient shortening and puncturing operation needs to be applied. However, those operations have to be implemented in a way that would neither compromise the coding gain advantage of LDPC codes, nor lower the overall transmit efficiency unnecessarily. This is particularly important when using a special class of LDPC parity check matrices that enable simple encoding operation. These special matrices employ either a lower triangular or a dual-diagonal (or a modified dual-diagonal) in the portion of the matrix corresponding to the parity check bits. An example of a dual-diagonal matrix is shown in
There have been several efforts to achieve efficient puncturing (with or without shortening). Most of the work has been done around the “rate compatible” approach. One or more LDPC parity check matrix is designed for the low code rate application. By applying the appropriate puncturing the same matrix can be used for a range of code rates higher than the original code rate. These methods predominantly target applications where adaptive coding (e.g. Hybrid Automatic Repeat Request H-ARQ) and/or unequal bit protection is desired.
In some cases puncturing is combined with code extension in order to mitigate the problem with “puncturing only” case. The main problem that researchers are trying to solve here is to preserve an optimum degree distribution through the process of modifying the original parity check matrix.
However, these methods do not directly address the problem described earlier: apply shortening and puncturing in such a way that the code rate is approximately the same as the original one and the coding gain is preserved.
A method which attempts to solve this particular problem has been described. This method specifies shortening and puncturing such that the code rate of the original code is preserved. The following notation is used:
Npunctured—Number of punctured bits
Nshortened—Number of shortened bits
Shortening to puncturing ratio, q, is defined as: q=Nshortened/Npunctured. In order to preserve the same code rate, q has to satisfy the following equation:
q
rate
preserved
=R/(1−R)
Two methods are prescribed for choosing which bits to “shorten” and which to “puncture” (“shortening” and “puncturing” pattern). The choice of a method depends on the code rate (i.e. on the particular parity check matrix design). This fact itself suggests that the method fails to prescribe general rules for performing shortening and puncturing. In addition, this particular method was intended for the IEEE 802.16e standard, available at www.ieee.org, the entirety of which is incorporated herein by reference, for which preservation of the code rate seems to be of essence. However, this is an unnecessary restriction for a general case described in
Both methods are applied to shortening and puncturing of the expanded matrices as described in Dale Hocevar and Anuj Batra, “Shortening and Puncturing Scheme to Simplify LDPC Decoder Implementation,” Jan. 11, 2005, a contribution to the informal IEEE 802.16e LDPC ad-hoc group, the entirely of the document is incorporated herein by reference. These matrices are generated by periodically shortening/puncturing bits with the periods and offsets taken from a set of base matrices (one base matrix per code rate). Whereas this method preserves the column weight distribution as a positive feature, it may severely disturb the row weight distribution of the original matrix. This, in turn, causes degradation when common iterative decoding algorithms are used. This adverse effect strongly depends on the structure of the expanded matrix. That is why there was a need to introduce second method, which somewhat mitigates problems for some of the matrices under consideration. Second method is based on first method with some reordering of the original pattern.
In general, the amount of puncturing needs to be limited. Extensive puncturing beyond certain limits paralyzes the soft decision decoder. Prior-art methods, none of which specify a puncturing limit or alternatively offer some other way for mitigating the problem, may potentially compromise the performance significantly.
The present invention introduces a system and method that allows coding matrices to be expanded to accommodate various information packet sizes and support for various code rates; additionally the invention defines a number of coding matrices particularly suited to the system and method of the present invention. The system and method enables high throughput implementations, allows achieving low latency, and offers other numerous implementation benefits. At the same time, the new parity part of the matrix still preserves the simple (recursive) encoding feature.
In accordance with one embodiment of the present invention, a general form is shown in
The “data” part (Hd) may also be placed on the right side of the “parity” (Hp) portion of the parity check matrix. In the most general case, columns from Hd and Hp may be interchanged.
Some constructions of these base matrices allow more efficient encoding.
H
p,present
invention(m)=T(Hp,prior
Where T is the transform describing the base matrix expansion process and m is the size of the permutation matrices. For m=1, Hp of the present invention defines the form of the prior art Hp (dual diagonal with the odd-weight column), i.e.
H
p,present
invention(1)=T(Hp,prior
A further pair of sub-matrices 905, 906 illustrate cases where these first and last columns, respectively, have only one sub-matrix each.
The two last sub-matrices 907, 908 in
Various advantages of embodiments of the present invention will now be described with reference to the matrix 100 of
It can be seen that expanded matrix 110 has inherited structural properties of its base (mother) matrix 100 from
The solution as described so far has some restrictions on the degree distribution (distribution of column weights) of the parity check matrix. Although the solution is also applicable to irregular codes, the degree distribution limitation compromises the performance of the code to some extent.
This restriction is virtually eliminated by the following enhancements allowing the embodiments of the present invention to be used in a general case when such a division of the parity check matrix is not required. The matrix can be expanded to accommodate various information packet sizes and can be designed for various code rates. This method enables high throughput implementations, permits low latency, and offers other implementation benefits. At the same time, the new parity part of the matrix can be still designed to preserve the simple encoding feature.
These changes mean that the sub-matrices 121 in
The enhancements modify the layered belief propagation architecture described in [1], for example. Layered belief propagation decoding is next briefly described with reference to
A high level architectural block diagram is shown in
In order to support a more general approach proposed by the present invention, this basic architecture needs to be modified. One example of such a modification is shown in
By exercising careful design of the parity check matrix the additional inter-layer storage 155 in
Iterative parallel decoding process is best described as Read-Modify-Write operation. The read operation is performed by a set of permuters, which deliver information from memory modules to corresponding processing units. Parity check matrices, designed with the structured regularity described earlier, allow efficient hardware implementations (e.g., fixed routing, use of simple barrel shifters) for both read and write networks. Memory modules are organized so as to provide extrinsic information efficiently to processing units.
Processing units implement block (layered) decoding (updating iterative information for a block of rows) using any known iterative algorithms, (e.g., Sum Product, Min-Sum, Bahl-Cocke-Jelinek-Raviv (BCJR).).
Inverse permuters are part of the write network that perform the write operation back to memory modules.
Such parallel decoding is directly applicable when the parity check matrix is constructed based on either permutation (or pseudo permutation) and zero sub-matrices.
One advantage of the present invention is an addition of special sub-matrices S, as described above in relation to
Parallel decoding is also applicable with the previously described modification to the methodology; that is, when the base matrix includes sub-matrices built by concatenation of smaller permutation matrices.
It can be seen that for the decoding layer 171 a first processing unit receives information in the first row 179 from bit 1 (according to S21), bit 6 (S22), bit 9 (S23), bit 13 (S124), bit 15 (S224), bit 21 (S26), and bit 24 (S29). Other processing units are loaded in the similar way.
It is well known that for layered belief propagation type decoding algorithms, the processing unit inputs extrinsic information accumulated by all other layers, excluding the layer currently being processed. Thus, the prior art implementation described using
This is illustrated in the
For simplicity,
Examples of the new features enabled by embodiments of the invention are listed below.
The present invention methodology for constructing parity check matrix supports both regular and irregular types of the parity check matrix. This means not only that the whole matrix may be irregular (non-constant weight of its rows and columns) but also that its constituents Hd and Hp may be irregular as well, if such a partition is desired. Matrix constructions based on the present invention provide all other features described at no performance penalty.
This is illustrated by an example in which the goal is to enable a hardware architecture design that targets high throughput thereby minimizing the latency.
The decoding of the LDPC codes can be done in several ways. In general, iterative decoding is applied. The most common is the “classical” Sum-Product Algorithm (SPA) method. In the SPA case, each iteration comprises two steps:
a. horizontal step, during which all row variables are updated at the same time based on the column variables; and
b. vertical step, during which all column variables are updated at the same time based on row variables.
It has been shown that better performance in terms of the speed of convergence, can be achieved with layered decoding. In that case only row variables are updated for a block of rows, one block row at a time. The fastest approach is to process all the rows within a block of rows simultaneously.
The following is a comparison of the achievable throughput (bit rate) of two LDPC codes: one based on the prior art expanded matrix, as described earlier in
T=(K×F)/(C×I),
where K is number of info bits, F is clock frequency, C is number of cycles per iteration, and I is the number of iterations. Assuming that K, F, and I are fixed and, for example, equal: K=320 bits, F=100 MHz, and I=10, the only difference between the prior art and the present invention could come from C, the factor which is basically a measure of the level of allowed parallelism. It can be seen by comparing
Cprior
Using these numbers in the formula gives:
Tmax,prior
Tmax,present
As expected this is 4× greater maximum throughput. In addition all the desirable features of the code design in terms of efficient encoding are preserved. This means that the encoding algorithm as described earlier with respect to
When a scaleable solution is desired, the size of the expanded LDPC parity check matrix is designed to support the maximum block size. The solutions of prior art do not scale well with respect to the throughput for various block sizes. For example, in the case of prior art preferred layered decoding, processing of short and long blocks takes the same amount of time. The only difference is that in the case of the shorter blocks not all of the processing units are used. Therefore, in the case of short blocks, achieved throughput is proportionally lower. The following example is based on the same case as before (comparing matrices as described earlier with respect to
The following table compares the computed results.
It can be seen from the table that the methodology of the present invention provides constant throughput independent on the codeword size, whereas in the case of prior art the throughput for the smaller blocks drops considerably. Furthermore, it is obvious that in order to handle maximum block size, both solutions require same number of the processing units (80 in the example above). However, while the new methodology fully utilizes all available processing resources irrespective of block size, the prior art methodologies utilize all processing units only in the case of the largest block and only a fraction of the total resources for other cases. Obviously the throughput suffers in the prior art case for all block sizes except the maximum one. As previously mentioned, there is no performance penalty for this in the present invention.
The previous example showing throughput improvement for shorter blocks can also be used to conclude that better latency is also achieved with the new methodology. In such cases, large blocks of data are broken into smaller pieces, so that the encoded data is split among multiple codewords. If one places a shorter codeword at the end of the series of longer codewords, then the total latency depends primarily on the decoding time of the last codeword. According to the table above, short blocks require proportionally less time to be decoded (as compared to the longer codewords), thereby allowing acceptable latency to be achieved by encoding the data in suitably short blocks, an effect not achievable using prior art methodologies.
The previous examples illustrated full hardware utilization. However, hardware scaling is also enabled, so that short blocks can use proportionately less hardware resources if an application requires it.
In addition, utilization of more efficient processing units and memory blocks is enabled. Memory can be organized to process a number of variables in parallel. The memory can therefore, be partitioned in parallel. This design is not practicable with prior art solutions.
The present invention is well suited for use with all known decoding algorithms of LDPC codes as well as many more general ones. It can be implemented in hardware and/or software and it demonstrates benefits in both implementations.
Further, the present invention enables flexible rate adjustments by the use of shortening, or puncturing, or a combination thereof. Block length flexibility is also enabled through expansion, shortening, or puncturing, or combinations thereof. Any of these operations can be applied to the base or expanded matrices.
If the base matrix is designed with some additional constraints, then base matrices for different code rates can be derived from one original base matrix in one of two ways:
a. Row combining. In this case higher code rate base matrices are derived from an original low rate base matrix by combining (adding together) rows of the base matrix using specific constraints. The first of these is that either the rows of the base matrix to be combined must not have overlapping elements, or if the rows of the base matrix have overlapping elements, then the rows of the expanded matrix must not have overlapping elements. The second constraint is that rows to be combined must belong to different blocks of rows of the original matrix. Both of the constraints must be imposed in such a way as to ensure preservation of the properties for a block of rows as in the original matrix.
b. Row splitting. In this case lower code rate base matrices are derived from an original high code rate base matrix by splitting rows using a specific constraint, namely that the number of blocks of rows in the derived base matrix is the same as in the original matrix.
Row-combining or row-splitting, with the specific constraints defined above, allow efficient coding of a new set of expanded derived base matrices. In these cases the number of layers may be as low as the minimum number of block rows (layers) in the original base matrix.
The present invention may be beneficially applied to both the transmitter and the receiver.
The present invention addresses other shortcomings of the prior art as follows:
(a) Specifies general rules for shortening and puncturing patterns
(b) Provides mechanism for q>qrate
(c) Establishes a limit on the amount of puncturing
(d) Provides an algorithmic method for finding the optimal solution within the range of given system parameters.
Although developed for wireless systems, embodiments of the invention can be applied to any other communication system which involves encoding of variable size data packets by a fixed error correcting block code.
The advantage of this invention can be summarized as providing an optimal solution to the described problem given the range of the system parameters (regarding the performance, power consumption, and complexity). It comprises the following steps:
These steps are more fully shown in the flow chart
At step 213, the minimum number of modulated symbols Nsym
Both the encoder and the decoder are presented with the same input parameters in order to be able to apply the same procedure and consequently use the same codeword size as well as other relevant derived parameters such as amount of shortening and puncturing for each of the codewords, number of codewords, etc. In some cases only transmitter (encoder) has all the parameters available, and receiver (decoder) is presented with some derived version of the encoding procedure parameters. For example, in some applications it is desirable to reduce the initial negotiation time between the transmitter and the receiver. In such cases the transmitter initially informs the receiver of the number of modulated symbols it is going to use for transmitting the encoded bits rather than the actual data packet size. In such situations the transmitter performs the encoding procedure differently taking into consideration the receiver's abilities (e.g. using some form of higher layer protocol for negotiation). In this case some of the requirements are relaxed in order to counteract deficiencies of the information at the receiver side. For example, the use of additional modulated symbols to enhance performance may always be in place, may be bypassed altogether, or may be assumed for the certain ranges of payload sizes (e.g. indirectly specified by the number of modulated symbols).
One example of such an encoded procedure is the IEEE 802.11n Orthogonal Frequency Division Multiplex (OFDM) based transceiver. In this case the reference to the number of bits per modulated symbol translates into the number of bits per OFDM symbol. Also, in this particular case, the AggregationFlag parameter is used to differentiate between the case when both the encoder and the decoder are aware of actual data packet size (AggregationFlag=0) and the case when the packet size is indirectly specified by the number of required OFDM symbols (AggregationFlag=1).
Each of those features will be now described in more detail.
Much effort has been expended by the coding research community to come up with designs of LDPC parity check matrices such that the derived codes provide optimum performance. Among those contributions references T. J. Richardson et al., “Design of Capacity-Approaching Irregular Length Low-Density Parity-Check Codes,” IEEE Transactions on Information Theory, vol. 47, February 2001 and S. Y. Chung, et al., “Analysis of sum-product decoding of low-density parity-check codes using a Gaussian approximation,” IEEE Transactions on Information Theory, vol. 47, February 2001, both of which are incorporated herein by reference, are examples. These investigations show that, in order to provide optimum performance, a particular variable nodes degree distribution should be applied. “Degree distribution” refers here to the distribution of the column weights in a parity check matrix. This distribution, in general, depends on the code rate and the size of the parity check matrix, or codeword. It is desirable that the puncturing and shortening pattern, as well as the number of punctured/shortened bits, are specified in such a way that the variable nodes degree distribution is preserved as much as possible. However, since shortening and puncturing are qualitatively different operations, different rules apply to them, as will now be explained.
Shortening of a code is defined as sending less information bits than possible with a given code, K′<K. The encoding is performed by: taking K′ bits from the information source, presetting the rest K-K′ of the information bit positions in the codeword to a pre-defined value (usually 0), computing M parity bits by using the full M×N parity check matrix, and finally forming the codeword to be transmitted by concatenating K′ information bits and M parity bits. One way to determine which bits to “shorten” in the “data” portion of the parity check matrix, Hd (31 in
3 3 3 8 3 3 3 8 3 3 3 8
When discarding columns, the aim is to ensure that the ratio of ‘3’s to ‘8’s remains close to optimal, say 1:3 in this case. Obviously it cannot be 1:3 when one to three columns are removed. In such circumstances, the removal of 2 columns might result in e.g.:
3 3 8 3 3 8 3 3 3 8
giving a ratio of ˜1:3.3 and the removal of a third column—one with weight ‘8’—might result in:
3 3 3 3 8 3 3 3 8
thus preserving a ratio of 1:3.5, which closer to 1:3 than would be the case where the removal of the third column with weight ‘3’, which results in:
8 3 3 3 8 3 3 3 8
giving a ratio of 1:2.
It is also important to preserve approximately constant row weight throughout the shortening process.
An alternative to the approach just described is to prearrange columns of the part of the parity check matrix, such that the shortening can be applied to consecutive columns in Hd. Although perhaps suboptimal, this method keeps the degree distribution of Hd close to the optimum. However, the simplicity of the shortening “pattern” (viz. taking out the consecutive columns of Hd) gives a significant advantage by reducing complexity. Furthermore, approximately constant row weight is guaranteed (assuming the original matrix satisfies this condition). An example of this concept is illustrated in
After rearranging the columns of the Hd part of the original matrix the new matrix takes on the form 221 shown in
In the case of a column regular parity check matrix, (or more generally, approximately regular or regular and approximately regular only in the data part of the matrix, Hd), the method described in the previous paragraph is still preferred compared to the random or periodic/random approach described in [8]. The method described here ensures approximately constant row weight, which is another advantage from the performance and the implementation complexity standpoint.
Puncturing of a code is defined as removing parity bits from the code word. In a wider sense, though, puncturing is defined as removing some of the bits (either parity bits or data bits or both) from the codeword prior to sending the encoded bits to the modulator block and subsequently over the channel. The operation of puncturing, increases the effective code rate. Puncturing is equivalent to a total erasure of the bits by the channel. The soft iterative decoder assumes a completely neutral value corresponding to those erased bits. In the case that the soft information used by the decoder is the log-likelihood ratio, this neutral value is zero.
Puncturing of LDPC codes can be given an additional, somewhat different, interpretation. An LDPC code can be presented in the form of the “bipartite” graph
Each variable node 231 is connected 234 by lines (“edges”), for example 233, to all the check nodes 232 in which that particular bit participates. Similarly, each check node (corresponding to a parity check equation) is connected by a set of edges 237 to all variable nodes corresponding to bits participating in that particular parity check equation. If a bit is punctured (for example node 235) then all the check nodes connected to it (those connected by thicker lines 236) are negatively affected. Therefore, if a bit chosen for puncturing participates in many parity checks, the performance degradation may be very high. On the other hand, since the only way that the missing information (corresponding to the punctured bits) can be recovered is from the messages coming from check nodes those punctured bits participate in, the more of those the more successful recovery may be. Faced with contradictory requirements, the optimum solution obviously can be found somewhere in the middle. Embodiments of the invention use the following general rules for puncturing:
Some of these trade-offs can be observed from
It is obvious from the
The matrix in
As has been previously mentioned, in the case where the preservation of the exact code rate is not mandatory, the shortening-to-puncturing ratio can be chosen such that it guarantees preservation of the performance level of the original code. Normalizing the shortening to puncturing ratio, q, as follows:
q
normalized=(Nshortened/Npunctured)/[R/(1−R)],
means that q becomes independent of the code rate, R. Therefore, qnormalized=1, corresponds to the rate preserving case of combined shortening and puncturing. However, if the goal is to preserve performance, this normalized ratio must be greater than one: qnormalized>1. It was found through much experimentation that qnormalized in the range of 1.2-1.5 complies with the performance preserving requirements.
In the case of a column regular parity check matrix (or more generally, approximately regular, or regular and approximately regular only in the data part of the matrix, Hd) the method described in the previous paragraph is still preferred compared to the random or periodic/random approach described in prior art since the present invention ensures approximately constant row weight, which is another advantage from both the performance and the implementation complexity standpoints.
A large percentage of punctured bits paralyzes the iterative soft decision decoder. In the case of LDPC codes this is true even if puncturing is combined with some other operation such as shortening or extending the code. One could conclude this by studying the matrix 250 of
P
puncture=100×(Npuncture/M),
then it can be seen that the matrix 250 from
Characteristics of the embodiments of the present invention may include:
The system, apparatus, and method as described above are preferably combined with one or more matrices shown in the
The matrices in
A first group of matrices (
A further matrix (
The rate R=¾ matrices (
The rate R=⅚ matrix (
The two rate R=⅚ matrices (
s′=floor{s·(L/96)}, where s is the right circular shift corresponding to the maximum codeword size (for L=Lmax=96), and it is specified in the matrix definitions.
The present invention has been described with regard to one or more embodiments. However, it will be apparent to persons skilled in the art that a number of variations and modifications can be made without departing from the scope of the invention as defined in the claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CA05/01563 | 10/12/2005 | WO | 00 | 7/29/2008 |
Number | Date | Country | |
---|---|---|---|
60617902 | Oct 2004 | US | |
60627348 | Nov 2004 | US | |
60635525 | Dec 2004 | US | |
60638832 | Dec 2004 | US | |
60639420 | Dec 2004 | US | |
60647259 | Jan 2005 | US | |
60656587 | Feb 2005 | US | |
60673323 | Apr 2005 | US |