Structured low-density parity-check (ldpc) code

FIELD OF THE INVENTION

The present invention relates to forward error correction as used in wireless communication and other systems. In particular, it relates to a class of high performance Low-Density Parity-Check (LDPC) codes suitable for efficient practical implementations.

BACKGROUND OF THE INVENTION

In a typical communication system forward error correction (FEC) is often applied in order to improve robustness of the system against a wide range of impairments of the communication channel. Referring first to FIG. 1, in which a typical communication network channel is depicted having an information source 101, sending data to a source coder 102 that in turn forwards the data to a channel encoder 103. The encoded data is then modulated 104 onto a carrier before being transmitted over a channel 105. After transmission, a like series of operations takes place at the receiver using a demodulator 106, channel decoder 107 and source decoder 108 to produce data suitable for the information sink 109. FEC is applied by encoding the information data stream at the transmit side at the encoder 103, and performing the inverse decoding operation on the receive side at the decoder 107. Encoding usually involves generation of redundant (parity) bits that allow more reliable reconstruction of the information bits at the receiver.

In many modern systems, FEC uses Low Density Parity Check (LDPC) codes that are applied to a block of information data of the finite length. LDPC codes are specified by a parity check matrix of size M×N. This matrix defines an LDPC block code (N, K), where K is the information block size, N is the length of the codeword, and M is the number of parity check bits, M=N−K. A general characteristic of the LDPC parity check matrix is the low density of non-zero elements that allows utilization of efficient decoding algorithms. The structure of the LDPC code parity check matrix is first outlined in the context of prior art hardware architectures that can exploit the properties of these parity check matrices.

In recent years much effort has been expended in the design of LDPC codes leading to very efficient hardware implementations that can achieve high coding gains with very small performance penalty.

In summary, the shortcomings of the prior art related to encoding variable length data packets are as follows:

(a) No general rules exist for shortening and puncturing patterns;

(b) No mechanism is provided for q>q_rate_—_preserved; and

(c) No limit on the amount of puncturing specified.

SUMMARY OF THE INVENTION

The present invention solves many of the aforementioned shortcomings, and has other advantages as will be obvious from the following description.

In a first aspect, the present invention comprises a method for low-density parity-check (LDPC) encoding of data comprising the steps of coding a channel coder using a parity check matrix H=[H_d|H_p], selecting a sparse structured base parity check matrix; and expanding the sparse structured base parity check matrix.

In a further aspect, the invention comprises a system for low-density parity-check encoding of data comprising intra-layer memory (155), a set of permuters (1821)-(1829), operably coupled to the intra-layer memory 155 through a read network, a first further permuter (1851), a second further permuter (1852), a number of processing units (184) whose outputs are directed through a write network, and whose inputs are operably coupled to the first set of permuters (1821)-(1829) through the read network, and through adders (1831) to the second further permuter (1852) that provides interlayer feedback of special bits, a set of inter-layer storage elements (186) for preserving information between layers, the inputs of which are operably coupled to the special bits outputs of the processing units (184), and the outputs of which are operably coupled to the inputs of the second further permuter (1852), and a set of inverse permuters (187) whose inputs are operably coupled to outputs of the processing units (184) through the write network, and through adders (1835) to outputs of the first further permuter (1851) through the write network, and whose outputs are directed to the intra-layer memory (155), thereby efficiently processing groups of rows.

In a yet further aspect, the invention comprises a system or method for low-density parity-check (LDPC) encoding of data, using a base matrix selected from the group consisting of those listed in FIGS. 26a, 26b and 26c.

In contrast to prior art, the present invention, including structured base parity check matrices in combination with expansion, allows tradeoffs in terms of the throughput and complexity. As a consequence they enable several times larger throughput compared with earlier approaches.

This summary of the invention does not necessarily describe all features of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be described with reference to the following figures:

FIG. 1 shows a typical system in which embodiments of the invention may be practiced;

FIG. 2 is an example of a Structured LDPC parity check matrix;

FIG. 3 is an example of a Parity check matrix with dual diagonal;

FIG. 4 is an example of Base matrix expansion;

FIG. 5 is an example showing an expanded matrix with weight-1 column problem fixed;

FIG. 6 is an example of unstructured data portion in a base parity check matrix;

FIG. 7 is an example of the expanded unstructured base matrix of FIG. 6;

FIG. 8 gives the general form of the base parity check matrix of the present invention;

FIG. 9 gives examples of H_ppart of the general base matrix allowing efficient encoding;

FIG. 10 is an example of a fully structured base parity check matrix;

FIG. 11 is an expanded matrix from FIG. 10FIG. 12 gives the general form of the parity check matrix of the present invention;

FIG. 13 shows a parity check matrix with outlined layers for the layered belief propagation decoding;

FIG. 14 gives the high level hardware architecture implementing prior art layered belief propagation decoding;

FIG. 15 is an example of the high level hardware architecture implementing layered belief propagation decoding of present invention;

FIG. 16 shows a sub-matrix construction by concatenation of permutation matrices;

FIG. 17 is an example of parallel decoding with concatenated permutation matrixes;

FIG. 18 is an example of modifications supporting parallel decoding when sub-matrices are built from concatenated smaller permutation matrices;

FIG. 19 illustrates short and long information blocks processing;

FIG. 20 illustrates encoding of long data packets;

FIG. 21 illustrates encoding procedure under present invention;

FIG. 22 illustrates rearranging of the columns in H_din order to enable efficient shortening;

FIG. 23 shows a bipartite graph of an LDPC code with emphasis on a punctured bit;

FIG. 24 illustrates puncturing: impact on the performance;

FIG. 25 is an example of a parity check matrix well suited for both puncturing and shortening operation; and

FIGS. 26
a, 26b, and 26c gives matrices for use in relevant encoding methods and systems.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

To set the context for the invention, we first discuss various aspects of architectures and encoding/decoding strategies.

Efficient Decoder Architectures

Efficient decoder architectures are enabled by designing the parity check matrix around some structural assumptions: “structured” LDPC codes. There have been several advances in that direction. The structure of these matrices in turn defines the code.

One trend is that of composing the parity check matrix of sub-matrices of the form of binary permutation or “pseudo-permutation” matrices.

Permutation matrices are defined as square matrices with the property that each row and each column has one element equal to 1 and other elements equal to 0. Pseudo-permutation matrices, are not necessarily square matrices and they may have row(s) and/or column(s) consisting of all zeros. It has been shown, that significant savings in wiring, memory, and power consumption are possible while still preserving the main portion of the coding gain. This approach enables various serial, parallel, and semi parallel hardware architectures and therefore various trade-off mechanisms.

This kind of structure allows the application of “layered” decoding (sometimes referred to as “layered belief propagation” decoding), which exhibits improved convergence properties compared to a conventional Sum-Product Algorithm (SPA) and its derivations. Each iteration of the layered decoding consists of a number of sub-iterations that equals the number of blocks of rows (or layers). FIG. 2 shows a matrix having three such layers 21, 22, 23.

Efficient Encoder Architectures

Another trend in LDPC parity check matrix design is the reduction in encoder complexity. Classical encoding of LDPC codes is much more complex than encoding of other advanced codes, such as turbo codes. In order to ease the problem it has become common to design systematic LDPC codes with the part of the parity check matrix corresponding to parity bits containing a lower triangular matrix. This allows simple recursive decoding. One simple example of a lower triangular matrix is a dual diagonal matrix as shown in FIG. 3.

Here, the parity check matrix 30 is partitioned as H=[H_d|H_p]. H_d31 is an M×K matrix that corresponds to the “data” bits of the codeword. The design of the H_d31 matrix ensures high coding gain. H_p32 is in this example an M×M “dual diagonal” matrix and corresponds to the “parity” bits of the codeword. These codes are systematic block codes. The codeword vector for these systematic (and also canonic in this case) codes has the structure:

$c = [\begin{matrix} d \\ p \end{matrix}]$

where d=[d₀. . . d_K-1]^Tis the block of (uncoded) data bits and p=[p₀. . . p_M-1]^Tare the parity bits. A codeword is any binary (or non-binary, in general) N-vector c that satisfies:

Hc=H
_d
d+H
_p
p=0

Thus, a given data block d is encoded by solving binary equation H_dd=H_pp for the parity bits p. In principle, this involves inverting the M×M matrix H_p:

p=H_p⁻¹H_dd

This assumes H_pis invertable. If H_p⁻¹is also low density then the direct encoding specified by the above formula can be done efficiently. However, with the dual diagonal structure of H_p, 32 encoding can be performed as a simple recursive algorithm:

$p_{0} = \sum_{n = 1}^{k_{0}} h_{0, i_{n}^{0}} d_{i_{n}^{0}},$

where i_n⁰is the index of the column in which row 0 contains a “1”

$p_{1} = p_{0} + \sum_{n = 1}^{k_{1}} h_{1, i_{n}^{1}} d_{i_{n}^{1}},$

where i_n¹is the index of the column in which row 1 contains a

$“ 1 ”$

$\dots$

$p_{M - 1} = p_{M - 2} + \sum_{n = 1}^{k_{M - 1}} h_{M - 1, i_{n}^{M - 1}} d_{i_{n}^{M - 1}},$

where i_n^M-1is the index of the column in which row M−1 contains a “1”

In these recursive expressions h_r,care non-zero elements (=1 in this example matrix) of the “data” part of the parity check matrix, H_d31. The number of non-zero elements in rows 0, 1, . . . , M−1, is represented by k₀, k₁, . . . , k_M-1, respectively.

Expansion of the Parity Check Matrix

In order to accommodate larger block sizes without redesigning the parity check matrix the original (“base”) matrix is expanded. This can be done for example by replacing each non-zero element with a permutation matrix of the size of the expansion factor. The most common way of performing expansion is as follows, (see [11])

Expansion of H_pis done by replacing each “0” element by an L×L zero matrix, 0_L×L, and each “1” element by an L×L identity matrix, I_L×L, where L represent the expansion factor.

Expansion of H_dis done by replacing each “0” element by an L×L zero matrix, 0_L×L, and each “1” element by a circularly shifted version of an L×L identity matrix, I_L×L. The shift order, s (number of circular shifts to the right, for example) is determined for each non-zero element of the base matrix.

For hardware implementations, it is important to note that these expansions can be implemented without the need to significantly change the base hardware wiring.

FIG. 4. shows an example of a base matrix 41 and a corresponding expanded matrix 42 using 3×3 sub-matrices of which that labeled 43 is an example.

The simple recursive algorithm described earlier can be still applied in a slightly modified form to the expanded matrix. If h_i,jrepresent elements of the H_dportion of the expanded parity check matrix, then parity bits can be determined as follows:

p
₀
=h
_0,0
d
₀
+h
_0,1
d
₁
+h
_0,2
d
₂
+ . . . +h
_0,11
d
₁₁

p
₁
=h
_1,0
d
₀
+h
_1,1
d
₁
+h
_1,2
d
₂
+ . . . +h
_1,11
d
₁₁

p
₂
=h
_2,0
d
₀
+h
_2,1
d
₁
+h
_2,2
d
₂
+ . . . +h
_2,11
d
₁₁

p
₃
=p
₀
+h
_3,0
d
₀
+h
_3,1
d
₁
+h
_3,2
d
₂
+ . . . +h
_3,11
d
₁₁

p
₄
=p
₁
+h
_4,0
d
₀
+h
_4,1
d
₁
+h
_4,2
d
₂
+ . . . +h
_4,11
d
₁₁

p
₅
=p
₂
+h
_5,0
d
₀
+h
_5,1
d
₁
+h
_5,2
d
₂
+ . . . +h
_5,11
d
₁₁

p
₆
=p
₃
+h
_6,0
d
₀
+h
_6,1
d
₁
+h
_6,2
d
₂
+ . . . +h
_6,11
d
₁₁

p
₇
=p
₄
+h
_7,0
d
₀
+h
_7,1
d
₁
+h
_7,2
d
₂
+ . . . +h
_7,11
d
₁₁

p
₈
=p
₅
+h
_8,0
d
₀
+h
_8,1
d
₁
+h
_8,2
d
₂
+ . . . +h
_8,11
d
₁₁

p
₉
=p
₆
+h
_9,0
d
₀
+h
_9,1
d
₁
+h
_9,2
d
₂
+ . . . +h
_9,11
d
₁₁

p
₁₀
=p
₇
+h
_10,0
d
₀
+h
_10,1
d
₁
+h
_10,2
d
₂
+ . . . +h
_10,11
d
₁₁

p
₁₁
=p
₈
+h
_11,0
d
₀
+h
_11,1
d
₁
+h
_11,2
d
₂
+ . . . +h
_11,11
d
₁₁

However, when the expansion factor becomes large, then the number of columns with only one non-zero element (1) in the H_pbecomes large as well. This may have negative effect on the performance of the code. One remedy for this situation is to use a slightly modified dual diagonal H_pmatrix. This is illustrated with reference to FIG. 5 where the modified base matrix 51 produces the expanded matrix 52.

The parity check equations now become:

h
_0,0
d
₀
+h
_0,1
d
₁
+ . . . +h
_0,11
d
₁₁
+p
₀
+p
₃=0 [equ 0]

h
_1,0
d
₀
+h
_1,1
d
₁
+ . . . +h
_1,11
d
₁₁
+p
₁
+p
₄=0 [equ 1]

h
_2,0
d
₀
+h
_2,1
d
₁
+ . . . +h
_2,11
d
₁₁
+p
₂
+p
₅=0 [equ 2]

h
_3,0
d
₀
+h
_3,1
d
₁
+ . . . +h
_3,11
d
₁₁
+p
₀
+p
₃
+p
₆=0 [equ 3]

h
_4,0
d
₀
+h
_4,1
d
₁
+ . . . +h
_4,11
d
₁₁
+p
₁
+p
₄
+p
₇=0 [equ 4]

h
_5,0
d
₀
+h
_5,1
d
₁
+ . . . +h
_5,11
d
₁₁
+p
₂
+p
₅
+p
₈=0 [equ 5]

h
_6,0
d
₀
+h
_6,1
d
₁
+ . . . +h
_6,11
d
₁₁
+p
₆
+p
₉=0 [equ 6]

h
_7,0
d
₀
+h
_7,1
d
₁
+ . . . +h
_7,11
d
₁₁
+p
₇
+p
₁₀=0 [equ 7]

h
_8,0
d
₀
+h
_8,1
d
₁
+ . . . +h
_8,11
d
₁₁
+p
₈
+p
₁₁=0 [equ 8]

h
_9,0
d
₀
+h
_9,1
d
₁
+ . . . +h
_9,11
d
₁₁
+p
₀
+p
₉=0 [equ 9]

h
_10,0
d
₀
+h
_10,1
d
₁
+ . . . +h
_10,11
d
₁₁
+p
₁
+p
₁₀=0 [equ 10]

h
_11,0
d
₀
+h
_11,1
d
₁
+ . . . +h
_11,11
d
₁₁
+p
₂
+p
₁₁=0 [equ 11]

Now by summing up equations 0, 3, 6, and 9, the following expression is obtained:

(h_0,0+h_3,0+h_6,0+h_9,0)d₀+(h_0,1+h_3,1+h_6,1+h_9,1)d₁+ . . . +(h_0,11+h_3,11+h_6,11+h_9,11)d₁₁+p₀+p₃+p₀+p₃+p₆+p₆+p₉+p₀+p₉=0

Since only p₀appears an odd number of times in the equation above, all other parity check bits cancel except for p₀, and thus:

p
₀=(h_0,0+h_3,0+h_6,0+h_9,0)d₀+(h_0,1+h_3,1+h_6,1+h_9,1)d₁+ . . . +(h_0,11+h_3,11+h_6,11+_9,11)d₁₁

Likewise:

p
₁=(h_1,0+h_4,0+h_7,0+h_10,0)d₀+(h_1,1+h_4,1+h_7,1+h_10,1)d₁+ . . . +(h_1,11+h_4,11+h_7,11+_10,11)d₁₁

p
₂=(h_2,0+h_5,0+h_8,0+h_11,0)d₀+(h_2,1+h_5,1+h_8,1+h_11,1)d₁+ . . . +(h_2,11+h_5,11+h_8,11+_11,11)d₁₁

After determining p₀, p₁, p₂the other parity check bits are obtained recursively:

p
₃
=h
_0,0
d
₀
+h
_0,1
d
₁
+ . . . +h
_0,11
d
₁₁
+p
₀

p
₄
=h
_1,0
d
₀
+h
_1,1
d
₁
+ . . . +h
_1,11
d
₁₁
+p
₁

p
₅
=h
_2,0
d
₀
+h
_2,1
d
₁
+ . . . +h
_2,11
d
₁₁
+p
₂

p
₆
=h
_3,0
d
₀
+h
_3,1
d
₁
+ . . . +h
_3,11
d
₁₁
+p
₀
+p
₃

p
₇
=h
_4,0
d
₀
+h
_4,1
d
₁
+ . . . +h
_4,11
d
₁₁
+p
₁
+p
₄

p
₈
=h
_5,0
d
₀
+h
_5,1
d
₁
+ . . . +h
_5,11
d
₁₁
+p
₂
+p
₅

p
₉
=h
_6,0
d
₀
+h
_6,1
d
₁
+ . . . +h
_6,11
d
₁₁
+p
₆

p
₁₀
=h
_7,0
d
₀
+h
_7,1
d
₁
+ . . . +h
_7,11
d
₁₁
+p
₇

p
₁₁
=h
_8,0
d
₀
+h
_8,1
d
₁
+ . . . +h
_8,11
d
₁₁
+p
₈

One desirable feature of LDPC codes is that they support various required code rates and block sizes. A common approach is to have a small “base” (or “mother”) matrix defined for each required code rate and to support various block sizes by expanding the base matrix. Since it is usually required to support a range of block sizes a common approach is to define expansion for the largest block size and then apply some kind of algorithm that specifies expansion for smaller block sizes. Below is an example of a base matrix specification:

$\begin{matrix} 11 & 0 & 10 & 6 & 3 & 5 & 1 & 0 & - 1 & - 1 & - 1 & - 1 \\ 10 & 9 & 2 & 2 & 3 & 0 & - 1 & 0 & 0 & - 1 & - 1 & - 1 \\ 7 & 9 & 11 & 10 & 4 & 7 & - 1 & - 1 & 0 & 0 & - 1 & - 1 \\ 9 & 2 & 4 & 6 & 5 & 3 & 0 & - 1 & - 1 & 0 & 0 & - 1 \\ 3 & 11 & 2 & 3 & 2 & 11 & - 1 & - 1 & - 1 & - 1 & 0 & 0 \\ 2 & 7 & 1 & 0 & 10 & 7 & 1 & - 1 & - 1 & - 1 & - 1 & 0 \end{matrix}$

In this example specification the base matrix is designed for the code rate R=½ and its dimensions are (M_b×N_b)=(6×12). Assume that the block (codeword) sizes (lengths) to be supported are in the range N=[72,144], with increments of 12, i.e. N=[72, 84, . . . , 132, 144]. In order to accommodate those block lengths the parity check matrix needs to be of the appropriate size (i.e. the number of columns must match N, the block length) and number of rows is defined by the code rate: M=(1−R)N. In order to generate these matrices the base matrix is expanded appropriately. The way this expansion is done is defined by the base matrix elements, which relate to the expansion supporting the maximum block size. One of the conventions, used in this example, for interpreting the numbers in the base matrix, is as follows:

−1, represents 12×12 all-zero square matrix, 0₁₂.
0, represents 12×12 identity matrix, I₁₂.
integer,r (<12), represents 12×12 identity matrix, I₁₂, rotated to the right (for example) a number of times corresponding to the integer.

The following example shows a rotated identity matrix where the integer specifying rotation equals 5:

0 0 0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 0 1 0 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0 0 1

1 0 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0 0 0

Therefore, for the largest block (codeword) size of N=144, base matrix needs to be expanded by a factor of 12. That way the final parity check matrix to be used for encoding and generating the codeword of size 144, is of the size (72×144). In other words, the base matrix was expanded Lmax=12 times (from 6×12 to 72×144), L being the expansion factor. For the block sizes smaller than the maximum the base matrix gets expanded by a factor L<Lmax. In this case expansion is performed in the similar fashion except that now matrices I_Land 0_L, are used instead of I₁₂and 0₁₂, respectively. Integers specifying the amount of rotation of the appropriate identity matrix, I_L, are derived from those corresponding to the maximum expansion by applying some algorithm. For example, such an algorithm may be a simple modulo operation:

rL=(rLmax)modulo L

Often base parity check matrices are designed to follow some assumed degree distribution. Degree distribution is defined as the distribution of column weights of the parity check matrix. Column weight in turn equals the number of 1's in a column. It has been shown that irregular degree distributions offer the best performance on Additive White Gaussian Noise (AWGN) channels. However, one consequence is that the base matrix does not exhibit any structure in its H_dpart. Only after expansion does the final matrix attain the general form of FIG. 2. Unfortunately in this case, the number of block-rows, corresponding to the number of sub-iterations in the layered decoding algorithm, becomes large. Since the maximum number of rows that can be processed in parallel equals the number of rows in the block of rows (and cannot be greater than that), the overall throughput is inversely proportional to the number of sub-iterations (number of blocks of rows). An example of such a matrix is shown in FIG. 6 where the data part H_d61 and the parity part H_p62 of a matrix 60 are shown as before. The corresponding expanded matrix is shown in FIG. 7 also having a data part H_d71 and the parity part H_p72 of the matrix 70. Each of the shaded squares 73 indicates a L×L small permutation matrix that is placed on the position of the 1's in the base matrix, where L is the expansion factor. So if the size of the base matrix was M_b×N_b, the size of expanded matrix is now M×N=LM_b×LN_b. It is obvious that the expanded matrix becomes somewhat structured but the expanded matrix still retains the irregular structure of base matrix.

Some prior art base matrix designs utilize the reduction of number of rows (and consequently the number of block of rows in the expanded matrix) by combining rows to increase the code rate without changing the degree distribution. Although this results in the number of rows in a high rate derived matrix becoming smaller, it is still relatively large since usually the original (low rate) base matrix has quite a large number of rows in order to allow row combining. Furthermore, now decoding time becomes a function of the code rate: the higher the code rate the fewer layers in the layered decoding and, in general, less time taken by the decoder. One consequence is that the decoding of the original low rate code typically takes longer than normal.

Encoding a Packet of Variable Length Using a Set of LDPC Codes

A data packet of length L is required to be encoded using an LDPC block code (N,K), as previously presented, K is the information block size, N is the length of the codeword, and M is the number of parity check bits, M=N−K. The encoded data is to be transmitted using a number of symbols, each carrying S bits.

This scenario is described with reference to FIG. 20. The data packet 201 of length L is divided into segments 208. These segments are in turn encoded using an (N,K) code 202, 203 (optionally “pruned” to (N′,K′) 204, 205). “Pruning” means applying code shortening (sending less information bits than possible with a given code, K′<K) and puncturing (removing some of the parity bits and/or data bits prior to sending the encoded bits to the modulator block and subsequently over the channel). Several of those “pruned” codewords may be concatenated 206 in order to accommodate the encoded data packet, and the resulting stream 207 is padded with bits 209 to match the modulated symbols boundaries 210 before being sent to the modulator. The amount of shortening and puncturing may be different for any of constituent “pruned” codewords. The objectives here are:

(a) Keep the performance in terms of coding gain at as high level as possible. This objective translates into the following needs:

- Select the largest suitable codeword from the available set of codewords. For the LDPC codes and other block codes, the longer the codeword the more coding gain can be achieved although at some codeword size the point of diminishing return is reached.
- Adjust properly the amount of shortening and puncturing, as this directly affects the performance as well as the efficiency of the transmission.

(b) Use as few of the modulated symbols as possible. This in turn means that it is desirable to utilize transmit power economically. This is especially important for hand held wireless and other devices operating on batteries. Keeping the “air time” at minimum translates into prolonged battery life.

(c) Keep the overall complexity at a reasonable level. This usually translates into a requirement to operate with a relatively small set of codewords of different size. In addition, it is desirable to have a code designed in such a way that various codeword lengths can be implemented efficiently. Finally, the actual procedure defining concatenation rules should be simple.

From (a) above it follows that in order to use a small number of codewords, an efficient shortening and puncturing operation needs to be applied. However, those operations have to be implemented in a way that would neither compromise the coding gain advantage of LDPC codes, nor lower the overall transmit efficiency unnecessarily. This is particularly important when using a special class of LDPC parity check matrices that enable simple encoding operation. These special matrices employ either a lower triangular or a dual-diagonal (or a modified dual-diagonal) in the portion of the matrix corresponding to the parity check bits. An example of a dual-diagonal matrix is shown in FIG. 3 (described earlier) in which H_p32 corresponds to the parity bits and H_d31 to the information data bits. Furthermore, some of those constructions allow definition of a set of parity check matrices by expanding an original (base) matrix. The common way for doing so is by replacing each non-zero element in the original matrix by a z×z square permutation matrix and each zero element by a z×z zero matrix. In its simplest form, permutation matrices are cyclically shifted z×z identity matrices.

There have been several efforts to achieve efficient puncturing (with or without shortening). Most of the work has been done around the “rate compatible” approach. One or more LDPC parity check matrix is designed for the low code rate application. By applying the appropriate puncturing the same matrix can be used for a range of code rates higher than the original code rate. These methods predominantly target applications where adaptive coding (e.g. Hybrid Automatic Repeat Request H-ARQ) and/or unequal bit protection is desired.

In some cases puncturing is combined with code extension in order to mitigate the problem with “puncturing only” case. The main problem that researchers are trying to solve here is to preserve an optimum degree distribution through the process of modifying the original parity check matrix.

However, these methods do not directly address the problem described earlier: apply shortening and puncturing in such a way that the code rate is approximately the same as the original one and the coding gain is preserved.

A method which attempts to solve this particular problem has been described. This method specifies shortening and puncturing such that the code rate of the original code is preserved. The following notation is used:

N_punctured—Number of punctured bits

N_shortened—Number of shortened bits

Shortening to puncturing ratio, q, is defined as: q=N_shortened/N_punctured. In order to preserve the same code rate, q has to satisfy the following equation:

q
_rate
_—
_preserved
=R/(1−R)

Two methods are prescribed for choosing which bits to “shorten” and which to “puncture” (“shortening” and “puncturing” pattern). The choice of a method depends on the code rate (i.e. on the particular parity check matrix design). This fact itself suggests that the method fails to prescribe general rules for performing shortening and puncturing. In addition, this particular method was intended for the IEEE 802.16e standard, available at www.ieee.org, the entirety of which is incorporated herein by reference, for which preservation of the code rate seems to be of essence. However, this is an unnecessary restriction for a general case described in FIG. 20. Namely, it is possible to accept some reduction in the code rate in order to keep the performance in terms of the coding gain at the certain level.

Both methods are applied to shortening and puncturing of the expanded matrices as described in Dale Hocevar and Anuj Batra, “Shortening and Puncturing Scheme to Simplify LDPC Decoder Implementation,” Jan. 11, 2005, a contribution to the informal IEEE 802.16e LDPC ad-hoc group, the entirely of the document is incorporated herein by reference. These matrices are generated by periodically shortening/puncturing bits with the periods and offsets taken from a set of base matrices (one base matrix per code rate). Whereas this method preserves the column weight distribution as a positive feature, it may severely disturb the row weight distribution of the original matrix. This, in turn, causes degradation when common iterative decoding algorithms are used. This adverse effect strongly depends on the structure of the expanded matrix. That is why there was a need to introduce second method, which somewhat mitigates problems for some of the matrices under consideration. Second method is based on first method with some reordering of the original pattern.

In general, the amount of puncturing needs to be limited. Extensive puncturing beyond certain limits paralyzes the soft decision decoder. Prior-art methods, none of which specify a puncturing limit or alternatively offer some other way for mitigating the problem, may potentially compromise the performance significantly.

The present invention introduces a system and method that allows coding matrices to be expanded to accommodate various information packet sizes and support for various code rates; additionally the invention defines a number of coding matrices particularly suited to the system and method of the present invention. The system and method enables high throughput implementations, allows achieving low latency, and offers other numerous implementation benefits. At the same time, the new parity part of the matrix still preserves the simple (recursive) encoding feature.

In accordance with one embodiment of the present invention, a general form is shown in FIG. 8 where the matrix 80 again has a data part H_d81 and a parity part H_p82. Each of the shaded blocks 83 represents either a permutation (or pseudo-permutation) matrix or a zero matrix. When efficient encoding is required, the parity part H_p82 of the matrix, is designed such that its inverse is also a sparse matrix. Elements of the base matrix and its sub-matrices may be binary or non-binary (belonging to a finite Galois Field of q elements, GF(q)) with a field of finite number of elements.

The “data” part (H_d) may also be placed on the right side of the “parity” (H_p) portion of the parity check matrix. In the most general case, columns from H_dand H_pmay be interchanged.

Some constructions of these base matrices allow more efficient encoding. FIG. 9 shows examples for H_p's of such matrices. In each example, zero sub-matrices 91 are shown lightly shaded with a 0, and permutation (or pseudo-permutation) sub-matrices 90 are shown cross-hatched. In FIG. 9 sub-matrices 901, 902 are suitable for use in embodiments of the invention. Sub-matrices 903, 904 represent particularly interesting cases of the generalized “dual diagonal” form. The first column 95 of the sub-matrix 903, and the last column 96 of the sub-matrix 904 contain an odd number of sub-matrices in order to ensure the existence of the inverse matrix of H_p. The other columns each contain two sub-matrices (in pairs, “staircase”), which ensures efficient recursive encoding. These constructions of H_pcan be viewed as generalized cases of the prior art in which H_pconsists of a dual diagonal and an odd-weight column (such as the matrix 60 in FIG. 6), which may be symbolically expressed as:

H
_p,present
_—
_invention(m)=T(H_p,prior_—_art,m),

Where T is the transform describing the base matrix expansion process and m is the size of the permutation matrices. For m=1, H_pof the present invention defines the form of the prior art H_p(dual diagonal with the odd-weight column), i.e.

H
_p,present
_—
_invention(1)=T(H_p,prior_—_art,1)=H_p,prior_—_art

A further pair of sub-matrices 905, 906 illustrate cases where these first and last columns, respectively, have only one sub-matrix each.

The two last sub-matrices 907, 908 in FIG. 9 illustrate lower and upper triangular structure, also permit efficient recursive encoding. However, in order to solve the weight-1 problem, the sub-matrices 99 (shown hatched) in each example must have the weight of all columns equal to 2 (except the last one, which has weight equal to 1). Note that this violates the assumption that sub-matrices are permutation matrices.

Various advantages of embodiments of the present invention will now be described with reference to the matrix 100 of FIG. 10, representing one possible realization of a rate R=½ parity check matrix based on the structure of the matrix 80 of FIG. 8 described earlier. In the matrix 100, only non-zero elements (1's in this example) are indicated and all blanks represent 0's. As before the parity check matrix consists in general of D blocks of rows that may be processed in parallel (D=4 for the matrix 100). Each block of rows can be further split in a number of blocks column-wise, B (B=8 for the matrix 100). Therefore, the whole base parity check matrix 100 can be envisioned as consisting of D blocks of rows and B blocks of columns of either permutation sub-matrices (or pseudo-permutation sub-matrices) or zero sub-matrices. It is not required that those sub-matrices are square sub-matrices, however for this example the encoding process and decoding features assume that all sub-matrices are m×m permutation sub-matrices or all-zero sub-matrices. Note especially that the expanded base matrix inherits structural features from the base (or “mother”) matrix. In other words, the number of blocks (rows or columns) that can be processed in parallel (or serial, or in some serial/parallel combination) in the expanded matrix equals the number of blocks (permutation sub-matrices or zero sub-matrices) in the base matrix. Now, if the base matrix does not have sufficient structural regularity (see matrix 60 of FIG. 6.) neither will the expanded matrix (see matrix 70 of FIG. 7.), which was the case of the prior art approach. The base matrix 100 of FIG. 10 is shown as an expanded matrix 110 in FIG. 11. In this example, each non-zero element is replaced by a L×L permutation sub-matrix, and each zero is replaced by an L×L zero sub-matrix of which the smaller square 111 is an example.

It can be seen that expanded matrix 110 has inherited structural properties of its base (mother) matrix 100 from FIG. 10. That means that in the expanded matrix large squares (of which 112 is an example) are still permutation (or all zero) sub-matrices and as such they offer implementation advantages.

The solution as described so far has some restrictions on the degree distribution (distribution of column weights) of the parity check matrix. Although the solution is also applicable to irregular codes, the degree distribution limitation compromises the performance of the code to some extent.

This restriction is virtually eliminated by the following enhancements allowing the embodiments of the present invention to be used in a general case when such a division of the parity check matrix is not required. The matrix can be expanded to accommodate various information packet sizes and can be designed for various code rates. This method enables high throughput implementations, permits low latency, and offers other implementation benefits. At the same time, the new parity part of the matrix can be still designed to preserve the simple encoding feature. FIG. 12 shows the most general form of this matrix 120. Cross-hatched blocks, of which 121 is an example, represent sub-matrices S, which may be, in the most general form, rectangular. In addition, these sub-matrices 121 may further consist of a set of smaller sub-matrices of different size. Elements of the base matrix and its sub-matrices may be binary or non-binary (belonging to elements of a finite Galois Field of q elements, GF(q)).

These changes mean that the sub-matrices 121 in FIG. 12 are not restricted to be permutation, pseudo-permutation, or zero sub-matrices. They still preserve the features that enable an efficient layered belief propagation decoding algorithm.

The enhancements modify the layered belief propagation architecture described in [1], for example. Layered belief propagation decoding is next briefly described with reference to FIG. 13 that shows a parity check matrix 130 having rows corresponding to different layers of which 132, 133, 134 are examples.

A high level architectural block diagram is shown in FIG. 14 for the parallel row processing scenario comprising memory modules 141, connected to a read network 142 (using ‘permuters’). These permuters are in turn connected to a number of processing units 143 whose outputs are directed to a write network 144 (using inverse permuters). In this scenario, each iteration of the belief propagation decoding algorithm consists of processing D layers (groups of rows). In the prior art approach updating of the decoding variables corresponding to a particular layer depends on the equivalent variables corresponding to all other layers.

In order to support a more general approach proposed by the present invention, this basic architecture needs to be modified. One example of such a modification is shown in FIG. 15 where the extra inter-layer storage element 155 is shown. In both figures it is assumed that the number of rows per decoding layer equals L. In this new architecture additional storage of “inter-layer” variables is also required—the function being provided by the element 155. This change enables an increased level of parallelism beyond the limits of prior art.

By exercising careful design of the parity check matrix the additional inter-layer storage 155 in FIG. 15 can be implemented with low complexity. One such approach is now explained.

Iterative parallel decoding process is best described as Read-Modify-Write operation. The read operation is performed by a set of permuters, which deliver information from memory modules to corresponding processing units. Parity check matrices, designed with the structured regularity described earlier, allow efficient hardware implementations (e.g., fixed routing, use of simple barrel shifters) for both read and write networks. Memory modules are organized so as to provide extrinsic information efficiently to processing units.

Processing units implement block (layered) decoding (updating iterative information for a block of rows) using any known iterative algorithms, (e.g., Sum Product, Min-Sum, Bahl-Cocke-Jelinek-Raviv (BCJR).).

Inverse permuters are part of the write network that perform the write operation back to memory modules.

Such parallel decoding is directly applicable when the parity check matrix is constructed based on either permutation (or pseudo permutation) and zero sub-matrices.

One advantage of the present invention is an addition of special sub-matrices S, as described above in relation to FIG. 12. A sub-matrix S can also be constructed by concatenation of smaller permutation or pseudo-permutation matrices. An example of this concatenation is illustrated in FIG. 16, in which the four small sub-matrices 161, 162, 163,164 are concatenated into the sub-matrix 165.

Parallel decoding is also applicable with the previously described modification to the methodology; that is, when the base matrix includes sub-matrices built by concatenation of smaller permutation matrices.

FIG. 17 illustrates such a base matrix 170. The decoding layer 171 includes permutation sub-matrices 172 S₂₁, S₂₂, S₂₃, S₂₆, S₂₇, sub-matrix S₂₄(built by concatenation of smaller permutation matrices), and zero sub-matrices S₂₅, and S₂₈. The decoding layer 171 is shown 174 with the sub-matrix S₂₄split vertically into S¹₂₄176 and S²₂₄177.

It can be seen that for the decoding layer 171 a first processing unit receives information in the first row 179 from bit 1 (according to S₂₁), bit 6 (S₂₂), bit 9 (S₂₃), bit 13 (S¹₂₄), bit 15 (S²₂₄), bit 21 (S₂₆), and bit 24 (S₂₉). Other processing units are loaded in the similar way.

It is well known that for layered belief propagation type decoding algorithms, the processing unit inputs extrinsic information accumulated by all other layers, excluding the layer currently being processed. Thus, the prior art implementation described using FIG. 14 presents the processing unit with all the results accumulated by other decoding layers. The only bits that require modification in order to satisfy this requirement are bits from S¹₂₄176 and S²₂₄177, which are referred to as “special bits”. To provide complete extrinsic information about these “special bits” to a processing unit an output must be added from other variable nodes within the current layer (inter-layer results) as described previously with respect to FIG. 15 where the interlayer storage element 155 was introduced.

This is illustrated in the FIG. 18 in which are shown additional memory modules 155 used for interlayer storage and which provide interlayer extrinsic information to permuters 1821, 1822,1829. For the “special bits” the additional storage for inter-layer information consists of the delay lines 186. Processing units 184, each ‘programmed’ to correspond with a row 179 of the current decoding layer 174, provide inputs to delay lines 186. A first further permuter 1851 is applied to choose a pair of processing units 184 that operate with same “special bit”. A second further permuter 1852 chooses a processing unit's “neighbor”—namely one that operates with same “special bit” at the current decoding layer. Adders 1831 combine intra-layer information with inter-layer results from the second further permuter 1852. Outputs from the first further permuter 1851 are combined using adders 1835 whose outputs enter the inverse permuters 187 as well as all other “normal” (i.e. non-“special”) bits output from each processing unit 184. The outputs from the inverse permuters 187 are written back to the memory modules 155 (intra-layer storage). Processing continues for the complete code matrix 170, taking each layer 174 in turn.

For simplicity, FIG. 18 shows details of modifications for “special bits” coming from S¹₂₄176. Of course, the analogous modifications for S²₂₄177, must also be included in embodiments of the invention.

Examples of the new features enabled by embodiments of the invention are listed below.

Support for Regular and Irregular Parity Check Matrix Constructions

The present invention methodology for constructing parity check matrix supports both regular and irregular types of the parity check matrix. This means not only that the whole matrix may be irregular (non-constant weight of its rows and columns) but also that its constituents H_dand H_pmay be irregular as well, if such a partition is desired. Matrix constructions based on the present invention provide all other features described at no performance penalty.

Throughput Improvement

This is illustrated by an example in which the goal is to enable a hardware architecture design that targets high throughput thereby minimizing the latency.

The decoding of the LDPC codes can be done in several ways. In general, iterative decoding is applied. The most common is the “classical” Sum-Product Algorithm (SPA) method. In the SPA case, each iteration comprises two steps:

a. horizontal step, during which all row variables are updated at the same time based on the column variables; and

b. vertical step, during which all column variables are updated at the same time based on row variables.

It has been shown that better performance in terms of the speed of convergence, can be achieved with layered decoding. In that case only row variables are updated for a block of rows, one block row at a time. The fastest approach is to process all the rows within a block of rows simultaneously.

The following is a comparison of the achievable throughput (bit rate) of two LDPC codes: one based on the prior art expanded matrix, as described earlier in FIG. 7, and the other based on the matrix of the present invention as described in FIG. 17. Throughput in bits per second (bps) is defined as:

T=(K×F)/(C×I),

where K is number of info bits, F is clock frequency, C is number of cycles per iteration, and I is the number of iterations. Assuming that K, F, and I are fixed and, for example, equal: K=320 bits, F=100 MHz, and I=10, the only difference between the prior art and the present invention could come from C, the factor which is basically a measure of the level of allowed parallelism. It can be seen by comparing FIG. 7 and FIG. 17 that the number of rows is the same in both cases (M_b=16). It is assumed that the expanded matrices are also of the same size, thus they can handle the same payload of, for example, K=320 bits. This requires expansion factor to be L=320/16=20. Now the maximum number of rows that can be handled in parallel is L (=20) for the matrix of FIG. 7 whereas the number of rows for parallel operation in the case of FIG. 17 is 4×20=80. Thus the number of cycles per iteration, C, is given as follows:

C_prior_—_art=16 and C_present_—_invention=4.

Using these numbers in the formula gives:

T_max,prior_—_art=200 Mbps

T_max,present_—_invention=800 Mbps

As expected this is 4× greater maximum throughput. In addition all the desirable features of the code design in terms of efficient encoding are preserved. This means that the encoding algorithm as described earlier with respect to FIG. 2 and efficient encoder architectures still applies and that there is no degradation in performance.

Maintaining Throughput Independent of Block (Codeword) Size

When a scaleable solution is desired, the size of the expanded LDPC parity check matrix is designed to support the maximum block size. The solutions of prior art do not scale well with respect to the throughput for various block sizes. For example, in the case of prior art preferred layered decoding, processing of short and long blocks takes the same amount of time. The only difference is that in the case of the shorter blocks not all of the processing units are used. Therefore, in the case of short blocks, achieved throughput is proportionally lower. The following example is based on the same case as before (comparing matrices as described earlier with respect to FIG. 7 and FIG. 17). The present invention allows the splitting of a block of rows into smaller stripes and still have a reasonably low number of cycles per layered decoding iteration. The prior art architecture does not allow this splitting without increasing decoding time beyond a reasonable point. FIG. 19 illustrates the difference. The block 191 and 192 represent short blocks as processed by the present invention and in the prior art respectively. In the case using embodiments of the present invention only 4 cycles are required per iteration, whereas prior art implementations require 16 cycles. This represents a considerable savings in processing. For comparison, blocks 193 and 194 represent long blocks as processed by the present invention and in the prior art respectively, where as expected the savings are not made.

The following table compares the computed results.

Number of

Codeword

processing
Throughput

size
C
units
(Mbps)

Prior art (FIG. 7)
320
16
20
200

1280
16
80
800

Embodiment of
320
4
80
800

present invention
1280
16
80
800

(FIG. 17)

It can be seen from the table that the methodology of the present invention provides constant throughput independent on the codeword size, whereas in the case of prior art the throughput for the smaller blocks drops considerably. Furthermore, it is obvious that in order to handle maximum block size, both solutions require same number of the processing units (80 in the example above). However, while the new methodology fully utilizes all available processing resources irrespective of block size, the prior art methodologies utilize all processing units only in the case of the largest block and only a fraction of the total resources for other cases. Obviously the throughput suffers in the prior art case for all block sizes except the maximum one. As previously mentioned, there is no performance penalty for this in the present invention.

Latency Improvement

The previous example showing throughput improvement for shorter blocks can also be used to conclude that better latency is also achieved with the new methodology. In such cases, large blocks of data are broken into smaller pieces, so that the encoded data is split among multiple codewords. If one places a shorter codeword at the end of the series of longer codewords, then the total latency depends primarily on the decoding time of the last codeword. According to the table above, short blocks require proportionally less time to be decoded (as compared to the longer codewords), thereby allowing acceptable latency to be achieved by encoding the data in suitably short blocks, an effect not achievable using prior art methodologies.

Hardware Scaling and Enabling Utilization of Efficient Hardware Components

The previous examples illustrated full hardware utilization. However, hardware scaling is also enabled, so that short blocks can use proportionately less hardware resources if an application requires it.

In addition, utilization of more efficient processing units and memory blocks is enabled. Memory can be organized to process a number of variables in parallel. The memory can therefore, be partitioned in parallel. This design is not practicable with prior art solutions.

Wide Support of the Existing Decoding Algorithms

The present invention is well suited for use with all known decoding algorithms of LDPC codes as well as many more general ones. It can be implemented in hardware and/or software and it demonstrates benefits in both implementations.

Code Rate and Information Block Size Scalability

Further, the present invention enables flexible rate adjustments by the use of shortening, or puncturing, or a combination thereof. Block length flexibility is also enabled through expansion, shortening, or puncturing, or combinations thereof. Any of these operations can be applied to the base or expanded matrices.

If the base matrix is designed with some additional constraints, then base matrices for different code rates can be derived from one original base matrix in one of two ways:

a. Row combining. In this case higher code rate base matrices are derived from an original low rate base matrix by combining (adding together) rows of the base matrix using specific constraints. The first of these is that either the rows of the base matrix to be combined must not have overlapping elements, or if the rows of the base matrix have overlapping elements, then the rows of the expanded matrix must not have overlapping elements. The second constraint is that rows to be combined must belong to different blocks of rows of the original matrix. Both of the constraints must be imposed in such a way as to ensure preservation of the properties for a block of rows as in the original matrix.

b. Row splitting. In this case lower code rate base matrices are derived from an original high code rate base matrix by splitting rows using a specific constraint, namely that the number of blocks of rows in the derived base matrix is the same as in the original matrix.

Row-combining or row-splitting, with the specific constraints defined above, allow efficient coding of a new set of expanded derived base matrices. In these cases the number of layers may be as low as the minimum number of block rows (layers) in the original base matrix.

Applicability to Both the Transmitter and the Receiver

The present invention may be beneficially applied to both the transmitter and the receiver.

The present invention addresses other shortcomings of the prior art as follows:

(a) Specifies general rules for shortening and puncturing patterns

(b) Provides mechanism for q>q_rate_—_preserved

(c) Establishes a limit on the amount of puncturing

(d) Provides an algorithmic method for finding the optimal solution within the range of given system parameters.

Although developed for wireless systems, embodiments of the invention can be applied to any other communication system which involves encoding of variable size data packets by a fixed error correcting block code.

The advantage of this invention can be summarized as providing an optimal solution to the described problem given the range of the system parameters (regarding the performance, power consumption, and complexity). It comprises the following steps:

- 1. Based on the data packet size determine the minimum number of required modulated symbols;
- 2. Select the codeword length from the set of available codeword lengths;
- 3. In an iterative loop determine required amount of shortening and puncturing and corresponding estimated performance and add additional modulated symbol(s) if necessary;
- 4. Distribute amount of shortening and puncturing across all constituent codewords efficiently; and
- 5. Append padding bits in the last modulated symbol if necessary.

These steps are more fully shown in the flow chart FIG. 21 in which the process starts 211 and various parameters are input 212:

- Data packet size in bits, L,
- Set of codewords of size N_i(i=1, 2, . . . , number_of_codewords) for the derived code rate R
- Number of bits carried by a modulated symbol S
- Performance criteria.

At step 213, the minimum number of modulated symbols N_sym_—_minis calculated. Next 214, the codeword size N is selected and the number of codewords to be concatenated N_cwordsis computed. Next at step 216 the required shortening and puncturing are computed, and performance estimated. If the performance criteria are met 217, the number of bits required to pad the last modulated symbol is computed 218 and the process ends 219. Where the performance criteria are not met 217, an extra modulated symbol is added 215 and the step 214 is reentered.

Both the encoder and the decoder are presented with the same input parameters in order to be able to apply the same procedure and consequently use the same codeword size as well as other relevant derived parameters such as amount of shortening and puncturing for each of the codewords, number of codewords, etc. In some cases only transmitter (encoder) has all the parameters available, and receiver (decoder) is presented with some derived version of the encoding procedure parameters. For example, in some applications it is desirable to reduce the initial negotiation time between the transmitter and the receiver. In such cases the transmitter initially informs the receiver of the number of modulated symbols it is going to use for transmitting the encoded bits rather than the actual data packet size. In such situations the transmitter performs the encoding procedure differently taking into consideration the receiver's abilities (e.g. using some form of higher layer protocol for negotiation). In this case some of the requirements are relaxed in order to counteract deficiencies of the information at the receiver side. For example, the use of additional modulated symbols to enhance performance may always be in place, may be bypassed altogether, or may be assumed for the certain ranges of payload sizes (e.g. indirectly specified by the number of modulated symbols).

One example of such an encoded procedure is the IEEE 802.11n Orthogonal Frequency Division Multiplex (OFDM) based transceiver. In this case the reference to the number of bits per modulated symbol translates into the number of bits per OFDM symbol. Also, in this particular case, the AggregationFlag parameter is used to differentiate between the case when both the encoder and the decoder are aware of actual data packet size (AggregationFlag=0) and the case when the packet size is indirectly specified by the number of required OFDM symbols (AggregationFlag=1).

Algorithm Parameters:

NN_max=2304, NN_min=576, NN_inc=576 (effectively 4 codeword

lengths:576, 1152, 1728, 2304)

are maximum, minimum and increment of codeword lengths

P_max
maximum puncturing percentage,(puncturing percentage is

defined as:

# of punctured bits/# total number of parity bits(%)

Algorithm Input:

R
target code rate

N_CBPS
number of data bits in OFDM symbol

AggregationFlag signals whether PSDU is an aggregate of MPDUs

(AggregationFlag = 1), or

not

HT_LENGTH
is number of payload octets

(AggregationFlag = 0), or number of OFDM

symbols (AggregationFlag = 1)

Algorithm Output:

NN
code length to use

N_CodeWords
number of codewords to use

KK_S,KK_S_—_Last
number of information bits to send in first

codeword(s), and in last

codeword

N_P,N_P_—_Last
number of bits to puncture in first codeword(s), and

in last codeword

N_OFDM
number of OFDM symbols used

N_PaddingBits
number of bits the last OFDM symbol is padded

Algorithm Procedure

if(AggregationFlag == 0) {

N_InfoBits=8×HT_LENGTH;
//in non-aggregation case HT_LENGTH is

//the number of payload octets

N_OFDM=ceil(N_InfoBits/ (N_CBPS×R));
//minimum number of OFDM symbols

}

else {

N_OFDM= HT_LENGTH;
//in aggregation case HT_LENGTH is the

//number of OFDM symbols

N_InfoBits= N_OFDM×N_CBPS×R;
//number of info bits includes padding;

//MAC will use its own delineation method

//to recover an aggregate payload

}

N_CodeWords= ceil(N_CBPS× N_OFDM/ NN_max);
//number of codewords is based on

maximum

//codeword length

NN = ceil(N_CBPS× N_OFDM/( N_CodeWords×NN_inc))× NN_inc;

//codeword length will be the larger

//closest one to N_CBPS× N_OFDM/N_CodeWords

KK=NN×R;
//number of information bits in codeword chosne

MM=NN−KK;
//number of parity bits in codeword chosen

N_ParityBits_—_requested=N_CodeWords× MM;

//total number of parity bits allocated in N_OFDMsymbols

N_ParityBits=min(N_OFDM× N_CBPS− N_InfoBits,N_ParityBits_—_requested);

//in non-aggregation case allow adding extra OFDM symbol(s) to limit puncturing

if (Aggregation Flag==0) {

while(100×( N_ParityBits_—_requested−N_ParityBits)/N_ParityBits_—_requested>P_max) {

N_OFDM= N_OFDM+1;

//extra OFDM symbol(s) are used to carry parity

N_ParityBits=min(N_ParityBits+ N_CBPS,N_ParityBits_—_requested);

}

}

//Finding number of information bits to be sent per codeword(s), KK_S, KK_S_—_Last, and

//number of bits the codeword(s) will be punctured N_P, and N_P_—_Last.

//Making sure that last codeword may only be shortened more then others, and punctured less

then others.

KK_S=ceil(N_InfoBits/ N_CodeWords);

KK_S_—_Last= N_InfoBits− KK_S×( N_CodeWords−1);

MM_P=min(MM, floor(N_ParityBits/_CodeWords);

MM_P_—_Last= min(MM, N_ParityBits− MM_P×( N_CodeWords−1));

N_P=MM− MM_P;

N_P_—_Last=MM− MM_P_—_Last;

//Finally, calculating number of padding bits in last OFDM symbol

N_PaddingBits= N_OFDM× N_CBPS− N_InfoBits−N_ParityBits;

Each of those features will be now described in more detail.

(a) General Rules for Shortening and Puncturing

Much effort has been expended by the coding research community to come up with designs of LDPC parity check matrices such that the derived codes provide optimum performance. Among those contributions references T. J. Richardson et al., “Design of Capacity-Approaching Irregular Length Low-Density Parity-Check Codes,” IEEE Transactions on Information Theory, vol. 47, February 2001 and S. Y. Chung, et al., “Analysis of sum-product decoding of low-density parity-check codes using a Gaussian approximation,” IEEE Transactions on Information Theory, vol. 47, February 2001, both of which are incorporated herein by reference, are examples. These investigations show that, in order to provide optimum performance, a particular variable nodes degree distribution should be applied. “Degree distribution” refers here to the distribution of the column weights in a parity check matrix. This distribution, in general, depends on the code rate and the size of the parity check matrix, or codeword. It is desirable that the puncturing and shortening pattern, as well as the number of punctured/shortened bits, are specified in such a way that the variable nodes degree distribution is preserved as much as possible. However, since shortening and puncturing are qualitatively different operations, different rules apply to them, as will now be explained.

(b) Rules for Shortening

Shortening of a code is defined as sending less information bits than possible with a given code, K′<K. The encoding is performed by: taking K′ bits from the information source, presetting the rest K-K′ of the information bit positions in the codeword to a pre-defined value (usually 0), computing M parity bits by using the full M×N parity check matrix, and finally forming the codeword to be transmitted by concatenating K′ information bits and M parity bits. One way to determine which bits to “shorten” in the “data” portion of the parity check matrix, H_d(31 in FIG. 3.), is to define a pattern which labels bits to be shortened, given the parity check matrix, H=[H_d|H_p]. This is equivalent to removing corresponding columns from H_d. The pattern is designed such that the degree distribution of the parity check matrix after shortening (removing appropriate columns from H_d) is as close as possible to the optimal one for the new code rate and the codeword length. To illustrate this we can use a trivial example. Consider a matrix having the following sequence of weights (each number corresponds to a column weight):

3 3 3 8 3 3 3 8 3 3 3 8

When discarding columns, the aim is to ensure that the ratio of ‘3’s to ‘8’s remains close to optimal, say 1:3 in this case. Obviously it cannot be 1:3 when one to three columns are removed. In such circumstances, the removal of 2 columns might result in e.g.:

3 3 8 3 3 8 3 3 3 8

giving a ratio of ˜1:3.3 and the removal of a third column—one with weight ‘8’—might result in:

3 3 3 3 8 3 3 3 8

thus preserving a ratio of 1:3.5, which closer to 1:3 than would be the case where the removal of the third column with weight ‘3’, which results in:

8 3 3 3 8 3 3 3 8

giving a ratio of 1:2.

It is also important to preserve approximately constant row weight throughout the shortening process.

An alternative to the approach just described is to prearrange columns of the part of the parity check matrix, such that the shortening can be applied to consecutive columns in H_d. Although perhaps suboptimal, this method keeps the degree distribution of H_dclose to the optimum. However, the simplicity of the shortening “pattern” (viz. taking out the consecutive columns of H_d) gives a significant advantage by reducing complexity. Furthermore, approximately constant row weight is guaranteed (assuming the original matrix satisfies this condition). An example of this concept is illustrated in FIG. 22 where the original code rate R=½ matrix 220 is shown. In the FIG. 22 (and FIG. 25) the white squares represent a z×z zero matrix, whereas the gray squares represent a z×z identity matrix shifted circularly to the right a number of times specified by the number written in the middle of the corresponding gray square. In this particular case, the maximum expansion factor is: z_max=72.

After rearranging the columns of the H_dpart of the original matrix the new matrix takes on the form 221 shown in FIG. 22. It can be seen that if the shortening is performed as indicated (to the left from the H_d/H_pboundary) the density of the new H_dwill slightly increase until it reaches a “heavy” weight columns (such as the block column 222). At that point the density of the new H_dwill again approach the optimum one. Note that the rearranging of the columns in H_ddoes not alter the properties of the code.

In the case of a column regular parity check matrix, (or more generally, approximately regular or regular and approximately regular only in the data part of the matrix, H_d), the method described in the previous paragraph is still preferred compared to the random or periodic/random approach described in [8]. The method described here ensures approximately constant row weight, which is another advantage from the performance and the implementation complexity standpoint.

Puncturing of a code is defined as removing parity bits from the code word. In a wider sense, though, puncturing is defined as removing some of the bits (either parity bits or data bits or both) from the codeword prior to sending the encoded bits to the modulator block and subsequently over the channel. The operation of puncturing, increases the effective code rate. Puncturing is equivalent to a total erasure of the bits by the channel. The soft iterative decoder assumes a completely neutral value corresponding to those erased bits. In the case that the soft information used by the decoder is the log-likelihood ratio, this neutral value is zero.

Puncturing of LDPC codes can be given an additional, somewhat different, interpretation. An LDPC code can be presented in the form of the “bipartite” graph FIG. 23, in which the codeword bits are presented by the “variable” nodes 231 and parity check equations by the “check” nodes 232.

Each variable node 231 is connected 234 by lines (“edges”), for example 233, to all the check nodes 232 in which that particular bit participates. Similarly, each check node (corresponding to a parity check equation) is connected by a set of edges 237 to all variable nodes corresponding to bits participating in that particular parity check equation. If a bit is punctured (for example node 235) then all the check nodes connected to it (those connected by thicker lines 236) are negatively affected. Therefore, if a bit chosen for puncturing participates in many parity checks, the performance degradation may be very high. On the other hand, since the only way that the missing information (corresponding to the punctured bits) can be recovered is from the messages coming from check nodes those punctured bits participate in, the more of those the more successful recovery may be. Faced with contradictory requirements, the optimum solution obviously can be found somewhere in the middle. Embodiments of the invention use the following general rules for puncturing:

- Bits selected for puncturing should be chosen such that each one is connected to as few check nodes as possible. This can be equivalently stated as follows: bits selected for puncturing should not be the ones corresponding to the heavy-weight (“strong”) columns, i.e. columns containing large number of non-zero elements (1's in this particular case).
- Bits selected for puncturing should be chosen such that they all participate in as many parity check equations as possible.

Some of these trade-offs can be observed from FIG. 24 showing the Frame Error Probability 240 for various situations.

FIG. 25 illustrates the parity check matrix 250 used for obtaining the above results. The codeword size is 1728, which is obtained by expanding the base matrix by the factor of z=72. In FIG. 24, the curves are shown for six examples,

- 241: Shortened=0 Punctured=216 Infobits 3 strong columns
- 242: Shortened=0 Punctured=216 Infobits 3 weak columns,
- 243: Shortened=0 Punctured=216 Infobits random
- 244: Shortened=0 Punctured=216 Parity columns 22,23,24
- 245: Shortened=0 Punctured=216 Parity random and
- 246: Shortened=0 Punctured=216 Parity columns 20,22,24

It is obvious from the FIG. 24 that puncturing bits corresponding to heavy-weight (“strong”) columns has a catastrophic effect on performance (241). On the other hand, puncturing block columns that do not participate in very many parity check equations does not provide very good performance either (244). The best results are obtained when both criteria are taken into account (the curves 242, 243, 245, 246). Among all of those, it appears that for the particular matrix structure (irregular H_dpart with the modified dual diagonal in the H_ppart) the best results were obtained when the punctured bits were selected from those corresponding to the “weak” columns of the data part of the parity check matrix, H_d, (242). If the parity check matrix is arranged as in 221 of FIG. 22, then the puncturing bits can be selected by starting from the leftmost bit of H_dand continuing with consecutive bits towards the parity portion of the matrix.

The matrix in FIG. 25 has undergone column rearrangement such that all the “light” weight “data” columns have been put in the “puncturing” zone (leftmost part of the H_dpart of the parity check matrix).

As has been previously mentioned, in the case where the preservation of the exact code rate is not mandatory, the shortening-to-puncturing ratio can be chosen such that it guarantees preservation of the performance level of the original code. Normalizing the shortening to puncturing ratio, q, as follows:

q
_normalized=(N_shortened/N_punctured)/[R/(1−R)],

means that q becomes independent of the code rate, R. Therefore, q_normalized=1, corresponds to the rate preserving case of combined shortening and puncturing. However, if the goal is to preserve performance, this normalized ratio must be greater than one: q_normalized>1. It was found through much experimentation that q_normalizedin the range of 1.2-1.5 complies with the performance preserving requirements.

In the case of a column regular parity check matrix (or more generally, approximately regular, or regular and approximately regular only in the data part of the matrix, H_d) the method described in the previous paragraph is still preferred compared to the random or periodic/random approach described in prior art since the present invention ensures approximately constant row weight, which is another advantage from both the performance and the implementation complexity standpoints.

A large percentage of punctured bits paralyzes the iterative soft decision decoder. In the case of LDPC codes this is true even if puncturing is combined with some other operation such as shortening or extending the code. One could conclude this by studying the matrix 250 of FIG. 25. Here, it can be seen that as puncturing progresses it is more and more likely that a “heavy” weight column will be hit. This is undesirable and has a negative effect on the code performance. Defining the puncturing percentage as:

P
_puncture=100×(N_puncture/M),

then it can be seen that the matrix 250 from FIG. 25 cannot tolerate puncturing in excess of P_puncture_—_max=33.3%. Therefore, this parameter P_puncture_—_maxmust be set and taken into account when performing the combined shortening and puncturing operation.

Characteristics of the embodiments of the present invention may include:

- Shortening or combined shortening and puncturing is applied in order to provide a large range of codeword sizes from a single parity check matrix.
- The effective code rate of the code defined by the parity check matrix modified by shortening and puncturing is equal to or less than the original code rate.
- Shortening is done in such a way that the column weight distribution of the modified matrix is optimal for the new matrix size and code rate. A suboptimum solution is to keep the column weight distribution of the new matrix only approximately optimum.
- Shortening is done in such a way that the approximately uniform row weight is preserved.
- Puncturing is done in such a way that each of the bits selected for puncturing is connected to as few check nodes as possible.
- Puncturing is done in such a way that the bits selected for puncturing all participate in as many parity check equations as possible.
- Puncturing is done in such a way that the approximately uniform row weight is preserved.
- A suboptimal but computationally efficient method is to first rearrange the columns of the “data” part of the parity check matrix, H_d, by applying the preceding rules assuming that shortening is applied to a group of consecutive bits of the “data” part of the parity check matrix and puncturing is applied to another group of consecutive bits of the “data” part of the parity check matrix (illustrated by the example matrix 250 shown in FIG. 25.).
- Performance of the new code, which is obtained by applying both the shortening and puncturing, can be kept at the level of the original code by setting the normalized shortening to puncturing ratio, q_normalized=(N_shortened/N_punctured)/[R/(1−R)] greater than one. The exact value (>1) depends on the particular matrix design and the code rate, R. When the preservation of the original code rate is required, the normalized shortening to puncturing ratio shall be set to one (q_normalized=1).
- The amount of puncturing is limited to a certain value, which depends on the particular parity check matrix design.

Preferred Matrices

The system, apparatus, and method as described above are preferably combined with one or more matrices shown in the FIGS. 26a, 26b and 26c that have been selected as being particularly suited to the methodology. They may be used alone, or with other embodiments of the present invention.

The matrices in FIGS. 26a, 26b and 26c have been derived and tested, and have proven to be at least as efficient as prior art matrices in correcting errors.

A first group of matrices (FIG. 26a #1-#5) cover expansion factors up to 72 using rates R=½, ⅔, ¾, ⅚, and ⅞, respectively. The matrices may be utilized as they are specified or with the columns in the “data” part of any of the matrices (first R24 columns on the left side) reordered in any way. The “parity” part ((1−R)24 rightmost columns) of the matrices is designed to allow simple encoding algorithms. They may be used in many different standards, including wireless standards IEEE 802.11, and IEEE 802.16.

A further matrix (FIG. 26b #6) covers expansion factors up to 96 for rate R=¾. The matrix may be utilized as it is or with the columns in the “data” part (first R*24 columns on the left side) reordered in any way. The “parity” part ((1−R)*24 rightmost columns) of the matrix is designed to allow simple encoding algorithms.

The rate R=¾ matrices (FIG. 26b #7-#9) cover expansion factors in the range between 24 and 96 in increments of 4.

The rate R=⅚ matrix (FIG. 26b #10) may be used to cover expansion factors in the range between 24 and 96 in increments of 4.

The two rate R=⅚ matrices (FIG. 26c #11 and #12) cover expansion factors up to Lmax=96. The matrices may be utilized as they are or with the columns in the “data” part (first R24 columns on the left side) reordered in any way. The “parity” part ((1−R)24 rightmost columns) of the matrix is designed to allow simple encoding algorithms. These particular matrices can accommodate codeword sizes in the range 576 to 2304 in increments of 96. Consequently, the expansion factors, L are in the range 24 to 96 in increments of 4. Right circular shifts of the corresponding L×L identity matrix (as explained in the previous section), s′, are determined as follows:

s′=floor{s·(L/96)}, where s is the right circular shift corresponding to the maximum codeword size (for L=Lmax=96), and it is specified in the matrix definitions.

The present invention has been described with regard to one or more embodiments. However, it will be apparent to persons skilled in the art that a number of variations and modifications can be made without departing from the scope of the invention as defined in the claims.

Number	Date	Country
60617902	Oct 2004	US
60627348	Nov 2004	US
60635525	Dec 2004	US
60638832	Dec 2004	US
60639420	Dec 2004	US
60647259	Jan 2005	US
60656587	Feb 2005	US
60673323	Apr 2005	US

Structured low-density parity-check (ldpc) code

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

PCT Information

Provisional Applications (8)