1. Field
The present disclosure relates to the decoding of low-density parity-check (LDPC) codes. More in particular, it relates to methods for improving the performance of iterative decoders for LDPC codes which may be used with modulation levels above simple binary signaling.
2. Description of Related Art
As known to the person skilled in the art and as also mentioned in U.S. Pat. No. 7,343,539 incorporated herein by reference in its entirety, a low-density parity-check (LDPC) code is a linear code determined by a sparse parity-check matrix H having a small number of 1s per column. The code's parity-check matrix H can be represented by a bipartite Tanner graph wherein each column of H is represented by a transmitted variable node, each row by a check node, and each “1” in H by a graph edge connecting the variable node and check node that correspond to the column-row location of the “1”. The code's Tanner graph may additionally have non-transmitted variable nodes. Each check or constraint node defines a parity check operation. Moreover, the fraction of a transmission that bears information is called the rate of the code. An LDPC code can be encoded by deriving an appropriate generator matrix G from its parity-check matrix H. An LDPC code can be decoded efficiently using a well-known iterative algorithm that passes messages along edges of the code's Tanner graph from variable nodes to check nodes and vice-versa until convergence is obtained, or a certain number of iterations is reached.
Forward error correction using LDPC codes is being used for deep-space and other aerospace applications as described by K. S. Andrews, D. Divsalar, S. Dolinar, J. Hamkins, C. R. Jones, and F. Pollara in “The development of turbo and LDPC codes for deep-space applications,” Proceedings of the IEEE, 95(11):2142-2156, November 2007. A set of LDPC codes has been approved as an international standard by the Consultative Committee for Space Data Systems (CCSDS) (see “TM Synchronization and Channel Coding,” CCSDS 131.1-B-2. Blue Book, Issue 2. August 2011). The standard LDPC codes include a family of nine accumulate repeat-4 jagged accumulate (AR4JA) LDPC codes, available in any combination of three code rates (½, ⅔, and ⅘) and three input block lengths (1024, 4096, and 16384).
The encoder 110 shown in
In a software implementation, it may remain most convenient and efficient to store the last n−k columns of the generator matrix in their entirety, not making use of the quasi-cyclic property, and performing the encoding operation using standard matrix multiplication. In a high-level language such as C, individual bit operations are not as efficient as operations that are applied on registers that are 32 or 64 bits wide. Therefore, in C it is efficient to break each of the n−k columns into 64-bit segments, and store each segment in a 64-bit wide “long int” data structure. In this way, in one operation, 64-bits of information can be XORed with a 64-bit portion of the generator column, and the final codebit determined from the parity of all such 64-bit operations of the column. Since the input lengths of each of the AR4JA codes are a multiple of 64, this approach makes efficient use of the 64-bit data structures.
Possible implementations of the modulator 120 shown in
BPSK is a real-valued constellation with two signal points: c(0)=A and c(1)=−A, where A is a scaling factor. This constellation is shown in
QPSK is a complex constellation with four signal points, with
for i=0, 1, 2, 3. This constellation is shown in
8-PSK has constellation points
for i=0, 1, . . . , 7. This constellation is shown in
for i=0, 1, . . . , M−1. The average symbol energy is Es=E[∥c(i)∥2]=A2.
16-APSK is a standard of the second generation Digital Video Broadcast for Satellites. It is also referred to as 12/4 APSK or 12/4 QAM. It consists of the union of amplitude-scaled QPSK and 12-PSK signal constellations as shown in Eq. 1 below and the constellation shown in
The DVB-S2 standard defines the ratio r2/r1=3.15, 2.85, 2.75, 2.70, 2.60, and 2.57 for code rates ⅔, ¾, ⅘, ⅚, 8/9, and 9/10, respectively. The DVB-S2 standard does not specify use of a rate ½ code with 16-APSK; for the simulations described herein, r2/r1=3.15 when a rate ½ code is used. The average symbol energy is E=E[∥c(i)∥2]=(r12+3r22)/4.
32-APSK is also a DVB-S2 standard. It is the union of three PSK constellations as shown in Eq. 2 below and the constellation shown in
The DVB-S2 standard defines the ratios r2/r1=2.84, 2.72, 2.64, 2.54, and 2.53, and r3/r1=5.27, 4.87, 4.64, 4.33, and 4.30 for code rates ¾, ⅘, ⅚, 8/9, and 9/10, respectively. The DVB-S2 standard does not specify use of rate ½ or ⅔ codes with 32-APSK; for the simulations described herein, r2/r1=4.0 and 3.15 and r3/r1=8.0 and 6.25 are used when rate ½ and ⅔ codes, respectively, are used. The average symbol energy is Es=E[∥c(i)∥2]=(r12+3r22+4r32)/8.
Encoded bits are assigned to a sequence of corresponding complex constellation points, or modulation symbols. Each of the modulations considered in this disclosure has a number of constellation points that is a power of two, which makes such bit-to-symbol mappings straightforward.
The signal constellations described above define a natural binary ordering. For example, the 8-PSK constellation points indexed by i=0, 1, 2, 3, 4, 5, 6, and 7 correspond to the 3-bit patterns 000, 001, 010, 011, 100, 101, 110, and 111, respectively. This may be referred to as the natural bit-to-symbol mapping for the modulation. Note that the natural ordering, or any other, is dependent on the way the constellation points happen to be indexed which, in principle, is arbitrary.
Other mappings, such as Gray codes, can often give better performance. Note that a Gray code may be more properly referred to as a Gray labeling. A code's word error rate performance is not dependent on the order of indexing, whereas with a Gray labeling, the whole point is that it is defined in a particular order. There are many Gray codes with the defining property that adjacent members in the list differ in exactly one bit in their binary representation, some with slightly different performance than others. In the simulations discussed herein, the binary reflected Gray code is used, which has recently been proven to be the optimal Gray code for M-PSK modulations (see, for example, E. Agrell, J. Lassing, E. G. Strom, and T. Ottosson, “On the optimality of the binary reflected Gray code,” IEEE Trans. Inform. Theory, 50(12):3170-3182, 2004.). The binary reflected Gray code of length M is obtained from the binary reflected Gray code of length M/2 by listing the members 0, 1, . . . , M−1, each preceded by a zero, followed by the members M−1, M−2, . . . , 0, each preceded by a one.
The binary reflected Gray code has the prefix property, i.e., a length M′ Gray code's members are equal to the first M′ members of a Gray code of length M, M>M′. Thus, when conducting simulations of Gray codes of various lengths, only the longest Gray code need be stored.
An anti-Gray code has the property that adjacent members in the list differ either in all their bits or in all but one of their bits. An anti-Gray code of length M can be obtained from a binary reflected Gray code of length M by removing the last M/2 entries and inserting after each of the remaining M/2 entries the ones complement of that entry. Anti-Gray codes do not have a prefix property, meaning a separate mapping should be stored for each length.
For modulations in which constellation points have more than two near neighbors, a specialized bit to symbol mapping is needed. The DVB-S2 standard specifies such a mapping to use with 16-APSK and 32-APSK.
The bit representations of the constellation points under the natural, Gray, anti-Gray, and DVB mappings are shown in
As discussed above in regard to
The passband signal is assumed to be of the form shown in Eq. 3 below:
s(t)=a(t)cos(2πfct+θ(t)) Eq. 3
where fc is the carrier frequency in Hz, and a(t) and θ(t) are arbitrary modulation-dependent signals. Eq. 3 may be rewritten as shown in Eq. 4 below:
s(t)=Re{{tilde over (s)}(t)ej2πf
where {tilde over (s)}(t)=a(t)ejθ(t) is the complex baseband representation of s(t). Eq. 5 below presents an alternative expression for {tilde over (s)}(t):
{tilde over (s)}(t)=√{square root over (Pc)}+{tilde over (m)}(t) Eq. 5
where √{square root over (Pc)} is an unmodulated residual carrier signal with complex baseband power Pc, and {tilde over (m)}(t) is a complex baseband modulation with complex baseband power
This can be put back in passband notation using Eq. 4, from which the residual carrier signal term √{square root over (Pc)} cos(2πfct) is readily apparent. The modulations discussed herein have the form shown in Eq. 6 below:
where m[i] is a member of a signal constellation m[i]εC={c(0), c(1), . . . , c(M−1)} in the complex plane, and where p(t) is a square pulse shape of symbol duration T as shown in Eq. 7 below:
For the purposes of this disclosure, the residual carrier signal can be assumed to have been filtered out of the modulated received signal or, equivalently, Pc=0. Thus, the received modulated complex baseband signal is of the form shown in Eq. 8 below:
{tilde over (r)}(t)={tilde over (m)}(t)+ñ(t) Eq. 8
where ñ(t) is a complex baseband Gaussian noise process with one-sided power-spectral density N0 in each dimension. As the receiver, {tilde over (r)}(t) is put through a perfect matched filter, which results in complex soft symbols as shown in Eq. 9 below:
r[i]=m[i]+n[i] Eq. 9
where n[i] is a complex Gaussian random variable with variance σ2 in each of its real and imaginary components.
The performance of the AR4JA LDPC codes on a binary-input additive white Gaussian noise (AWGN) channel is well-documented (see, for example, “The development of turbo and LDPC codes for deep-space applications,” and “Low density parity check codes for use in near-Earth and deep space,” cited above). Such published performance results apply to binary phase-shift keying (BPSK) or quadrature PSK (QPSK) modulation, as is typically used in deep space missions. When bandwidth is constrained, however, system engineers may also desire to know the performance of LDPC codes when used with higher order modulations, in order to most effectively trade off power efficiency, bandwidth efficiency, and complexity. The need for bandwidth-efficient higher order modulations will become more pressing in the future as NASA and other space agencies utilize higher data rates and more simultaneous missions in the same limited spectrum. Modern variable coded modulation (VCM) or adaptive coded modulation (ACM) schemes will be able to switch between the different coded modulations as power and bandwidth resources vary.
Therefore, it is helpful to assess the performance of the standard LDPC codes when used with higher order modulations such as 8-PSK, 16-ary amplitude PSK (16-APSK), and 32-APSK. The performance of rate ⅘ AR4JA codes used with BPSK, 8-PSK, and 16-APSK has been previously reported (see M. Cheng, D. Divsalar, and S. Duy “Structured low-density parity-check codes with bandwidth efficient modulation,” In Proceedings of SPIE Conference on Defense Security and Sensing, April 2009). For other combinations of codes and modulations, performance may be estimated based on the concept of code imperfectness. First, the code imperfectness of the code when used with BPSK is determined by measuring the difference between the code's required bit signal to noise ratio Eb/N0 to attain a given codeword error rate (CWER) and the minimum possible Eb/N0 required to attain the same CWER as implied by the sphere-packing bounds for codes with the same block size k and code rate r (see S. Dolinar, D. Divsalar, and F. Pollara, “Code performance as a function of block size,” TDA Progress Report, 42(133), May 1998). This same imperfectness is then applied with respect to the capacity of the higher order modulation to arrive at an approximated performance of the code when used with the higher order modulation. The imperfectness approximation has generally been found to be fairly accurate, to within about 0.5 dB, over a wide variety of codes and modulations.
The presence of noise in the channel makes the selection and implementation of a decoder (such as the decoder 140 shown in
Described herein are embodiments that provide for digital communication coding methods, apparatus, and systems with improved performance for decoding of LDPC coded signals. The described methods, apparatus, and systems incorporate a decoder or decoding method that decodes LDPC coded messages with a bipartite graph having check nodes and variable nodes. Messages from check nodes are partially hard limited, so that every message which would otherwise have an magnitude at or above a specified level is reassigned to an maximum magnitude, while the sign of the sign of the original message is not changed.
One aspect is a method for decoding a low-density parity-check (LDPC) coded signal transmitted in a channel, where the method comprises: receiving input messages comprising the LDPC coded signal for subsequent processing on a bipartite graph, wherein the bipartite graph comprises variable nodes and check nodes representing an LDPC code; passing messages along edges of the bipartite graph, wherein passing messages comprises iteratively passing messages from the variable nodes to the check nodes and from the check nodes to the variable nodes; assigning a maximum positive value to every message from each check node greater than or equal to a selected positive limit value; assigning a maximum negative value to every message from each check node less than or equal to a selected negative limit value; and outputting a decoded message when convergence is reached or a selected number of iterations is reached. Absolute values of the maximum positive value and minimum negative value may be equal.
Another aspect is a digital communication receiving system, wherein the digital communication receiving system is configured to receive transmissions encoded with a low-density parity-check code, and the system comprises: a demodulator, wherein the demodulator receives modulated data and outputs demodulated data; and a decoder, wherein the decoder decodes demodulated data from the demodulator to output decode data by performing several processing steps, wherein the several processing steps comprise: receiving the demodulated data as inputs to variable nodes of a bipartite graph, wherein the bipartite graph comprises variable nodes and check nodes representing the low-density parity-check code; passing messages along edges of the bipartite graph, wherein passing messages comprises iteratively passing messages from the variable nodes to the check nodes and from the check nodes to the variable nodes; assigning a maximum positive value to every message from each check node greater than or equal to a selected positive limit value; assigning a minimum negative value to every message from each check node less than or equal to a selected negative limit value; and outputting the decoded data when convergence is reached or a selected number of iterations is reached. Absolute values of the maximum positive value and minimum negative value may be equal.
Still another aspect is a method for decoding a low-density parity-check (LDPC) coded signal transmitted in a channel, where the method comprises: receiving input messages comprising the LDPC coded signal for subsequent processing on a bipartite graph, wherein the bipartite graph comprises variable nodes and check nodes representing an LDPC code; passing messages along edges of the bipartite graph, wherein passing messages comprises iteratively passing messages from the variable nodes to the check nodes and from the check nodes to the variable nodes; assigning a maximum positive value to at least one message from at least one check node greater than or equal to a selected positive limit value; assigning a minimum negative value to at least one message from at least one check node less than or equal to a selected negative limit value; and outputting a decoded message when convergence is reached or a selected number of iterations is reached. Absolute values of the maximum positive value and minimum negative value may be equal.
The details of one or more exemplary embodiments are set forth in the accompanying drawings and description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
As described below, embodiments of the present invention provide for improved decoding performance at lower signal-to-noise ratios. The improved decoding performance is provided at various modulations and with various demodulation approaches. The description below presents the performance of other known decoding methods to establish the improvement provided by embodiments of the present invention. Simulation results for combinations of the nine AR4J ALDPC codes and five modulations discussed above are presented below to provide estimates of the expected performance of these codes using known decoding approaches and embodiments according to the present invention. Provided below is the simulated performance of parameters such as code rates and lengths and modulations, for different combinations of codes and modulations, along with some combinations of mappings, demodulator structures, and number of decoder iterations.
As described above, LDPC systems may utilize various modulation types and various bit-to-modulation-symbol mappings. To aid in understanding the invention, a derivation of associated log likelihood ratios (LLRs) that apply to the various modulation types and bit mappings is presented below. One of the simple and well-performing LLR approximations can be expressed in a general equation that applies to all of the modulation types.
A demodulator (such as the demodulator 130 shown in
As discussed below, the exact LLR expression for an arbitrary constellation is derived, and a lower-complexity approximate LLR expression based on nearest neighbors to the received point and the LLR expressions specific to BPSK, QPSK, 8-PSK, 16-APSK, and 32-APSK are provided.
The LLR for the jth bit of the symbol is shown in Eq. 10 below:
where P is used to indicate a probability and p to indicate a probability density function (pdf). Also, Bayes's rule for a mixture of probabilities and pdfs was applied and, in the last step, p(bj=0)=P(bj=1)=½ is assumed.
For iε{0,1}, the pdf may be found as shown in Eqs. 11-13 below:
where Eq. 11 follows because it is a sum of disjoint events, and Eq. 13 is the pdf of a complex Gaussian random variable with variance σ2 in each of its real and imaginary components.
Substituting into Eq. 10, Eq. 14 is obtained:
Thus, to compute the jth bit LLR from r, one may compute the squared distance to each of the constellation points, separating those constellation points that have a 0 in bit j from those that have a 1, and using Eq. 14.
The relation shown below in Eq. 15 may be used in Eq. 14:
∥r−c∥2=∥r∥2−2r,c+∥c∥2 Eq. 15
where the inner product is r,cRe{r}×Re{c}+Im{r}×Im{c}.
When the modulation has symbols each of the same energy, as is the case for PSK modulations, the ∥r∥2 and ∥c∥2 terms in the numerator and denominator cancel and the simpler form shown in Eq. 16 is obtained:
A common approximation to the LLR is to replace each sum in Eq. 14 by its largest term, i.e., by using only the nearest constellation point that has bj=0 in the numerator, and the nearest neighbor that has bj=1 in the denominator. If these nearest neighbor constellation points are denoted as shown in Eq. 17 below:
c*(j,i)c(argminb:b
iε{0,1}, then Eq. 16 may be approximated as shown below:
For equal energy signal constellations, Eq. 19 may be approximated as shown in Eq. 20 below:
This requires one subtraction and two multiplications. The step of dividing by σ2 can be eliminated if a remains constant over many symbols, by precomputing c(i)/σ2 for each i.
The LLR for hard decisions produced by the demodulator will differ from the LLR for soft decisions. When the demodulator produces hard decisions, the decoder does not have access to r, and therefore cannot compute λj as in Eq. 14. Instead, the decoder only is told whether bj is more probably a 1 or a 0, i.e., whether λj≦0 or λj>0, respectively. That is, the hard decision decoder is given sgn(λj).
Because the decoder operates on LLRs, a hard decision LLR may be defined as shown below in Eq. 21:
where p is the probability that the hard decision is incorrect. For BPSK, p=Q(√{square root over (2Es/N0)}), where:
Note that computation of λj(H) requires knowledge of Es/N0. The receiver typically makes an estimation of this, but if this estimate is not available, there would be an additional decoder implementation loss.
The LLR discussion above was for an arbitrary modulation constellation. For BPSK modulation, there are only two constellation points, and so the expression in Eq. 18, and hence Eq. 20, is exact. There is only one bit LLR to compute, namely, λ0, with c*(0,0)=A and c*(0,1)=−A, and the LLR is given by Eq. 22 below:
When a code is used with BPSK, the LLRs of the codebits are independent and identically distributed (i.i.d.), because each codebit gets mapped to its own modulation symbol, and each modulation symbol is corrupted by i.i.d. noise.
The LLR for QPSK modulation may also be derived in a similar manner. As can be seen from
c(0)=A(1+j)
c(1)=A(−1+j)
c(2)=A(1−j)
c(3)=A(−1−j)
and then plugging these relations into Eq. 16, when then becomes Eq. 23 below
Using the following relationships:
r,c(0)=A(Re{r}+Im{r})
r,c(1)=A(−Re{r}+Im{r})
r,c(2)=A(Re{r}−Im{r})
r,c(3)=A(−Re{r}−Im{r})
and plugging these into Eq. 23 and simplifying, Eq. 24 is obtained:
which is identical to Eq. 22. Following the same procedure for the most significant bit, where c(0) and c(1) are now in the numerator and c(2) and c(3) are in the denominator, the LLR is given by Eq. 24 below:
As was the case for BPSK, with coded QPSK using a Gray bit-to-symbol mapping, the LLRs of the codebits are independent and identically distributed (i.i.d.). Note, when the bit-to-symbol mapping is not a Gray code, the LLR expressions will not simplify to the expressions above, and the LLR's will not be i.i.d.
A similar approach is followed to determine the LLR for 8-PSK modulation. The three bit LLRs for each 8-PSK symbol can be computed using Eq. 16, with four terms each in the numerator and denominator. As there is no apparent simplification of this exact LLR expression, the approximate LLR computation of Eq. 20 can be used when a lower complexity computation is needed.
To identify the closest constellation point with a 0 or a 1 in the bit position of interest, one could compute the distances to all eight constellation points. This is unnecessary, however. As can be seen from
The computation in Eq. 26 requires only comparisons to constants, and no computation of distances. Similarly, another constellation point may be calculated as shown in Eq. 27 below:
Eq. 26 and Eq. 27 can be plugged into Eq. 20. The LLRs for the other two bits can be computed in a similar fashion.
Unlike BPSK and QPSK, when higher order modulations are used, the codebit LLRs are neither independent nor identically distributed. They are not independent because noise affecting reception of an 8-PSK constellation point affects the LLRs of the three associated codebits in a correlated manner. They are not identically distributed because the distance properties are not the same with respect to each bit. For example, with Gray-coded 8-PSK as shown in
The distance properties of the LSB are worse than those of the other two bits. As a result, the MSB and middle bit of Gray-coded 8-PSK are received, on average, with a higher absolute LLR than the LSB is.
Following the techniques described above, the LLR for 16-APSK modulation can be derived. The four bit LLRs for each 16-APSK symbol can be computed using Eq. 16, with eight terms each in the numerator and denominator. As there is no apparent simplification of this exact LLR expression, the approximate LLR computation of Eq. 20 can be used when a lower complexity computation is needed.
To identify the closest constellation point with a 0 or a 1 in the bit position of interest, one could compute the distances to all sixteen constellation points. As was the case for 8-PSK, this is unnecessary. Since 16-APSK is simply the union of two PSK modulations, the angle comparison approach used for 8-PSK can be used to identify the closest inner-ring constellation point with a 0 in the bit position of interest, and separately, to identify the closest outer-ring constellation point. Then r,c can be computed for each of the two candidate constellation points to find the closer point. This requires computation of a total of four inner products, or eight multiplications, to compute an approximate bit LLR.
A more careful approach can be even more efficient. The Voronoi regions of 16-APSK are shown in
Following the techniques described above, the LLR for 32-APSK modulation can also be derived. The five bit LLRs for each 32-APSK symbol can be computed using Eq. 16, with sixteen terms each in the numerator and denominator. As there is no apparent simplification of this exact LLR expression, the approximate LLR computation of Eq. 20 can be used when a lower complexity computation is needed. Since 32-APSK is the union of three PSK modulations, the angle comparison approach used for 8-PSK can be used to identify the closest constellation point with a 0 in the bit position of interest, on each ring. Then r,c can be computed for each of the three candidate constellation points to find the closest point. The same type of calculation is made for constellation points with a 1 in the bit position of interest. This requires computation of a total of six inner products, or twelve multiplications, to compute an approximate bit LLR.
The Voronoi boundaries of 32-APSK are not all horizontal, vertical, or at a 45 degree angle, so the more efficient method detailed above for 16-APSK could not be used for 32-APSK.
After the received signal is demodulated, it is provided to a decoder, such as the decoder 140 shown in
As can be seen in
The curve labeled as JH2009 in
As described in additional detail below, embodiments of the present invention provide for optimization of decoder performance, which provides for improvements over the performance of existing decoders. After optimization, the performance can be improved to that shown in
The number of iterations may provide for some improvement in decoder performance.
Quantization levels may also provide for decoder performance improvements. In a practical decoder, LLRs are represented by digital quantities. This quantization limits both the dynamic range and the resolution of the LLRs. In early experiments, it has been determined that 8 bits of quantization for the LLRs leads to a negligible loss in performance. A quantizer of the form shown in Eq. 28 below:
is convenient, where C is a scale factor. In this way, Q(x) takes on the integer values −127, −126, . . . , 126, 127, and can be stored in an 8-bit register. This is a symmetric, uniform (equal step-size) quantizer, and for x in the granular region, Q(x)≈Cx. In the decoding algorithm, the value Q(x)/C can be used wherever x would normally be used. Note that the quantizer represents zero exactly, which is helpful to represent the LLRs of untransmitted variable nodes. It also is symmetric about zero, so that a decoder will not be biased toward either positive or negative LLRs.
Since the quantizer output has maximum magnitude 127, it represents LLRs in the dynamic range (−127/C, +127/C). Smaller values of C correspond to a larger dynamic range, which could aid the performance of a decoder. Given the fixed number (255) of quantizer levels, however, a larger dynamic range also means larger, coarser step size between quantizer levels. These two effects may be traded off to optimize performance.
Handling of variable node processing may also provide for decoder performance improvements. A given variable node receives LLR messages u1, u2, . . . , ud from d check nodes, where d is the degree of the variable node, along with an LLR λ from the demodulator. The message the variable node sends back to the jth of the d check nodes connected to it is given by Eq. 29 below:
Given quantized inputs Q(λ) and Q(ui), which as described above are about 8 times their true LLR values and are clipped to ±127, the outgoing quantized message may be computed as shown in Eq. 30 below:
Eq. 30 may also be written as Eq. 32 below:
Q(vi)=clip(U−uj) Eq. 32
where UQ(λ)+Σi=1dQ(ui). This form is convenient because each of the outgoing messages v1, . . . , vd can be computed from U with a single subtraction.
In an early FPGA LDPC decoder implementation reported in the literature, U was clipped prior to the subtraction as shown by Eq. 33 below:
Q(vj)=clip(clip(U)−uj) Eq. 33
Intuitively, this clipping, herein referred to as “Jones clipping,” seems undesirable because, for example, if all of the incoming messages are large, including uj, then the outgoing message will be near zero. Without the clipping of U, the message Q(vj) would be large, as is intuitively desirable.
Despite the intuition about the detrimental effect of this “Jones clipping,” it turns out that the overall effect is to improve performance because such clipping apparently helps the decoder dig itself out of trapping sets in which it otherwise would get stuck. The effect may be analogous to simulated annealing, in which the algorithm occasionally moves in the opposite direction of the gradient in order to dig itself out of a local minimum. A solid theoretical understanding of this is lacking, however.
The performance improvement can be seen in the curve labeled “with Jones clipping” in
When channel symbol LLRs for degree-1 variable nodes are not clipped to levels below the maximum magnitude of check node messages, an error floor results. The reason for the floor is that a strong but wrong channel symbol LLR is not able to be overcome by the single message from the check node. For the (1024,⅘) code with 128 degree-1 variable nodes, channel symbol LLRs clipped to ±15.875, and a decoder with maximum check node message 15.125, the theoretical floor, 128Q((4Es/N0+15.125)/√{square root over (8Es/N0)}), is shown as the lower curve in
Altering the decoder to clip degree-1 variable nodes to 116/8=14.5 made little difference in the error floor, as seen in the curve labeled “degree-1 clipping” in
A given check node receives messages v1, v2, . . . , vd from d variable nodes, where d is the degree of the check node. The message the check node sends back to the jth of the d variable nodes connected to it is given by Eq. 34 below:
This can be computed by repetitively applying the function as shown below in Eqs. 35 and 36:
The second ln term of min* is smaller than the first, and can be ignored. The first ln term can be quantized using the approximation shown in Eq. 37 below:
With quantized inputs Q(x)/8 and Q(y)/8 in place of x and y, this is nonzero only when ∥Q(x)|−|Q(y)∥≦21, so a length 22 look-up table can implement this approximation. Thus, the entire min* approximation can be computed with a few comparisons, one subtraction, and no multiplications, logarithms, or exponentials.
In some implementations, such as a software decoder on a standard desktop, it is efficient to replace the comparisons, small look-up table, and subtraction with a single look-up table. With the 8-bit quantized values, an unsigned min* table has 128×128=16384 1-byte entries, and a signed min* table has 256×256=65536 1-byte entries, which is within the reach of typical computing platforms.
When a full look-up table is used for min*, there is no need to use an approximation as in Eq. 36. Instead the table can simply contain the entries shown in Eq. 38 below:
which can be conveniently computed once, ahead of time. This is equivalent to Eq. 34, using quantized inputs. Note, using the approximation shown in Eq. 37 for both log terms of Eq. 36 is not equivalent to Eq. 38, because Eq. 37 quantizes the log term separately, introducing quantization noise twice, whereas Eq. 38 does not quantize until the end of the full computation.
Nevertheless, this more exact min* computation made no discernible difference in the simulated error floor.
The rate ⅘ AR4JA codes have degree-18 check nodes. To compute a min* function of 17 variables, multiple 2-input min* functions are repeatedly computed, using a tree-structure. Since each min* involves quantization noise, the total quantization noise for the min* with 17 variables could be significant. As an alternative, each reliability message vi from a variable node can be transformed to an unreliability Ψ(vi)=ln(tan h(vi)), so that the product in Eq. 34 becomes a summation as shown in Eq. 39 below:
Note that Ψ(•) is a self-inverse function. With quantized inputs and outputs, Eq. 39 becomes Eq. 40 as shown below:
In this form, the addition can be performed without introducing quantization noise beyond that present in the inputs, and the result is transformed back to a reliability and re-quantized only at the end of the computation. The overall quantization noise is less using this method. This alteration had no discernible effect on error-floor performance, as seen in the curve marked as having “additive unreliabilities at check nodes” in
One additional decoder variation made a big difference in the error floor performance. Messages from each check node were partially hard-limited, so that every message from a check node which would otherwise have a quantized magnitude at least 100 was re-assigned to have maximum magnitude (127) (i.e., positive messages greater than or equal value +100 were re-assigned to a value of +127 and negative messages less than or equal value −100 were re-assigned to a value of −127). This resulted in the performance shown by curve marked as having “hard-limit check node messages” in
As noted, the check-node hard-limiter helps improve performance for the reasons discussed below. The lower floor means that the decoder is handling trapping sets better than the JH2009 decoder. Consider a trapping set V of incorrectly converged variable nodes, with a set C of neighboring check nodes, each connected to V an odd number of times (i.e., a (|V|, |C|) trapping set). The check nodes in C are unsatisfied. In general, a node of V may receive messages from nodes in C and nodes not in C. If the decoder is stuck in the trapping set, the (correct) messages from nodes in C are not powerful enough to overcome the (incorrect) messages from nodes not in C. Because of how C is connected to V, the messages from check nodes in C tend to start converging slightly faster than those not in C. By hard-limiting the messages from all check nodes above 100, the unsatisfied checks are able to more quickly correct incorrect nodes in V. The interaction of Jones clipping with the partial hard-limiter may also be important.
Various other damping, amplifying, optimal processing of cycles, and iterative demodulation decoding may also be incorporated. These may lead to additional performance improvements.
Software was written in C to implement the encoder, bit-mapper, modulator, noise generator, demodulator, LLR computation, and decoder for each combination of code, modulation, bit-mapping type, and demodulation type set forth in Table 1 below. Additional support for random message generation, noise generation, and gathering performance statistics was also included. The decoder uses LLRs quantized to eight bits.
The same encoder/decoder software was used for all nine codes. Prior to simulating the coded modulation, the software reads an initialization file that defines the protograph LDPC code's input and output length, circulant size, number of check and variable nodes in the protograph, number of edges in the protograph, a compact representation of the generator matrix, and an edgelist describing the parity check protograph and circulant offsets.
Table 2 shows the encoding and decoding speed of the C simulations, when compiled with a GNU C compiler on a typical desktop PC (a 3 GHz Intel Xeon processor running linux). The decoder is an 8-bit message passing decoder that stops iterating when a codeword is found. Because more iterations are needed at lower signal-to-noise ratios (SNRs), the speed of such a variable iterations decoder is sensitive to the SNR. The speeds reported in the table refer to a simulation with BPSK modulation, soft decisions, and operation at the Eb/N0 shown, which in each case corresponds to operation at a codeword error rate of about 10−4 and represents a reasonable lower limit on the Eb/N0 at which the decoder would be operated in practice. The software simulation was found to spend only a small fraction of its running time computing LLRs. Most of the time is spent performing decoder iterations. This is true even with the high order modulations such as 16-APSK and 32-APSK, where exact LLR computations amounted to only about 5 percent of the overall simulation time. As a result, the numerical results reported in this disclosure used the exact LLR expression of Eq. 14, and not the lower-complexity approximate LLR expressions described above.
Table 2 below shows the encoding speeds achieved using a software encoder in C on a standard desktop. Encoding speeds ranged from 1.5 to 50 Mbps.
A separate MATLAB implementation of equivalent functionality was also developed. The MATLAB implementation was found to run about 50 times slower. Simulation results reported in this disclosure were collected with the C software.
The numerical results obtained from the simulations are presented below. This results include: the performance of AR4JA codes when used with a variety of modulations, an optimized bit-mapping, an optimum demodulator (LLR computation), and the optimized decoder algorithms described above.
This disclosure presents a set of simulation results for LDPC codes in combination with several modulations. The numerical results are consistent with previous results, except that a new partial hard-limiter for check node messages has been introduced to eliminate error floors. The simulation results provide a foundation for the design of variable coded modulation (VCM) or adaptive coded modulation (ACM) schemes.
Performance depends on optimization of bit-to-symbol mapping in the modulator, LLR computation by the demodulator, and on the decoder's quantization dynamic range and step size, variable node clipping strategy, check node partial hard-limiting, and number of iterations. With careful optimizations, error floors can be avoided down to below CWER=10−6. Error floors may be lower, as they were not reached with the simulations conducted here. Performance is not sensitive to ring ratios used in 16-APSK and 32-APSK, nearest neighbor approximations to the LLR, and maximum iterations beyond about 200. Use of an interleaver may be avoided without performance degradation. Those skilled in the art will understand that iterative demodulating and decoding, while not specifically discussed herein, may provide for additional performance improvements.
As noted above, the methods and systems described herein did not make use of an interleaver—each set of adjacent codebits was grouped and used as input to the modulator, as shown in
Not using an interleaver may make a code vulnerable to losses when used with higher order modulations, because a weakly received modulation symbol may give rise to multiple poor codebit LLRs. An interleaver helps distribute these bursts of poor LLRs across multiple codewords, instead of bunching them in a single codeword. Codebits are passed through an interleaver, π, prior to modulation, and a de-interleaver, π−1, after demodulation, as shown in
In the single codeword interleaver, the bits within a codeword are re-ordered arbitrarily, as shown in
In a block interleaver, codewords are written in rows and read out in columns, as shown in
In a block interleaver with bit-reordering, codewords are written in rows and read out in columns, but the bits are reordered within each codeword, as shown in
Additionally,
The present disclosure has described different decoder variations. Application of these different variations in a cumulative manner, as described above, had different impacts on improving the error floor. The results of these decoder variations are summarized below.
1. Exact min*.
The decoder was altered to use an exact min* computation that incorporates the min* term and both log correction terms prior to quantization. This made no discernible difference in the error floor.
2. Jones Clipping.
Introducing Jones clipping reduces the error floor by one decade, to about CWER=10−5. This is seen in the curve labeled “with Jones clipping” in
3. Clipping Degree-1 Variable Nodes.
The description above describes a floor that occurs when channel symbol LLRs going into degree-1 variable nodes are not clipped to levels below the maximum magnitude of check node messages. The reason for the floor is that a strong but wrong channel symbol LLR is not able to be overcome by the single message from the check node. For the (1024,⅘) code with 128 degree-1 variable nodes, channel symbol LLRs clipped to 15.875, and a decoder with maximum check node message 15.125, the theoretical floor, 128Q((4Es/N0+15.125)/√{square root over (8Es/N0)}), is shown in
Altering the decoder to clip degree-1 variable nodes to 116/8=14.5 made little difference in the error floor, as seen in the red curve labeled “degree-1 clipping” in
4. Dynamic Range Adjustment.
The JH2009 decoder used integers −127 to 127 to represent LLRs ranging from −15.875 to +15.875, in uniform steps of ⅛. Using a different step size (and thus different total dynamic range) affects decoder performance, but the range (15.875,+15.875) was found to be near-optimal, at least in the waterfall region.
5. Additive Unreliability at Check Node.
The rate ⅘ AR4JA codes have degree-18 check nodes. To compute a min* function of 17 variables, multiple 2-input min* functions are repeatedly computed, using a tree-structure. Since each min* involves quantization noise, the total quantization noise for the min* with 17 variables could be large. As an alternative, each reliability message from a variable node can be transformed to an unreliability, and these may be added at the check node. This addition can be performed exactly, and the result can be transformed back to a reliability and re-quantized only at the end of this computation. However, this alteration had no discernible effect on error-floor performance, as seen in
6. Hard-Limit Check Node Messages.
The hard-limit check node decoder variation made a big difference in the error floor performance. The decoder was altered to partially hard-limit messages from the check nodes, so that every message from a check node which would otherwise have a magnitude at least 100 was re-assigned to have magnitude 127. This resulted in the performance in
Embodiments of the present invention may utilize the decoder improvements discussed above. Systems using decoders with such improvements may also utilize interleavers as discussed above. Such embodiments may provide for improved performance in the presence of higher noise levels and/or allow for higher transmission rates and/or allow for faster decoder performance.
The foregoing Detailed Description of exemplary and preferred embodiments is presented for purposes of illustration and disclosure in accordance with the requirements of the law. It is not intended to be exhaustive nor to limit the invention to the precise form or forms described, but only to enable others skilled in the art to understand how the invention may be suited for a particular use or implementation. The possibility of modifications and variations will be apparent to practitioners skilled in the art.
No limitation is intended by the description of exemplary embodiments which may have included tolerances, feature dimensions, specific operating conditions, engineering specifications, or the like, and which may vary between implementations or with changes to the state of the art, and no limitation should be implied therefrom. In particular it is to be understood that the disclosures are not limited to particular compositions or biological systems, which can, of course, vary. This disclosure has been made with respect to the current state of the art, but also contemplates advancements and that adaptations in the future may take into consideration of those advancements, namely in accordance with the then current state of the art. It is intended that the scope of the invention be defined by the Claims as written and equivalents as applicable. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. Reference to a claim element in the singular is not intended to mean “one and only one” unless explicitly so stated. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. The term “several” includes two or more referents unless the content clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.
Moreover, no element, component, nor method or process step in this disclosure is intended to be dedicated to the public regardless of whether the element, component, or step is explicitly recited in the Claims. No claim element herein is to be construed under the provisions of 35 U.S.C. Sec. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for . . . ” and no method or process step herein is to be construed under those provisions unless the step, or steps, are expressly recited using the phrase “comprising step(s) for . . . ”
A number of embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the present disclosure. Accordingly, other embodiments are within the scope of the following claims.
The present application is related to and claims the benefit of the following copending and commonly assigned U.S. Patent Application: U.S. Patent Application No. 61/474,861, “Method of Error Floor Mitigation in Low-Density Parity-Check Codes,” filed on Apr. 13, 2011; the entire contents of which is incorporated herein by reference.
The invention described herein was made in the performance of work under a NASA contract, and is subject to the provisions of Public Law 96-517 (35 USC 202) in which the Contractor has elected to retain title.
Number | Date | Country | |
---|---|---|---|
61474861 | Apr 2011 | US |