1. Field of the Invention
The embodiments described herein are related to methods for Low-Density Parity-Check decoding and more particularly to methods for achieving reduced complexity Low-Density Parity-Check decoders.
2. Background of the Invention
A Low-Density Parity-Check (LDPC) code is an error correcting code that provides a method for transferring a message over a noisy transmission channel. While LDPC techniques cannot guaranty perfect transmission, the probability of lost information can be made very small. In fact, LDPC codes were the first to allow data transmission rates at close to the theoretical maximum, e.g., the Shannon Limit. LDPC techniques use a sparse parity-check matrix, e.g, a matrix populated mostly with zeros, hence the term low-density. The sparse matrix is randomly generated subject to the defined sparsity constraint.
LDPC codes can be defined as both a matrix and a graphical form. An LDPC matrix will have a certain number of rows (M) and columns (N). The matrix can also be defined by the number of 1's in each row (wr) and the number of 1's in each column (wc). For a matrix to be considered low-density the following conditions should be met: wc<<N and wr<<M. An LDPC matrix can be regular or irregular. A regular LDPC matrix, is one in which wc is constant for every column and wr=wc*(M/N) is also constant for every row. If the matrix is low-density but the number of 1's in each row or column is not constant, then such codes are called irregular LDPC code.
It will also be understood that an LDPC code can be graphically defined by its corresponding Tanner graph. Not only do such graphs provide a complete representation of the code, they also help to describe the decoding algorithm as explained in more detail below. A Tanner graph comprises nodes and edges. The nodes are separated into two distinctive sets, or types, and the edges connect the two different types of nodes. The two types of nodes in a Tanner graph are called the variable nodes (v-nodes) and check nodes (c-nodes), or parity nodes. Thus, the Tanner graph will consist of M check nodes (the number of parity bits) and N variable nodes (the number of bits in a code word). A check node will then be connected to a variable node if there is a 1 in the corresponding element of the LDPC matrix.
The number of information bits can be represented as (K). A Generator Matrix (GK×N) can then be defined according to the following:
cN×1=GN×K dK×1, where
As can be seen, the code word cN×1 is generated by multiplying the message by the generator matrix. The subscripts are matrix notation and refer to the number of rows and columns respectfully. Thus, the data word and code word can be represented as single column matrices with K and N rows respectfully.
The parity check Matrix can be defined as HM×NcN×1=0.
Accordingly,
In receive portion 110, demodulator 112 can be configured to remove the carrier from the received signal; however, channel 108 will add channel effects and noise, such the signal produced by demodulator 112 can have the form: rN×1=2/σ2(1−2 cN×1)+wN×1, where r is a multilevel signal. As a result of the noise and channel effects, some of data bits d will be lost in the transmission. In order to recover as much of the data as possible, decoder 114 can be configured to use the parity check matrix HM×N to produce an estimate d′K×1 of the data that is very close to the original data dK×1. It will be understood that decoder 114 can be a hard decision decoder or a soft decision decoder. Soft decision decoders are more accurate, but also typically require more resources.
In order to illustrate the operation of LDPC codes, the following example is presented:
As can be seen, the example parity check matrix H is low density, or sparse. The first row of matrix H defines the first parity check node, or equation. As can be seen, the first parity check node will check received samples r0, r2, and r4, remembering that r is the multilevel signal produced by demodulator 112 in the receiver. The second parity check node, i.e., the second row of H, checks for received samples r1, r3, and r5, and the third parity check node checks samples r0, r1, and r5. In this example, there are three parity check nodes and six samples. The first and second parity check nodes are considered orthogonal, because they involve mutually exclusive sets of samples.
If it is assumed that K=3 and M=3, then the following is true:
This produces the following equations:
d0+d2+p1=0
d1+p0+p2=0
d0+d1+p2=0
These equations reduce to:
p0=d0
p1=d0+d2
p2=d0+d1
Thus, for example, if d=[0;1;0], then p=[0;0;1] and c=[0;1;0;0;0;1].
In an LDPC decoder, the operations of the parity check and variable nodes can be implemented by processors. In other words, each parity check node can be implemented by a parity check processor, and each variable check node can be implemented by a variable node processor. An LDPC decoder is then an iterative decoder that implements a message passing algorithm defined by H.
Unfortunately, conventional LDPC decoding techniques result in a high complexity, fully parallel decoder implementations where all the messages to and from all the parity node processors have to be computed at every iteration in the decoding process. This leads to large complexity, increased resource requirements, and increased cost.
Hence, there are many current efforts devoted to reducing the complexity of check node message updating, while keeping the performance loss as small as possible. The most common simplification is the min-sum algorithm (MSA), which has greatly reduced the complexity of check node updates, but incurs a 0.3-0.4 dB degradation in performance relative to standard sum-product algorithm (SPA) check node implementations. To combat this performance degradation, modifications of the MSA using a normalization term and an offset adjustment term have also been proposed. Such solutions do have reduced performance loss compared with the more conventional MSA implementations, but there is still significant performance loss. In addition, two-dimensional MSA schemes have been proposed that can further improve the performance of MSA with some additional complexity. Thus, in conventional implementations, there is a constant trade-off between complexity and performance.
Systems and methods for generating check node updates in the decoding of low-density parity-check (LDPC) codes are described below. The systems and methods described below use new approximations in order to reduce the complexity of implementing a LDPC decoder, while maintaining accuracy. The new approximations approximate the standard sum-product algorithm (SPA), and can reduce the approximation error of min-sum algorithm (MSA) and has almost the same performance as sum-product algorithm (SPA) under both floating precision operation and fixed-point operation.
In one aspect, a receiver can include a demodulator configured to receive a wireless signal, remove a carrier signal from the wireless signal and produce a received signal, and a Low Density Parity Check (LDPC) processor configured to recover an original data signal from the received signal. The LDPC processor can include a plurality of variable node processors configured to receive the received signal and generate variable messages based on the received signal, and a parity node processor configured to receive the variable messages and generate soft outputs based in the variable messages, the parity node processor configured to implement the following:
The parity node processor can be implemented using either a serial architecture or a parallel architecture.
In another aspect, a parity node processor can include a plurality of input processing blocks configured to receive variable messages in parallel and perform an exponential operation on the variable messages, a summer coupled with the plurality of input processing blocks, the summer configured to sum the outputs from the plurality of input processing blocks, a plurality of adders coupled with the summer and the plurality of input processing blocks, the plurality of adders configured to subtract the outputs of the plurality of input processing blocks from the output of the summer, and a plurality of output processing blocks coupled with the plurality of adders, the plurality of output processing blocks configured to perform a logarithm function on the outputs of the plurality of adders.
In another aspect, a parity node processor can include an input processing block configured to serially receive variable messages and perform an exponential operation on the variable messages, an accumulator coupled with the input processing block, the accumulator configured to accumulate the output of the input processing block, a shift register coupled with the input processing block, the shift register configured to store the variable massages for one clock cycle, an adder coupled with the accumulator and the shift register, the adder configured to subtract the output of the shift register from the output of the accumulator, and an output processing block coupled with the adder, the output processing block configured to perform a logarithm function on the output of the adder.
In still another aspect, a method for processing a received wireless signal can include receiving the wireless signal, removing a carrier signal from the wireless signal to produce a received signal, generating variable messages from the received signal, performing an exponential operation on the variable messages to generate exponential data, summing the exponential data subtracting the variable messages from the summed exponential data to form a difference, and performing a logarithmic operation on the difference.
These and other features, aspects, and embodiments of the invention are described below in the section entitled “Detailed Description.”
Features, aspects, and embodiments of the inventions are described in conjunction with the attached drawings, in which:
In the descriptions that follow, certain example parameters, values, etc., are used; however, it will be understood that the embodiments described herein are not necessarily limited by these examples. Accordingly, these examples should not be seen as limiting the embodiments in any way. Further, the embodiments of an LDPC decoder described herein can be applied to many different types of systems implementing a variety of protocols and communication techniques. Accordingly, the embodiments should not be seen as limited to a specific type of system, architecture, protocol, air interface, etc. unless specified.
A check node processor 302 of degree n is shown in
With the standard sum-product algorithm, the outgoing message is determined as follows:
The outgoing soft messages are then fed back to the variable node processors for use in generating outputs ui during the next iteration; however, a soft message λi based on a variable node output from a particular node are not returned to that node. Thus, the j≠i constraint in the following term of (1):
This can also be illustrated with the aide of
The messages produced by parity node processor 202 can be defined using the following equations:
Thus parity node processor 202 can be configured to implement the above equations (2). The soft messages produced by the parity nodes, e.g., parity node 202, are then fed back to variable nodes 208, 210, 212, 214, 216, and 218, for use in the next iteration.
For example,
Variable node processor 208 can be configured to implement the following equation:
u0k=uch,0+λk(0→0)+λk(2→0), (3)
It will be understood that the decoder described above can be implemented using hardware and/or software configured appropriately and that while separate parity check processors and variable node processors are described, these processors can be implemented by a single processor, such as a digital signal processor, or circuit, such as an Application Specific Integrated Circuit (ASIC); however, as mentioned above, implementation of a LDPC processor such as that described with respect to
As noted above, the sum-product algorithm of equation (1) can be prohibitive in terms of practical and cost effective implementation. Approximations have been proposed with the aim of reducing this complexity. For example, it can be shown that (4) is equivalent to (1):
λi=u1{circle around (+)}u2{circle around (+)} . . . {circle around (+)}un, (4)
where the operator {circle around (+)} is defined as:
Using the approximation formula:
ex+ey≈max(ex,ey)=emax(x,y). (6)
Or equivalently,
1n(ex+ey)≈max(x, y) (7)
in both numerator and denominator of (5), then the following can be obtained:
Repeatedly substituting (8) into (4), the min-sum algorithm (MSA) can be obtained as follows:
It will be apparent that equation (9) is much simpler to implement than (1) or (4), but the cost for this simplification is a grave performance penalty, generally about 0.3˜0.4 dB, depending on the specific code structure and code rate. To reduce such performance loss, some modifications have been proposed. For example, the performance loss of MSA comes from the approximation error of (9) relative to (1). Accordingly, to improve the performance loss, the approximation error should be reduced. It can be shown that (9) is always larger than (1) in magnitude. Thus, normalized-MSA and offset-MSA use scaling or offsetting to force the magnitude be smaller.
With the normalized min-sum algorithm, (9) is scaled by a factor α:
The offset min-sum algorithm reduces the magnitude by a positive constant β:
But these approaches again increase the complexity. Thus, as mentioned above, there is a constant trade-off between complexity and performance.
The embodiments described below use a new approach for the check nodes update in the decoding of LDPC codes. The approach is based on a new approximation of the SPA that can reduce the approximation error of the MSA and has almost the same performance as the SPA under both floating precision operation and fixed-point operation. As a result, the new approximation can be implemented in simple structures, the complexity of which is on par with MSA implementations.
The approximation error of MSA comes from the approximation error of equation (7). Note that equation (7) is coarse when x and y are close. MSA uses equation (7) in both numerator and denominator of equation (5). If the value of |x| and |y| is close, then either the numerator or the denominator can introduce large approximation error. Thus, to improve the accuracy of the outgoing message, equation (7) can be used in (5) only when the numerator or denominator of (5) will produce a small approximation error.
For example, when both x and y have the same sign, then using the approximation 1+ex+y≈max(e0,ex+y) in the numerator will produce better results than using ex+ey≈emax(x,y) in the denominator. Similarly, when x and y have opposite signs, then only approximating the denominator of (5) using ex+ey≈emax(x,y) can produce better results. Thus, a better approximation of (5), for x, y>0, can be generated using the following:
For all combinations of the signs of x and y, the following general expression can be used:
x{circle around (+)}y≈−sgn(x)sgn(y)1n(e−|x|+e−|y|) (13)
Iteratively substituting (13) into (4), produces:
Note that (14) only holds when
If this condition is not satisfied, then the results can be limited to 1, resulting in the following.
Now, let
then (15) can be expressed as:
The sign of (16) can be realized in the same way as in a MSA implementation, e.g., with binary ex-or logic circuit. The kernel of the approximation has the invertibility property, which allows the computation of the aggregate soft messages first, followed by intrinsic back-out to produce extrinsic updates.
The amplitude of equation (16) can be realized with a serial structure or a parallel structure shown in
Both structures 600 and 700 have the same computation load. Serial structure 600 requires smaller hardware size, but needs 2n clock cycles to get all outgoing soft messages. Parallel structure 700 requires only 1 clock cycle, but needs larger hardware size than serial structure 600. Parallel structure 700 is attractive when the decoding speed is the primary concern. It will be understood that the exponential and logarithm operations in
The 1n(•) operation can include the min (•,0) operation, which can be implemented by simply using the sign bit of the logarithm result to clear the output. In particular, if the logarithm is realized with look-up table, this can be done by simply setting the content of the table to 0 for all inputs greater than 1 or simply limiting the range of the address used to pick up the table content.
The implementations of
The computation complexity of the proposed implementations is similar to an MSA implementation. Table 1 is the comparison of the computation load for parity node processing for various decoding algorithm, where it has been assumed that SPA, MSA, normalized-MSA and offset-MSA are implemented in a known forward-backward manner.
In the simulations, the variable node updates are integer summations with results ranging from −128˜+128. The exponential operation, e.g., in
It can be seen from the graphs of
Moreover, although it can be challenging to meet the dynamic range requirements for the exp ( ) operation, the simulation results show that the fixed-point operation has hardly any performance loss relative to the floating operation. Note that the number of quantization bits can be greatly reduced with non-uniform quantization, with increased complexity. With non-uniform quantization, the size of the logarithm and exponential tables can be reduced, but these quantized values should be first mapped to the linearly quantized values before the operation of summation in
Accordingly, using the systems and method described above, the resources, i.e., complexity, required to implement a parity node can be reduced, while still maintaining a high degree of precision. In certain embodiments, the complexity can be reduced even further through degree reduction techniques. In other words, the number of inputs to the parity node can be reduced, which can reduce the resources required to implement the parity node. It should also be noted that in many parity node implementations, the sign and the absolute value of the outgoing soft message are calculated separately.
The outputs of DRU 1302 can then be provided to parity node processor 1304. Parity node processor 1304 can be implemented using either the serial configuration of
Similarly, depending on the embodiment, DRU 1302 can be implemented in parallel or serial structures.
An example, implementation for the comparators of
In the example of
Parity node processor 1304 can be configured to calculate the absolute value of outgoing messages with equation (16), i.e., the second term of equation (16). In other words, the sign and absolute value for equation (16) can be determined separately using the following:
Thus, parity node processor 1304 can be used to calculate the absolute value in accordance with equation (18) for a check node of degree m. Parity node processor 1304 can be implemented as a serial or parallel parity node processor as described above.
Output unit (OU) 1306 can be configured to simply connect the outputs of parity node processor 1304, i.e., {|λ′1|, |λ′2|, . . . , |λ′m|}, to the output ports {|λ1|, |λ2|, . . . , |λn|}. For example, suppose there are 8 inputs {|u1|, |u2|, . . . , |u8|} and DRU 1302 select m=3 of them. The selection results depends on the specific data value of {|u1|, |u2|, . . . , |u8|}. Suppose that for some specific inputs, the selection result is {u′1=|u2|, u′2=|u8|, u′3=|u5|}, then OU 1306 should connect |λ′1|, |λ′2| and |λ′3| to |λ2|, |λ8| and |λ5|, respectively and connect −1n A to |λ1|, |λ3|, |λ4|, |λ6|, |λ7|.
For this to be feasible, OU 1306 should be configured to operate in coordination with DRU 1302. For example, if the k-th input of DRU 1302, i.e., |uk|, is selected by DRU 1302 as the j-th input of parity node processor 1304, i.e., u′j, then OU 1306 can be configured to correspondingly connect the j-the output of parity node processor 1304 to |λk|.
It should be noted that while a parallel implementation of DRU 1302 can be paired with a parallel implementation of parity node processor 1304, and that a serial implementation of DRU 1302 can be paired with a serial implementation of parity node processor 1304, such us not required. In other words, a parallel implementation of DRU 1302 can be paired with a serial implementation of parity node processor 1304 and vice versa. Moreover, it may be better, depending on the requirements of a particular implementation to forgo the inclusion of DRU 1302 and OU 1306. For example, if decoding speed is of the most concern, then a combination of a parallel DRU 1302 and a parallel parity node processor 1304 can be the best choice. On the other hand, if hardware size and resources is the most important issue, then a serial parity node processor 1304 without any DRU 1302 or OU 1306 can be preferred. If the LPDC decoder is implemented, e.g., with a Digital Signal Processor (DSP), as in the Software Defined Radio (SDR) terminals, a serial DRU 1302 and a serial parity node processor can be preferred because it provides the least decoding delay.
Table 2 illustrates the LDPC complexity comparison with the degree reduction of
While certain embodiments of the inventions have been described above, it will be understood that the embodiments described are by way of example only. Accordingly, the inventions should not be limited based on the described embodiments. Rather, the scope of the inventions described herein should only be limited in light of the claims that follow when taken in conjunction with the above description and accompanying drawings.
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application Ser. No. 60/820,729, filed Jul. 28, 2006, entitled “Reduced-Complexity Algorithm for Decoding LDPC Codes,” which is incorporated herein by reference in its entirety as if set forth in full.
Number | Name | Date | Kind |
---|---|---|---|
6539367 | Blanksby et al. | Mar 2003 | B1 |
7454685 | Kim et al. | Nov 2008 | B2 |
7458009 | Yu et al. | Nov 2008 | B2 |
7747934 | Livshitz | Jun 2010 | B2 |
20030229843 | Yu et al. | Dec 2003 | A1 |
20060236195 | Novichkov et al. | Oct 2006 | A1 |
20080104474 | Gao et al. | May 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20080052558 A1 | Feb 2008 | US |
Number | Date | Country | |
---|---|---|---|
60820729 | Jul 2006 | US |