Features, aspects, and embodiments of the inventions are described in conjunction with the attached drawings, in which:
In the descriptions that follow, certain example parameters, values, etc., are used; however, it will be understood that the embodiments described herein are not necessarily limited by these examples. Accordingly, these examples should not be seen as limiting the embodiments in any way. Further, the embodiments of an LDPC decoder described herein can be applied to many different types of systems implementing a variety of protocols and communication techniques. Accordingly, the embodiments should not be seen as limited to a specific type of system, architecture, protocol, air interface, etc. unless specified.
A check node processor 302 of degree n is shown in
With the standard sum-product algorithm, the outgoing message is determined as follows:
The outgoing soft messages are then fed back to the variable node processors for use in generating outputs ui during the next iteration; however, a soft message λi based on a variable node output from a particular node are not returned to that node. Thus, the j≠i constraint in the following term of (1):
This can also be illustrated with the aide of
The messages produced by parity node processor 202 can be defined using the following equations:
Thus parity node processor 202 can be configured to implement the above equations (2). The soft messages produced by the parity nodes, e.g., parity node 202, are then fed back to variable nodes 208, 210, 212, 214, 216, and 218, for use in the next iteration.
For example,
Variable node processor 208 can be configured to implement the following equation:
u
0
k
=u
ch,0+λk(0>0)+λk(2>0), (3)
where uch,0 is the message from the channel, which does not change with each iteration
It will be understood that the decoder described above can be implemented using hardware and/or software configured appropriately and that while separate parity check processors and variable node processors are described, these processors can be implemented by a single processor, such as a digital signal processor, or circuit, such as an Application Specific Integrated Circuit (ASIC); however, as mentioned above, implementation of a LDPC processor such as that described with respect to
As noted above, the sum-product algorithm of equation (1) can be prohibitive in terms of practical and cost effective implementation. Approximations have been proposed with the aim of reducing this complexity. For example, it can be shown that (4) is equivalent to (1):
λi=u1{circle around (+)}u2{circle around (+)} . . . {circle around (+)}un, (4)
where the operator {circle around (+)} is defined as:
Using the approximation formula:
e
x
+e
y≈max(ex,ey)=emax (x,y). (6)
Or equivalently,
1n(ex+ey)≈max(x, y) (7)
in both numerator and denominator of (5), then the following can be obtained:
Repeatedly substituting (8) into (4), the min-sum algorithm (MSA) can be obtained as follows:
It will be apparent that equation (9) is much simpler to implement than (1) or (4), but the cost for this simplification is a grave performance penalty, generally about 0.3˜0.4 dB, depending on the specific code structure and code rate. To reduce such performance loss, some modifications have been proposed. For example, the performance loss of MSA comes from the approximation error of (9) relative to (1). Accordingly, to improve the performance loss, the approximation error should be reduced. It can be shown that (9) is always larger than (1) in magnitude. Thus, normalized-MSA and offset-MSA use scaling or offsetting to force the magnitude be smaller.
With the normalized min-sum algorithm, (9) is scaled by a factor α:
The offset min-sum algorithm reduces the magnitude by a positive constant β:
But these approaches again increase the complexity. Thus, as mentioned above, there is a constant trade-off between complexity and performance.
The embodiments described below use a new approach for the check nodes update in the decoding of LDPC codes. The approach is based on a new approximation of the SPA that can reduce the approximation error of the MSA and has almost the same performance as the SPA under both floating precision operation and fixed-point operation. As a result, the new approximation can be implemented in simple structures, the complexity of which is on par with MSA implementations.
The approximation error of MSA comes from the approximation error of equation (7). Note that equation (7) is coarse when x and y are close. MSA uses equation (7) in both numerator and denominator of equation (5). If the value of |x| and |y| is close, then either the numerator or the denominator can introduce large approximation error. Thus, to improve the accuracy of the outgoing message, equation (7) can be used in (5) only when the numerator or denominator of (5) will produce a small approximation error.
For example, when both x and y have the same sign, then using the approximation 1+ex+y≈max(e0,ex+y) in the numerator will produce better results than using ex+ey≈emax(x,y) in the denominator. Similarly, when x and y have opposite signs, then only approximating the denominator of (5) using ex+ey≈emax(x,y) can produce better results. Thus, a better approximation of (5), for x, y>0, can be generated using the following:
For all combinations of the signs of x and y, the following general expression can be used:
x{circle around (+)}y≈−sgn(x)sgn(y)1n(e−|x|+e−|y|) (13)
Iteratively substituting (13) into (4), produces:
Note that (14) only holds when
Now, let
then (15) can be expressed as:
The sign of (16) can be realized in the same way as in a MSA implementation, e.g., with binary ex-or logic circuit. The kernel of the approximation has the invertibility property, which allows the computation of the aggregate soft messages first, followed by intrinsic back-out to produce extrinsic updates.
The amplitude of equation (16) can be realized with a serial structure or a parallel structure shown in
Both structures 600 and 700 have the same computation load. Serial structure 600 requires smaller hardware size, but needs 2n clock cycles to get all outgoing soft messages. Parallel structure 700 requires only 1 clock cycle, but needs larger hardware size than serial structure 600. Parallel structure 700 is attractive when the decoding speed is the primary concern. It will be understood that the exponential and logarithm operations in
The 1n(•) operation can include the min (•,0) operation, which can be implemented by simply using the sign bit of the logarithm result to clear the output. In particular, if the logarithm is realized with look-up table, this can be done by simply setting the content of the table to 0 for all inputs greater than 1 or simply limiting the range of the address used to pick up the table content.
The implementations of
The computation complexity of the proposed implementations is similar to an MSA implementation. Table 1 is the comparison of the computation load for parity node processing for various decoding algorithm, where it has been assumed that SPA, MSA, normalized-MSA and offset-MSA are implemented in a known forward-backward manner.
In the simulations, the variable node updates are integer summations with results ranging from −128˜+128. The exponential operation, e.g., in
It can be seen from the graphs of
Moreover, although it can be challenging to meet the dynamic range requirements for the exp ( ) operation, the simulation results show that the fixed-point operation has hardly any performance loss relative to the floating operation. Note that the number of quantization bits can be greatly reduced with non-uniform quantization, with increased complexity. With non-uniform quantization, the size of the logarithm and exponential tables can be reduced, but these quantized values should be first mapped to the linearly quantized values before the operation of summation in
Accordingly, using the systems and method described above, the resources, i.e., complexity, required to implement a parity node can be reduced, while still maintaining a high degree of precision. In certain embodiments, the complexity can be reduced even further through degree reduction techniques. In other words, the number of inputs to the parity node can be reduced, which can reduce the resources required to implement the parity node. It should also be noted that in many parity node implementations, the sign and the absolute value of the outgoing soft message are calculated separately.
The outputs of DRU 1302 can then be provided to parity node processor 1304. Parity node processor 1304 can be implemented using either the serial configuration of
Similarly, depending on the embodiment, DRU 1302 can be implemented in parallel or serial structures.
An example, implementation for the comparators of
In the example of
Parity node processor 1304 can be configured to calculate the absolute value of outgoing messages with equation (16), i.e., the second term of equation (16). In other words, the sign and absolute value for equation (16) can be determined separately using the following:
Thus, parity node processor 1304 can be used to calculate the absolute value in accordance with equation (18) for a check node of degree m. Parity node processor 1304 can be implemented as a serial or parallel parity node processor as described above.
Output unit (OU) 1306 can be configured to simply connect the outputs of parity node processor 1304, i.e., {|λ′1|, |λ′2|, . . . |λ′m|}, to the output ports {|uλ|, |λ2|, . . . |λn|}. For example, suppose there are 8 inputs {|u1|, |u2|, . . . |u8|} and DRU 1302 select m=3 of them. The selection results depends on the specific data value of {|u1|, |u2|, . . . |u8|}. Suppose that for some specific inputs, the selection result is {|u′1|, |u2|, u′2=|u8|, u′3=|u5|}, then OU 1306 should connect |λ′1|, |λ′2| and |λ′3| to |λ2|, |λ8| and |λ5|, respectively and connect −1n A to |λ1|, |λ3|, |λ4|, |λ6|, |λ7|.
For this to be feasible, OU 1306 should be configured to operate in coordination with DRU 1302. For example, if the k-th input of DRU 1302, i.e., |uk|, is selected by DRU 1302 as the j-th input of parity node processor 1304, i.e., u′j, then OU 1306 can be configured to correspondingly connect the j-the output of parity node processor 1304 to |λk|.
It should be noted that while a parallel implementation of DRU 1302 can be paired with a parallel implementation of parity node processor 1304, and that a serial implementation of DRU 1302 can be paired with a serial implementation of parity node processor 1304, such us not required. In other words, a parallel implementation of DRU 1302 can be paired with a serial implementation of parity node processor 1304 and vice versa. Moreover, it may be better, depending on the requirements of a particular implementation to forgo the inclusion of DRU 1302 and OU 1306. For example, if decoding speed is of the most concern, then a combination of a parallel DRU 1302 and a parallel parity node processor 1304 can be the best choice. On the other hand, if hardware size and resources is the most important issue, then a serial parity node processor 1304 without any DRU 1302 or OU 1306 can be preferred. If the LPDC decoder is implemented, e.g., with a Digital Signal Processor (DSP), as in the Software Defined Radio (SDR) terminals, a serial DRU 1302 and a serial parity node processor can be preferred because it provides the least decoding delay.
Table 2 illustrates the LDPC complexity comparison with the degree reduction of
While certain embodiments of the inventions have been described above, it will be understood that the embodiments described are by way of example only. Accordingly, the inventions should not be limited based on the described embodiments. Rather, the scope of the inventions described herein should only be limited in light of the claims that follow when taken in conjunction with the above description and accompanying drawings.
This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application Ser. No. 60/820,729, filed Jul. 28, 2006, entitled “Reduced-Complexity Algorithm for Decoding LDPC Codes,” which is incorporated herein by reference in its entirety as if set forth in full.
Number | Date | Country | |
---|---|---|---|
60820729 | Jul 2006 | US |