The invention will now be described, by way of example only, with reference to the enclosed views, wherein:
By way of introduction of a detailed description of preferred embodiments of the arrangement described herein invention, some of the theoretical principles underlying such an arrangement will now be briefly discussed by way of direct comparison with the related art described in the foregoing.
As a first point, the MIN-SUM (MS) approximation will be shown to be a straightforward simplification of the check-node computation.
In fact:
The reliability of the messages coming out of a check-node update can be expected to be dominated by the least reliable incoming message. The MS outputs are, in modulus, slightly larger than those output by a non-approximated check-node processor. This results in a significant error rate degradation.
For this reason, Chen et al. (already cited in the foregoing) have proposed to resort to Normalized-MS (N-MS) to partially compensate for these losses: N-MS typically consists of a simple multiplication of the output messages by a scaling factor. The factor can be optimized through simulations or, in a more sophisticated way, with density evolution as disclosed by Chen et al.
This approach recovers most of the performance gap caused by MS and makes MS a valid alternative to a full processing approach. An almost equivalent alternative to the N-MS is the Offset-MIN-SUM (O-MS), again disclosed by Chen et al., that performs slightly worse than N-MS.
A MS decoder does not require knowledge of the noise variance, which is of great interest when the noise variance in unknown or hard to be determined. More sophisticated approximations are able to perform nearly the same as a full precision approach, but generally require a data dependent correction term that makes the check-node processor more complex. This specific issue has been investigated in the art (see, e.g., Zarkeshvari, F. Banihashemi, A. H.: On implementation of min-sum algorithm for decoding low-density parity-check (LDPC) codes: GLOBECOM '02. IEEE Vol. 2, 17-21 November 2002, pp. 1349-1353).
Parallel or partially parallel architectures employ a multiplicity of check-node processors. For this reason any simplification of this computation kernel is of particular interest. When MS is adopted, the same modulus is shared by all outgoing messages from a check-node update processor; its value is equal to the smaller modulus among the incoming messages. The only exception is the outgoing message that corresponds to bit whose incoming massage has the smaller modulus. The modulus of such outgoing message is equal to the second smaller among the incoming messages.
Hence, the minimum check-to-bit information to be stored is much less in comparison with the approaches described so far. For that reason, Normalized MS approximation, with a memory efficient approach, is proposed here in conjunction with the layered decoding (L-SPA) to compensate for the MS performance degradation thanks to the faster convergence given by the scheduling modification. While a more detailed analysis of the storage capability will be provided in the following, with a detailed comparison with the other cases, it will noted that, by adopting the approach described herein, storing (i) two moduli; (ii) the signs of all the outgoing messages; (iii) the position of the least reliable message will suffice. The new approach is capable of outperforming conventional SPA with the same number of iterations, while requiring about 70% less memory. The approach considered here (which may be designated Layered-Normalized-MIN-SUM, i.e., L-N-MS) applies a memory efficient normalized MIN-SUM approach to a layered decoding schedule is schematically represented below.
where Ri1 and Ri2, are the smallest and second smallest check-to-bit message modulus, M(i) is the least reliable bit in equation i, Smi are the signs of the outgoing messages and α is the scaling factor of N-MS.
Performance of the L-M-MS proposed herein can be compared with performance achievable with: a layered decoding and pure MS (i.e., without normalization factor) (L-MS); with layered decoding algorithm (L-SPA); and with a conventional SPA.
For instance a meaningful comparison can be performed at 25 iterations. As a first example, a structured LDPCC code, designed by the team of Prof. Wesel (University of California Los Angeles) has been used for the comparison. Code is designed with same graph conditioning adopted in Vila Casado A. I.; Weng W.; Wesel R. D.: “Multiple Rate Low-Density Parity-Check Codes with Constant Block Length”, Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, Calif., 2004. The code is 1944 bits long with rate ⅔. It is designed with a combination of 8×24=192 cyclically shifted identity matrices and null matrices of size 81×81. The number of edges is equal to 7613 with maximum variable degree equal to 8 and maximum check degree equal to 13. The parity part is organized as described in
The upper right matrix D is defined (parity section only) by Eq 15 below for a rate ⅔ code structure.
The results show L-N-MS performs slightly better than conventional SPA, but requires much simpler check-node processing and a dramatically smaller amount of memory. The gap between L-SPA and L-MS is mostly recovered by means of the normalization factor. The normalization factor α has been optimized through simulations focusing on Frame Error Rate—FER equal to 10−2 with the resulting value equal to 1.35.
As a second example, a high rate structured LDPCC code of similar size has been selected among those proposed in Eleftheriou E.; Ölcer S.: Low density parity-check codes for digital subscriber lines, in Proc., ICC'2002, New York, N.Y., pp. 1752-1757. The code has a linear encoding complexity and supports layered decoding. It is 2209 bits long and it has rate 0.9149. In this case L-N-MS performs even slightly better than the L-SPA. An explanation could be found in the code structure that may have more short cycles compared to the previous example, so that SPA becomes less efficient. The normalization factor α was equal to 1.3.
Fixed-point implementation of N-MS would require a multiplication by a factor with a high accuracy in the quantization level and a significant complexity due to the operator itself. However, it is possible to simplify the normalization procedure at the cost of negligible performance loss.
The normalization can be implemented very efficiently with the following approach:
Q/α1α≅Q−(Q>>s) Eq 16
where the operator (x>>y) represent a y bits right shift of message x. For both examples s has been chosen equal to 2, that corresponds to a=1.333.
One may define a uniform quantization scheme (Nb,p), where Nb is the number of bits (including sign) and p is the fraction of bits dedicated to the decimal part (i.e., the quantization interval is 2−p). The adopted quantization schemes are the best for a given number of bits Nb. For the rate ⅔ code not even 8 bits are sufficient to perform close to the floating point precision. However, if the same quantization scheme is applied to decode a similar rate ⅔ code with size 648 bits, it results that L-N-MS with (8-4) performs better than floating point SPA at 12 iterations.
This result is consistent with the results reported in Zarkeshvari et al. (already cited), where it has been noted that the MS approximation works pretty well with short codes and quantized messages. For the higher rate code even 6 bits were found to lead to negligible losses.
The N-MS approach allows a significant reduction of the memory to store the check-to-bit messages Rij. In fact, the amount of memory turns out to be: (i) 2*nc*(Nb−1) bits for the modulus of the two least reliable check-to-bit messages of each check (where nc is the number of checks); (ii) the sign of all check-to-bit messages that result in E bits; (iii) the position of the least reliable message in the check that results in nc*ceil(log2(dc)) bits, where dc is (maximum) check-node degree, and [ceil] denotes the ceiling operator.
Table 2 below summarizes the results of comparison of the memory requirements for the approaches presented so far. Specifically, Table 2 refers to the memory needed to store the messages Rij and Qij and reports the results of comparison between conventional check-node and memory efficient MS approximation applied to different decoding algorithms.
The results in terms of memory requirements for the simulated codes indicate that the L-N-MS approach proposed herein requires 70% and 76% less memory than the conventional implementations of the SPA algorithm for rate ⅔ code and rate 0.9149 code, respectively. At the cost of some minor performance losses, memory requirements can be reduced by a factor 24%, 42% and 50% when the memory efficient MS solution is applied to SPA, M-SPA, and L-SPA, respectively, for the rate ⅔ code considered. For the rate 0.9149 code, the reduction amounts to 24%, 51% and 61%.
A “memory efficient” MS entails some significant, potential advantages that relate to the implementation of high-speed parallel decoders.
A first advantage lies in that a check-node requires much less input/output bits, so that routing problems can be scaled-down compared to a conventional approach. Secondly, in vectorized decoders explicitly dedicated to structured LDPCC (see, Novichkov et al. and WO-A-02/103631—both already cited), memory paging is designed so that all messages belonging to the same non-null sub-block in the parity check matrix are stored in the same memory word. A switch-bar is then adopted to cyclically rotate the message after/before the R/W operation. The approach discussed herein provides for the possibility of implementing switch-bars for A only.
With reference to the general layout of
Referring to
The decoder 20 herein is assumed (just by way of example, with no intended limitation of the scope of the invention) to operate with “parallelism 3”, i.e., a structured LDPCC with subblock size equal to 3 is assumed. The basic layout of the arrangement implemented in the decoder of
where Ri1 and Ri2 are the smallest and second smallest check-to-bit message modulus, M(i) is the least reliable bit in equation i, Smi are the signs of the outgoing messages and α is the scaling factor of N-MS.
The memory block designated A stores the messages Λj; each word contains the values belonging to three consecutive bit nodes.
The memory block designated S stores the signs Sij; three signs belonging to three consecutive messages └S3i,3j S3i+1,3j+1 S3i+2,3j+2┘ are arranged together to form a memory word.
The memory block designated R contains three messages related to the minimum and second minimum and minimum position, i.e., the memory block designated R contains three messages related to i) the value of the minimum, ii) the value of the second minimum and iii) the minimum position.
The messages are arranged together in such a way that all the messages related to the check equations that must be run in parallel (a super-code) can be read simultaneously; an example of memory word content is given below:
The input messages to the memory block A and the output messages therefrom are rotated back and forward according to the proper shift values.
In the embodiment shown herein, this function is performed via switch-bars 100, 102 arranged at the input and the output of the memory block A.
The messages coming out of the memory blocks A, S, and R are demultiplexed towards the proper blocks Q configured to perform the computation of the values {tilde over (Q)}ji In the embodiment shown herein, the demultiplexing is performed via three demultiplexers 104, 106, and 108 each serving a respective one of three blocks Q. As illustrated, a bit-to-check module 120 comprises a plurality of bit-to-check generators Q.
The three blocks Q in turn feed a corresponding block CNP (Check Node Processor). The CNP blocks are configured to perform the following functions:
The output messages from the CNP blocks are then multiplexed via multiplexer blocks 110, 112, and 114 to be written back at the proper addresses in the memory blocks A, S, and R. As illustrated, a check node module 130 comprises a plurality of check node processors CNP.
The present invention is not limited to the embodiments described above. For instance, the foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, schematics, and examples. Insofar as such block diagrams, schematics, and examples contain one or more functions and/or operations, it will be understood by those skilled in the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof. In one embodiment, the present subject matter may be implemented via ASICs. However, those skilled in the art will recognize that the embodiments disclosed herein, in whole or in part, can be equivalently implemented in standard integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more controllers (e.g., microcontrollers) as one or more programs running on one or more processors (e.g., microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of ordinary skill in the art in light of this disclosure.
All of the above U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet, are incorporated herein by reference, in their entirety.
From the foregoing it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited except as by the appended claims.
| Number | Date | Country | |
|---|---|---|---|
| 60787063 | Mar 2006 | US |