Low-density parity-check (LDPC) codes are a type of error correction code. In some applications, LDPC codes are used to correct for errors that are introduced by a (e.g., noisy) communication channel or by (e.g., degrading) storage media. New techniques which improve the performance of LDPC systems would be desirable. For example, reducing the processing time would be desirable because the error corrected data is output sooner and/or less processing resources or power is consumed.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Various embodiments of a low-density parity-check (LDPC) decoding technique that checks for early convergence are described herein. As will be described in more detail below, a (e.g., layered min-sum) LDPC decoder performs an early termination check before a full iteration of decoding is performed (e.g., at some fractional, pre-calculated iteration count). When decoding data with relatively few errors, an LDPC decoder which performs the techniques described herein can output the error corrected data earlier than other types of LDPC decoders which reduces processing time and/or conserves power. The following figure describes one LDPC decoding embodiment with early convergence.
At 100, low-density parity-check (LDPC) encoded data with one or more errors is received. For example, the encoded data may have been received from a communication channel or read back from storage (e.g., after being stored for many years on degrading storage media) and errors were introduced into the LDPC encoded data.
At 102, information associated with an early convergence checkpoint is received. In some embodiments, the information received at step 102 is a signal (e.g., a collective monitored state signal based on monitored states) that indicates when the early convergence checkpoint has been reached or passed. In some embodiments, the information received at step 102 is a partial and/or fractional iteration where the fractional iteration is strictly greater than 0 and strictly less than 1 and the fractional iteration corresponds to the early convergence checkpoint. The LDPC decoder in such embodiments tracks the (fractional) iteration count and pauses decoding when the fractional count received at step 102 is reached.
At 104, the information associated with the early convergence checkpoint is used to perform (e.g., layered min-sum) LDPC decoding on the LDPC encoded data up to the early convergence checkpoint and generate a decoded codeword, wherein the early convergence checkpoint is prior to a first complete iteration of the LDPC decoding.
At 106, it is determined if the LDPC decoding is successful. For example, the decoded codeword generated at step 100 may or may not still include errors and decoding is not declared a success (in one example) unless the syndrome vector is all zeros or otherwise indicates all errors have been successfully removed from the decoded data.
If it is determined that decoding is successful at 106, the decoded codeword (e.g., generated at 104) is output at 110. For example, if there are only a few errors, then a fraction of a complete iteration may be sufficient to remove all of the errors. With other types of LDPC decoders, the decoder completes at least one complete iteration before checking if decoding is successful. With early convergence checking, the decoded data can be output sooner and power and processing resources may be conserved when possible (e.g., only a few errors that do not require a full or complete decoding iteration).
If it is determined that decoding is not successful at 106, the (e.g., layered min-sum) LDPC decoding continues at 108 (e.g., until successful or a timeout is reached). If the resumed LDPC decoding at step 108 is successful, the decoded codeword is output. In some cases, a timeout is reached and the resumed LDPC decoding is halted; an error is then declared or a different type of decoding (e.g., stronger but slower) is attempted.
Before describing more detailed examples of the early decoding termination process shown in
A receiver (206) coupled to the communication channel (204) inputs received data and demodulates and/or extracts LDPC encoded data with errors from the received data with errors. An LDPC decoder (208a) then decodes the received, compressed data and outputs the uncompressed data. In some embodiments, LDPC decoder (208a) performs an early decoding termination process (e.g.,
The communication channel (204) introduces noise and/or errors so that the data received by the receiver (206) contains noise. Error correction encoding the data with an LDPC code prior to transmission may reduce the number of errors and/or indecipherable messages at the receiver.
To obtain the original data, the storage interface (220) reads back data (with errors) stored on the storage (222) and passes the LDPC encoded data (with errors) to the LDPC decoder (208b). The LDPC decoder (208b) decodes the LDPC encoded data and outputs the data (e.g., error corrected as or if needed). In some embodiments, LDPC decoder (208b) performs an early decoding termination process (e.g.,
Before describing more detailed examples of early decoding termination, it may be helpful to first describe LDPC decoding in more detail. First, an example of min-sum LDPC decoding is described. Then, a layered min-sum LDPC decoding example is described.
Quasi-cyclic low-density parity-check (QC-LDPC) codes are a special class of the LDPC codes with a structured
In this example, a message-passing (MP) LDPC decoder is shown. In general, the (layered) min-sum decoding controller (300) controls the message passing between the variable nodes (302a) and the check nodes (304a). Periodically, the controller (300) will check the syndrome vector (308); if the syndrome vector is all-zero then all of the errors have been removed from the LDPC encoded data and the decoded codeword (310) is output. Message passing is the most efficient way to achieve near-optimal decoding of LDPC codes. In message passing decoding, the variable node (VN) and check node (CN) update rules as follows. For brevity and notational conciseness, examples described herein may simply use i (instead of vi) to denote a variable node and j (instead of cj) to denote a check node. A variable node i receives an input message Lich from the channel, typically the log-likelihood ratio (LLR) of the corresponding channel output, defined as follows:
where ci∈{0, 1} is the code bit and ri is the corresponding received symbol.
An iterative message passing decoder alternates between two phases: a variable node to check node phase during which variable nodes send messages to check nodes along their adjacent edges and a check node to variable node phase during which check nodes send messages to their adjacent variable nodes. The message update rules (which are described in more detail below) are depicted schematically in
In the initialization step of the decoding process, variable node i forwards the same message to all of its neighboring check nodes, (i), namely the LLR Lich derived from the corresponding channel output. In the check node-to-variable node message update phase, check node j uses the incoming messages and CN update rule to compute and forward, to variable node i∈(j), a new check node to variable node message, Lj→i. Variable node t then processes its incoming messages according to the variable node update rule and forwards to each adjacent check node, (i), an updated variable node to check node message, Li→j. After a pre-specified number of iterations, variable node i sums all of the incoming LLR messages to produce an estimate of the corresponding code bit i. Note that all of the check node to variable node message updates can be done in parallel, as can all of the variable node to check node message updates. This enables efficient, high-speed software and hardware implementations of the iterative message passing decoding.
Li→j and Lj→i are the messages sent from variable node i to check node j and from check node j to variable node i, respectively. (i) is the set of check nodes directly connected to variable node i and (j) is the set of variable nodes directly connected to check node j. Then, the message sent from variable node i to check node j in SPA decoding is given by:
and the message from check node j to variable node i is computed as:
Pi is the a posterior probability (APP) message of variable node i:
The decoded word v is defined as the hard-decision of the APP messages
[v1,v2,v3, . . . ,vn-1,vn]hd([P1,P2,P3, . . . ,Pn-1,Pn]) (5)
where hd(Pi) is 1 if Pi<0, or 0 otherwise. A decoding success is declared if the resulting syndrome vector (γ) is all zeros, such that:
γ
During the decoding process, the decoded word (
γ←γ⊕(viold⊕vinew)·hi (7)
where vinewhd(Pi) is the updated hard decision i-th bit, and hi denotes the i-th row of the parity check matrix
A popular way to perform min-sum decoding is to have variable nodes take log-likelihood ratios of received information from the channel as an initial input message (i.e., Li→j=Lich) and employ the following equivalent check node update rule:
where 0<α and β<1 are the attenuation factor and attenuation rounding parameter, respectively, which can be either pre-fixed or dynamically adjusted. These satisfy the requirement that:
α+β≥1 (9)
This ensures that a minimum check node to variable node message of 1 is not attenuated to zero.
Channel LLR inputs may be conveniently scaled for min-sum decoding but precise information is needed for the original sum-product decoding. The following notations help to simplify the above calculation in the algorithmic procedure. Let:
si→jhd(Li→j) (10)
be the binary sign representation, which converts to the actual sign value in terms of:
sign(s)=(−1)s.
Let s(j) be the total sign of all variable nodes i's to the check nodes j:
s(j)⊕i′∈v(j)si′→j. (11)
Let L1(j) and iL
L1(j)mini′∈v(j)|Li′→j|, iL
and let L2(j) be the second minimum (e.g., second lowest) variable node message to check node j:
Let {circumflex over (L)}1(j) and {circumflex over (L)}2(j) be the attenuated minimum and second minimum variable node messages to check node j, that is:
{circumflex over (L)}1(j)└α·L1(j)+β┘, {circumflex over (L)}2(j)└α·L2(j)+β┘ (14)
and therefore with the above notations, Equation (8) is conveniently re-expressed by:
A decoding success is declared if the all-zero syndrome results after the first full iteration.
When a QC-LDPC code with b×b circulants is in use, each circulant of b bits is updated independently and in parallel.
In hardware implementations of iterative message passing LDPC decoding, the decoding efficiency of min-sum decoding can be further improved using a layered approach. Layered min-sum decoding is based on a serial (e.g., sequential, ordered, etc.) update of check node messages. Instead of sending all messages from variable nodes to check nodes, and then all messages from check nodes to variable nodes, the layered coding goes over the check nodes in (some) sequential order such that, to each updating check node, all messages are sent in and processed, and then sent out to neighboring variable nodes. Such scheduled and/or serial updates to the check nodes enable immediate propagation of the newly updated message, unlike the flooded scheme where the updated messages can propagate only at the next iteration. To put it another way, the flooding approach is not amenable to stopping or pausing or decoding between decoding iterations and outputting the decoded codeword at that time whereas the scheduled and/or serial approach of layered decoding does permit or otherwise allow for this.
The layered min-sum decoding approach roughly increases convergence speed by two times compared to the flooded min-sum decoding approach. Moreover, it provides a good trade-off between speed and memory. This is achieved by iterating over dynamic check node to variable node messages, denoted by
where the superscript (last) denotes a latest updated piece of data. It is noted that in layered min-sum decoding, the variable node to check node message updated at the last layer (all but the last are from the current iteration) is utilized to update the check node to variable node Qi in the current layer. In contrast, in flooded decoding updating, a check node to variable node message Lj→i utilizes the variable node to check node messages each generated at the last iteration.
The APP message Pi at the layer j is calculated as:
Pi(j)=Qi(j
where Lj
Q1(j)=Pi(j)−Lj→iold, (18)
where Lj→iold was saved during the preceding iteration. The layered decoding can be applied to all types of iterative message passing decoding, including the sum-product algorithm (SPA) and min-sum decoding. A hardware amenable layered min-sum decoding process without early convergence is described in
Line 9 (404) is the first opportunity for a (successfully) decoded codeword to be output. As shown in line 9, the decoding process must be beyond the first iteration. This means that the first opportunity for the decoded codeword to be output in this older and slower technique is right after the first iteration when: KITHP036C1=0 (see, e.g., the for loop at line 1), j=0 (see, e.g., the for loop at line 2), and i=0 (see, e.g., the for loop at line 4).
Note the exemplary decoding process shown in
To put it another way, J is a set iterating through indexes 0, 1, 2, . . . ,
such that Ja={ab, ab+1, ab+2, . . . , ab+b−1}.
Returning to the convergence check at the end of the first iteration at line 9 (404), the rationale behind this is based on the initialized values (400). Note that the decoded word (
Although checking the syndrome at the end of the first full iteration (as shown in
In a non-QC-LDPC example, there are n variable nodes (500) that output a plurality of variable bits. n monitoring blocks (502) each input a corresponding variable bit and determine when that variable bit (variable node) has been (e.g., sufficiently) processed so that if a check of the syndromes were performed, a false positive would not result. For example, if a state signal output by a monitor (502) is a value of 1 (as an example), that means that the corresponding variable node or variable bit has had at least some LDPC decoding performed on it (e.g., so a syndrome which is generated from that variable node or variable bit can be trusted), whereas a 0 means that the corresponding variable bit has not yet been (e.g., sufficiently) processed. The monitoring blocks are illustrative and/or exemplary to convey the concept and in some embodiments are not necessary because there is already a signal within the system that can be reused or otherwise repurposed as the state signal.
The state signals generated by the monitoring blocks (402) are input to an AND block (504) to generate a collective state signal. The collective state signal is a 1 when all of the variable bits have been processed (e.g., indicated by the state signals all being 1).
The collective state signal is input by a layered min-sum decoding controller (506). When the collective state signal goes from a 0 to a 1, LDPC decoding is paused (e.g., message passing is paused) while the controller (506) checks the syndrome (508) to see if the syndrome is all zeros. If the syndrome (508) is all zeros, then the controller (506) outputs the decoded codeword (510) as the output data.
In some embodiments, a QC-LDPC code is used. In such embodiments, the H matrix is a quasi-cyclic
The following figures describe this more formally and/or generally in a flowchart.
At 100, LDPC encoded data with one or more errors is received.
At 102a, information associated with an early convergence checkpoint is received, including by receiving a collective state signal that is based at least in part on a plurality of states associated with a plurality of variable nodes. For example, in
At 104a, the information associated with the early convergence checkpoint is used to perform decoding on the LDPC encoded data up to the early convergence checkpoint and generate a decoded codeword, wherein the early convergence checkpoint is prior to a first complete iteration of the layered min-sum LDPC decoding; and in response to the collective state signal indicating that the early convergence checkpoint has been reached, pausing the LDPC decoding, wherein the LDPC decoding includes layered min-sum LDPC decoding. For example, the controller (506) in
The process then continues to step 106 in
At 100b, LDPC encoded data with one or more errors is received, wherein the LDPC encoded data includes QC-LDPC encoded data.
At 102b, information associated with an early convergence checkpoint is received, including by receiving a collective state signal that is based at least in part on a plurality of states associated with a plurality of circulants. As described above, with QC-LDPC codes, it is sufficient to monitor the circulants. This is more efficient (for QC-LDPC embodiments) than monitoring all of the variable nodes (as an example) since there are fewer circulants than variable nodes and so less monitoring logic and/or routing is used when circulants are monitored.
At 104b, the information associated with the early convergence checkpoint is used to perform layered min-sum LDPC decoding on the LDPC encoded data up to the early convergence checkpoint and generate a decoded codeword, including by in response to the collective state signal indicating that the early convergence checkpoint has been reached, pausing the layered min-sum LDPC decoding, wherein the early convergence checkpoint is prior to a first complete iteration of the layered min-sum LDPC decoding and the LDPC decoding includes layered min-sum LDPC decoding.
The process then continues to step 106 in
Depending upon the implementation and/or application, a different embodiment for determining when the early convergence checkpoint has occurred may be desirable. For example, suppose that due to the specific implementation of a layered min-sum decoder, the variable nodes or circulants are spread out. Monitoring (some examples of which are described in
With layered min-sum decoding, the sequence in which decoding is performed in the various layers (e.g., the ordering and/or timing when variable bits (circulants for QC-LDPC parallel processing) are processed within the first layer, the second layer, etc.) is known ahead of time. This known sequencing or ordering permits the exact layer number (denoted by τ) along with the associated variable index (circulant index for QC-LDPC parallel processing) (denoted by τ′) that corresponds to zero unprocessed variable bits (circulants for QC-LDPC parallel processing) which in turn permit a fractional iteration count (denoted by η) to be calculated or otherwise determined as:
where m is the number of layers of
The following figures show a visual example of this calculation using a QC-LDPC example.
During the first layer (702), shifted identity matrices I0-I6 are processed. Columns with a non-zero value (i.e., a circulant) that are processed during the first iteration are indicated using a check mark above those columns.
Once all of the columns in the
Returning to Equation (19), the fractional iteration count (η) would be calculated using the QC-LDPC version of Equation (19) as
wherein ωr denotes the Hamming weight of the τ-th layer, and τ′ denotes the threshold index of the last unprocessed circulant. Conceptually, the first term in the sum represents the number of complete or full layers to reach I20 (722), in this case the first layer (702) in
As a practical matter, it may be difficult to pause decoding in the middle of a layer. For example, in
where the equation is given for non-QC-LDPC codes. Conceptually, in the context of the
Another way to describe this is to say that it's not worth the effort of precisely calculating the second term in Equation (19) for those embodiments where the decoder cannot easily or feasibly stop in the middle of a layer or row, so the second term in Equation (19) is “rounded up” to the equivalent of a full layer or row.
Returning briefly to the decoding example shown in
The following figure describes these examples more formally and/or generally in a flowchart.
At 100, LDPC encoded data with one or more errors is received. In some embodiments, the LDPC encoded data is QC-LDPC encoded data.
At 102c, information associated with an early convergence checkpoint is received, including by receiving a fractional iteration count. For example, the fractional iteration count (η) may be calculated per Equation (19) or the fractional iteration count (ηsimplified) may be calculated per Equation (20).
At 104c, the information associated with the early convergence checkpoint is used to perform layered min-sum LDPC decoding on the LDPC encoded data up to the early convergence checkpoint and generate a decoded codeword, including by receiving a current iteration of the LDPC decoding; and in response to the current iteration exceeding the fractional iteration count, pausing the LDPC decoding, wherein the early convergence checkpoint is prior to a first complete iteration of the LDPC decoding and the LDPC decoding includes layered min-sum decoding.
As described above, in some cases there may be certain points at which it is easier or more convenient to pause decoding (e.g., at the end of a layer) and in some embodiments, the decoding is paused there.
The process then continues to step 106 in
Returning briefly to
With the permutation shown in
The following figure describes these examples more formally and/or generally in a flowchart.
At 1000, a first layered decoding scheduling and a second layered decoding scheduling are received. As used herein, layered decoding scheduling refers to the schedule by which (e.g., a particular implementation of) layered min-sum decoding is performed. For example, the Ĥ matrices (700a-700c) in
At 1002, a first potential fractional iteration count is determined based at least in part on the first layered decoding scheduling. At 1004, a second potential fractional iteration count is determined based at least in part on the second layered decoding scheduling. See, for example, Equations (19) and (20) and the example described in
At 1006, a minimum one of the first potential fractional iteration count or the second potential fractional iteration count is selected to be the fractional iteration count. For example, a lower fractional iteration count permits the early convergence checkpoint to be reached sooner so that more power and/or processing resources can be conserved.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Number | Name | Date | Kind |
---|---|---|---|
8826109 | Zhang | Sep 2014 | B2 |
20110154150 | Kang | Jun 2011 | A1 |
20110293045 | Gross | Dec 2011 | A1 |
20140068381 | Zhang | Mar 2014 | A1 |
Entry |
---|
Chen et al., “Reduced-Complexity Decoding of LDPC Codes”, IEEE Transactions on Communications, vol. 53, No. 8, Aug. 2005, pp. 1288-1299. |
Kim et al., “A Reduced-Complexity Architecture for LDPC Layered Decoding Schemes”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, No. 6, Jun. 2011., pp. 1099-1103. |
Mansour et al., “High-Throughput LDPC Decoders”, IEEE Transactions on Very Large Scale Integration Systems, vol. 11, No. 6, Dec. 2003, pp. 976-996. |
Richardson et al., “The Capacity of Low-Density Parity-Check Codes Under Message-Passing Decoding”, IEEE Transactions on Information Theory, vol. 47, No. 2, Feb. 2001, pp. 599-618. |