The present invention relates to error correction systems for computer data. More specifically, the invention relates to the use of parity check codes such as a low density parity check code (“LDPC”).
The transmission of binary computer data involves the introduction of errors, which must be detected and corrected, if possible. Although the difference between the two binary values, zero and one, seems clear, like the difference between black and white, in practice an electronic device may have difficulty distinguishing the difference. The difference between binary values may be detected as a voltage difference, but electronic noise in a circuit can interfere and render the difference less certain. This uncertainty must be dealt with. One option is to reject the data input and request retransmission. However, this is impossible with some fast flowing digital signals with substantial volume, such as digital TV, and is impractical in many situations. Accordingly, error correction systems have been developed to detect and correct errors. Communication systems often use forward error correction to correct errors induced by noise in the channel. In such systems, the error correction occurs at the receiver. One such system is parity check coding. One example of parity check coding is “low density parity check” coding (“LDPC”).
Forward error correction consists of adding redundancy to data. Block codes, such as the LDPC codes, segment the data into blocks. These blocks have additional bits added according to a specified algorithm, to create a codeword. This codeword is transmitted to the receiver over the channel. The data that is transmitted is binary in nature, meaning that it is either a logical “1” or a logical “0”. Noise is added by the channel, and the receiver detects each of the bits of the codeword and makes a best initial determination as to whether the bit is a logical 1 or 0. The receiver might also have the ability to assign a confidence in its guess. These guesses are called soft bits.
When a receiver gets a codeword, it is processed. The coding information added to original data is used to detect and correct errors in the received signal and thereby recover the original data. For received values with errors, the decoding system will attempt to recover or generate a best guess as to the original data.
As noted above, the receiver can reject data input containing errors. Retransmission may increase the reliability of the data being transmitted or stored, but such a system demands more transmission time or bandwidth or memory, and in some applications, such as digital TV signals, it may be impossible with current technology. Therefore, it is highly desirable to perfect error detection and correction of transmitted data.
LDPC systems use an iterative decoding process which is particularly suitable for long codewords. In general, LDPC codes offer greater coding gains than other, currently available codes. The object is to use parallel decoding in the LDPC's iterative process to increase speed. In order to accomplish this, the inherent parallelism of an LDPC code must be found and exploited. There is also a need to reduce the amount of memory accesses and total memory required per iteration. To make the LDPC coding work as efficiently and quickly as possible, careful attention must be drawn to the storage of data and routing the data to the storage during the iterations.
U.S. Pat. No. 6,633,856 to Richardson et al. (“Richardson”), discloses two LDPC decoder architectures, a fast architecture and a slower architecture. In the slow architecture, a single iteration consists of two cycles. There is an edge memory consisting of one location for each edge in the Tanner Graph or, equivalently, there is one location for each 1 in the H matrix. There is also an input buffer which requires a memory location for each input variable, or equivalently, there is a memory location for each column of the H matrix. The two memories do not require the same resolution, the high resolution memory is the edge memory, and the low resolution memory is the input buffer. In the fast architecture, a single iteration consists of a single memory cycle. There are two edge memories and a single input buffer required.
The current invention involves a parallel SISO structure that allows the decoder to process multiple parity equations at the same time. There is a new SISO decoder which allows for the updating of the Log-likelihood-ratios in a single operation, as opposed to the two pass traditionally associated with the Tanner Graphs. In the decoder, there is a mapping structure that correctly aligns the stored estimates to the stored differences for presentation to the SISOs. There is also the ability to deal with multiple instances of the same data being processed at the same time. This structure manages the updates and the differences in such a manner that all calculations on a single piece of data that are processed in parallel are incorporated correctly in the new updated estimates.
The LDPC architecture of the present invention makes better use of memory and processing capacity during decoding. In the present invention, a single iteration consists of a single memory cycle. Two memories are disclosed. The first is a difference array which has a memory location for each of the ones in the H matrix, and the second is a current array which has a memory location for each of the columns in the H matrix. The current array may use high resolution memory, but the difference array requires only low resolution memory.
The LDPC architecture of the present invention requires the same number of memory cycles as the fast architecture of the Richardson architecture, but the present invention only requires the same number of memory locations as the slow architecture. Furthermore, the Richardson architectures require the larger memory to have higher resolution, while the present invention requires only the small memory as the higher resolution. The result is that, even with the same number of memory locations as the slow architecture of Richardson, the number of memory bits required by the present invention is less than required by even the slow architecture of Richardson.
Another significant difference between the present invention and the Richardson architectures is how permutations are handled. The Richardson architecture stores all the variable messages in their unpermuted form and the check messages in their permuted form. This requires a permutation block for each memory access. The architecture of the present invention represents the differences in their permuted form, and the variable nodes are stored in the same permutation as the last time they were accessed. They are permuted to the correct orientation each time they are used. The consequence is that only one permutation is required per iteration instead of the two required by the Richardson architecture. This is a significant savings, as the permuter is a fairly large function.
a is a decoder architecture with no parallelism.
b is a decoder architecture for expanded codes which allows for parallel processing of data.
a shows an expanded H-Matrix with permuted sets.
b shows the H-Matrix of
a shows a circuit that finds the minimum value in a sequential list of values, and passes all the non-minimums through. It also gives the sequence number in the list of the minimum value.
b shows the minimum function block.
a shows the sign bit path of the SISO circuit.
b shows the magnitude field path of the SISO circuit.
1. The Coding Process
Communication systems often use forward error correction to correct errors induced by noise in a transmission channel. In such forward error correction systems, the detection and correction of errors occur at the receiver. Bits received through the channel are detected at the receiver as “soft” values. A soft value represents the “best guess” that the receiver can make for the value of the bit that was sent and the confidence in that guess. In essence, data is sent as a single bit, and received as a multi-bit sample. During transmission, a single bit of data may pick up noise, so that it is necessary to use more than a single bit to identify the sampled data. For example, in a binary system, if a “1” is coded as 5 volts and a “0” as 0 volts, then each can be represented with a single bit. If a value of 4.2 volts is received, then this is close to representing a “1”, but the receiver will use multiple bits to represent how close to the 5 volts the sampled data resides.
A typical format for the received data is signed magnitude, where the first bit is a sign bit representing the hard decision data, and the remainder of the bits represent the confidence in the hard decision bit. A “hard decision” is a single bit. In the example set out immediately above, the receiver reads 4.2 volts, but could output a “1” as a hard decision, which would indicate 5 volts. This is shown in
A type of forward error correction is low density parity check codes (LDPC). Low Density Parity Check codes are codes that have a “sparse” H-Matrix. A sparse H-Matrix is one in which there are many more zeroes than ones in the H-Matrix. For illustration here, a representative (non-sparse) H-Matrix 201 is shown in
In practice, an H-matrix will be much larger than the exemplary matrix of
2. The SISO
As noted above, inputs are received in a signed magnitude representation. The inputs are stored in an input buffer 251 in
In its basic operation, the “Soft-In-Soft-Out” (“SISO”) function of an LDPC decoder evaluates each of the parity equations rowi 202, represented by the rows 202 of the H-Matrix 201 using the current estimates C 220, and if the parity equation is satisfied, will increase the confidence of the current estimates ck 221 for those current estimates ck 221 related to rowi 202. If the parity equation rowi 202 is not satisfied, the confidence of each current estimate ck 221 related to rowi 202 will be decreased. It is possible to decrease the confidence to the point that a current estimate's hard decision bit is actually flipped, producing a correction of erroneous data.
The parity equations that the SISO evaluates are determined by the multiplication of the H-Matrix 201 by the input vector I 210 and the multiplication of the H-Matrix 201 by the current estimate vector C 220. This multiplication yields the parity equations
i0+11+i3+i5+i9
i1+12+i4+i5+i6
i0+12+i3+i6+i7
i0+11+i4+i7+i8
i2+13+i4+i8+i9
for the inputs and the parity equations
c0+c1+c3+c5+c9
c1+c2+c4+c5+c6
c0+c2+c3+c6+c7
c0+c1+c4+c7+c8
c2+c3+c4+c8+c9
for the current estimates.
For each evaluation of a parity equation, the SISO outputs a difference for each of the inputs. This value is the difference between the input to the SISO and the estimate that this particular equation provides for that data. Referring to
The SISO 258 takes as inputs all the inputs identified by a row in the H-Matrix. As an example, for row 0 of the matrix in
After one complete iteration cycle, each of the parity equations, row 0 through row 4, will have been evaluated once, and the contents of the CA will be as follows:
c0′=c0+d0,0+d0,2+d0,3
c1′=c1+d1,0+d1,2+d1,3
c2′=c2+d2,1+d2,2+d2,4
c3′=c3+d3,0+d3,2+d3,4
c4′=c4+d4,1+d4,3+d4,4
c5′=c5+d5,0+d5,2
c6′=c6+d6,1+d6,2
c7′=c7+d7,2+d7,3
c8′=c8+d8,3+d8,4
c9′=c9+d9,4+d9,0
The result ck′ is the new value for ck which is stored back in the CA 252 after the iteration. The old value of ck is overwritten by the new value.
The CA 252 will contain n signed magnitude values and the DA 257 contains as many signed magnitude values as there are 1's in the H-Matrix 201. In the above example, the DA 257 will have 25 entries, and the CA 252 will have 10.
a. SISO Inputs/Outputs
The data structure for ck and di,k is shown in
Sticky adder 256 is placed ahead of the SISO 258. The sticky add function is defined as follows:
A⊕B=A+B if A+B<MaxVal
A⊕B=MaxVal if A+B≧MaxVal
MaxVal⊕B=MaxVal for all B
Where A and B are variables and MaxVal is the maximum value that can be handled. For example, if X and Y are 6 bit signed magnitude registers, then the lvl field is a 5 bit number and the hd field is a single bit. If X is a positive 20 and if Y is a positive 15, then the binary value of X is 110100 and the binary value of Y is 101111. Then, lvl(X)⊕lvl(Y)=31.
There is an input ik 210 and a current estimate ck 221 associated with each column of the H-Matrix, and there is a difference associated with each non-zero entry in the H-Matrix; that is with every “1” entry. For example, when working on row 1 of the H-Matrix 201 in
tk=ck⊕(−di,k) for all k where Hi,k=1
The value tk is the output of adder 256 in
The purpose of the SISO is to generate the differences. The differences are the differences between each input and current estimate as identified by the particular row equation being worked. The differences are defined by the following sets of equations:
MinVal1=min(lvl(tk)) for all k
v=k: lvl(tk)=MinVal1
MinVal2=min(lvl(tk)) for all k≠v
hd(di,k)=hd(tk)+CORRECT where addition is over GF(2)
lvl(di,v)=MinVal2
lvl(di,k)=max(0, MinVal1−f(MinVal2−MinVal1)) for k≠v with the function f(MinVal2−MinVal1) is defined such as:
The output of the SISO is di,k. This value of di,k replaces the value that was read from the DA. The value of ck that was read from CA is replaced with tk⊕di,k for all k.
b. The Minimum Function
a and 11b are block diagrams showing the minimum function of the present invention.
The minimum function block is initialized by having the counter 416 set to zero and the Val register 413 set to the maximum possible value with a preset which initializes the Val register 413 to all ones. The numbers are input on the Data_in line 402. This value is presented to the “a” input of the comparator 411. The “b” input of the comparator 411 is the current minimum value. After initialization, this is the maximum possible number. If “a” is less than “b”, then Mux 1403 passes the Val register value to the output Data_out 422. Mux 2407 passes the Data_in input 402 to the input of the Val register 413, where it is saved. If “a” is not less than “b”, then Mux 1403 passes Data_in to the output Data_out 422. Mux 2407 passes the contents of the Val register back to the Val register 413, in effect, leaving it the same.
As noted above, the counter 416 is initially set to zero. Every time new input is brought in, the counter is incremented. If Data_in 402 is less than the value stored in the Val register 413, the value of the counter 416 is latched into the Loc register 417. This corresponds to a new minimum value being stored in the Val register 413.
Once a sequence of numbers have passed through the minimum function block, the output MinVal 414 has the minimum value and the output MinLoc 421 has the location in the sequence of the minimum value.
By way of example, if the sequence {14,16,10,10} were passed through the circuit, the following would occur. The counter 416 is initialized to zero and the Val register 413 is initialized to a maximum value. The number 14 is input. 14 is less than a maximum value, so 14 gets placed in the Val register 413, the number 0 is placed in Loc 417, and the maximum value is passed to the output Data_out 422 and the counter 416 is incremented to 1. Then the number 16 is input 16 is larger than the 14 that is in Val 413 register, so the Val register 413 maintains its value of 14, the register Loc 417 maintains its value of 0, 16 is passed to the output Data_out 422 and the counter 416 is incremented to 2. Then the number 10 is input. 10 is less than the 14 that is in Val register 413, so the Val register 413 is changed to 10, the number 2 is placed in Loc 417, 14 is passed the out Data_out 422 and the counter 416 is incremented to 3. Then the second number 10 is input. The second 10 is not less than the first 10, so the 10 that is in Val register 413 stays the same, the value of Loc 417 does not change, the second 10 is passed out Data_out 422 and the counter 416 is incremented to 4. As this is the end of the sequence, the MinVal output 414 is 10 and the MinLoc output 421 is 2.
c. Details of the SISO
The SISO is shown in
First consider the sign bit data path in
The magnitude or confidence data path is shown in
The first sum block 517 takes MinVal1 509 and MinVal2 516 as inputs, with MinVal1 509 as a negative input 519. The output of the first sum block 517 is input to the f(x) block 520. The f(x) block 520 has A function listed SUCH as
This output is input to the second sum block 521 as a negative input 522. The other input is MinVal1 509. The output of this second sum block 521 is input to a comparator 523, as well as input to a Mux 524. The Mux 524 has a second input which is a zero value 527. The comparator 523 tests to see if the input is greater than 0. The output of the comparator 523 is the select input of the Mux 524. If the comparator 523 tests true, then the output of the second sum block 521 is passed to the output as the lvl(tk) output 528. If the comparator; 523 is false, then the zero input 527 is passed to the output as the lvl(tk) output 528. Finally, MinVal2 516 is passed to the output as the MIN(lvl(tk)) output for k equal to v.
In summary, referring to
a shows a circuit that performs this function.
3. Expanded Code
The H-Matrix in
As an example, let m=3. In such a case, there are 6 possible permutations, any of which can be used. These permutations are shown in
Each of the equations, the differences, the inputs and current estimates will be grouped in sets of m. Looking at the Matrix in
Another exemplary representation for the H-matrix is shown in
Finally, a third representation is listed in
The purpose of the decoder architecture is to allow parallel solution of equations and allowing for a wider memory structure and reads that are more than one input wide. The decoder shown in
The DA 257 is the memory that holds all the Di,k's. They are in groups of m, and stored in the “proper” order, where the “proper” order means the permutation indicated by that permutation matrix of
An example working through two complete iterations for the code defined by the H-Matrix in
Referring to
The differences calculated by a SISO 2581-3 are stored in the DA 257 as D0,0, D0,1, D0,3, D0,5 and D0,9. These differences are also added 2601-3 to the inputs stored in the FIFO 2591-3 and stored back in the CA 252. Note that the inputs are now stored back in the original location, but in a permuted form.
This exemplary architecture allows three SISO's, 2581, 2582, and 2583, to operate in parallel. The inputs are read three at a time.
As the equations for the remaining ROWi's are evaluated, there is always a choice in taking the input from Ck or Ik. If Ik has been used, then select Ck. If Ik has not been used, then select Ik. This can be seen by examining ROW1. I1 has been used, so C1 is selected by a mux 2531-3. C1 needs to be permuted to P6. However, it is already permuted by P3. Permutation P4 accomplishes this. Therefore, C1 is permuted by P4. I2 has not been used, so it is selected by the mux 2531-3. I2 is permuted by P4, By the same token, I4 has not been used, so I4 is permuted by P3, I5 has already been used, so C5 is selected by the mux 2531-3 and permuted by P1. I6 has not been used, so I6 is selected by the mux 2531-3 and permuted by P6. Note that inputs I2, I4 and I6 were in their initial states, as they had not yet been permuted. With respect to the three SISO's, SISO0 2581 gets i1,2, c2,2, i4,1, c5,0 and i6,0; SISO1 2582 gets i1,1, c2,0, i4,0, c5,1 and i6,1; SISO2 2583 gets i1,0, c2,1, i4,2, c5,2 and i6,2.
The differences d1,1, d1,2, d1,4, d1,5 and d1,6 are all initially zero. The new differences are stored in the DA 257. The differences are also added 2601-6 into the output of a FIFO 2591-3, which are then stored in the CA 252 as C1, C2, C4, C5 and C6 respectively. This continues for ROW2, ROW3, and ROW4, at which point each of the equations has been solved once. At this point, the DA 257 is filled with non-zero values.
In general, the proper permutation to perform on any Cj can be determined by looking at the H-Matrix of
This architecture keeps the differences Di,k in the permutation that is seen in the H-Matrix of
To get the required outputs, everything needs to be permuted back to P1. At the end of the last iteration, C0 and C2 are stored in permutation P5, C1 and C3 are stored in permutation P2, C4 is stored in permutation P4 and the rest are stored in permutation P1. C0 and C2 are permuted by P4, C1 and C3 are permuted by P2, C4 is permuted by P5 and the rest are permuted by P1. This gets all the outputs into the required P1 permutation.
4. Multiplicity Architecture
One of the conditions that can occur in a code is when the same set of inputs is used more than once in the same set of equations. The input sets will occur with different permutations. An example would be to replace the second term in ROW0 with (3,4). The equation becomes
Row I=(0,2)+(3,4)+(3,5)+(5,1)+(9,4)
This requires I3 to be used twice in the first iteration, followed by C3 being used twice in subsequent iterations. The terms (3,4) and (3,5) are called “multiplicities” and occur when the same set of inputs are used in the same equations more than once. When this occurs, the input set will always occur with different permutations. There is a difference stored in the DA for each of these permutations. In the above example, the first difference has permutation P4, and the second difference has permutation P5. D40,3 will represent the difference set D0,3 with permutation P4 and D50,3 will represent the difference set D0,3 with permutation P5. In general, Dyi,k will represent the set associated with the equation for ROWi and the input set Ck in permutation Py. Each of these is a separate set of differences, and will be stored separately in the DA 257. However, both differences require the same input, which is not permitted. In the first iteration, the input vector I 251 will be selected by the mux 253, and in subsequent iterations the current estimates C will stored in the CA 252 will be selected by a mux 2531-3. The output of the mux 253 minus the first difference will be stored in a FIFO 2591-3, as well as the output of the mux 2531-3 minus the second difference. After doing multiple operations on the same Ck in the same iteration, the decoder in
When processing ROW0 for the first time, the first input is I0 with a permutation of P2. The second input is I3 with a permutation of P4. The third input is I3, but with a permutation of P5. The fourth and fifth inputs are I5 and I9 with permutations of P1 and P4 respectively. The inputs to a FIFO 259 are also different when dealing with repetitive sets in the same equation. The first time a set element is seen, the FIFO 259 receives I3−D40,3. Recognize that D40,3 is zero, as this is the first pass, and the differences are initialized originally to zero. The next input 251 to the FIFO 259 will be (−D50,3). Again, D50,3 is equal to zero. Also, D50,3 is stored in a different location in the DA 257 than D40,3, which allows for the retention of both values. When I3−D40,3 is output from the FIFO 259, the other input to the adder 615 will be equal to zero. The output of adder 1615 goes to adder 2616. The new D4′0,3, which is output from the SISO 258, is added to the output of adder 1615 using adder 2616. This goes to the second permutation block 617, where it is permuted to P5. Referring to
C′3=C3−D40,3+D4′0,3+(−D50,3)+D5′0,3
This shows that both differences have been updated with the new values.
For subsequent iterations, C3 will be stored in permutation P2. In that case, when ROW0 is evaluated, C3 will first be brought in with permutation P3, and second brought in with permutation P6 to get the required permutations of P4 and P5 respectively.
In the multiplicity architecture shown in
5. Processor Architecture
The architecture disclosed here arranges components as discrete components, but this is illustrative and not intended to be limiting.
The figures and description set forth here represent only some embodiments of the invention. After considering these, skilled persons will understand that there are many ways to make an LDPC decoder according to the principles disclosed. The inventors contemplate that the use of alternative structures or arrangements which result in an LDPC decoder according to the principles disclosed, will be within the scope of the invention.
This application claims priority under 35 U.S.C. § 119(e) from provisional patent Application No. 60/568,939, filed May 7, 2004. The 60/568,939 Application is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
7181676 | Hocevar | Feb 2007 | B2 |
7237171 | Richardson | Jun 2007 | B2 |
20030104788 | Kim | Jun 2003 | A1 |
20040093554 | Hung | May 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20050258984 A1 | Nov 2005 | US |
Number | Date | Country | |
---|---|---|---|
60568939 | May 2004 | US |