Methods and apparatus for low-density parity check decoding using hardware-sharing and serial sum-product architecture

Information

  • Patent Application
  • 20080052595
  • Publication Number
    20080052595
  • Date Filed
    July 31, 2006
    18 years ago
  • Date Published
    February 28, 2008
    16 years ago
Abstract
Methods and apparatus are provided for decoding codes that can be described using bipartite graphs having interconnected bit nodes and check nodes. A magnitude of a check-to-bit node message from check node j to bit node i is computed based on a sum of transformed magnitudes of bit-to-check node messages for a plurality of bit nodes connected to the check node j, less a transformed magnitude of the bit-to-check node message for bit node i and check node j. A sign of the check-to-bit node message from check node j to bit node i can also be computed by multiplying a product Sj of the sign of bit-to-check node messages among a plurality of bit nodes connected to the check node j by the sign of the bit-to-check node message for bit node i and check node j. A decoder architecture is also disclosed for decoding a code that can be described using a bipartite graph having interconnected bit nodes and check nodes. The disclosed decoder can be concatenated with a soft output detector.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates the general structure of an LDPC matrix, H;



FIG. 2 is an exemplary bipartite graph representation of an LDPC code;



FIG. 3 is a block diagram of an exemplary hardware-sharing LDPC decoder architecture;



FIG. 4 is a block diagram of an LDPC decoder incorporating features of the present invention;



FIG. 5 is a block diagram of a typical communication system where an LDPC decoder is concatenated with a soft input soft output (SISO) detector; and



FIG. 6 is a block diagram of an LDPC decoder that is concatenated with a SISO detector.





DETAILED DESCRIPTION

The present invention provides methods and apparatus for LDPC decoding using a hardware-sharing and serial sum-product architecture. The disclosed LDPC decoder performs the sum-product decoding algorithm with less memory and fewer clock cycles, relative to a conventional implementation.


Low-Density Parity Check Codes

The following background discussion of LDPC codes and LDPC decoding is based on a discussion in, A. J. Blanksby and C. J. Howland, “A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Decoder,” IEEE J. Solid-State Circuits, Vol. 37, 404-412 (March 2002), incorporated by reference herein. For a more detailed discussion, the reader is referred to the full Blanksby and Howland paper.


Matrix Representation of LDPC Codes


LDPC codes are linear block codes. The set of all codewords, x ∈ Cx, spans the null space of a parity check matrix H:





T=0, ∀∈Cχ.   (1)


The parity check matrix for LDPC codes is a sparse binary matrix.


FIG. 1 illustrates the general structure 100 of an LDPC matrix, H. As shown in FIG. 1, each row of the parity check matrix, H, corresponds to a parity check and a set element hji indicates that data bit i participates in parity check j. In a block of n bits, there are m redundant parity bits. The code rate is given by:






r=(n−m)/n   (2)


The set row and column elements of the parity check matrix H are selected to satisfy a desired row and column weight profile, where the row and column weights are defined as the number of set elements in a given row and column, respectively. In a regular LDPC code, all rows are of uniform weight, as are all columns. If the rows and columns are not of uniform weight, the LDPC code is said to be irregular.


Graph Representation of LDPC Codes


LDPC codes can also be represented using a bipartite graph, where one set of nodes represents the parity check constraints and the other set represents the data bits. FIG. 2 is an exemplary bipartite graph representation 200 of an LDPC code. The parity check matrix is the incidence matrix of the graph where a bit node i, corresponding to column i in H, is connected to check node j, corresponding to row j in H, if the entry hji in His set, i.e., non-zero.


The algorithm used for decoding LDPC codes is known as the sum-product algorithm. For good decoding performance with this algorithm, it is important that the length of cycles in the graph representation of the LDPC code is as long as possible. In the exemplary representation of FIG. 2, an exemplary short cycle of length four has been illustrated. Short cycles, such as the length—4 cycle illustrated in FIG. 2, degrade the performance of the sum-product algorithm.


The Sum-Product Algorithm The sum-product algorithm is an iterative algorithm for decoding LDPC codes.


The sum-product algorithm is also known as the message passing algorithm or belief propagation. For a more detailed discussion of the sum-product algorithm, see, for example, A. J. Blanksby and C. J. Howland, “A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Decoder,” IEEE J. Solid-State Circuits, Vol. 37, 404-412 (March 2002), and D. E. Hocevar, “LDPC Code Construction With Flexible Hardware Implementation,” IEEE Int'l Conf. on Comm. (ICC), Anchorage, Ak., 2708-2712 (May, 2003), each incorporated by reference herein.


The message from bit node i to check node j is given by:










Q

i
,
j


=






l


B
i


,

l

j





R

l
,
i



+

λ
i






(
3
)







It is noted that the notations used herein are defined in a table at the end of the specification. The message from check node j to bit node i is given by:










R

j
,
i


=



s

j
,
i


·
φ



(







l


C
j


,

l

i






φ


(



Q

l
,
j




)




)








(
4
)







where:








s

j
,
i


=





l


C
j


,

l

i












sign


(

Q

l
,
j


)




;




and







φ


(
x
)


=



-
log







tanh


(

x
/
2

)



=

log






x

+
1




x

-
1


.







The a-posteriori information value, which is also called a-posteriori log-likelihood ratio (LLR), for bit i, Λi, is given by:







Λ
i

=





l


B
i





R

l
,
i



+


λ
i

.






LDPC Decoder

A significant challenge when implementing the sum-product algorithm for decoding LDPC codes is managing the passing of the messages. As the functionality of both the check and bit nodes is relatively simple, their respective realizations involve only a small number of gates. The main issue is the implementation of the bandwidth required for passing messages between the functional nodes.


Hardware-Sharing Decoder Architecture



FIG. 3 is a block diagram of an exemplary hardware-sharing LDPC decoder architecture 300. As shown in FIG. 3, the generalized LDPC decoder architecture 300 comprises a number of functional units 310, 320 implementing either the check or bit node functionality, respectively, and a memory fabric 350 to store the messages and realize the graph connectivity. Control logic 330 controls the configuration of the memory fabric 350. For a detailed discussion of an implementation of a hardware-sharing LDPC decoder architecture 300, see, for example, E. Yeo et al., “VLSI Architectures for Iterative Decoders in Magnetic Recording Channels,” IEEE Trans. On Magnetics, Vol. 37, No. 2, 748-755 (March 2001).


It has been recognized that such a hardware-sharing architecture reduces the area of the decoder.


Modified LDPC Parity Check Equations


The present invention recognizes that the components of the above LDPC parity check equations can be reorganized to yield improvements in the memory and clock cycle requirements. The check node computation shown in equation (4) can be separated into several components, as follows:





ρi,j=φ(|Qi,j|)   (5)


where ρi,j is thus a transformed magnitude of the bit-to-check node message Qij between bit node i and check node j and “| |” is the notation for magnitude.





σi,j=sign(Qi,j)   (6)


Thus, σi,j is the sign of the bit-to-check node message Qi,j between bit node i and check node j.









P
j

=




l


C
j





ρ

l
,
j







(
7
)







Pj is computed for a check node j as the sum of the transformed magnitudes of the bit-to-check node messages for all the bit nodes connected to the check node j.









S
j

=




l


C
j












σ

l
,
j







(
8
)







Sj for a given check node j is thus a product of the sign of the bit-to-check node messages for all the bit nodes connected to the check node j.

Then, the magnitude and sign of the message from check node j to bit node i is given by





|Rj,i|=φ(Pj−ρi,j)   (9)





sign(Rj,i)=Sj·σi,j   (10)


Thus, while the conventional computation of the check-to-bit node messages according to equation (4) excludes the current bit node i from the computation (I ∈ Cj, l≠i), the present invention computes an intermediate value Pj as the sum of the transformed magnitudes ρij of the bit-to-check node messages for all the bit nodes l connected to the check node j, and then subtracts ρij from Pj to compute the magnitude of the message Rj,i from check node j to bit node i.


In the exemplary embodiment, the bit node computations are performed in 2's-complements and the check node computations are performed in sign-magnitude (SM).


LDPC Decoder With Serial Sum-Product Architecture



FIG. 4 is a block diagram of an LDPC decoder 400 incorporating features of the present invention. The exemplary embodiment of the LDPC decoder 400 shown in FIG. 4 illustrates an example where dc equals 3, and Bi equals {j1, j2, j3}. The elements 430, 435, 440, 450, 470, 475, 478 and 480 form the check node update block, comprising parallel check node update units. At each time cycle, computations are performed pertaining to one bit node. Thus, it takes n cycles to perform complete computations for bit nodes 1 to n. Assume that bit node 1 is computed during the 1st time cycle. Then at the (n·(k−1)+i)th time cycle, the bit node update unit 410 produces the dc messages Qi,jk,j∈Bi corresponding to the ith bit node at the kth iteration These dc messages are converted from 2's-complement to sign-magnitude number representation at stage 430. The magnitude parts are sent to parallel transformation units 435-1 through 435-3 that perform the function φ and the outputs from these units 435 are ρi,jk,j∈Bi as defined by equation (5).


These ρi,jk,j∈Bi are fed to dc transformed magnitude update units 440 (to compute the sum Pjk over all bit nodes connected to the check node j, and then Pjk is taken out later in the next iteration) that consist of adders and read/write circuitry. The parallel transformed magnitude update units 440 can access any dc elements from m memory elements in the memory 460, where each element stores q bits, where q is the number of bits used to represent Pjk in the exemplary embodiment. These parallel transformed magnitude update units 440 update the relevant memory elements such that at the end of an iteration (i.e., (n·k)th time cycle) there are pjk,j∈1 . . . m as defined in equation (7), stored in the m memory elements. In other words, these memory elements keep a running sum of the relevant ρ values.


If the intermediate values in the memory element j for the kth iteration is given by Pjk(i) for i=1 . . . n, then the running sum at the (n·(k−1)+i)th time instance is given by:











P
j
k



(
i
)


=

{










P
j
k



(

i
-
1

)


+

ρ

i
,
j

k










if





j



B
i












P
j
k



(

i
-
1

)








else




.






(
11
)







Then, at the end of an iteration, such as at (n·k)th time instance:






P
j
k
=P
j
k(n)   (12)


The signs of the bit-to-check node messages Qi,jk,j∈Bi, which are σi,jk,j∈Bi as defined by equation (6), are processed as follows. Similar to the procedure discussed above, σi,jk,j∈Bi are fed to another set of dc sign update units 450 that consist of XOR gates and read/write circuitry. These parallel sign update units update the relevant memory elements in the memory 460 such that at the end of an iteration, Sjk,j∈1 . . . m as defined in equation (8), are stored in m memory elements, where each memory element stores one bit. In other words, these memory elements keep a running product of the sign-bit σ values. The product is obtained by the XOR gates. As explained in the previous paragraph, if the intermediate value of the memory element j for the kth iteration is given by Sjk(i) for i=1 . . . n, then the running product is given by:











S
j
k



(
i
)


=

{









S
j
k



(

i
-
1

)


·

σ

i
,
j

k










if





j



B
i












S
j
k



(

i
-
1

)








else









(
13
)







Then, at the end of an iteration, such as at (n·k)th time instance:






S
j
k
=S
j
k(n)   (14)


The ρi,jk,j∈Bi computed at each time cycle are also sent to a first in first out (FIFO) buffer 420-1 (buffer-1 in FIG. 4). The FIFO 420-1 consists of dc·q columns (there are dc number of ρ values generated at each cycle and each ρ is represented using q bits in the exemplary embodiment) and n rows. At time cycle n·(k−1)+i, the set of ρi,jk,j∈Bi is fed to the back of the FIFO 420-1 and a set of ρi,jk−1,j∈Bi from the previous iteration is read from the front of the buffer 420-1. At the (n·k)th time cycle, all the Pjk, j=1 . . . m are available in the memory 460 and all the ρi,jk, i=1 . . . n, j∈Bi are available in the first FIFO buffer 420-1. Then, the top most row of the FIFO 420-1 holds ρ1,jk, jÅB1 and the row after holds ρ2,jk, j∈B2, and the last row of the buffer 420-1 consists of ρn,jk,j∈Bn.


The σi,jk,j∈Bi computed at each time cycle are fed to another FIFO buffer 420-2 (buffer-2 in FIG. 4). The second FIFO buffer 420-2 consists of dc number of one bit columns (since each σi,j is represented using one bit) and n rows. At time cycle n·(k−1)+i, the set of σi,jk, j∈Bi is fed to the back of the FIFO 420-2 and a set of σi,jk−1, j∈Bi from the previous iteration is read from the front of the buffer 420-2.


Now, the procedure is explained for the computation of the magnitude of check-to-bit node messages Rj,ik−1 from the Pjk−1, j=1 . . . m saved in the memory 460 and the ρi,jk−1, j∈Bi saved in the first FIFO buffer 420-1 from the previous iteration k−1. The required Pjk−1, j∈Bi are read from memory 460 and the ρi,jk−1, j∈Bi are read from the first FIFO buffer 420-1. Then, the differences (Pjk−1−ρpi,jk−1) are computed for each j∈Bi by parallel transformed magnitude subtraction units 470-1 through 470-3 and the corresponding results are passed to parallel transformation units 475-1 through 475-3 that perform the function of φ. The outputs of these transformations units 475 are the magnitudes of the corresponding messages from the dc check nodes to the ith bit node at the (k−1)th iteration, namely, |Rj,ik−1, j∈Bi according to equation (9).


The sign of the check-to-bit node messages at the (k−1)th iteration (sign(Rj,ik−1)) is computed in a similar manner. The required Sjk−1,j∈Bi are read from memory 460 and the σi,jk−1, j∈Bi are read from the second FIFO buffer 420-2. Then, the products Sjk−1·σi,jk−1 are computed using parallel sign processing units 478-1 through 478-3 (each using an XOR gate in the exemplary embodiment) for each j∈Bi. These products are the sign bits of the corresponding messages from the dc check nodes to the ith bit node at the (k−1)th iteration, namely sign(Rj,ik−1), j∈Bi according to equation (10).


The sign and magnitude of the check-to-bit node messages are passed through dc sign-magnitude to 2's-complement conversion units 480. The results from these units 480 are Rj,ik−1, j∈Bi which in turn are the inputs to the bit node update unit 410. The bit node update unit 410 computes the bit-to-check node messages Qi,jk for the kth iteration according to (see also equation (3)):










Q

i
,
j

k

=






l


B
i


,

l

j





R

l
,
i


k
-
1



+

λ
i






(
15
)







Memory Requirements and Throughput


The exemplary first FIFO buffer 420-1 has a size of n·dc·q bits and the exemplary second FIFO buffer 420-2 has a size of n·dc bits.


The amount of memory required for Pj, j equal to 1 . . . m is 2·q·m bits, where the multiple of two is for values pertaining to iterations k and k−1.


The amount of memory required for Sj, j equal to 1 . . . m is 2·m bits, where the multiple of two is for values pertaining to iterations k and k−1.


The total memory requirement is (2·m+n·dc)·(q+1) bits.


At each time cycle, the disclosed method computes dc check-to-bit node messages and dc bit-to-check node messages. Thus, it takes n cycles to process length n blocks of data per iteration with one bit node update unit 410.


When compared with standard LDPC decoding architectures, the disclosed architecture only requires (2·m+n·dc)·(q+1) bits of memory space and takes only n cycles per iteration. It is noted that a typical serial sum-product architecture requires 2·n·dc·(q+1) bits of memory space and m+n cycles per iteration.


Concatenated LDPC Decoder With Serial Sum-Product Architecture


LDPC codes can be used for channels impaired by intersymbol interference (ISI) and noise to improve the bit error rate. FIG. 5 shows a typical communication system 500 comprising an LDPC encoder 510, ISI channel 520 (that introduces intersymbol interference and noise), soft-input soft-output (SISO) detector 615 and LDPC decoder 600. In FIG. 5, the LDPC decoder 600 is concatenated with the SISO detector 615, as discussed further below in conjunction with FIG. 6. The SISO detector 615 takes in the channel output and the extrinsic information values from the LDPC decoder 600, which are used as a-priori information values. The SISO detector 615 gives out extrinsic information values, which are used as a-priori information values λi by the LDPC decoder 600 for the next iteration. Extrinsic information values from the LDPC decoder are used as a-priori information values for the SISO detector and extrinsic information values from SISO detector are used as a-priori information values for the LDPC decoder. The SISO detector considers the ISI channel to compute the extrinsic information values for example using the MAP algorithm or the soft output Viterbi algorithm (SOVA) as known in the art, while the LDPC decoder considers the LDPC code to compute the extrinsic information values. An iteration of the system consists of processing the data by the SISO detector 615 and LDPC decoder 600 once.



FIG. 6 shows the disclosed concatenated SISO detector and LDPC decoder architecture in more detail. The SISO detector 615 takes in the channel output and the extrinsic information values from the LDPC decoder 600 and gives out extrinsic information values, which are used as a-priori information values λi by the LDPC decoder 600 for the next iteration. The present invention exploits the fact that at every pass through the LDPC decoder 600, the dc messages from bit node i to its check nodes, namely, {circumflex over (Q)}i,j, ∀j∈Bi are equal to the a-priori information value or a-priori LLR λi of the bit node i, i.e.:





{circumflex over (i,j)}=λi, ∀j∈Bi.   (16)


The check node computation is broken into parts in a similar manner as described above.











ρ
^

i

=

φ


(



λ
i



)






(
17
)








σ
^

i

=

sign


(

λ
i

)






(
18
)








P
^

j

=




l


C
j






ρ
^

l






(
19
)








S
^

j

=




l


C
j













σ
^

l






(
20
)







The sign and magnitude of the message from check node j to bit node i are given by:





|{circumflex over (R)}j,i|=φ({circumflex over (P)}j−{circumflex over (ρ)}i)   (21)





sign({circumflex over (R)}j,i)=Ŝj·σi   (22)


Using these {circumflex over (R)}j,i, j∈Bi, the extrinsic information value from the LDPC decoder, which is passed to the SISO detector 615, can be computed by the extrinsic information computation unit 610:










Λ

ext
,
i


=




l


B
i






R

l
,
i


.






(
23
)







This extrinsic information value is used by the SISO detector 615 as a-priori information value.



FIG. 6 shows a block diagram for the proposed architecture 600 for the concatenated system, where dc equals 3, and Bi equals {j1,j2,j3}. Each a-priori LLR λi pertaining to bit node i, which is equal to the extrinsic information value given out from the SISO detector 615, is passed to a 2's-complement to sign-magnitude conversion unit 630. The magnitude is sent to the transformation unit 635 that performs the function φ and the output from unit 635 is {circumflex over (ρ)}ik, as defined by equation (17). The transformed magnitude {circumflex over (ρ)}ik for the kth iteration and ith bit node is fed to dc transformed magnitude update units 640 that update {circumflex over (P)}jk, ∀j∈Bi in the memory. Similarly, the sign {circumflex over (σ)}ik is passed into another set of dc sign update units 650 that update Ŝjk,∀j∈Bi in the memory. Moreover, {circumflex over (ρ)}ik and {circumflex over (σ)}ik are fed into a first FIFO buffer 620-1 and second buffer 620-2, respectively.


The check-to-bit node messages for the (k−1)th iteration, namely {circumflex over (R)}j,ik−1, j∈Bi are computed in a similar manner as the procedure described above in conjunction with FIG. 4. Only one value, however, is used from the first FIFO buffer 620-1 ({circumflex over (ρ)}ik−1 is used to obtain the magnitude) and one value is used from the second FIFO buffer 620-2 ({circumflex over (σ)}ik−1 is used to obtain the sign). Thus, the memory requirements for the FIFO buffers 620 are dc times less than for the stand alone architecture described above in conjunction with FIG. 4.


Memory Requirements and Throughput


The exemplary first FIFO buffer 620-1 has a size of n q bits and the exemplary second FIFO buffer 620-2 has a size of n bits.


The amount of memory required for Pj, j equal to 1 . . . m is 2 q m bits.


The amount of memory required for Sj, j equal to 1 . . . m is 2 m bits.


The total memory requirement is (2·m+n)(q+1).


At each time cycle, the disclosed method computes dc check-to-bit node messages and the extrinsic information value. Thus, it takes n cycles to process length n blocks of data per iteration.


The disclosed architecture performs the sum-product algorithm as defined by the equations (17) to (23). The disclosed method has better error rate performance than the less optimum method proposed by Wu and Burd, where only the minimum LLR values are considered in the sum-product algorithm. Moreover, the method proposed by Wu and Burd applies only to a concatenated LDPC decoder system. The present invention only requires (2·m+n)(q+1) memory space and takes only n cycles per iteration.


Notation


The following notation has been used herein:


i is the index of a bit node;


j is the index of a check node;


k is the index of the iteration;


Qi,j is the message from bit node i to check node j;


Rj,i is the message from check node j to bit node i;


λi is the a-priori information value or a-priori log-likelihood ratio (LLR) pertaining to bit i;


Λi is the a-posteriori information value or a-posteriori LLR pertaining to bit i;


Λext,i is the extrinsic information value or extrinsic LLR pertaining to bit i;


Bi is the set of check nodes connected to bit node i;


Cj is the set of bit nodes connected to check node j;


n is the number of bit nodes;


m is the number of parity check nodes;


dr is the row weight of the parity check matrix;


dc is the column weight of the parity check matrix;


bit nodes are 1 . . . n;


check nodes are 1 . . . m.


While exemplary embodiments of the present invention have been described with respect to digital logic units, as would be apparent to one skilled in the art, various functions may be implemented in the digital domain as processing steps in a software program, in hardware by circuit elements or state machines, or in combination of both software and hardware. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer. Such hardware and software may be embodied within circuits implemented within an integrated circuit.


Thus, the functions of the present invention can be embodied in the form of methods and apparatuses for practicing those methods. One or more aspects of the present invention can be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, or transmitted over some transmission medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a device that operates analogously to specific logic circuits.


As is known in the art, the methods and apparatus discussed herein may be distributed as an article of manufacture that itself comprises a computer readable medium having computer readable code means embodied thereon. The computer readable program code means is operable, in conjunction with a computer system, to carry out all or some of the steps to perform the methods or create the apparatuses discussed herein. The computer readable medium may be a recordable medium (e.g., floppy disks, hard drives, compact disks, memory cards, semiconductor devices, chips, application specific integrated circuits (ASICs)) or may be a transmission medium (e.g., a network comprising fiber-optics, the world-wide web, cables, or a wireless channel using time-division multiple access, code-division multiple access, or other radio-frequency channel). Any medium known or developed that can store information suitable for use with a computer system may be used. The computer-readable code means is any mechanism for allowing a computer to read instructions and data, such as magnetic variations on a magnetic media or height variations on the surface of a compact disk.


The memories and buffers could be distributed or local and the processors could be distributed or singular. The memories and buffers could be implemented as an electrical, magnetic or optical memory, or any combination of these or other types of storage devices. Moreover, the terms “memory,” “buffer” and “FIFO buffer” should be construed broadly enough to encompass any information able to be read from or written to a medium. With this definition, information on a network is still within a memory because the associated processor can retrieve the information from the network.


It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.

Claims
  • 1. A method for decoding a code that can be described using a bipartite graph having interconnected bit nodes and check nodes, comprising: computing a magnitude of a check-to-bit node message from check node j to bit node i, wherein said magnitude is based on a sum of transformed magnitudes of bit-to-check node messages for a plurality of bit nodes connected to said check node j, less a transformed magnitude of the bit-to-check node message for bit node i and check node j.
  • 2. The method of claim 1, wherein said code comprises an LDPC code.
  • 3. The method of claim 1, wherein said bipartite graph is a graphical representation of a parity check matrix.
  • 4. The method of claim 1, further comprising the step of computing a sign of said check-to-bit node message from check node j to bit node i, by multiplying a product Sj of the sign of bit-to-check node messages among a plurality of bit nodes connected to the check node j by the sign of the bit-to-check node message for bit node i and check node j.
  • 5. A decoder for decoding a code that can be described using a bipartite graph having interconnected bit nodes and check nodes, comprising: a bit node update unit; anda plurality of parallel check node update units connected to said bit node update unit to compute a plurality of check-to-bit node messages.
  • 6. The decoder of claim 5, wherein said plurality of parallel check node update units are connected to a memory.
  • 7. The decoder of claim 5, wherein one of said check node update units computes a magnitude of said check-to-bit node message based on a sum of transformed magnitudes of a plurality of bit-to-check node messages less a single transformed magnitude of a bit-to-check node message.
  • 8. The decoder of claim 5, wherein one of said check node update units comprises an adder and read/write circuitry connected to a memory to compute a sum of transformed magnitudes of a plurality of bit-to-check node messages.
  • 9. The decoder of claim 5, wherein one of said check node update units comprises a subtractor and read circuitry connected to a memory to compute a difference between a sum of transformed magnitudes and a single transformed magnitude.
  • 10. The decoder of claim 5, wherein a sum of transformed magnitudes of a plurality of bit-to-check node messages is obtained from a memory and a single transformed magnitude of a bit-to-check node message is obtained from a buffer.
  • 11. The decoder of claim 5, wherein one of said check node update units computes a sign of a check-to-bit node message by multiplying a product Sj of the sign among a plurality of bit-to-check node messages by the sign of a single bit-to-check node message.
  • 12. The decoder of claim 5, wherein one of said check node update units comprises an XOR gate and read/write circuitry connected to a memory to compute a product Sj of the sign among a plurality of the bit-to-check node messages.
  • 13. The decoder of claim 5, wherein one of said check node update units comprises an XOR gate and read circuitry connected to a memory to multiply a product Sj of the sign among a plurality of the bit-to-check node messages by the sign of a single bit-to-check node message.
  • 14. A method for decoding a code that can be described using a bipartite graph having interconnected bit nodes and check nodes, comprising: computing a magnitude of a check-to-bit node message from check node j to bit node i wherein said magnitude is based on a sum of transformed magnitudes of a-priori information values for a plurality of bit nodes connected to said check node j, less a transformed magnitude of the a-priori information value for bit node i.
  • 15. The method of claim 14, wherein said a-priori information values are generated by a soft output detector.
  • 16. The method of claim 14, wherein said computing step is performed by an LDPC decoder that is concatenated with a soft output detector.
  • 17. The method of claim 14, wherein said code comprises an LDPC code.
  • 18. The method of claim 14, wherein said bipartite graph is a graphical representation of a parity check matrix.
  • 19. The method of claim 14, further comprising the step of computing a sign of said check-to-bit node message from check node j to bit node i, by multiplying a product Sj of the sign of a-priori information values for a plurality of bit nodes connected to the check node j by the sign of the a-priori information value for bit node i.
  • 20. The decoder of claim 5, wherein one of said check node update units computes a magnitude of said check-to-bit node message based on a sum of transformed magnitudes of a-priori information values for a plurality of bit nodes less a single transformed magnitude of a a-priori information value of a bit node.
  • 21. The decoder of claim 5, wherein one of said check node update units comprises an adder and read/write circuitry connected to a memory to compute a sum of the transformed magnitudes of a-priori information values.
  • 22. The decoder of claim 5, wherein one of said check node update units comprises a subtractor and read circuitry connected to a memory to compute a difference between a sum of transformed magnitudes of a-priori information values and a single transformed magnitude of a a-priori information value.
  • 23. The decoder of claim 5, wherein a sum of transformed magnitudes of a-priori information values is obtained from a memory and a single transformed magnitude of a a-priori information value is obtained from a buffer.
  • 24. The decoder of claim 5, wherein one of said check node update units computes a sign of a check-to-bit node message by multiplying a product Sj of the sign of a-priori information values among a plurality of bit nodes by the sign of a single a-priori information value of a bit node.
  • 25. The decoder of claim 5, wherein one of said check node update units comprises an XOR gate and read/write circuitry connected to a memory to compute a product Sj of the sign of a-priori information values among a plurality of bit nodes.
  • 26. The decoder of claim 5, wherein one of said check node update units comprises an XOR gate and read circuitry connected to a memory to multiply a product Sj of the sign of a-priori information values among a plurality of bit nodes by the sign of a single a-priori information value of a bit node.
  • 27. The decoder of claim 5, wherein said decoder is concatenated with a soft output detector.