Method and system for replica group-shuffled iterative decoding of quasi-cyclic low-density parity check codes

Abstract
A block of symbols are decoded using iterative belief propagation. A set of belief registers store beliefs that a corresponding symbol in the block has a certain value. Check processors determine output check-to-bit messages from input bit-to-check messages by message-update rules. Link processors connect the set of belief registers to the check processors. Each link processor has an associated message register. Messages and beliefs are passed between the set of belief registers and the check processors via the link processors for a predetermined number of iterations while updating the beliefs to decode the block of symbols based on the beliefs at termination.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram of prior art channel coding;



FIG. 2 is a block diagram of a prior art belief propagation decoding;



FIG. 3 is a schematic diagram of a prior art turbo-code.



FIG. 4 is a block diagram of prior art serial and parallel turbo coding;



FIG. 5 is a flow diagram of a method for generating a combined-replica, group-shuffled, iterative decoder according to an embodiment of the invention;



FIG. 6 is a schematic diagram of replicated sub-decoders;



FIG. 7 is a diagram of a combined-replica, group-shuffled, iterative decoder according to an embodiment of the invention; and



FIG. 8 is a diagram of replicated sub-decoder schedules for a combined decoder for a turbo-code;



FIG. 9 is a base matrix according to an embodiment of the invention;



FIG. 10 is a factor graph according to an embodiment of the invention;



FIG. 11 is a block diagram a system and method for encoding and decoding data according to an embodiment of the invention;



FIG. 12 is a block diagram of a VLSI decoder according to an embodiment of the invention;



FIG. 13 is a block diagram of an architecture of the decoder of FIG. 12;



FIG. 14 is a block diagram of a belief register according to an embodiment of the invention;



FIG. 15A is a block diagram of a check processor according to an embodiment of the invention;



FIGS. 15B-15C are block diagrams of comparators used by the check processor of FIG. 15A;



FIG. 16 is a block diagrams of a link processor according to an embodiment of the invention;



FIG. 17 is a block diagram of a message register according to an embodiment of the invention; and



FIGS. 18A and 18B are block diagrams comparing conventional message updates with the message update according to an embodiment of the invention.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT


FIG. 5 shows a method for generating 500 a combined-replica, group-shuffled, iterative decoder 700 according to our invention.


The method takes as input an error-correcting code 501 and a conventional iterative decoder 502 for the error-correcting code 501. The conventional iterative decoder 502 iteratively and in parallel updates estimates of states of symbols defining the code based on previous estimates. The symbols can be binary or taken from an arbitrary alphabet. Messages in belief propagation (BP) methods and states of bits in bit-flipping (BF) decoders are examples of what we refer to generically as “symbol estimates” or simply “estimates” for the states of symbols.


We also use the terminology of “bit estimates” because for simplicity the symbols are assumed to be binary, unless stated otherwise. However the approach also applies to other non binary codes. Prior-art BP decoders, BF decoders, turbo-decoders, and decoders for turbo product codes are all examples of conventional iterative decoders that can be used with our invention.


To simplify this description, we use BF and BP decoders for binary LDPC codes as our primary examples of the input conventional iterative decoders 501. It should be understood that the method can be generalized to other examples of conventional iterative decoders, not necessarily binary.


In a BF decoder for a binary LDPC code, the estimates for the values of each code-word symbol are stored and updated directly. Starting with an initial estimate based on a most likely state given the channel output, each code-word bit is estimated as either 0 or 1. At every iteration, the estimates for each symbol are updated in parallel. The updates are made by checking how many parity checks associated with each bit are violated. If a number of checks that are violated is greater than some pre-defined threshold, then the estimate for that bit is updated from a 0 to a 1 or vice versa.


A BP decoder for a binary LDPC code functions similarly, except that instead of updating a single estimate for the value of each symbol, a set of “messages” between the symbols and the constraints in which the messages are involved are updated. These messages are typically stored as real numbers. The real numbers correspond to a log-likelihood ratio that a bit is a 0 or 1. In the BP decoder, the messages are iteratively updated according to message-update rules. The exact form of these rules is not important. The only important point is that the iterative decoder uses some set of rules to iteratively update its messages based on previously updated messages.


Constructing Multiple Sub-Decoders


In the first stage of the transformation process according to our method, multiple replicas of the group-shuffled sub-decoders are constructed. These group-shuffled sub-decoders 511 are then combined 520 into the combined-replica group-shuffled decoder 700.


Partitioning Estimates into Groups


The multiple replica sub-decoders 511 are constructed as follows. For each group-shuffled replica sub-decoder 511, the estimates that the group-shuffled sub-decoder makes for the messages or the symbol values are partitioned into groups.


An example BF decoder for a binary LDPC code has one thousand code-word bits. We can divide the bit estimates that the group-shuffled sub-decoder makes for this code in any number of ways, e.g., into ten groups of a hundred bits, or a hundred groups of ten bits, or twenty groups of fifty bits, and so forth. For the sake of simplicity, we assume hereafter that the groups are of equal size.


If the conventional iterative decoder 501 is a BP decoder of the LDPC code, the groups of messages can be partitioned in many different ways in each group-shuffled sub-decoder. We describe two preferred techniques. In the first technique, which we refer to as a “vertical partition,” the code-word symbols are first partitioned into groups, and then all messages from the same code-word symbol to the constraints are treated as belonging to the same group. In the vertical partition, the messages from constraints to symbols are treated as dependent messages, while the messages from the symbols to the constraints are treated as independent messages. Thus, all dependent messages are automatically updated whenever a group of independent messages from symbols to constraints are updated.


In the second technique, which we will refer to as a “horizontal partition,” the constraints are first partitioned into groups, and then all messages from the same constraint to the symbols are treated as belonging to the same group. In the horizontal partition, the messages from constraints to symbols are treated as the independent messages, and the messages from the symbols to the constraints are merely dependent messages. Again, all dependent messages are updated automatically whenever a group of independent messages are updated.


Other approaches for partitioning the BP messages are possible. The essential point is that for each replica of the group-shuffled sub-decoder, we define a set of independent messages that are updated in the course of the iterative decoding method, and divide the messages into some set of groups. Other dependent messages defined in terms of the independent messages are automatically updated whenever the updating of a group of independent messages completes.


Assigning Update Schedules to Groups


The next step in generating a single group-shuffled sub-decoder 511 assigns an update schedule for the groups of estimates. An update schedule is an ordering of the groups, which defines the order in which the estimates are updated. For example, if we want to assign an update schedule to ten groups of 100 bits in the BF decoder, we determine which group of bits is updated first, which group is updated second, and so on, until we reach the tenth group. We refer to the sub-steps of a single iteration when a group of bit estimates is updated together as a “iteration sub-step.”


The set of groups along with the update schedule for the groups, defines a particular group-shuffled iterative sub-decoder. Aside from the fact that the groups of estimates are updated in sub-steps according to the specified order, the group-shuffled, iterative sub-decoder functions similarly to the original conventional iterative decoder 501. For example, if the input conventional iterative decoder 501 is the BF decoder, then the new group-shuffled sub-decoder 511 uses identical bit-flipping rules as the conventional decoder 501.


Differences Between Replica Sub-Decoders Used in Combined Decoders


The multiple group-shuffled sub-decoders 511 may or may not be identical in terms of the way that the sub-decoders are partitioned into groups. However, the sub-decoders are different in terms of their update schedules. In fact, it is not necessary that every bit estimate is updated in every replica sub-decoder used in the combined decoder 700. However, every bit estimate is updated in at least one of the replica sub-decoders 511. We also prefer that each replica sub-decoder 511 has the same number of iteration sub-steps, so that each iteration of the combined decoder completes synchronously.



FIG. 6 shows a simple schematic example of replicated group-shuffled sub-decoders. In this example, we used three different replica sub-decoders, each having three groups of bit estimates. In this example, the groups used in each replica sub-decoder are identical, but the updating order is different.


In the first replica sub-decoder 610, the bit estimates in group 1 is updated in the first iteration sub-step, followed by the bit estimates in group 2 in the second iteration sub-step, followed by the bit estimates in group 3 in the third iteration sub-step. In the second replica sub-decoder 620, the bit estimates in group 2 are updated first, followed by the bit estimates in group 3, followed by the bit estimates in group 1. In the third replica sub-decoder 630, the bit estimates in group 3 are updated first, followed by the bit estimates in group 1, followed by the bit estimates in group 2.


The idea behind our combined-replica group-shuffled decoders is described using this example. Consider the first iteration, for which the input estimate for each bit is obtained using channel information. We expect that the initial input ‘reliability’ of each bit to be equal. However, after the first sub-step of the first iteration is complete, the bits that were most recently updated should be most reliable. Thus, in our example, we expect that for the first replica sub-decoder, the bit estimates in group 1 are the most reliable at the end of the first sub-step of the first iteration, while in the second replica sub-decoder, the bit estimates in group 2 are the most reliable at the end of the first sub-step of the first iteration.


In order to speed up the rate at which reliable information is propagated, it makes sense to use the most reliable estimates at each step. The general idea behind constructing a combined decoder from multiple replica group-shuffled sub-decoders is that we trade off greater complexity, e.g., logic circuits and memory, in exchange for an improvement in processing speed. In many applications, the speed at which the decoder functions is much more important than the complexity of the decoder, so this trade-off makes sense.


Combining Multiple Replica Sub-Decoders


The decoder 700 is a combination of the different replicas of group-shuffled sub-decoders 511 obtained in the previous step 510.


Whenever a bit estimate is updated in an iterative decoder, the updating rule uses other bit estimates. In the combined decoder, which uses the multiple replica sub-decoders, the bit estimates that are used at every iteration are selected to be the most reliable estimates, i.e., the most recently updated bit estimates.


Thus, to continue our example, if we combine the three replica sub-decoders described above, then the replica decoders update their bit estimates in the first iteration as follows. In the first sub-step of the first iteration, the first replica sub-decoder updates the bit estimates in group 1, the second replica sub-decoder updates the bit estimates in group 2, and the third replica sub-decoder updates the bit estimates in group 3.


After the first sub-step is complete, the replica sub-decoders update the second group of bit estimates. Thus, the first replica sub-decoder updates the bit estimates in group 2, the second replica sub-decoder updates the bit estimates in group 3, and the third replica sub-decoder updates the bit estimates in group 1.


The important point is that whenever a bit estimate is needed to do an update, the replica sub-decoder is provided with the estimate from the currently most reliable sub-decoder for that bit. Thus, during the second sub-step, whenever a bit estimate for a bit in group 1 is needed, the estimate is provided by the first replica sub-decoder, while whenever a bit estimate for a bit in group 2 is needed, this estimate is provided by the second replica sub-decoder.


After the second sub-step of the first iteration is complete, the roles of the different replica sub-decoders change. The first replica decoder is now the source for the most reliable bit estimates for bits in group 2, the second replica sub-decoder is now the source for the most reliable bit estimates for bits in group 3, and the third replica sub-decoder is now the source for the most reliable bit estimates for bits in group 1.


The general idea behind the way the replica decoders 511 are combined in the combined decoder 700 is that at each iteration, a particular replica sub-decoder “specializes” in giving reliable estimates for some of the bits and messages, while other replica sub-decoders specialize in giving reliable estimates for other bits and messages. The “specialist” replica decoder for a particular bit estimate is always that replica decoder which most recently updated its version of that bit estimate.


System Diagram for Generic Combined Decoder



FIG. 7 shows a combined decoder 700. For simplicity, we show a combined decoder that uses three group-shuffled sub-decoders 710, 720, and 730. Each sub-decoder partitions estimates into a set of groups, and has a schedule by which the sub- it updates the estimates.


The overall control of the combined decoder is handled by a control block 750. The control block consists of two parts: reliability assigner 751; and a termination checker 752.


Each sub-decoder receives as input the channel information 701 and the latest bit estimates 702 from the control block 750. After each iteration sub-step, each sub-decoder outputs bit estimates 703 to the control block. To determine the output a particular sub-decoder applies the pre-assigned iterative decoder, e.g., BP or BF, using its particular schedule.


After each iteration sub-step, the control block receives as inputs the latest bit estimates 703 from each of the sub-decoders. Then, the reliability assigner 751 updates the particular bit estimates that the assigner has received to match the currently most reliable values. The assigner then transmits the most reliable bit estimates 702 to the sub-decoders.


The termination checker 752 determines whether the currently most reliable bit estimates correspond to a codeword of the error-correcting code, or whether another termination condition has been reached. In the preferred embodiment, the alternative termination condition is a pre-determined number of iterations. If the termination checker determines that the decoder should terminate, then the termination checker outputs a set of bit values 705 corresponding to a code-word, if a code-word was found, or otherwise outputs a set of bit values 705 determined using the most reliable bit estimates.


The description that we have given so far of our invention is general and applies to any conventional iterative decoder, including BP and BF decoders of LDPC codes, turbo-codes, and turbo product codes. Other codes to which the invention can be applied include irregular LDPC codes, repeat-accumulate codes, LT codes, and Raptor codes. We now focus on the special cases of turbo-codes and turbo product codes and quasi-cyclic LDPC (QC-LDPC) codes, in order to further describe details for these codes. For the case of QC-LDPC codes, we also provide details of the preferred hardware embodiment of the invention.


Combined Decoder for Turbo-Codes


To describe in more detail how the combined decoder can be generated for a turbo-code, we use as an example a turbo-code that is a concatenation of two binary systematic convolutional codes. We describe in detail a preferred implementation of the combined decoder for this example.


A conventional turbo decoder has two soft-input/soft-output convolutional BCJR decoders, which exchange reliability information, for the k information symbols that are shared by the two codes.


To generate the combined decoder for turbo-codes, we consider a parallel-mode turbo-decoder to be our input “conventional iterative decoder” 501. The relevant “bit estimates” are the log-likelihood ratios that the information bits receive from each of the convolutional codes. We refer to these log-likelihood ratios as “messages” from the codes to the bits.


In the preferred embodiment, we use four replica sub-decoders to generate the combined-replica group-shuffled decoder for turbo-codes constructed from two convolutional codes.


An ordering by which the messages are updated for each replica sub-decoder is assigned to each sub-coder. This can be done in many different ways, but it makes sense to follow the BCJR method, as closely as possible. In a conventional BCJR decoding “sweep” for a single convolutional code, each message is updated twice, once in a forward sweep and once in a backward sweep. The final output log-likelihood ratio output by the BCJR method for each bit is normally the message following the backward sweep. It is also possible to get equivalent results by updating the bits in a backward sweep followed by a forward sweep.


In our preferred embodiment, as shown in FIG. 8, the four replica sub-decoders are assigned the following updating schedules. In each replica sub-decoder, each single message is considered a group. The first replica sub-decoder 810 updates only the messages from the first convolutional code using the forward sweep of the schedule followed by a backward sweep of the schedule. The second replica sub-decoder 820 updates only the messages from the first convolutional code using a backward sweep followed by a forward sweep. The third replica sub-decoder 830 updates only the messages from the second convolutional code using a forward sweep followed by a backward sweep. The fourth replica sub-decoder 840 updates only the messages from the second convolutional code using a backward sweep followed by a forward sweep.


As each bit message is updated in each of the four replica sub-decoders, other messages are needed to perform the update. In the combined decoder, the message is obtained from that the replica sub-decoder which most recently updated the estimate.


Combined Decoder for Turbo Product Codes


We now describe the preferred embodiment of the invention for the case of turbo product codes (TPC). We assume that the turbo product code is constructed from a product of a horizontal code and a vertical code. Each code is decoded using a exact-symbol decoder. We assume that the exact-symbol decoders output log-likelihood ratios for each of their constituent bits.


To generate the combined decoder for turbo product codes, we consider a parallel-mode turbo product decoder to be our input “conventional iterative decoder” 501. The relevant “bit estimates” are the log-likelihood ratios output for each bit by the symbol-exact decoders for the horizontal and vertical sub-codes. We refer to these bit estimates as “messages.”


In the preferred embodiment, we use two replica sub-decoders that process successively the vertical codes and two replica sub-decoders that process successively the horizontal codes to generate the combined decoder for such a turbo product code. In the replica sub-decoders which successively process the vertical codes, the messages from those vertical codes are partitioned into groups such that messages from the bits in the same vertical code belong to the same group. In the replica sub-decoders which successively process the horizontal codes, the messages from the horizontal codes are partitioned into groups such that messages from the bits in the same horizontal code belong to the same group.


In the preferred embodiment for turbo product codes, the updating schedules for the different replica sub-decoders are as follows. In the first replica sub-decoder that processes vertical codes, the vertical codes are processed one after the other moving from left to right, while in the second replica sub-decoder that processes vertical codes, the vertical codes are processed one after the other moving from right to left. In the third replica sub-decoder that processes horizontal codes, the horizontal codes are processed one after the other moving from top to bottom. In the fourth replica sub-decoder that processes horizontal codes, the horizontal codes are processed one after the other moving from bottom to top.


At any stage, if a message is required, it is provided by the replica sub-decoder that most recently updated the message.


High-Speed Decoding of Quasi-Cyclic LDPC Codes


Quasi-cyclic low-density parity check (QC-LDPC) error-correcting codes have been accepted or proposed for a wide variety of communications standards, e.g., 802.16e, 802.11n, 3GPP, DVB-S2, and will likely be used in many future standards, because of their relatively good performance and convenient structure.


One embodiment of the invention provides a “replica-group-shuffled” decoder for QC-LDPC codes that have excellent performance vs. complexity trade-offs. The decoder can be implemented using VLSI circuits. A single overall architecture enables the decoding of QC-LDPC codes with different base matrices, different code rates, and different code lengths. The VLSI circuits can also support high-speed, or low-complexity (power) designs depending on the decoding application.


The parity check matrix H of a quasi-cyclic LDPC code is constructed using a “base matrix,” which specifies which sub-matrices to use. For example, one QC-LDPC code has a base matrix as shown in FIG. 9. This base matrix is used in the IEEE 802.16e standard.


This base matrix has 24 columns and 8 rows. The full parity check matrix H is obtained from the base matrix by replacing each −1 with a (z×z) all-zeros matrix, and replacing each other number t with the (z×z) permutation matrix Pt.


The IEEE 802.16e standard allows for many different possible values for z, ranging from z=24 to z=96. For the purposes of one implementation, we use the code shown in FIG. 9, with z=44, which means that for our code, N=24z=1056, and M=8z=352, i.e., each block has 1056 bits or information symbols, and 352 check bits.


Encoding and Decoding



FIG. 11 shows the overall structure of a system for coding a block of information symbols according to an embodiment of the invention. A source encoder encodes 1110 binary input data, which are than channel encoded 1120, and modulated 1130. The encoded and modulated data are passed through channel 1140 with additive noise 1103 as an analog signal. At a destination 1102, a received noisy signal is demodulated 1150, channel decoded 1200, and passed to a source decoder 1160 to recover the input data.


When the analog received signals are de-modulated, they are converted into a number that expresses a ‘belief’ that each received bit is a zero or a one. This initial belief for a bit is also called the “channel information.” The belief can be considered a probability that the bit is a zero, ranging from 0 to 1.0. For example, if the value of the belief is 0.0001, the signal is probably a one, and a value of 0.9999 would tend to indicate a logical 0. A value of 0.5123 could be either a zero or a one. It should be noted that the values can be in other ranges, e.g., negative and positive. In the preferred embodiment, the probability is expressed as a log-likelihood ratio (LLR), which is stored using a small number of bits. A positive LLR indicates that the bit is probably a zero, while a negative LLR indicates that the bit is probably a one.


It is the purpose of the decoder, shown in FIG. 12, to return a code-word that is highly probable given the received channel information. The beliefs are collected into groups of size z, and a group of beliefs is stored in a bank of registers 1400. The set of banks of registers are coupled to a relatively small number, e.g., 8, of “super-processors” 1202 by wires 1203. The way that the wires are connected is determined by the particular base-matrix of the QC-LDPC error correcting code that is used. Each super-processor includes a single “check processor” and a number of link processors.


Horizontal Group-Shuffled Min-Sum Decoder


As described above, in a conventional “horizontal shuffled” decoder, we cycle through the check nodes one by one, updating bit-to-check messages and beliefs automatically as one cycles through the check nodes. As also described above, in a “horizontal group-shuffled” decoder, we organize the check nodes into groups, and update the different groups serially while the checks within a group are processed. That is, all the check-to-bit messages for each check node are determined in parallel.


The way we apply this idea to decoding quasi-cyclic LDPC codes is by forming z groups of M/z checks, where z is the size of the permutation matrices in the parity check matrix, and M/z is the number of rows in the base matrix of the code. For example, for the code from the IEEE 802.16e standard, with the base matrix shown in FIG. 9, and with z=44, we have 44 groups, each of 8 check bits for total of 352 check bits.


In our architecture as shown in FIG. 12, we devote one super processor 1202 to each of the checks in a group. Therefore as shown in FIG. 12, we use eight super processors 1202, which work in parallel, stepping through the 44 groups.


Each super processor includes one check processor connected to a number of link processors. For the 802.16e code, that number is ten link processors for all but one of the super-processors, and eleven link processors for the last one. Generally, the number of link processors connected to a particular check processor is the number of non “−1” entries in a row of the base matrix. There is one check processor for each row in the base matrix. The link processors are then connected to banks of belief registers 1400, such that only one link processor can update a particular belief register at the time


Replicated Horizontal Group-Shuffled Min-Sum Decoder


We can also “replicate” the check processors 1500. As described, each check processor steps through 44 checks in order. We can replicate these processors by having, for example one processor stepping through the 44 checks in the order 1, 2, 3, . . . , 43, 44, while a second processor steps through the checks in the order 23, 24, 25, . . . 43, 44, 1, 2, . . . , 21, 22, etc. Of course, many other possible orders exist.


The belief for each of the bits is stored in a single belief register. Therefore, we carefully select the order that each check processor uses to step through the checks, in order to avoid any conflicts caused by two check processors simultaneously accessing the same belief register of memory as the processors update the bit beliefs.


Replicating check processors adds additional complexity to the decoder. Replicating reduces the number of iterations necessary to achieve a certain performance, which can be advantageous for some applications.


Decoder Architecture



FIG. 13 show an architecture of our decoder in a greater detail. Each super processor 1202 contains a check processor 1500 and a set of (e.g., ten) link processors 1600. Each super-processor is connected to a set of (e.g., ten) banks of belief registers 1400 via the link processors. The number of link processors in each super-processor is determined by the number of non-zero sub-matrices in the row corresponding to the super-processor associated with the base matrix.


Each link processor 1600 has an associated message register 1700. This architecture is much simpler than the prior art Richarchson architecture shown in U.S. Pat. No. 6,633,856 to Richardson FIGS. 15-17.


During operation, the belief registers 1400 are initialized with the beliefs produced by the demodulator 1250. The decoder 1200 operates on the beliefs for a predetermined number of iterations. During each iteration, beliefs and messages are passed back and forth between the belief registers and the check processors 1500 via the link processors 1600. The messages are stored in message registers 1700.


The link processors enforce that the beliefs stay within a predetermined range of values, e.g., that the values do not underflow or overflow the register size. In a preferred embodiment, the message registers 1700 store only check-to-bit messages. The memory can be stored in shift registers as generally described below. When the decoding terminates, the final beliefs can be read from belief registers and thresholded to recover the input data.


It should be noted, that the architecture does not include bit processors as might be found in prior art decoders. Also, processors are associated with the links themselves.


Belief Registers



FIG. 14 shows a structure a bank of belief registers in greater detail. The set of belief registers are grouped to form multiple banks of belief registers. Each bank of belief registers 1400 is associated with one column in the base matrix of the code, see FIG. 9. Each bank of belief registers stores the beliefs corresponding to variable nodes (bits) in the corresponding base matrix column. Line 1402 is used to initialize the register.


Instead of storing the beliefs statically, and accessing the beliefs as required, in this embodiment of the invention, we store the beliefs in shift registers, and the values automatically cycle from one stage to another, until the values are sent to the appropriate super-processor. This design exploits the fact the quasi-cyclic structure of the LDPC code.


A bank of belief register contains z stages (individual belief registers) 1410, where z is the dimension of permutation matrices. As can be seen, the stages are shifted in a circular manner so that each stage either passes its belief to the next stage or outputs its belief to the connected link processor 1600. The input for a stage is either the belief coming from the previous stage or the updated belief from the connected link node processor. The init signal 1402 forces all the stages to load the channel information from the demodulator 1150 of a new block to be decoded.


It should be noted that only selected stages are connected to the link processors. The placement of the connections to the link processors mostly depends on the base matrix used. Thus, if a certain super-processor is connected to a given bank of belief registers, and the base matrix has a permutation matrix of Pt for that connection, then normally one would connect the tth stage to the super-processor. However, it is important that there is an additional degree of freedom that can be exploited. One can choose, for a particular super-processor, to always connect to stage t+k instead of stage t. As long as one does that consistently for every connection coming out of a super-processor, the decoder will still operate correctly. This degree of freedom, which we call the “shift degree of freedom” is exploited to ensure that two super-processors do not simultaneously access the same belief register. In hardware implementations, it is sometimes useful for detailed timing reasons to avoid having two connections to super-processors appear in adjacent stages. We can also optimize the shift degree of freedom to also avoid this situation.


Check Processor



FIG. 15A shows the check processor 1500. The check processor has ten inputs 1501 and ten outputs 1502, one from each associated link processor, see FIG. 13. Each of the inputs comes from a different link processor, and each of the outputs goes to a different link processor. The check processor receives inputs corresponding to belief-to-check messages, and it computes output messages corresponding to check-to-bit messages. Note that the check-to-bit messages are stored in message registers 1700, but the bit-to-check messages are not stored, and are instead computed as necessary.


The check processor implements a belief propagation message update rule. In the embodiment described here, the check processor updates according to the min-sum rule described above and below using XOR gates, comparator gates, and MUX blocks shown in FIGS. 15A-15C.


The min-sum message-update rules are defined as follows. Each message is given a time index, and new messages are iteratively determined from old messages using the message-update rules. The message update rules are as follows:








Initialization


:



U

m

n


(
0
)



=
0

,






Bit





node





update


:



V

n

m


(

t
+
1

)




:

=

I
n


+





m





N


(
m
)



\

m





U


m



n


(
t
)




,






Check





node






update
:





U

m

n


(
t
)




:=


min


n





N


(
m
)



\n








V


n



m


(
t
)










n





N


(
m
)



\n












sgn


(

V


n



m


(
t
)


)






,




and








Belief





update


:



B
n
t


=


I
n

+




m


M


(
n
)






U

m

n


(
t
)





,




where Um→n is the message from check m to bit n, Vn→m is the message from bit n to check m, and Bn is the belief for bit n. The superscripts are used to indicate the time index. Note that M(n) is the set of all check nodes connected to bit node n, and vice-versa for N(m), and M(n)/m is defined as the set of all check nodes connected to bit nodes n except for check node m.


Other message updating rules, e.g., the sum-product rules, or the normalized min-sum rules, differ in comparison with the min-sum rules in the details of the message-update rules. Implementing these different message-update rules entails complexity/performance trade-offs. The trade-off do not require large changes in the over-all architecture of the system. Typically, the message-update decoding process terminates after some pre-determined number of iterations. At that point, each bit is assigned to be a zero when its (positive) belief is greater than or equal to zero, and a one otherwise, if its belief is negative.


Each message has a sign and a magnitude. For the magnitude, using the min-sum message update rule, the check processor determines a minimum message, and sends the message to all link processors, except for the one from which the link processor received the minimum message. Instead, that link receives the second best minimum value.


The sign of each outgoing check-to-bit message is determined by the number of incoming bit-to-check messages that “believe” that they are more likely to be one, and thus have a negative LLR. If that number is odd, then the outgoing message should have a negative LLR, while if that number is even, then the outgoing message should have a positive LLR.


Therefore, we determine 1550 first and second minimums for output messages. The magnitude of each input message is compared 1530 with the first minimum value. If it is equal to the first minimum value, the second minimum value is selected 1540, using a MUX, as the magnitude of the corresponding output message. Otherwise, the first minimum value becomes the magnitude of the corresponding output message.


For the sign, because a likely bit value of 0 corresponds to a positive LLR and a likely bit value of 1 corresponds to a negative LLR, the product of the signs corresponds to the XOR of the values. The sign of the output is the product of the signs of all the inputs excluding that of its corresponding input. We use two XOR blocks 1520 to fulfill this function as shown in FIG. 15A. Then, the magnitude of each output is combined with its corresponding sign, which generates the complete output message.


As shown in FIG. 15B, the comparator 1530 is actually constructed as a cascade of comparators. Three variations 1531, 1532 and 1532 are shown.


For a 10-input comparison, the input messages are divided into three groups, with 3, 3, and 4 messages, respectively. A block comparator 1641 receives three inputs and compares each pair among them. Thus, there are three parallel comparisons and according to the comparison results, it outputs the minimum value and the second minimum value. The shaded block comparator 1542 receives four inputs and compares each pair. So there are six parallel comparisons and according to the comparison results, it outputs the minimum value and the second minimum value.


In the cascade 1533, we use a comparator 1543. Because we know the ordering of the outputs of comparator 1541 in the second stage, we do not need to compare these again in the third stage.


Link Processor


At any time during the message updating process, the message Um→n from a check node m to a bit node n, the message Vn→m from a bit node n to a check node m, and the belief Bn at a bit node n are connected by an equation






V
n→m=Bn−Um→n.


This equation is useful for our embodiments, because the equation means that we only need to store the beliefs and the check-to-bit messages, and determine bit-to-check messages from the stored information as needed, see FIG. 13. This property also holds for other message updating processes, such as the sum-product process, and the normalized min-sum process, because the property only depends on the bit-node update and the belief update equations, which are unchanged in other processes, in comparison with the min-sum process.



FIGS. 18A and 18B contrast the conventional message update with the update with the update according to the embodiments of the invention. In the prior art, the check-to-bit messages 1801 are summed at a bit-processor 1810 to produce the output bit-to-check messages 1802 for. Instead, to compute the bit-to-check-messages, we subtract 1820 the check-to-bit messages from the beliefs.


Because we use this approach, we do not need to use bit-processors, and we do not need to store bit-to-check messages. Instead, we use link processors, which only need to access a single check-to-bit message and a single belief.



FIG. 16 shows the link processor 1600 for messages between the belief registers and the check processors. As shown in FIG. 16, the link processor takes inputs from a belief register 1400 and a message register 1700. In the embodiment shown, the beliefs are stored as 9-bit LLR values (one bit for the sign, the remaining 8 bits for the magnitude), while the check-to-bit messages are stored as 6-bit LLR values. After subtracting 1610 the message from the belief, and limiting the maximum value of the magnitude of the remainder to a 5 bit value, using the saturation block 1620, we send the resulting value to the corresponding check processor. To recover the beliefs from the check-to-bit messages sent by the check processor, we perform an addition operation 1630.


Message Register


As shown in FIG. 17, in one embodiment of the invention, the structure of the message registers 1700 is uses shift registers similar to those previously described for the belief registers. Each message register is associated with a non-zero entry in the base matrix of the code.


The message register includes z stages, where z is the dimension of the permutation matrices. Each stage either passes its message to the next stage or outputs its message to a connected link processor. The input is either the message coming from the previous stage or the updated message from the connected link processor. The signal init is a synchronous reset that forces all the stages to output all zeroes at a rising edge when the signal is ‘1’. The init signal is set to ‘1’ at the beginning of decoding each block, and set to ‘0’ after one clock cycle because messages need to be initialized as all zeroes.


Effect of the Invention

Simulations with the combined decoder according to the invention show that the combined decoder provides better performance, complexity and speed trade-offs than prior art decoders. The replica shuffled turbo decoder invention outperforms conventional turbo decoders by several tenths of a dB if the same number of iterations are used, or can use far fewer iterations, if the same performance at a given noise level is required.


Similar performance improvements result when using the invention with LDPC codes, or with turbo-product codes, or any iteratively decodable code.


Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.

Claims
  • 1. An apparatus for decoding a block of symbols using iterative belief propagation, comprising: a set of belief registers, each belief register configured to store a belief that a corresponding symbol in the block has a certain value;a plurality of check processors, the plurality of check processors configured to determine output check-to-bit messages from input bit-to-check messages by message-update rules;a plurality of link processors connecting the set of belief registers to the plurality of check processors; andmeans for passing the check-to-bit and bit-to-check messages and the beliefs between the set of belief registers and the plurality of check processors via the link processors for a predetermined number of iterations while updating the beliefs.
  • 2. The apparatus of claim 1, in which the link processors determine output bit-to-check messages using input beliefs and the check-to-bit messages.
  • 3. The apparatus of claim 1, in which each link processor has an associated message register, the message register storing only the check-to-bit messages.
  • 4. The apparatus of claim 1, in which the block of symbols is encoded using a quasi-cyclic low density parity code (QC-LDPC) having a base matrix of m rows and n columns, in which there is one column for every bank of belief registers, and one row for each check processor.
  • 5. The apparatus of claim 4, in which the base matrix includes z permutation sub-matrices, and each bank of belief registers includes z belief stages, each belief stage corresponding to a single belief register.
  • 6. The apparatus of claim 5, in which the values of the beliefs are circulated through the belief stages of each bank of belief registers, and an input for a particular belief stage is either the belief coming from a previous belief stage or an updated belief from a connected link processor.
  • 7. The apparatus of claim 1, in which the updating is according to a min-sum process.
  • 8. The apparatus of claim 1, in which the updating is according to a sum-product process.
  • 9. The apparatus of claim 1, in which the updating is according to a normalized min-sum process.
  • 10. The apparatus of claim 1, in which the link processor subtracts the check-to-bit message from the belief of the connected belief register to produce the bit-to-check message.
  • 11. The apparatus of claim 5, in which each message register includes z message stages.
  • 12. The apparatus of claim 11, in which the values of the message registers are circulated through the message stages of each message register during the updating.
  • 13. The apparatus of claim 1, in which the set of belief registers is partitioned into a plurality of banks of belief registers, and in which the link processors and the check processors are arranged in a set of super processors such that there is one check register and a plurality of link registers in each super processor.
  • 14. The apparatus of claim 13, in which the block of symbols is encoded using a quasi-cyclic low density parity code (QC-LDPC) having a base matrix of m rows and n columns, in which there is one super processor for each row, and in which there is one column for every bank of belief registers, and one row for each check processor, and a number of link processors in each super-processor is determined by a number of non-zero sub-matrices in the row corresponding to the super-processor.
  • 15. The apparatus of claim 14, in which the link processors are connected to the banks of belief registers, such that only one link processor updates a particular belief register at any one time.
  • 16. The apparatus of claim 15, in which a shift degree of freedom is used to avoid connecting two adjacent belief registers to the same super-processor.
  • 17. A method for decoding a block of symbols using iterative belief propagation, comprising: storing a belief that a particular symbol in the block has a certain value in an associated belief registers;determining, in associated check processors and according to message-update rules, output check-to-bit messages from input bit-to-check messages received from the belief registers; andpassing the messages and beliefs between the belief registers and the check processors via the link processors for a predetermined number of iterations while updating the beliefs.
RELATED APPLICATION

This is a Continuation-in-Part Application of United States Patent Application 20060161830, by Yedidia; Jonathan S. et al. filed Jul. 20, 2006, “Combined-replica group-shuffled iterative decoding for error-correcting codes.”