MESM: A FAST BJCR BASED DECODER IMPLEMENTATION SCHEME

Abstract
A memory efficient, accelerated implementation architecture for BCJR based forward error correction algorithms. In this architecture, a memory efficiency storage scheme is adopted for the metrics and channel information to achieve high processing speed with a low memory requirement. Thus, BCJR based algorithms can be accelerated, and the implementation complexity can be 5 reduced. This scheme can be used in the BCJR based turbo decoder and LDPC decoder implementations.
Description

This application relates to the field of telecommunications, and more specifically, to a system of forward error correction capable of reducing errors in received data. Reliable data transmission is an important issue for wireless transmission systems. forward error correction (FEC) schemes are widely used to improve the reliability of wireless systems. In recent years, some close to Shannon limit FEC schemes, such as Turbo codes and low density parity check (LDPC, have been adopted in high data rate systems. Many of these schemes include decoders that utilize the BCJR algorithm, proposed by Bahl, Cocke, Jelinek and Raviv in “Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate,” IEEE Trans. Inform. Theory, Mar. 1974, vol. IT-20, No. 2, pp. 284-287, the disclosure of which is incorporated herein by reference. The BCJR algorithm is a modified Maximum A Posteriori (MAP) algorithm. The objective of the BCJR algorithm is to obtain a good guess of the bit probabilities. These probabilities include systematic information and extrinsic information. The performance of the decoder is close to the Shannon limit.


Although the performance of the BCJR algorithm is near optimal, integrated circuit implementations of the BCJR based algorithm face two main challenges: latency and a large memory requirement.


Upon receiving a data frame fed into the BCJR algorithm based decoder, the decoder works in an iterative way. For each iteration, the decoder at first checks the data frame and extrinsic information generated in the previous iteration from head to tail and then from tail to head to collect decoding information. Based on the collected information, the decoder estimates the most likely possible input data. The estimation is fed back to the other decoder for the next iteration. This means that the BCJR based decoder must process the data from the beginning of the data frame to the end and then back before estimations are made during the iteration processing. After a certain number of iterations, the result of extrinsic information converges, and the process stops. For a data frame of n bits, the forward and backward information getting step takes 2n substeps, and the estimation step needs n substeps. Hence, latency of the BCJR algorithm is high. The BCJR decoder has to keep all of the decoding information until the extrinsic information is generated, and the extrinsic information must be stored for the next iteration. For a data frame of n bits and the turbo code space of S, 2×n×S memory units are needed to keep the information. So the memory requirement is large for the maximum a posteriori (MAP) algorithm.


In exemplary embodiments according to the present invention, a memory-efficient storage method (MESM) with module normalization and embedded storage technology is provided for the BCJR based decoder implementation to reduce memory requirements and speed up the decoder. MESM can be used to design Turbo and low-density parity check (LDPC) decoders.





The accompanying drawings, together with the specification, illustrate exemplary embodiments of the present invention, and, together with the description, serve to explain the principles of the present invention.



FIG. 1 is a block diagram illustrating an exemplary implementation scheme for MESM;



FIG. 2 is a trellis graph illustrating an exemplary decoding path;



FIG. 3 is a diagram illustrating a metric calculation;



FIG. 4 is a diagram illustrating the metrics update method of MESM; and



FIG. 5 is a diagram illustrating module normalization.





In the following detailed description, only certain exemplary embodiments of the present invention are shown and described, by way of illustration. As those skilled in the art would recognize, the invention may be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Like reference numerals designate like elements throughout the specification.


A memory-efficient storage method (MESM) uses a shared memory storage scheme to reduce the memory requirement for the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm and a two-way metrics update method to update metrics information and produce the extrinsic information to speed up the iterative decoding procedure.


In MESM, a shared storage scheme is used for the metrics information and system information, which is shown as the stacks 102, 104, 106, and 108 in FIG. 1. According to a metrics information sharing scheme, forward and backward metrics information updates in the BCJR algorithm can be performed in parallel, and extrinsic information for the next iteration can be calculated with the metrics update. The metrics can be updated with module normalization (discussed below) and stored with embedded storage technology. Thus, the processing of the BCJR based decoder is sped up and memory is reduced significantly. With the help of a system information sharing scheme, the computation load for the system information is half of the traditional method. Therefore, the BCJR based decoding algorithm can be simplified.


For convenience, the following table defines some of the main symbols used in this document.















αk
The metrics of the MAP decoder in forward direction at kth step


βk
The metrics of the MAP decoder in reverse direction at kth step


LLek
The extrinsic information of the MAP decoder for the kth



symbol


S
The state space of the turbo code


xk
The kth symbol at the decoding path


yk
The kth parity code word at the decoding path


ds
The kth symbol fed into the decoder


dp
The kth parity code word fed into the decoder


s+
The sub-collection of the states transfer from s′ to s, (s′ → s),



when xk = +1


s
The sub-collection of the states transfer from s′ to s, (s′ → s),



when xk = −1


length
The length of data frame for decoder









The following example illustrates some of the underlying ideas of the BJCR algorithm. Assume an archaeologist finds a letter written by his ancestors hundreds of years ago. Some words on the letter have become unrecognizable due to the age of the letter, but the archaeologist can guess the unrecognizable words utilizing relationships among the words. He can guess the unrecognizable words more easily and correctly within a paragraph than he can with a single sentence. If there are enough useful words on the letter, the archaeologist can get all of the information that his ancestors wanted to tell.


The BCJR algorithm works in a similar manner by adding redundant bits into the source data with the relationship encoding algorithm. The source data and the redundant bits are transmitted together. The receiver gets a noisy version of the sent data. The BCJR based decoder scans the received data from head to tail to recognize the relationship forward metrics, and scans the data from tail to head to recognize the relationship backward metrics. Then, the decoder utilizes the metrics to determine the optimal decoding path from all possible paths. The optimal path is the best guess for all inputted data. For example, as illustrated in FIG. 2, a data frame of length=15 is fed into the BCJR based decoder. The state space of the code, S, is 8; and the optimal decoding path 202 is shown in FIG. 2.


According to the BCJR based decoder architecture illustrated in FIG. 1, the output of each decoder is the probability distribution for each bit in the frame. A common tool to express the bit probabilities in bit decoding is based on the Log Likelihood Ratio (LLR). The LLR of the kth bit is defined by:










LLR


(

s
k

)


=


log



P


(


c
k

=

1
/
X


)



P


(


c
k

=

0
/
X


)




=





s
+






α

k
-
1




(

s


)




γ


(


s


,
s

)





β
k



(
s
)








s
-






α

k
-
1




(

s


)




γ


(


s


,
s

)





β
k



(
s
)










(
1
)







where sεS, s ′εS, and ck is the value of the kth bit in the transmitted codeword. X is the received vector. αk−1 is the k-1th forward metrics. βk is the kth backward metrics. γk(s′, s) is the probability that system state transforms from the s′ state at the k-1th step to the s state at the kth step.


According to an exemplary embodiment, as illustrated in FIG. 1, the metrics update can be performed by forward metric update block 112 and backward metric update block 114 as follows. In the forward direction, αk=(α0k, α1k, . . . , αs−1k) is obtained with αk−1. In the backward direction, βk=(β0k, β1k, . . . , βs−1k) is obtained with βk+1.











α
k



(
s
)


=





s







α

k
-
1




(

s


)





γ
k



(


s


,
s

)







s






s







α

k
-
1




(

s


)





γ
k



(


s


,
s

)










(
2
)








β

k
-
1




(
s
)


=




S





β
k



(
s
)





γ
k



(


s


,
s

)







s






s







α

k
-
1




(

s


)





γ
k



(


s


,
s

)










(
3
)







where sεS, s ′εS, α0 and βlength are known, and probability γk(s′, s) is defined as:











γ
k



(


s


,
s

)


=

p


(


s
k

=


s
/

s

k
-
1



=

s




)








=

exp


[



1
2




x
k



(


LLe
k

+


L
c



ds
k
s



)



+


1
2



L
c



dp
k



Y
k



]








=


exp


[


1
2




x
k



(


LLe
k

+


L
c



ds
k
s



)



]


+


γ
k
e

(


s


,










The extrinsic information, LLek, is computed by extrinsic calculation blocks 110a and 110b utilizing αk−1 and βk:










LLe
K

=





s
+






α

k
-
1




(

s


)





γ
e



(


s


,
s

)





β
k



(
s
)








s
-






α

k
-
1




(

s


)





γ
e



(


s


,
s

)





β
k



(
s
)









(
4
)







Where sεS and s ′εS.


The procedure of the metrics update according to one exemplary embodiment is shown in FIG. 3. According to equations (1)-(4), the metrics in the forward direction, αk−1, and in the reverse direction, βk, should be kept until LLek or LLRk is calculated. Thus, the memory requirement, M, is:



M=2×length×S.


Equations (1)-(4) illustrate three truths. The first is that a metrics update can be performed independently in the forward and backward directions, since a can be computed without the information of β, and vice versa. The second is that αk−1 is used in two cases: one for updating αk and another for calculating LLek. A similar case exists for βk. We also find that αk−1 and βk are useless after LLek is obtained from FIG. 2. The third is that each γk(s′, s) is used two times: once for the forward update and again for the backward update. Thus, γk (s′, s) can be calculated once and stored for two uses. Thus, a novel metrics update and storage scheme is proposed to speed up the BCJR based algorithm. This scheme is referred as a memory-efficient storage method (MESM). Due to the calculation independence of the metric updating, the memory can be shared for the forward and backward metrics storage, and metrics updating can be performed simultaneously for the forward and backward directions in MESM. The system information, γk (s′, s), is shared by the forward and backward update. The MESM can reduce the computation complexity of the BCJR algorithm.


In MESM, the same memory can be shared by forward and reverse metrics, α and β, and the BCJR based decoder performs the extrinsic and metrics computations at the same time. The procedure of MESM is as follows.


The initial forward and reverse metric information, α0 and βlength, is stored at the head (M0) and tail (Mlength) of the same memory space. Let kf=0 and kr=length, where length is an odd number. Let






h
=



length
-
1

2

.





In the forward direction, read (Mkf) and (Mkf+1) as αkf and βkf+1 respectively. Calculate αkf+1 with αkf, LLekf+1 with αkf and βkf+1. In the backward direction, read (Mkr−1) and (Mkr) as the αkr−1 and βkr respectively, calculate βkr−1 with βkr, LLekr with αkr−1 and βkr. Then, store αkf+1 and βkr−1 in (Mkf+1) and (Mkr−1) respectively. γkf+1 and γkr−1 are pushed in the stacks for the forward and backward directions, respectively.


When kf>h, γkf+1 and γkr−1 are popped out of the stacks for the αkf+1, βkr, LLekf+1 and LLekr calculation. LLekf+1 and LLek are outputted as useful extrinsic information, otherwise LLekf+1 and LLekr are discarded.






k
f
=k
f+1 and kr=kr−1. Return to 2 until Kr=1.



FIG. 4 shows the metrics update method of MESM. At the initial time, step0, α0 and βlength are stored in the head and tail of the memory space. The new forward metrics α1 and LLe1 are obtained at the end of step1. LLe1, denoted LLe1′ in FIG. 4, may be useless because the content of the 2nd memory unit (M1) is meaningless at step1. After step1, α1 is stored in (M1) for calculation of LLe1 in the future, and LLe1′ is discarded. βlength−1 is stored in (Mlength−1) calculation of LLelength−1 in the future, and LLe′length−1 is discarded. After h steps, α0, α1, . . . , αh are stored in (M0), (M1), . . . , (Mh) respectively, and βh+1, βh+2, . . . , βlength are stored in (Mh+1, (M+2), . . . , (Mlength), respectively. At the end of step h+1 , a reasonable LLeh+1 can be obtained in the forward direction, since the available βh+1 in (Mh+1) can be found at the beginning of step h+1. In the reverse direction the available αh in (Mh) can be found at the beginning of step h+1. Now, an interesting thing happens: αh and βh+1 become useless, because LLeh+1, αh+1, and βn are calculated at the end of step h+1. Thus, αh and βh+1 in the memory can be overwritten with βh and αh+1, respectively. At the end of step h+2, αh+2 and LLeh+2 will be obtained in the forward direction, and βh−1 and LLeh in the reverse direction, and αh−1 and βh+2 in the memory can be replaced with βh−1 and αh+2, respectively. So, all of the extrinsic information will be obtained at the end of step length. In MESM, length memory is used for the γk(s′,s) storage. Thus the memory requirement for MESM is:






M=length×(S+1).


That is only half of the traditional method. All of the extrinsic information is obtained as soon as the metrics calculation is complete. According to FIG. 3, when the metrics update is finished, all of the extrinsic information of the BCJR decoder has been calculated. Thus, length substeps are needed in MESM instead of 3×length in the traditional method. Thus, the proposed scheme can also help to speed up each iteration.


In MESM, if length is odd, the metrics are updated as illustrated in FIG. 3. If the data frame length is even, an additional symbol according to the initial state of encoder should be added at the end of the tail to perform the proposed method directly. Because the turbo encoder will return to initial state at the end of encoding for every data frame, the additional symbol will not cause any errors for the BCJR decoder.


According to the ideal BCJR algorithm, metrics at each step is a set of numbers between 0 and negative infinity (0 to the bound of the negative numbers in practice). The metric of one state closer to 0 means the state is the correct state on the optimal decoding path with higher probability. If ak(s) is the maximum, ak(s)=0, s is the correct state with the highest probability at the kth step in the forward direction. If βk(s) is the maximum, βk(s)=0, s is the correct state with the highest probability at the kth step in the reverse direction.


According to an exemplary embodiment of the present invention, module normalization can be used for the metrics update. In module normalization, the metrics, for example αk, are replaced by the normalized αk(s)=mod{αk(s), C}, as in FIG. 5. It should be noted that −C< αk(s)<C. We can think of the quantities { αk(s)} as positions on the modular circle with radius







C
π

,




with angle








π




α
_

k



(
s
)



C

.




Let αk(s1) and αk(s2) be real numbers with |αk(s1)−αk(s2)|<C. Let A be the angle swept out by counter clockwise motion along the arc from αk(s2) to αk(s1), then:





αk(s1)>αk(s2), if and only if A<π.


This proposition can be interpreted in less mathematical terms when thinking of αk as runners in a race. If all the runners are running in one half the circle at all times (a very competitive race), it is very easy to determine who is the leader. Therefore, if the difference between αk is always less than half of the modular circle, the maximum of αk can be easily found regardless of the absolute position on the circle.


In module normalization, the modular arithmetic can be implemented by 2's-complement adders and subtractors. Let the 2's complement data range used in intelligent normalization be (−C, C). According to (6a) and (7a) in the log MAP algorithm, the normalization is already performed, if the following relationships are satisfied:





max Ak(s)−min Ak(s)<C, and





max Bk(s)−min Bk(s)<C.


The normalization in the BCJR can be simplified with module normalization.


If {αk−1(s′)} and {βk(s)} are obtained with module normalization, the contribution of the maximum of {αk−1(s′)} and the maximum of {βk(s)} will be maintained. The same result will be obtained with (2-3), if 2's complement calculation is used, and the following relationships are satisfied:





|max Pk+−max Pk|<C;





max Pk+−min Pk+<C; and





max Pk−min Pk<C,


where:





max Pk+=maxsεS+{Pk(s)}, min Pk+=minsεS+{Pk(s)}; and





max Pk=maxsεS{Pk(s)}, min Pk=minsεS{Pk(s)}.


Therefore, the procedure of finding the maximum metric is not needed. The process of BCJR at every step is simplified, and metrics update is sped up.


The penalty of the module normalization is the extra bit required to perform modular normalization. The bit width of module normalization can be determined with |max P+−max P|<C in the BCJR algorithm.


According to the ideal of module normalization, the metrics are moved to new places where αk(0) and βk(0) always equal 0. Thus, {α(0)}={α0(0), α1(0), . . . , αlength(0)} and {β(0)}={β0(0), β0(0), . . . , βlength(0)} need not be stored in the memory. This technique may be referred to as embedded storage. It can reduce the memory requirement.


Since αk(0) and βk(0) are always 0 during the metrics and extrinsic information calculation, the operations using them can be omitted. Thus, embedded storage will not increase the computation load.


If the sliding window method is adopted in the implementation of the BCJR algorithm, MESM can also be used. The implementation block diagram of MESM is shown in FIG. 1.


While the present invention has been described in connection with certain exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims, and equivalents thereof.

Claims
  • 1. A method of decoding a turbo code by a decoder implementing the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm, the method comprising: a) iteratively calculating forward metrics and backward metrics according to maximum a posteriori (MAP) type algorithm;b) storing each of the forward metrics as they are calculated in sequence beginning from a head position of a memory unit having a plurality of storage locations equal in number to a length of the turbo code, wherein the plurality of storage locations are initially unloaded with any of the forward metrics or the backward metrics;c) storing each of the backward metrics as they are calculated in sequence beginning from a tail position of the memory unit;d) after a number of the forward metrics and a number of the backward metrics fill up all of the initially unloaded storage locations of the memory unit, iteratively calculating a plurality of extrinsic information corresponding to the forward metrics and the backward metrics stored in the memory unit,wherein at least one of the forward metrics already used for calculating its corresponding one of the plurality of extrinsic information is overwritten in its corresponding storage location by one of the backward metrics calculated after the at least one of the forward metrics, andwherein at least one of the backward metrics already used for calculating its corresponding one of the plurality of extrinsic information is overwritten in its corresponding storage location by one of the forward metrics calculated after the at least one of the backward metrics.
  • 2. The method of claim 1, wherein an additional symbol according to an initial state of the decoder is added to the tail of the turbo code before the turbo code is decoded.
  • 3. The method of claim 1, wherein the forward metrics and backward metrics each comprise normalized metrics.
  • 4. The method of claim 1, wherein the forward metrics are calculated independently from the backward metrics.
  • 5. The method of claim 1, wherein said iteratively calculating forward metrics and backward metrics comprise calculating a plurality of system information each corresponding to a corresponding pair of one the forward metrics and one of the backward metrics.
  • 6. The method of claim 5, wherein each of the plurality of system information is calculated once and used for the calculation of the corresponding one of the forward metrics and the corresponding one of the backward metrics.
  • 7. A method of decoding a received message comprising L symbols according to a BCJR MAP algorithm, the method comprising: calculating forward metrics αk corresponding to metrics of the MAP decoder in a forward direction at a kth step, according to an equation:
  • 8. The method of claim 7, further comprising adding an additional symbol according to an initial state of an encoder at the end of a tail of the received message when L is an even number.
  • 9. The method of claim 7, wherein: the forward metrics αk comprise normalized αk(s)=mod {αk(s), C}, wherein quantities { αk(s)} are positions on a modular circle with radius
PCT Information
Filing Document Filing Date Country Kind 371c Date
PCT/IB09/51299 3/27/2009 WO 00 9/29/2010
Provisional Applications (1)
Number Date Country
61072185 Mar 2008 US