Turbo-code decoder

Abstract
The present invention provides a turbo-code decoder that adopts the parallel and systolic array VLSI structure design. Since the output of previous level can be used as the input of next level. So the advantages of the parallel and the pipeline calculation are totally achieved. The latency is only N+M+2 units of time, the latency is shorten to as about ⅕ comparing to the conventional sequential calculation structure that takes 5*(N+M) units of time. The decoding throughput is about 5*(N+M) times higher than the conventional decoder. Although the quantity of the circuit gate is about 5*(N+M) times higher than the conventional decoder. However, the VLSI techniques had been progressively improved nowadays, thus the hardware complexity is easy to overcome. Devoting the hardware cost to get the higher speed will be a changeless trend.
Description


BACKGROUND OF THE INVENTION

[0001] 1. Field of Invention


[0002] The present invention generally relates to a decoder, and more particularly, to a fast turbo-code decoder. The decoder is designed to use the systolic array very large scaled integrated (VLSI) circuits; the output of previous level can be used as the input of next level. Thus, the advantages of the parallel and the pipeline calculation are totally achieved. The decoding speed has improved manifestly comparing to the calculation time of the conventional decoder. The speed has about 5*(N+M) times faster than the conventional decoder, wherein, N stands for the block length, and M stands for register size.


[0003] 2. Description of Related Art


[0004] The error control coding is widely used in the communication system and the computer media storage. Berrou, Glavieux and Thitimajshima first proposed the turbo-code whose error-correcting capability nears to the Shannon limited error-correcting in 1993 (C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon Limited Error-correcting Coding and Decoding: Turbo-codes (1),” in Proc. ICC'93, May, 1993). Since the excellence of the error-correcting capability, the turbo-code is widely applied in the general communication system such as the CDMA transmission system. Whereas, if the block length of the conventional decoding algorithm is too small, the error-correcting capability is not good, wherein the block length is for transmission. On the other hand, if the block length of transmission is too large, for a communication system needs the real time processing, the decoding delay is too large to tolerant. Therefore, it is important to solve this problem to fulfill the requirement of the current high-speed communication.



SUMMARY OF THE INVENTION

[0005] To solve the problem mentioned above and to increase the computing speed and thus to increase the throughput. The present invention provides a structure design using the parallel and systolic array VLSI.


[0006] The structure design adopting the parallel and systolic array VLSI mentioned above, wherein the decoder is designed to use the systolic array VLSI circuits. Since the output of previous level can be used as the input of next level. So the advantages of the parallel and the pipeline calculation are totally achieved. The latency is only N+M+2 units of time, the latency is shorten to as about ⅕ comparing to the conventional sequential calculation structure that takes 5*(N+M) units of time. The decoding throughput is about 5*(N+M) times higher than the conventional decoder. Although the quantity of the circuit gate is about 5*(N+M) times higher than the conventional decoder. However, the VLSI techniques had been progressively improved nowadays, thus the hardware complexity is easy to overcome. Devoting the hardware cost to get the higher speed will be a changeless trend.


[0007] In order to achieve the objective mentioned above, the present invention uses a parallel and systolic array VLSI structure design to provide a turbo-code decoder for the communication system. The decoder comprises a serial-to-parallel output unit and a plurality of parallel decoding units. Wherein, the serial-to-parallel output unit receives a serial input signal, converts it and outputs a parallel signal. The parallel decoding units mentioned above are serially connected to form a plurality of levels. The first level parallel decoding unit receives the parallel signal that is output from the serial-to-parallel output unit. The output from the first level parallel decoding unit is sent to the second level parallel decoding unit, with certain sequence, the parallel signal passes through the parallel decoding units for decoding process.


[0008] The turbo-code decoder mentioned above, wherein, each parallel decoding unit receives an extrinsic parameter when processing the decoding process, to be the signal that is after the decoding process from the parallel decoding unit, and sends the extrinsic parameter to the next level of the parallel decoding unit.


[0009] The turbo-code decoder mentioned above, wherein, the extrinsic parameter is obtained from a deinterleaving operation. The extrinsic parameter of the first level parallel decoding unit is La0,k=(0, 0 . . . , 0), where k=1, 2, . . . , N, N is the block length of the turbo-code.


[0010] The turbo-code decoder mentioned above, wherein, the serial input signals are r1s,k, r1p,k, and r2p,k messages of the turbo-code, whereas k=1, 2, . . . , N, N is the block length of the turbo-code.


[0011] The turbo-code decoder mentioned above, wherein, the serial-to-parallel output unit receives the r1s,k, r1p,k, and r2p,k, wherein, the subscript K=0, 1, . . . , N+M−1 represents the whole block and end message. M stands for register size of the turbo-code decoder. The serial-to-parallel output unit coverts the received r1s,k, r1p,k, and r2p,k messages and outputs the results to the first level parallel decoding unit in parallel. The first level parallel decoding unit also receives an extrinsic parameter La,k at the same time. The La,k is the parameter that is obtained via a deinterleaving operation on the previous level extrinsic parameter Λ(dk). The initial value of the first level decoding unit extrinsic parameter is set as La0,k=(0, 0 . . . , 0), a first level extrinsic parameter La1,k is generated via the first level parallel decoding unit. And makes the message r1s,k, r1p,k and r2p,k pass through sequentially to be the input of next level.


[0012] The turbo-code decoder mentioned above, wherein, the parallel decoding unit comprises a first decoder, a second decoder, an interleaving unit, and a deinterleaving unit. Wherein, the first decoder receives the r1s,k, r1p,k messages and the extrinsic parameter La,k. The second decoder receives the r2p,k message and the extrinsic parameter La,k. The interleaving unit is allocated between the first decoder and the second decoder, receives the output of the first decoder. The deinterleaving unit is connected to the second decoder, alternately outputs the output of the first decoder and the second decoder.


[0013] The turbo-code decoder mentioned above, wherein, the first decoder of the parallel decoding units constitutes a systolic array VLSI circuits structure.


[0014] The turbo-code decoder mentioned above, wherein, the systolic array VLSI circuits is composed of N+M units of the module C, A, B, D, and E. Wherein, the module C receives La1,k, r1s,k and r1p,k, and outputs rk(1)(m) and rk(0)(m). Module A calculates a forward recursive probability parameter αk. Module B calculates a backward recursive probability parameter βk. Module D adopts (N+M) units of parallel calculation to obtain the Λ(dk) after the calculation of the αk, βk, and γk(i) are finished. Module E outputs the value of the calculation from the module D, K=0, 1, . . . , N+M−1.


[0015] The turbo-code decoder mentioned above, wherein, the value of the Λ(dk) is calculated according to a MAP algorithm and following equation:
1Λ(dk)=logmmγk(1)(m,m)·αk-1(m)·βk(m)mmγk(0)(m,m)·αk-1(m)·βk(m)


[0016] Wherein, αk is the forward recursive probability parameter, βk is the backward recursive probability parameter, γk(i) is a branch probability parameter.


[0017] The turbo-code decoder mentioned above, wherein, the forward recursive probability parameter αk is obtained from the calculation of the previous parameter αk−1 and the branch probability parameter γk(i), the equation is as follows:
2αk(m)=m1i=0γk(i)(m,m)·αk-1(m)mmi=01γk(i)(m,m)·αk-1(m)


[0018] The turbo-code decoder mentioned above, wherein, the backward recursive probability parameter βk is obtained from the calculation of the next parameter βk+1 and the branch probability parameter γk(i), the equation is as follows:
3βk(m)=m1i=0γk+1(i)(m,m)·βk+1(m)mmi=01γk+1(i)(m,m)·βk+1(m)


[0019] The turbo-code decoder mentioned above, wherein, the branch probability parameter γk(i) is obtained from following equation according to the MAP algorithm:


γk(i)(m′,m)=p1s,k|dk=i,sk=m,sk−1=m′)·p(r1s,k|dk=i,sk=m,sk−1=m′q(dk=i|sk=m,sk−1=m′)·Pr{sk=m|sk−1=m′}


[0020] Wherein whether the probability parameter q(dk=i|sk=m,sk−1=m′) is 0 or 1 depends on the input bit dk=i is 0 or 1 combines the probability of the state m′ to the state m.







BRIEF DESCRIPTION OF THE DRAWINGS

[0021] The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention, and together with the description, serve to explain the principles of the invention. In the drawings,


[0022]
FIG. 1 schematically shows a turbo-code encoder comprising of two parallel RSC encoders;


[0023]
FIG. 2 schematically shows the decoding structure of the turbo-code;


[0024]
FIG. 3 schematically shows the structure of the P levels parallel decoding unit (Level 1, Level 2, . . . , Level P);


[0025]
FIG. 4 schematically shows the structure of the first level decoding unit of the parallel decoding units in FIG. 3;


[0026]
FIG. 5 schematically shows the structure of the systolic array VLSI that is composed of the first level decoding unit of the parallel decoding unit in FIG. 4;


[0027]
FIG. 6 schematically shows the structure of the simplified modules, data streams, and the latches of the parallel decoding units in FIG. 3 when N=4 and M=3;


[0028]
FIG. 7 schematically shows the calculation structure of the branch probability parameter γk(i)(m′, m);


[0029]
FIG. 8 schematically shows the structure of module A for calculating αk;


[0030]
FIG. 9 schematically shows the structure of module B for calculating βk;


[0031]
FIG. 10 schematically shows the structure of module D for calculating Λ(dk);


[0032]
FIG. 11 schematically shows the structure of the calculation submodule L (using analog circuit);


[0033]
FIG. 12 schematically shows the structure of the fast RSC encoder, wherein, Gb=1011, Gd=1010;


[0034]
FIG. 13 schematically shows the trellis diagram;


[0035]
FIG. 14 schematically shows the detail structure of module A (wherein the submodule L is designed as the digital circuit);


[0036]
FIG. 15 schematically shows the detail structure of module D;


[0037]
FIG. 16 schematically shows the latency for accomplishing a message having a block size length; and


[0038]
FIG. 17 schematically shows the comparison of the bit error rate, wherein, the iterative decoding number P=6, code ratio R=1/3, register size M=3, generator parameter Gb=1011, Gd=1110, the 256*256 random interleaving method is adopted by the first decoder and the second decoder.







DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0039] The present invention provides a structure design adopting the parallel and systolic array VLSI. The structure design adopting the parallel and systolic array VLSI mentioned above, wherein the decoder is designed to use the systolic array VLSI circuits. Since the output of previous level can be used as the input of next level. So the advantages of the parallel and the pipeline calculation are totally achieved. The latency is only N+M+2 units of time, the latency is shorten to as about ⅕ comparing to the conventional sequential calculation structure that takes 5*(N+M) units of time. The decoding throughput is about 5*(N+M) times higher than the conventional decoder. Although the quantity of the circuit gate is about 5*(N+M) times higher than the conventional decoder. However, the VLSI techniques had been progressively improved nowadays, thus the hardware complexity is easy to overcome. Devoting the hardware cost to get the higher speed will be a changeless trend.


[0040] Berrou, Glavieux and Thitimajshima first proposed the turbo-code whose error-correcting capability nears to the Shannon limited error-correcting in 1993 (C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon Limited Error-correcting Coding and Decoding: Turbo-codes (1),” in Proc. ICC'93, May, 1993). The encoding structure comprises two parallel recursive systematic convolution encoder (hereafter abbreviated as RSC). The important characteristics are (1) Two convolution codes with the same structure encode in parallel, thus the receiving end is able to decode the message repeatedly; (2) To increase the minimum distance between two encoding codes by using the non-uniform random interleaving (S. Benedetto and G. Montorsi: “Role of Recursive Convolutional Codes in Turbo Codes,” Electron. Lett., Vol.31, No.11, pp. 858-859, 1995); and (3) Soft-in Soft-out decoding.


[0041] Because the characteristics mentioned above, the capability of the error-correcting appears equal and excellent. Due to the excellence of the error-correcting capability, the turbo-code is widely applied in the general communication system such as the CDMA transmission system (J. Blaanz, P. Jung, and M. Na B han, “Realistic Simulations of CDMA Mobile Radio Systems Using Joint Detection and Coherent Receiver Antenna Diversity,” IEEE third International Symposium on Spread Spectrum Techniques and Applications, Oulu Finland, 1994).


[0042] Referring to FIG. 1, it schematically shows a turbo-code encoder comprising of two parallel RSC encoders. The input bit sequence is represented as d=(d1, d2, d3, . . . , dk, . . . , dN), where dk is the input bit of the encoder at time k, k is from 1 to N, N is the block size. The output of the encoder at time k is represented as ck=(Xk,y1k,y2k). Since the encoder is systematic, so xk=dk, the surplus code output is represented as y1k, y2k. The decoding structure of the turbo-code is shown in FIG. 2. The decoder 200 comprises two recursive decoding units 210 and 220; two recursive decoding units 210 and 220 are connected in interleaving and deinterleaving unit as shown as the 212, 214 and 216 in the diagram.


[0043] It is assuming that the Gaussian noise is the noise used in the communication channel. It is further assuming that the noise of each transmission symbol is an independent noise, the expectation value is 0, and the variant is N0/2. Using the binary modulation, if the input bit dk is 0, the modulation is −1.0; if the input bit dk is 1, the modulation is +1.0. Therefore, the sequence of the receiving vector R is represented as R=(r1, r2, r3, . . . , rk, . . . , rN), the kth symbol is represented as




r


k
=(r1s,k, r1p,k, r2p,k)=(2xk−1+n1s,k, 2y1k−1+n1p,k, 2y2k−1+n2p,k)



[0044] Wherein, n1s,k, n1p,k, and n2p,k is the noise of the channel r1s, r1p, r2p at time k respectively, and they are independent each other. The detail of the Maximum A Posteriori (hereafter abbreviated as MAP) algorithm proposed by BCJR (L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate,” IEEE Tran. I. T., Vol.20, pp.284-287, March 1974) is not superfluously described here. Herein, only describe the result of the MAP algorithm. The objective of the MAP algorithm is to calculate whether the A Posterioi Probability (hereafter abbreviated as APP) of each input bit dk is the ratio of 1 or 0. Wherein, k=0, 1, 2, . . . , N−1. From the derivation result of the turbo-code having the error-correcting capability nears to the Shannon limited error-correcting proposed by Berrou, Glavieux and Thitimajshima mentioned above, the following equation is obtained:
4Λ(dk)=logmmγk(1)(m,m)·αk-1(m)·βk(m)mmγk(0)(m,m)·αk-1(m)·βk(m)(1)


[0045] Wherein, αk is the forward recursive probability parameter, βk is the backward recursive probability parameter, γk(i) is the branch probability parameter. As we can see from the name, the forward recursive probability parameter αk can be obtained from the calculation of the previous parameter αk−1 and the branch probability parameter γk(i), the equation is as follows:
5αk(m)=m1i=0γk(i)(m,m)·αk-1(m)mmi=01γk(i)(m,m)·αk-1(m)(2)


[0046] The backward recursive probability parameter βk can be obtained from the calculation of the next parameter βk+1 and the branch probability parameter γk+1(i), the equation is as follows:
6βk(m)=m1i=0γk+1(i)(m,m)·βk+1(m)mmi=01γk+1(i)(m,m)·βk+1(m)(3)


[0047] The branch probability parameter γk(i) is obtained from following equation according to the MAP algorithm:


γk(i)(m′,m)=p1s,k|dk=i,sk=m,sk−1=m′)·p(r1s,k|dk=i,sk=m,sk−1=m′)·q(dk=i|sk=m,sk−1=m′)·Pr{sk=m|sk−1=m′}  (4)


[0048] Wherein, whether the probability parameter q(dk=i|sk=m,sk−1=m′) is 0 or 1 depends on the input bit dk=i is 0 or 1 combines the probability of the state m′ to the state m.


[0049] In a sequential calculation decoder, it is assuming that each Λ(dk) in equation (1) needs a unit of time, wherein, K is from 0 to N+M−1, N stands for the block length of the transmission, and M stands for the register size of the decoder. It is further assuming that αk, βk, and γk(i) in equation (2), (3), and (4) needs a unit of time respectively, wherein, i=0 or 1. Therefore, the first level decoder needs 5*(N+M) units of time. According to the decoding algorithm such as the Viterbi algorithm (A. J. Viterbi, “Error Bound for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans. Inform. Theorem, vol.IT-13, pp.260-269 April 1967)(A. J. Viterbi and J. K. Omura, “Principles of digital communication and coding,” New York: MacGraw-Hill, 1979) or the BCJR algorithm mentioned above, if N is too small, the error-correcting capability is not good. However, if N is too big, for a communication system needs the real time processing, the decoding delay is too big to tolerant.


[0050] As mentioned in the previous paragraph, currently the decoding algorithm is used to decide the value of Λ(dk) in equation (1), if Λ(dk)>0, dk=1, otherwise, dk=0. To calculate each Λ(dk) in equation (1), the αk, βk, and γk(i) in equation (2), (3), and (4) must be calculated first. For a sequential calculation decoder, it needs 5*(N+M) units of time (G. Masera, G. Piccinini, M. R. Roch, nad M. Zqmboni, “VLSI Architectures for Turbo Codes,” IEEE Trans. On VLSI Systems, vol.7, no.3, pp. 369-379, September 1999).


[0051] In order to increase the calculation speed and thus to increase the throughput. A preferred embodiment of the present invention adopts the parallel and systolic array VLSI structure design. The whole decoder circuit is composed of P levels parallel decoding units. The structure is shown in FIG. 3. There is a serial in parallel out unit before the first level to receive the message r1s,k, r1p,k and r2p,k wherein, the subscript K=0, 1, . . . , N+M−1 represents the whole block and end message. The output is sent to the first level decoding unit, the other input of the first level decoding unit is La,k, herein, the La,k is the parameter obtained via the deinterleaving on the previous level extrinsic parameter Λ(dk), the initial value of the 0 th level decoding unit extrinsic parameter is set as La0,k=(0, 0 . . . , 0). The first level extrinsic parameter La1,k is generated via the first level decoding unit, and the message r1s,k, r1p,k, and r2p,k sequentially pass through to be the input of next level.


[0052] Each level of the decoding unit comprises two decoders. These two decoders are the first decoder and the second decoder as shown in FIG. 4, wherein, the structure of the first decoder is similar to the second decoder's. The whole systolic array VLSI structure is shown in FIG. 5. Wherein, N and M can be adjusted according to the design requirement. For easy to describe, the block length N=4 and register size M=3 are used as an example. FIG. 6 schematically shows the structure of the simplified modules, data streams, and the latches. It is apparent for those who skilled in the art that even the embodiment is used as an example in the present invention, the embodiment will not limit the apply range of the present invention.


[0053] According to the literature (I. L. Turner, “A Modified BAHL Algorithm for Recursive System Convolutional Codes on Rayleigh Fading Channels,” IEEE 49th Vehicular Technology Conference, pp.75-76 vol. 1, 1999), the apriori probability of the input bit dk calculated by the previous level decoder is represented as
7Pr{sk=m|sk-1=m}=eL(dK)1+eL(dK),ifq(dk=1|sk=m,sk-1=m)=1(5)Pr{sk=m|sk-1=m}=eL(dK)1+eL(dK)=11+eL(dK),ifq(dk=0|sk=m,sk-1=m)=1(6)


[0054] Wherein, L(dk) is the log likelihood ratio (LLR) extrinsic parameter calculated from the message bit dk by the previous level decoder. It is assumed in a AWGN channel, well than, the partial probability of the equation (4) is calculated as follows:
8p(r1s,k|dk=i,sk=m,sk-1=m)=12πσr1sexp[-(r1s,k-μr1s)22σr1s2](7)p(r1p,k|dk=i,sk=m,sk-1=m)=12πσr1pexp[-(r1p,k-μr1p(m,m))22σr1p2](8)


[0055] Wherein, μr1s and μr1p(m′,m) is the expectation value of r1s and r1p respectively. Thereinto, μr1s depends on the input bit, and μr1p(m′,m) depends on the input bit and also impacted by the previous state and current state. σr1s2 and σr1p2 is the variant of the r1s and r1p respectively. It is assumed that the variant of r1s and r1p are the same. Therefore, the above two equations can be multiplied and consolidated as follows:
9p(r1s,k|dk=i,sk=m,sk-1=m)·p(r1p,k|dk=i,sk=m,sk-1=m)=12πσ2exp[-12·(r1s,k-μr1s)2+(r1p,k-μr1p(m,m))2σ2](9)


[0056] For a discrete memory-less gauss channel, the branch probability parameter γk1 or γk0 for input bit is 1 or 0 can be calculated from the equation (4), (5), (6), and (9) as follows:
10γk(1)(m,m)=12πσ2exp[-12·(r1s,k-1)2+(r1p,k-μr1p(m,m))2σ2]·eL(dK)1+eL(dK)(10)γk(0)(m,m)=12πσ2exp[-12·(r1s,k+1)2+(r1p,k-μr1p(m,m))2σ2]·11+eL(dK)(11)


[0057] According to the equation (10) and (11), the branch probability parameter γk(i)(m′,m) can be calculated in parallel. The N+M units of the module C (as shown in FIG. 7) are used to calculate each γk(i)(m′, m) in parallel. Thus, the N+M units of time can be shortened to a unit of time. The input signal of the module C in FIG. 7 is La,k, r1s,k and r1p,k respectively, wherein, k=1, . . . , N+M. The module C is used to calculate γk(1)(m′,m) and γk(0)(m′,m) respectively.


[0058] In addition, since the forward recursive probability parameter αk is output from the previous level to be the input of the next level, and the backward recursive probability parameter βk is output from the next level to be the input of the previous level. It is suitable to design as the systolic array VLSI to increase the calculation speed. According to the equation (2), N+M units of Module A (as shown in FIG. 8) are used to calculate αk. Wherein, the first level input is γ1(1)(m′,m) and γ1(0)(m′,m) and the initial value of the forward recursive probability parameter α0(m) are used to calculate α1(m). The second level input γ2(1)(m′,m) and γ2(0)(m′,m) and α1(m) are used to calculate α2(m). Thus, the systolic array is able to work simultaneously. All αk(m), wherein k=1, . . . , N-M, can be calculated after N+M units of time.


[0059] According to the equation (3), it adopts N+M units of Module B (as shown in FIG. 9) for calculating βk. Wherein, the first level input is γN+M(1)(m′,m) and γN+M(0)(m′,m) and the initial value of the backward recursive probability parameter βN+M(m) are used to calculates βN+M−1(m). The inputs of the second level γN+M(1)(m′,m) and γN+M−1(0)(m′,m), and the backward recursive probability parameters βN+M−1(m) are used to calculate βN+M−2(m). The advantage is the structure of each module is the same; the output of the previous level is the input of the next level. Thus, the throughput is (N+M) times of the original throughput.


[0060] When the calculation of αk, βk and γk(i) are completed, according to the equation (1), it adopts N+M units of module D (as the module D shown in FIG. 10) to calculate Λ(dk). By using the parallel calculation, the N+M units of time is shortened to a unit of time.


[0061] The submodule L located in between the module A and the module B calculates the product-sum of two inputs. As the example shown in FIG. 11, the submodule L adopts the analog circuit provided by the conventional technique. The analog circuits proposed by the reference literatures also can be used. Like H. -A. Loeliger, F. Lustenberger, F. Tarkoy, M. Helfensten, “Decoding in Analog VLSI,” IEEE Communication Magzine, Vol.37 (4), pp.99-101 April 1999, or H. -A. Loeliger, F. Lustenberger, M. Helfensten, F. Tarkoy, “Probability Propagation and Decoding in Analog VLSI,” IEEE Trans.on Information Theory, Vol.47(2), pp.837-843 February 2001, or F. Lustenberger, M. Helfenstein, H, -A, Loeliger, F. Tarkoy, G. S. Moschytz, “An Analog VLSI Decoding Technique for Digital Codes,” ISCAS '99. Proceedings of the 1999 IEEE international Symposium on Circuits and Systems, Vol. 2, pp.424-427 1999, . . . , etc.


[0062] For easy to describe the detail structure of the module A, B, and D mentioned above, the preferred embodiment of the present invention uses the turbo-code of the third generation CDMA mobile communication standard as an example for description. However, it is not used to limit the apply range of the present invention. The turbo-code of the third generation CDMA mobile communication standard is: a decoder register size M=3. For the first decoder and the second decoder, the code ratio R=1/3, the parameter of the feedback generator and the parameter of the direct-feed-forward generator is Gb=1011 and Gd=1110 respectively. As shown in FIG. 12, the recursive systematic convolution encoder (hereafter abbreviated as RSC), wherein, the RSC adopts the fast RSC encoder, for the physical content of the fast RSC encoder, please refer to the “Fast Turbo-code Encoder” proposed by the same inventor of the present invention in April, 2001. The trellis diagram is shown in FIG. 13.


[0063] Referring to the content of FIG. 6, FIG. 6 schematically shows the structure of the simplified modules, data streams, and the latches when the block length N=4 and the register size M=3. There are N+M=7 units of the module A, B, C, and D. In the first unit of time, the parallel input La,k, r1s,k and r1p,k signals, k=1,2, . . . , 6,7 are used simultaneously to calculate the γ1(i), γ2(i), . . . , γ7(i). In the 7 units of time afterwards, the α1, α2, . . . , α6 and β1, β2, . . . , β6 is calculated respectively. In the other one unit of time afterwards, according to the equation (2), the parallel input γk(1)(m′,m), γk(0)(m′,m), αk−1 and βk−1 are used to calculate Λ(dk). The Λ(dk) is used as the extrinsic parameter of the next level, if the last level is reached, the dk is determined accordingly, if dk>0, determine dk=1, otherwise dk=0.


[0064] According to the trellis diagram of FIG. 13. It is easy to simplify the structure of the module A, B, and D. FIG. 14 schematically shows the detail structure of the module A based on this design. The detail structure of the module B is also similar to the module A. The detail structure of the module D is shown in FIG. 15.


[0065] The latency spent for accomplishing a message with one block size length of the parallel and systolic array VLSI structure design of the preferred embodiment according to the present invention, as shown in FIG. 16, is N+M+2 units of time. Comparing to the original conventional sequential calculation structure that needs 5*(N+M) units of time, the time is shortened to about ⅕ only. Furthermore, the systolic array VLSI structure design is able to generate a set of dk in every one unit of time after the first set of dk is generated.


[0066] The performance comparison is shown in table 1:
1TABLE 1The structure comparison of the systolic array and the sequential typeSystolic ArrayItem/StructureSequential StructureStructurePro and ConLatency5*(N + M)(N + M) + 2The latency is aboutOutput Time5*(N + M)1The throughput isabout 5*(N + M)timesNumber of Hardware15*(N + M)The complexity ofGatethe circuit is about5 *(N + M) times


[0067] In order to prove the error-correcting feature of the preferred embodiment according to the present invention. Herein, the CDMA mobile communication system mentioned above is used as an example. The RSC decoder with register size M=3 is shown in FIG. 12. The trellis diagram is shown in FIG. 13. The iterative decoding number P=6. The random interleaving method is adopted in between the first decoder and the second decoder. The simulation result is obtained as shown in FIG. 17, wherein, the block length N=65536, the vertical axis is the decoding performance denoted by the bit error rate (BER). The horizontal axis is the communication environment denoted by the signal/noise ratio. As we can see here, under the situation with the same signal/noise ratio, the larger the iterative decoding number, the better the decoding performance. This is accorded with the theory, and is similar to the simulation result disclosed in the contents of the literatures: C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon Limited Error-correcting Coding and Decoding: Turbo-codes (1),” in Proc. ICC'93, May, 1993, and P. Robertson “Illuminating the Structure of Code and Decoder of Parallel Concatenated Recursive Systmatic (Turbo) Codes,” in Proc. IEEE GLOBECOM Conf., San Francisco, Calif. Pp. 1298-1303, December 1994.


[0068] The present simulation uses the programming language C language running on the GenuineInter Pentium® III CPU,128 MB RAM personal computer. The simulation runs on the working platform with the Windows Me® operating system. The bit error rate comparison shown in FIG. 17, wherein, the iterative decoding number (p=1, . . . , 6), the code ratio R=1/3, the register size M=3, the generator parameter Gb=1011, Gd=1110, and uses the 256*256 random interleaving deinterleaving method.


[0069] The present invention provides a fast turbo-code decoder. Wherein, the decoder is designed to use the systolic array VLSI circuits. Since the output of previous level can be used as the input of next level. So the advantages of the parallel and the pipeline calculation are totally achieved. The latency is only N+M+2 units of time, the latency is shorten to as about ⅕ comparing to the conventional sequential calculation structure that takes 5*(N+M) units of time. The decoding throughput is about 5*(N+M) times higher than the conventional decoder. Although the quantity of the circuit gate is about 5*(N+M) times higher than the conventional decoder. However, the VLSI techniques had been progressively improved nowadays, thus the hardware complexity is easy to overcome. Devoting the hardware cost to get the higher speed will be a changeless trend.


[0070] Although the invention has been described with reference to a particular embodiment thereof, it will be apparent to one of the ordinary skill in the art that modifications to the described embodiment may be made without departing from the spirit of the invention. Accordingly, the scope of the invention will be defined by the attached claims not by the above detailed description.


Claims
  • 1. A turbo-code decoder for communication system, the decoder comprising: a serial-to-parallel output unit, used to receive a serial input signal and output a parallel signal after converting the serial input signal; and a plurality of parallel decoding units, wherein the parallel decoding units are serially connected to form a plurality of levels, the first level parallel decoding unit receives the parallel signal that is output from the serial-to-parallel output unit, the output from the first level parallel decoding unit is sent to the second level parallel decoding unit, with certain sequence, the parallel signal passes through the parallel decoding units for decoding process.
  • 2. The turbo-code decoder of claim 1, wherein each of the parallel decoding unit receives an extrinsic parameter when processing the decoding process, to be the signal that is after the decoding process from the parallel decoding unit, and sends the extrinsic parameter to the next level of the parallel decoding unit.
  • 3. The turbo-code decoder of claim 2, wherein the extrinsic parameter is obtained from a deinterleaving operation, the extrinsic parameter of the first level parallel decoding unit is La0,k=(0, 0 . . . , 0), where k=1, 2, . . . , N, N is the block length of the turbo-code.
  • 4. The turbo-code decoder of claim 1, wherein the serial input signal are r1s,k, r1p,k, and r2p,k messages of the turbo-code, whereas k=1, 2, . . . , N, N is the block length of the turbo-code.
  • 5. The turbo-code decoder of claim 4, wherein the serial-to-parallel output unit receives the r1s,k, r1p,k, and r2p,k, wherein the subscript K=0, 1, . . . , N+M−1 represents the whole block and an end message, wherein M stands for a total number of latch units of the turbo-code decoder, the serial-to-parallel output unit coverts the received r1s,k, r1p,k, and r2p,k messages and outputs results to the first level parallel decoding unit in parallel, the first level parallel decoding unit also receives an extrinsic parameter La,k at the same time, the parameter La,k is obtained via a deinterleaving operation on the previous level extrinsic parameter Λ(dk), the initial value of the first level decoding unit extrinsic parameter is set as La0,k=(0, 0 . . . , 0), a first level extrinsic parameter La1,k is generated via the first level parallel decoding unit, and the message r1s,k, r1p,k and r2p,k pass through sequentially to be the input of the next level.
  • 6. The turbo-code decoder of claim 5, wherein the parallel decoding unit comprises: a first decoder, used to receive the r1s,k, r1p,k messages and the extrinsic parameter La,k; a second decoder, used to receive the r2p,k message and the extrinsic parameter La,k; an interleaving unit, located between the first decoder and the second decoder, used to receive the output of the first decoder; and a deinterleaving unit, used to connected to the second decoder, alternately outputs the output of the first decoder and the second decoder.
  • 7. The turbo-code decoder of claim 6, wherein the first decoder of the parallel decoding units constitutes a systolic array very large scaled integrated (VLSI) circuits structure.
  • 8. The turbo-code decoder of claim 7, wherein the systolic array VLSI circuits is composed of N+M units of the module C, A, B, D, and E, wherein, the module C receives La1,k, r1s,k and r1p,k, and outputs γk(1)(m′,m) and γk(0)(m′,m), the module A calculates a forward recursive probability parameter αk, the module B calculates a backward recursive probability parameter βk, the module D adopts (N+M) units of parallel calculation to obtain the Λ(dk) after the calculation of the αk, βk, and γk(i) are finished, and the module E outputs the value of the calculation from the module D, where K=1, 2, . . . , N+M.
  • 9. The turbo-code decoder of claim 8, wherein the value of the Λ(dk) is calculated according to a MAP algorithm and following equation:
  • 10. The turbo-code decoder of claim 9, wherein the forward recursive probability parameter αk is obtained from the calculation of the previous parameter
  • 11. The turbo-code decoder of claim 9, wherein the backward recursive probability parameter βk is obtained from the calculation of the next parameter βk+1 and the branch probability parameter γk+1(i) the equation is as follows:
  • 12. The turbo-code decoder of claim 9, wherein the branch probability parameter γk(i) is obtained from following equation according to the MAP algorithm:
  • 13. The turbo-code decoder of claim 11, wherein, assuming in a AWGN channel, the probability is calculated as follows:
  • 14. The turbo-code decoder of claim 12, wherein, assuming that the variant of r1s and r1p are the same, therefore, the above two equations can be multiplied and consolidated as follows:
  • 15. The turbo-code decoder of claim 11, wherein assuming for a discrete memory-less gauss channel, the branch probability parameter γk(1) or γk(0) for input bit being 1 or 0 can be calculated from the equation as follows:
  • 16. The turbo-code decoder of claim 5, wherein the N=4 and the register size M=3, the simplified modules, a data stream, and a latch structure are shown as the content of FIG. 6.
  • 17. The turbo-code decoder of claim 5, wherein the a priori probability of the input bit dk calculated by the previous level parallel decoding unit can be used by the next level decoder.
  • 18. The turbo-code decoder of claim 5, wherein L(dk) is the log likelihood ratio (LLR) extrinsic parameter calculated from the message bit dk by the previous level decoder.