The invention pertains to an iterative decoder, and more particularly to an iterative decoder intended to be embodied in the form of an integrated circuit, or a plurality of integrated circuits, using very large scale integration (VLSI) technology. This decoder exhibits a novel architecture making it possible to considerably reduce the complexity (area of silicon occupied) and/or the power consumption as compared with the decoders known from the prior art and/or to increase the data bit rate for given complexity and given power consumption; for this reason it is particularly suited to space applications and to mobile telephony.
The transmission of a digital signal over a noisy channel generally comprises the use of an error correction code so as to obtain a bit error rate (BER) or block error rate (also frame error rate, FER) which is acceptable even with a small signal-to-noise ratio. As a general rule, the decoding operation is more complex than the coding operation in terms of computation time and equally in terms of memory, and hence of area of silicon occupied.
Among the various coding and decoding algorithms which have been developed, iterative codes, such as “turbo codes”, have acquired great significance in the last few years. These codes are described, for example, in the following articles:
“Turbo” codes are obtained by parallel concatenation of convolutional codes; codes obtained by series concatenation of convolutional codes, which have similar properties and also constitute a field of application of the present invention are known as “turbo-like” codes.
These codes are characterized by the fact that the decoding is an iterative procedure and the BER and the FER decrease at each iteration. Often the number Nit of iterations is fixed and it is determined by considering the case of the blocks that are most corrupted by noise. It is obvious that this entails a waste of resources, since most blocks undergo more iterations than necessary. For this reason stopping rules have been envisaged; see in this regard:
To increase the data bit rate, use is typically made of decoders consisting of several replicas of the same decoding module, each module taking charge of the processing of a block of bits. Disregarding problems related to multiplexing on input and to demultiplexing on output, the bit rate is proportional to the number M of modules used. There are essentially three architectures based on this principle: the pipeline architecture, the parallel architecture and the matrix architecture, which are illustrated by
In the pipeline architecture, M=Nit modules are connected in series as in an assembly line. A block of bits introduced at the input of the line exits same after having been processed once by each module, hence after having undergone Nit iterations.
In the parallel architecture, M modules each perform the complete decoding (Nit iterations) of a block of bits. It is easy to appreciate that if M=Nit the performance of a parallel decoder is the same as that of a pipeline decoder, both in terms of complexity and bit rate. If M>Nit, the bit rate is higher, but so is the complexity, whereas the reverse effect is obtained for M<Nit. Here and subsequently in this document, the term “complexity” is understood to mean a quantity proportional to the area occupied on an electronic chip by a circuit embodying the decoder. Complexity depends both on the architecture of the decoder and on the microelectronic technology chosen; for a given technology, the complexity makes it possible to compare various architectures.
The matrix architecture is merely a generalization of the previous two: a matrix decoder is composed of M pipeline decoders in parallel.
These architectures are essentially equivalent and the choice to use one rather than another depends on considerations specific to the application considered. A decoder based on any one of them can operate only for a fixed number of iterations, this entailing a waste of hardware resources and a higher than necessary energy consumption.
More recently, modular decoder architectures allowing the application of stopping rules have been developed.
Document DE 102 14 393, which represents the closest state of the art, discloses an iterative decoder comprising a plurality of servers, each iteratively decoding a data block, an input buffer including more memory locations than servers and a control unit for allocating data packets stored in the input buffer to the different servers.
Document WO 02/067435 A1 describes a decoder comprising a plurality of decoding modules in parallel and a device for dynamically allocating incoming data packets. Although the allocating device is equipped with a temporary memory, the decoder is designed in such a way that the probability of an incoming data packet not finding any free decoding module is small. In order for this condition to hold, it is necessary to use a large number of modules, of which at least one will not be busy almost at each instant. This therefore results in a waste of hardware resources. Moreover, this document provides no information which makes it possible to determine the number of decoding modules and of elements of the temporary memory as a function of the performance required and of the operating conditions of the decoder.
Document EP 0 973 292 A2 describes the use of a buffer memory for each decoding module (also called a “server”), so as to produce as many queues as modules, plus possibly a global buffer memory at the level of the allocating device. In this document the use of a stopping criterion is not described: on the contrary, the number of iterations is determined a priori on the basis of the ratio of the power of the carrier to that of the noise. While this ratio remains constant, the duration of the decoding is the same for all the packets: there is therefore the same problem of overdimensioning encountered in the architectures with a fixed number of iterations described above.
A subject of the present invention is an iterative decoder exhibiting a novel architecture which allows better use of hardware resources as compared with the prior art. Such an architecture is dubbed “matrix of servers operating simultaneously with a buffer memory” (otherwise known as “Buffered Array of Concurrent Servers” or BACS).
More precisely, the invention pertains to an iterative decoder, comprising:
Preferably α≦0.01.
According to various embodiments of an iterative decoder according to the invention:
The invention also pertains to a communication system using an iterative decoder as described above.
The invention also pertains to a process for manufacturing an iterative decoder as described above comprising the steps consisting in:
According to a first variant, step A.b comprises:
According to a second variant, step A.b comprises:
According to a third variant, step A.b comprises:
According to a fourth variant, step A.b comprises:
According to a particular embodiment of the invention, the cost function C(N, L) considered in the manufacturing processes described above is proportional to the physical dimensions (complexity) of the electronic circuit which constitutes the decoder.
Other characteristics, details and advantages of the invention will emerge from reading the description offered with reference to the appended drawings, given by way of example, and in which:
According to
A packet currently being decoded cannot be erased from the input buffer memory 23 so long as it has not been fully processed: for this reason this memory must comprise at least N locations. Moreover, if the iterative code used is a turbo code, each server must be furnished with at least one extrinsic memory location 25 for storing the information exchanged at each iteration between the two elementary decoders of which it consists (see the references mentioned above regarding turbo codes). The extrinsic memory may possibly be absent in the case of other coding techniques.
The control unit 21 accomplishes three tasks:
Such a control unit can be embodied according to techniques known to the person skilled in the art.
According to alternative variants of the invention, the control unit 21 can accomplish only a part of the tasks mentioned above. For example, each server can autonomously verify its own stopping condition. However, statistical multiplexing constitutes an essential characteristic of a decoder according to the invention.
In a preferred embodiment of the invention, the allocating of the data packets stored in the input buffer memory to the servers available is managed by the control unit 21 on the basis of the first-in-first-out (FIFO) principle, that is to say the packets are processed in the same order in which they are received, but other principles may be applied. For example, if the decoding of a packet turns out to be particularly lengthy, it may be interrupted so as to deal with an increase in the data bit rate at input and resumed later.
Advantageously, several servers can simultaneously access the buffer memory, which is a shared resource.
Several embodiments of iterative decoding modules are known to the person skilled in the art, for this reason their structure and their performance will not be discussed here. For more information see, for example, documents EP 1 022 860 B1, WO 01/06661 and EP 0 735 696 B1. In what follows it will be assumed that the servers are identical to one another, but the more general case of different servers comes within the scope of the present invention.
If a data packet to be decoded is received while all the N servers, and hence the first N locations of the input buffer memory, are occupied, said packet must be stored in one of the L additional memory locations until a server becomes free. The decoder therefore constitutes a queuing system and its performance can be evaluated with the aid of “queuing theory”, explained for example in the text:
The bit rate of each server Γserv, as well as the area of silicon occupied by each server (Sdec) and by each buffer memory location (SI/O), are considered to be external parameters, which depend on the technological choices made during the embodying of the decoder.
The decoder is modeled by a queuing system of the D/G/N/N+L type. In Kendall's notation, commonly used in queuing theory, this signifies that the system is characterized by:
Since the number of waiting positions is finite and the service time is not deterministic, it is possible for a packet to arrive while the queue is full: this packet is therefore lost. The probability of losing a packet, the so-called blocking probability, is indicated by PB and its value must be small enough so as not to substantially affect the benefits of the decoder. A typical precondition is that PB should be less than the FER by at least two orders of magnitude: PB≦α·FER*, with α≈0.01.
In principle, the service time Ts is a discrete random variable which can take the values Nit·TS,min, where TS,min is the time required to perform an iteration and Nit is an integer lying between 1, corresponding to the case where the stopping condition is satisfied after a single iteration, and +∞, since the decoding may not converge. In practice, the stopping rules always provide for a maximum number of allowable iterations (for example 100). Curve 32 in
Although Ts is, in principle, a time, it is advantageous to measure it as a number of iterations, so as to get away from the physical characteristics of the decoder, which depend on the microelectronic technology used.
As illustrated by the flowchart in
Having determined the FDP, it is possible to calculate the expected value of Ts, E{Ts}, this constituting step 402.
In step 403, which may for example be simultaneous with step 402, we put L=Lmin, where Lmin is the lower extreme of an interval determined a priori of allowable values for the number of additional memory locations [Lmin, Lmax], in which typically Lmin=1.
In step 404, a first estimate of the number of servers, N*, is calculated by considering an infinite number of waiting positions. In this case, the mean bit rate of each server is simply 1/E(Ts). The number of servers necessary to ensure a target bit rate Γ* is therefore N=Γ*E(Ts).
The blocking probability PB corresponding to the pair (N, L) is calculated (step 405) and compared with the preset PB* (step 406). The calculation of PB is done by simply counting lost packets during a series of simulations based on the D/G/N/N+L model and using the FDP of Ts determined in step 401. For further particulars regarding these simulations, see the work by S. K. Bose cited above.
The pair (N, L) is picked (step 408) if PB<PB*, otherwise N* is incremented by one unit (step 407); the loop 420 is traversed as long as the condition PB<PB* does not hold.
If L<Lmax (test step 409), the number of additional memory locations is incremented by one unit (step 410) and the cycle begins again from step 404 (loop 430).
Finally, out of all the allowable pairs (N, L) we choose that one (N*, L*) which minimizes the cost function, C(N, L), which represents for example the area of silicon occupied (step 411). If the complexity of the control unit is neglected, the cost function is for example given by C(N, L)=N·Sdec+L·SI/O.
The only parameter which remains to be determined is the number N′ of locations of the output buffer memory. Its value depends on the specific application: two typical cases may for example be considered: M′=N+L and N′=0 (output memory may be completely absent).
From the point of view of the calculation time, the most expensive step is certainly the determination of PB, which has to repeated for each pair (N*, L*) and which requires the carrying out of a significant number of simulations of the queuing system. The other step which requires a high calculation time is the determination of the FDP(Ts) which is done with the aid of several simulations of the decoding algorithm with various input packets. The statistical distribution of the service time for these packets is determined by the value of Eb/N0 under the operating conditions for which the decoder is provided and by the allowable error rate (BER* or FER*). These simulations may be considerably simplified by replacing the stopping rule actually applied in the decoder with the so-called “genie-aided” stopping criterion: the iteration is stopped when a packet has been correctly decoded. Obviously, such a criterion can be applied only within the framework of a simulation, where the data packets are entirely known. The “genie-aided” criterion is generally supplemented with a criterion for stopping after a maximum number of iterations so as to prevent the decoder from remaining blocked in an infinite loop. The maximum number of iterations is chosen in such a way that the error rate is less than or equal to BER* or FER*. Experience shows that this simplified approach gives results that are close enough to those obtained by applying realistic stopping criteria.
An alternative process is illustrated in
In
A concrete example of a design of a decoder according to the invention is now considered. The decoding algorithm used is a serial turbo code, obtained by concatenating two codes: the outer code is a 16-state convolutional with rate ½, the inner code is a 2-state differential code with rate 1/1. The sequence at the output is interleaved with a pseudo-random law (“S-random law”). For further particulars regarding this type of code, see:
The servers and the buffer memories are embodied in a conventional manner; for exemplary architectures of iterative decoders that can be used as servers in an embodiment of the present invention, see:
The microelectronics technology used is the 0.18 μm HCMOS process from “Taiwan Semiconductor Manufacturing Company” (TSMC). The transmission of the data is characterized by a spectral efficiency η=1 bps/Hz and a ratio Eb/N0=2.75 dB. Use is made of quadrature phase shift keying (QPSK) modulation with r=½ and blocks of length K=748 bits; the allowable FER is 10−5 and PB*=10−7. Under these conditions,
As a function of the constraints related to the technology for embodying a decoder according to the invention, cost functions other than complexity may be taken into account during design.
In the case Γ*=50, for example, the use of a stopping criterion, and hence of statistical multiplexing of the packets, even without additional memory locations, reduces the number of servers from 50 to 15. The addition of 5 memory locations reduces the number of servers to 11, which is still very advantageous, while even an “infinite” buffer memory does not make it possible to drop below 10 servers: there is therefore no reason to increase the value of L excessively. In fact, in the case considered, the small length of the blocks (K=748 bits) renders the complexity of the system quite insensitive to variations in L. If substantially longer blocks are considered, the complexity increases greatly if the number of memory locations exceeds its optimum value.
Hitherto, the case has always been considered of a communication system with a bit rate Γ* and a ratio Eb/N0 (and hence a signal-to-noise ratio) which are prescribed, and it is proposed that the complexity of the decoder be minimized. The decoder's performance improvement obtained by the present invention can also be utilized to decrease the signal-to-noise ratio, and hence the power transmitted, while maintaining the complexity of the system (or another cost function) within acceptable limits. Represented in
A dimensioning process for a decoder according to the invention aimed at minimizing the power transmitted, and hence the ratio Eb/N0 of the signal to be decoded, is illustrated in
One begins by determining the bit rate Γ*, the error rate (BER*/FER*) and the blocking probability (PB*) that are allowable, a cost function (C(N, L)) and an allowable maximum value (C*) of said cost function (step 450). In step 452 a first trial value of Eb/N0 is chosen; in step 454 a decoder is dimensioned, by one of the processes illustrated in
Again, it is possible to choose the pair (N*, L*) in such a way as to maximize the bit rate Γ for a given complexity (or a value of another cost function) and a given ratio Eb/N0, as illustrated by
One begins by determining the allowable error rate (BER*/FER*) and the allowable blocking probability (PB*), the ratio Eb/N0, a cost function (C(N, L)) and an allowable maximum value (C*) of said cost function (step 470). In step 472 a first trial value of the bit rate Γ is chosen; in step 474 a decoder is dimensioned, by one of the processes illustrated in
The person skilled in the art can easily refine these processes: for example, when C exceeds C* for the first time, it is possible to decrease the value of δ so as to approach closer to the allowable minimum value of Eb/N0 (
Other possible criteria for optimizing the decoder will be apparent in an obvious manner to the person skilled in the art depending on the specific application for which a decoder according to the present invention is provided.
In processes 4A-4D it is understood that some of the steps may be performed simultaneously.
Although in the detailed description only the particular case of turbo codes has been considered, the present invention applies equally to the decoding of all the other iterative codes such as, for example, “turbo-like” codes, low-density parity check (LDPC) codes, interference cancellation (IC) codes, serial interference cancellation (SIC) codes and parallel interference cancellation (PIC) codes, without this list being regardable as exhaustive.
Number | Date | Country | Kind |
---|---|---|---|
0310261 | Aug 2003 | FR | national |