The invention is based on a priority application EP 04291360.8 which is hereby incorporated by reference.
The invention relates to a method for codec mode adaptation of adaptive multi-rate codecs regarding speech quality. In more detail, the invention relates to a method of codec mode adaptation for switching of a speech codec, in particular a GSM or UMTS multi-rate codec (AMR), in dependency of the prevailing channel condition for transmission of speech frames in a telecommunication system.
In a telecommunication system for digital voice transmission, e.g. a digital mobile radio system like GSM or UMTS speech signals to and from a mobile station are speech encoded and channel encoded before transmitted digitally over the disturbed mobile radio channel. Recently speech and channel codecs in telecommunication systems are provided to have multiple modes. For example, the adaptive multi-rate narrow band (AMR-NB) codec standardized by 3GPP has 8 modes (4,75; 5,15; 5,90; 6,70; 7,40; 7,95; 10,2 and 12,2 kbit/s) of source bit rate. All bit rates can be used e.g. in GSM full-rate channel with 22,8 kbit/s capacity.
The difference between source and channel bit rate is fed up with bits used for channel error protection. That means, that for a lower rate mode more channel error protection is used which makes the transmission more robust in bad channel conditions. The reason for adapting such a variable rate scheme is to adapt the necessary compression (or source rate mode) to the prevailing channel condition in a way that error free channel decoding is already possible, but the compression is not too strong to loose achievable speech quality.
In GSM recommendation 05.09 or 3GPP rec. TS45.009 respectively, a solution is disclosed on how to get the switching or adaptation decision based on an estimated carrier to interferer ratio (C/I) that are estimated in the base station (BTS) for each received data burst. The C/I values describe the disturbance of each received data burst. They vary in the time and are a measure for the current channel quality. After smoothing the C/I values with a linear filter a list of one to three switching thresholds and a hysteresis is used for the switching/adaptation decision of the codec mode. The switching decision between the modes is based on the carrier-to-interference ratio C/I measured or estimated at the respective receiver. The smoothing of the C/I values means that a mean C/I value is calculated. Furthermore, this averaging over a huge number of C/I values results in a slow reaction of the codec mode adaptation decision. Thus, if the channel falls in a bad channel condition this mechanism can be too slow for reaction and due to the “misselected” mode a long row of speech frames are muted thereby degrading said speech quality.
Further, the estimation of a mean C/I value by smoothing the C/I values is disadvantageous. Thus, for example a channel with a mean C/I value of 5 dB and a high variation of C/I values of −5 dB to +15 dB and a channel with the same mean C/I value, but a small variation of C/I values, say from +2 dB to +8 dB, will give the same C/I-mean value or codec mode adaptation decision after averaging with a linear smoothing filter. In other words: the decision algorithm mentioned above results in the same codec mode for both channel conditions which is certainly not adequate.
It is therefore an object of the present invention to provide a method, a mobile terminal, and a base station for codec mode adaptation of an adaptive multi-rate codec in dependency of the prevailing channel condition for transmission of speech frames, which enables a more efficient codec mode decision regarding the speech quality of a speech channel.
This object is achieved by a method according to claim 1, a mobile terminal according to claim 12, and a base station according to claim 13.
According to the invention a codec mode adaptation is made by:
A mobile terminal and/or a base station according to the invention comprises means for:
The invention recognizes, that the local minimum values of the channel quality value speech frame curve are indeed relevant for the speech quality. With other words: all the peaks in negative direction of a channel quality value curve are relevant since these lead to lost frames in the speech decoder. A channel quality value is for example the carrier-to-interferer ratio (C/I) in a speech transmission system like GSM. In contrast, the filtering of C/I values of the state of the art only gives an average C/I value for a plurality of consecutive speech frames, and thus high C/I values of speech frames are overvalued since the transmission can not become more than error free.
In other speech transmission systems e.g. IS-95 or UMTS variable rate speech codecs are also employed. Channel quality can also be given by various measures such as channel decoder metrics or estimated C/I or number of channel (raw) bit errors or receive code power and maybe thus generally denoted as channel quality indicator or channel quality value.
According to an alternative embodiment of the invention a codec mode adaptation decision is made by: determining a bit error rate (BER) from a channel quality value, e.g. a carrier-to-interferer ratio (C/I), per data burst, generating a frame bit error rate value of a speech frame from a plurality of consecutive data bursts, generating an ordered list of frame bit error rate values for a plurality of speech frames, determining a critical bit error rate level for the plurality of speech frames based on a maximum operation of the frame bit error rate values of the plurality of speech frames or a sorting and selecting operation of the frame bit error rate values of the plurality of speech frames, controlling a codec mode adaptation based on the critical bit error rate level.
The determining of a bit error rate (BER) from a channel quality value like C/I is for example made with a calibration curve generated with representative test data or predetermined test pattern or by an estimation. The generating of a frame BER value is preferable made by an estimation or an averaging of the BER values of consecutive data bursts. This is in particular sufficient, if a codec mode switching decision is made for speech frames and not for individual data bursts, which is a practical approach.
As mentioned before, the filtering of C/I values of the state of the art only gives an average C/I value for a plurality of consecutive speech frames, and thus high C/I values of speech frames are overvalued since the transmission can not become more than error free.
The carrier-to-interferer ratio could be measured or estimated. The carrier-to-interferer ratio (C/I) is a logarithmic scale. According to the invention it is realised that the C/I in general is not well suitable for monitoring small deviations.
Furthermore, the local maximum values of a frame averaged bit error rate corresponds to local minimum values of a C/I speech frame curve and are relevant for the speech quality. With other words: all the positive peaks of the BER values are relevant since these lead to lost frames in the speech decoder. According to the invention a critical bit error rate level is determined by a (sizewise) sorting or maximum operation of frame BER values. The solution according to the invention is orientated towards the real speech quality. It enables in particular an enhanced codec mode adaptation for highly and small fluctuating channels. Instead of a linear filtering of C/I values of the state of the art a non-linear filtering is performed such, that local minimum values of speech channel quality are the basis for a codec mode adaptation decision.
A mobile terminal and a base station according to the invention comprises means for carrying out the method according to the invention. These means comprises in particular a program code for executing the method on a processor, e.g. a DSP, and the processor unit memory for storing the program code and data.
According to a preferred embodiment of the invention the controlling of the codec mode adaptation is made in dependency of the number of speech frames which can be handled by the error concealment of the speech codec. That means, that the critical bit error rate level is determined in dependency of the number of speech frames handled by the error concealment of the speech codec or that the second, third, fourth etc. lowest minimum of the speech frame channel quality value curve is used for controlling the codec mode adaptation.
For each codec mode there is a bit error rate level above which channel error correction is not possible anymore and due to residual bit errors such a speech frame has to be marked bad, i.e. a bad frame indication (BFI) results for this frame. The critical BER level is determined such, that at this level it is guaranteed that maybe some single speech frames are indicated as BFI and handled by an error concealment, e.g. replaced by a repetition of the previous frame, but not so many frames as would go beyond the ability of the error concealment. Thus, a codec mode is selected, not too high to create bad speech quality due to too much bad frames, but high enough to not unnecessarily reduce the speech quality a priory by high compression. If, for example, the number of allowed concealed speech frames is 2, than the 2+1 highest, i.e. the third highest bit error rate value is the critical bit error rate level for the window.
According to a further development of the invention the determining of a critical bit error rate level for a plurality of speech frames is achieved by:
The critical bit error rate value for the window is selected from the list of the sorted bit error values. For descending ordered BER values and a number of concealed frames C, the critical BER value is the C+1 ordered BER value. For a number of allowed error concealed frames of 1, for example, the critical BER value is the second highest BER value in the list of descending sorted BER values. For an error concealment number of 2, it is the third highest value and so on. Such a method according to the invention offers a simple as well as efficient way for finding the right critical BER value, i.e. the second, third, fourth etc. highest local maximum taking in consideration the number of allowed concealed frames (that can be considered as outliers), even without determining the local maximum BER value, i.e. the first (highest) local maximum.
According to a further embodiment of the invention, a total critical bit error rate level is determined from a plurality of windows, i.e. a plurality of window bit error rate values. Thereby, the codec mode adaptation decision is made in view of the speech quality from a greater number of past frames, i.e. windows. However, an further way to incorporate a greater number of past speech frames is to enlarge the length of a window.
In a preferred embodiment of the invention, the critical bit error rate values/level of the individual windows are subjected a weighting operation. Thereby a kind of “forgetting factor” is introduced emphasizing the importance of present speech channel quality but also including the speech channel quality in the past.
In a further preferred embodiment of the invention, the method is not only applied to completely filled windows, but also to windows which are partially filled with frame bit error rate values. Thus, the method is continuously executed for each frame bit error rate value, i.e. for each speech frame received in the receiver. In an advantageous manner, the critical bit error rate value, in particular the critical window BER value and the total critical BER value of a plurality of windows, are determined in dependency of the present speech frame quality or prevailing channel condition, respectively, without waiting time for filing a window.
In a preferred embodiment of the invention, the window BER value is set to the last window BER value, when the number of filled frame BER values of said window is less than or equal to the number of allowed concealed speech frames.
According to a further development of the invention, the method is applied with at least two sequences of windows which are partly overlapping with each other. For the first sequence of windows and the second sequence of windows a total bit error rate level is determined, and the total critical bit error rate level is determined by a maximum operation out of the at least two sequences of windows.
Other object and advantages of the present invention may be ascertained from a reading of the specification and appended claims in conjunction with accompanying drawing wherein:
It is to be understood that the aforementioned features and the features explained below can be used not only in the respective combinations described but also in other combinations or alone without departing from the scope of the present invention.
Preferred embodiments of the present invention will now be described with reference to the accompanying drawings in which
a/b: shows fluctuating C/I values of a highly and a slightly fluctuating channel;
The speech and channel codecs in presently developed telecommunication systems can have multiple modes. For example, the Adaptive Multi-Rate narrow band (AMR-NB) Codec standardized by 3GPP has 8 modes (4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 and 12.2 kbit/s) of source bit rate. All bit rates can be used e.g. in the GSM full-rate channel with 22.8 kbit/s capacity. The difference between source and channel bit rate is filled up with bits used for channel error protection.
That means that for lower rate modes more channel error protection is used which makes the transmission more robust in bad channel conditions. Using a lower source rate mode on the other hand reduces the speech quality right from the beginning due to the higher compression factor of the voice compression. This is especially adverse for speech signals with much background noise.
The reason for adopting such a variable rate scheme is thus to adapt the necessary compression (or source rate mode) to the prevailing channel condition in a way that error free channel decoding is already possible but the compression is not to strong to lose achievable speech quality.
In order to measure reception quality and to give the sending station the information about the optimum codec mode, a two-way signaling scheme is used where the backward channel is used to carry the information about the desired codec mode for the sending direction. This information is denoted for the direction from a Base Station to a mobile station as codec mode command (CMC) and for the direction from mobile station to Base Station as codec mode request (CMR). For the uplink direction this algorithm is located in the Base Station whereas for the downlink direction the respective algorithm is located in the mobile station. Drawing a signal block diagram for the system will give a picture as shown in
Signaling of the mode coming out from this decision process is done in the backward channel.
The output of this smoothing filter located in the decision element is the estimated C/I-value and taken as C/I-norm(n) which is used as quality indicator. With this C/I-norm value a decision on the mode is taken every 20 ms frame in the decision element. The decision is based on a simple threshold comparison with hysteresis as shown in
The method according 3GPP has the drawback, that the averaging over C/I values results in a slow (0.5 s) reaction of the decision elements. So if the channel falls in a bad channel condition this mechanism can be too slow for reaction and due to the “misselected” mode a long row of speech frames are muted degrading the speech quality. A further drawback is the concentrating of this method on mean C/I values.
In other words: the known decision algorithms result in the same codec mode for both. Now for the first channel this could mean that the bursts with low C/I values result in muted speech frames while this does not occur for the second.
Following, embodiments of the invention will be described with reference to FIGS. 6 to 8.
According to the invention, an adaptation algorithm is provided, based on the measured channel quality, which overcomes above mentioned drawbacks and is oriented to improve the speech quality.
The invention realised that high C/I values in general will compensate low C/I values in a linear smoothing which does not translate to the speech quality.
The invention comprises an algorithm/method that is oriented towards the local minimum values of channel quality or local maximum values of channel bit error rate (BER) and uses these values as input for decisions.
Speech is e.g. digital transmitted in speech frames via transmission bursts carrying the transmitted bits of a speech frame. This is realized for example for a GSM FR channel such, that one speech frame with 456 bits is distributed by a interleaving function over 8 adjacent data bursts. The bit error rate in these data bursts will determine the channel bit errors for the transmitted 456 channel bits. The bit error rate per burst can be estimated from the C/I per burst. By averaging over the relevant data bursts belonging to a frame the bit error rate for each speech frame can be obtained and also depicted over the time axis.
For each codec mode there is a certain bit error rate following called BER_oper, above which channel error correction is not possible anymore and due to bit errors, in particular residual bit errors in classla bits, the frame has to be marked bad so that a bad frame indication (BFI) results for this frame. The BER_oper is higher the lower the codec mode and the more powerful the channel codec is. So looking at peaks in negative directions of the C/I-curve hints where the problems for speech quality are. Expressing it reverse: after obtaining the relevant bit error rate (BER) per speech frame the positive peaks or high BER values create the problems. Depending on the error concealment abilities of the speech codec a certain number of such peaks can be compensated by the error concealment.
Usually one peak of a BER value above BER_oper is not sufficient to create a problem because if it results in a bad frame indication (BFI) error concealment in the speech decoder takes place which repeats the previous set of speech parameters. If the next frame after that is also a BFI frame then, depending on the error concealment abilities, already muting of the speech output signal could occur and a user starts to listen effects.
Turning back to the BER curve of
The determined level value BER_crit is the value to be used for mode switching to select the highest mode possible still with BER_oper>BER_crit.
Thus, a mode not too high to create bad speech quality due to too much bad frames, but high enough to not unnecessarily reduce the speech quality a priori by high compression.
Following with reference to
The BER values or the time line denoted by index n, respectively, are first partitioned in intervals or windows W of Length L.
Inside the window W the relative time index is then denoted m. Thus the i-th row of BER values is then denoted BERW(i, m) with
The values in the window i before and including the index m are sorted preferable in descending order, which gives for each time index m
with 1BERW(i,m) denoting the so far maximum value, 2BERW(i,m)the second highest value and C+1BERW(i,m) the C+1 highest value in general. The index m must be big enough to provide all values i.e. C≦m<L.
The critical level to allow C concealed frames is BERW_crit(i,m)=C+1BERW(i,m) since only C frames have higher BER-values as this.
If one allows only C=1 frame to be concealed the critical level is given by
BERW—crit(i,m)=2BERW(i,m) if m≧1. (Eq. 1.)
So the second highest value out of the window gives the critical level.
It goes without saying, that a similar method is applicable, it the values are sorted in an ascending order.
Now for C=1 far initialization at m=O the critical level is set to the last critical value from the preceding window.
BERW—crit(i,0)=BERW—crit(i−1, L−1).
So for each m∈0,1, . . . , L−1 the level BERW_crit(i,m) is defined and is computed. It is the current value that is computed with each new speech frame or time instance n. The value for the whole window i is then denoted just BERW_crit(i) and given by
BERW—crit(i)=BERW—crit(i, L−1) for each i∈Z. (Eq. 2.)
To incorporate also prediction from past frames i.e. windows, finally the total level is defined by a maximum operation as:
The forgetting factor α is chosen as α<1. This gives the desired total critical level for each time index n.
Using this simple solution it may seldom happen that the maximum of one window lies at the end and the maximum of the next window at its beginning. These frames would be bad frame signaled so that more than one frame in a row is error concealed. In order to prevent this possibility a more sophisticated solution with two time shifted sets of windows is later described as a second embodiment of the invention. There it can be shown that error concealed frames are guaranteed to lie at least L/2+1 frames apart.
Since these effects are seldom the simple solution from above is mostly considered sufficient and will give the right critical level for decision.
A generalization of the solution to C≠1 but C=2 or 3 concealed frames allowed in a row is easily done by setting C=2 or 3 in the formulas above.
In order to implement this method according to the invention only approx. 5 to 6 additional permanent storage locations in memory are needed per channel. The method provide a BER level which is the right to be used for mode decisions. After possibly adding a security distance (or using a security factor) the BER level can be converted back into a C/I ratio or directly used for a threshold decision similar to the smoothed C/I values of the 3 GPP recommendation. So still the old threshold decision mechanism with hysteresis could be used further.
Following, with reference to
In order to simplify the presentation of the principle of the two sequences of windows, a prediction from the past frames, i.e. windows, as described for the first embodiment of the invention is omitted in the following description for the second embodiment.
A finite time signal BER(n) is assumed known for all points for which the critical level BER_crit shall be determined. For this, two sequences of windows W1(i) and W2(i) of length L are used. L is assumed even and the sequences are time shifted by L/2 as depicted in
The method for determining the critical level BERW1_crit(i) or BERW2_crit(i) for each window W1(i) or W2(i) is described in equation Eq.1 and equation Eq.2 of the first embodiment of the invention. So the total level for the first sequence of windows W1(i) shall be defined by
and for the second sequence by
For determining the level value BER_crit for the mode switching decision, i.e. the total level regarding the two windows, a maximum operation out of the two sequences as given by
BER—crit_total=Max {BERW1—crit, BERW2—crit}.
is derived.
It should be noted, that this method could furthermore comprises a prediction as shown and described in connection with the first embodiment.
Following it is shown, that error concealed frames are guaranteed to lie at least L/2+1 frames apart, if a method according to the second embodiment of the invention, two time shifted sets of windows, is used.
With reference to
Thus BER(n1)>BER_crit_total≧BERW1_crit(i)>BERW1(i,n≠n1) and BER(n1)>BER_crit_total≧BERW2_crit(j)≧BERW2(j,n≠n1).
So the critical level is that high that inside window W1(i) and inside window W2(j) there can not be another concealed frame. Only outside there can be the next concealed frame.
So n1 lies in the overlap region of W1 and W2 of a first and a second window of size L as depicted in
Thus one can easily see that the minimum distance to values outside W1∪W2 is when n1 is either at the edge of W1 or at the edge of W2 for which the distance is then roughly a half window length or exactly
This is the minimum distance to the next concealed frame.
As one can see now a generalization to three or four sequences of third or fourth window length time shifted windows is easily possible. This would achieve that the minimum distance in bad cases is even increased to almost L.
Following, advantages and further developments of the method according to the invention are summarized.
It is achieved now that independent of the mean C/I or the fluctuation of the channel the second highest BER values or second lowest channel quality value determine the mode decision. These values describing a more relevant measure to get optimum speech quality.
Furthermore, the parameters of window length L, possibly comprising a security distances to compensate for C/I→BER inaccuracy, could be adapted, for example based on characteristic channel measurements.
During Discontinuous Transmission (DTX) there is no speech transmission but only SID-update frames every 8th frame. Such an SID-frame can still be used to calculate an BER-value for this frame. So for C=1 this value will be repeated for one time in the affected window. Then the method can go on as described before.
The improvement potential by the method according to the invention is especially high for dynamic channels with quickly changing C/I-levels. Since due to the maximum operation in getting the final value per frame it adjusts immediately to two bad channel BER values that come in. Thus it provides optimum speech quality for any channel and any state. Furthermore, there is no problem with different variances of channel quality (C/I-fluctuation) of different channels e.g. velocity 3 km/h or 50 km/h. These do not need different adjustments or thresholds but the same threshold based on speech quality apply for any channel. Thus, there is no tuning by the network provider to cell or channel situation necessary any more. The method provides on the whole a maintenance free or low maintenance solution.
Number | Date | Country | Kind |
---|---|---|---|
04291360.8 | May 2004 | EP | regional |