The present invention relates to a technical field of speech coding/decoding, and more particularly to a device and a method for frame lost concealment.
Voice over IP (VoIP) achieves speech communication through switching processing such as speech compressed encoding, packaging and packeting, routing distribution, storage and switching, and depackaging and decompression over the IP network or Internet. The coding technology is a key to VoIP, and can be classified into waveform coding, parametric coding, and hybrid coding. The waveform coding occupies a large bandwidth and is inapplicable to circumstances with insufficient bandwidths.
In order to enhance the transmission efficiency of VoIP in the case of limited bandwidths, a low bit rate coding/decoding method is proposed in the industry. International Telecommunication Union-Telecommunication Standardization Sector (ITU_T) publicized Telephone Bandwidth Speech Coding Standard G.729 in March of 1996, in which a conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP) speech coding/decoding scheme is employed for speech signals with a code rate of 8 kb/s. Later on, ITU_T successively publicized G.729 Annex A and Annex B in November, 1996 to further optimize the G.729.
CS-ACELP is a coding mode on the basis of code-excited linear-prediction (CELP). Every 80 sampling points constitutes one speech frame. A speech signal is analyzed and then various parameters are extracted, such as linear-prediction filter coefficient, codebook sequence numbers in adaptive and fixed codebooks, adaptive code vector gain, and fixed code vector gain. These parameter codes are then sent to a decoding end. At the decoding end, as shown in
However, when transmitted in a network, it is inevitable that an IP packet may be damaged during the transmission, discarded due to the network congestion, lost due to network failures, or even discarded just because it arrives at a receiving end too late and cannot be included in the replayed speech. Frame loss is the main reason for degradation in speech quality during the network transmission. Lost IP frames will not recur at the decoding end. When one codebook or several adjacent continuous codebooks are lost, the CS-ACELP decoder is confronted with two problems. One is the loss of all code elements contained in a group of sequentially arranged excitation signals. At this point, alternative excitation signals capable of generating the smallest speech quality distortion and transiting smoothly need to be obtained by calculation. When a frame loss occurs, all original adaptive codebook parameters, short-term linear-prediction filter coefficients, and gains are lost. Since the G.729 adopts a backward-adaptive coding mode, speech signals can be converged only after a certain period of time when a next good frame is received. Therefore, in the case of frame loss, the quality of speech of the G.729 decoder degrades rapidly.
Aiming at the frame loss phenomenon of the G.729, the G.729 Standard adopts a frame lost concealment technology of high-performance and low-complexity. Referring to
In Step 201, a current lost frame is detected, and a long-term prediction gain of the last 5 ms good sub-frame before the lost frame is obtained from a long-term post-filter.
In practice, good frames such as speech frames or mute frames are forwarded to a frame lost concealment processing device by an upper-layer protocol layer such as a real-time transfer protocol (RTP) layer. A lost frame detection is also completed by the upper-layer protocol layer. On receiving a good frame, the upper-layer protocol layer directly forwards the good frame to the frame lost concealment processing device. When detecting a lost frame, the upper-layer protocol layer sends a frame loss indication to the frame lost concealment processing device; the frame lost concealment processing device receives the frame loss indication and determines that a frame loss occurs currently.
In Step 202, it is determined whether the long-term prediction gain of the last 5 ms good sub-frame before the lost frame is larger than 3 dB. If yes, the current lost frame is considered as a periodic frame, i.e., speech, and Step 203 is performed; otherwise, the current lost frame is considered as a non-periodic frame, i.e., non-speech, and Step 205 is performed.
In Step 203, a fundamental-tone delay of the current lost frame is calculated on the basis of a fundamental-tone delay of the last good frame before the lost frame. An adaptive codebook gain of the current lost frame is obtained by attenuating the energy of an adaptive codebook gain of the last good frame before the lost frame. Further, an adaptive codebook of the last good frame before the lost frame is taken as an adaptive codebook of the current lost frame.
In particular, the process of calculating the fundamental-tone delay of the current lost frame includes the following steps. First, an integer part T of the fundamental-tone delay of the last good frame before the lost frame is taken. If the current lost frame is an nth frame in continual lost frames, the fundamental-tone delay of the current lost frame equals T plus (n−1) sampling point durations. In order to avoid an excessive periodicity of the frame loss, the fundamental-tone delay of the lost frame is limited to a value no greater than that obtained by adding T to 143 sampling point durations.
In the G.729, a frame is 10 ms long and contains 80 sampling points. Thus, one sampling point lasts for 0.125 ms.
An adaptive codebook gain of the first lost frame in the continual lost frames is set to be identical with the adaptive codebook gain of the last good frame before the lost frame. Adaptive codebook gains of the second lost frame and lost frames after the second one in the continual lost frames are attenuated with an attenuation coefficient of 0.9 on the basis of the adaptive codebook gain of a former lost frame. That is, the adaptive codebook gain of the current lost frame is gpn=0.9gpn−1.
n represents a frame number of the current lost frame in the continual lost frames, gPn is the adaptive codebook gain of the current lost frame, n−1 represents a frame number of a former lost frame of the current lost frame in the continual lost frames, gPn−1 is an adaptive codebook gain of the former lost frame of the current lost frame, and n>1.
In Step 204, an excitation signal of the current lost frame is calculated on the basis of the fundamental-tone delay, the adaptive codebook gain, and the adaptive codebook. Thus, the flow is ended.
In Step 205, the fundamental-tone delay of the current lost frame is calculated on the basis of the fundamental-tone delay of the last good frame before the lost frame. A fixed codebook gain of the current lost frame is obtained by attenuating the energy of a fixed codebook gain of the last good frame before the lost frame. Further, a sequence number and a symbol of a fixed codebook of the current lost frame are obtained on the basis of a currently generated random number.
In particular, a fixed codebook gain of the first lost frame in the continual lost frames is set to be identical with the fixed codebook gain of the last good frame before the lost frame. Fixed codebook gains of the second lost frame and lost frames after the second lost frame in the continual lost frames are attenuated with an attenuation coefficient of 0.98 on the basis of the fixed codebook gain of a former lost frame. That is, the fixed codebook gain of the current lost frame is gcn=0.98*gcn−1.
n represents the frame number of the current lost frame in the continual lost frames, gcn is the fixed codebook gain of the current lost frame, n−1 represents the frame number of the former lost frame of the current lost frame in the continual lost frames, gcn−1 is a fixed codebook gain of the former lost frame of the current lost frame, and n>1.
The process of calculating the sequence number and the symbol of the fixed codebook specifically includes the following steps: first obtaining seed(n) on the basis of seed(n)=seed(n−1)×31821+13849, then adopting 0 to 12th least significant bits of seed(n) as the sequence number of the fixed codebook, and adopting 0 to 3rd least significant bits as the symbol of the fixed codebook, where seed(0)=21845.
In Step 206, the excitation signal of the current lost frame is calculated on the basis of the fundamental-tone delay, the fixed codebook gain, and the sequence number and symbol of the fixed codebook.
The method shown in
The present invention provides a device and a method for frame lost concealment, so as to improve the quality of speech of recovered frames when a frame loss on speech occurs.
The technical solutions of the present invention are implemented as follows.
A device for frame lost concealment including a lost frame detection module, a lost frame pitch period determination module, and a lost frame excitation signal determination module is provided.
The lost frame detection module forwards a frame loss indication signal sent from an upper-layer protocol layer.
The lost frame pitch period determination module receives the frame loss indication signal sent from the lost frame detection module, then determines a pitch period of a current lost frame on the basis of a pitch period of the last good frame before the lost frame stored therein, and sends the pitch period of the current lost frame.
The lost frame excitation signal determination module receives and stores an excitation signal of the good frame from the upper-layer protocol layer, and then obtains an excitation signal of the current lost frame on the basis of the pitch period of the current lost frame sent from the lost frame pitch period determination module and the good frame excitation signal stored therein.
A method for frame lost concealment is provided for storing a received good frame excitation signal. The method includes the following steps.
First, a current lost frame is detected, and a pitch period of the current lost frame is obtained on the basis of a pitch period of the last good frame before the lost frame.
Next, an excitation signal of the current lost frame is recovered on the basis of the pitch period of the current lost frame and an excitation signal of the last good frame stored.
In the above device and method, a pitch period of a current lost frame is determined on the basis of a pitch period of the last good frame before the lost frame. An excitation signal of the current lost frame is recovered on the basis of the pitch period of the current lost frame and an excitation signal of the last good frame before the lost frame. Thereby, the hearing contrast of a receiver is reduced, and the quality of speech is improved. Further, in the present invention, a pitch period of continual lost frames is adjusted on the basis of the change trend of the pitch period of the last good frame before the lost frame. Therefore, a buzz effect produced by the continual lost frames is avoided, and the quality of speech is further improved. In addition, by attenuating the energy of the excitation signal obtained from the continual lost frames, the device and method accord with the hearing physiological characteristics of human and reduce the hearing contrast of the receiver.
The present invention is described in detail below by embodiments with reference to the accompanying drawings.
When a frame loss occurs, with the rising of the frame loss rate, large deviations in effective information and energy level of the whole speech segment during the frame loss may occur. After a linear prediction (LPC) is performed on a segment of continuous speech signals, it is found that frequency spectra of residual signals obtained after the LPC are far from the white noises. It is apparent that distinct sharp pulses exist between the continuous voiced sound areas, so that long-term correlations exist between the excitation signals. Meanwhile, it can be seen clearly that, the correlations of the excitation signals are spaced from each other by an interval of one pitch period or an integral multiple of the pitch period. Since the unvoiced sounds or noises do not have periodic excitation signals, properties such as energy levels of excitation signals of two adjacent unvoiced sounds or noises can be set identical. Therefore, the fundamental-tone delay of the last good frame before the lost frame may be taken as the pitch period of the good frame, and a pitch period of the lost frame is obtained on the basis of the good frame pitch period. After that, an excitation signal of the lost frame is recovered on the basis of the pitch period of the lost frame and an excitation signal of the last good frame before the lost frame.
The lost frame detection module 31 is adapted to forward a frame loss indication signal sent from an upper-layer protocol layer to the lost frame pitch period determination module 32.
The lost frame pitch period determination module 32 is adapted to receive the frame loss indication signal sent from the lost frame detection module 31, then determine a pitch period of a current lost frame on the basis of a pitch period of the last good frame before the lost frame stored therein, and send the pitch period of the current lost frame to the lost frame excitation signal determination module 33.
The lost frame excitation signal determination module 33 is adapted to receive an excitation signal of the good frame coming from the upper-layer protocol layer, store the excitation signal of the good frame in a buffer thereof, receive the pitch period of the current lost frame sent from the lost frame pitch period determination module 32, and then obtain an excitation signal of the current lost frame on the basis of the pitch period and the excitation signal of the good frame stored therein.
Further, referring to
The good frame pitch period output module 321 is adapted to store pitch periods of sub-frames of each good frame, then receive a trigger signal sent from the lost frame detection module 31, and output the stored pitch periods of the sub-frames of the last good frame to the pitch period change trend determination module 322 and the lost frame pitch period output module 323.
The pitch period change trend determination module 322 is adapted to receive the pitch periods of the sub-frames of the last good frame sent from the good frame pitch period output module 321, and determine whether the pitch period of the good frame is in a decreasing trend. If yes, a trigger signal 1 is sent to the lost frame pitch period output module 323; otherwise, a trigger signal 0 is sent to the lost frame pitch period output module 323.
The lost frame pitch period output module 323 is adapted to receive a frame number of the current lost frame in continual lost frames sent from the lost frame detection module 31. If the trigger signal 1 from the pitch period change trend determination module 322 is received, a value obtained by subtracting the sampling point durations (the number of the sampling point durations is the same as the frame number of the current frame in the continual lost frames) from the pitch period of the last good sub-frame in the last good frame sent from the good frame pitch period output module 321 and then adding one sampling point duration serves as the pitch period of the current lost frame. On the contrary, if the trigger signal 0 from the pitch period change trend determination module 322 is received, a value obtained by adding the sampling point durations (the number of the sampling point is the same as the frame number of the current frame in the continual lost frames) to the pitch period of the last good sub-frame sent from the good frame pitch period output module 321 and then subtracting one sampling point duration serves as the pitch period of the current lost frame. Afterward, the lost frame pitch period output module 323 outputs the pitch period of the current frame to the lost frame excitation signal determination module 33.
Further, referring to
The good frame excitation signal output module 331 is adapted to receive and store the excitation signal of the good frame coming from the upper-layer protocol layer, receive the pitch period of the current lost frame output by the lost frame pitch period determination module 32, overlap and add an excitation signal of the last
pitch periods of the current lost frame, i.e., having a length of
stored therein with an excitation signal of the last 1 to
pitch periods of the current lost frame, and adopt the obtained excitation signal as the excitation signal of the last
pitch periods of the current lost frame. After that, the good frame excitation signal output module 331 adopts the excitation signal of the last
to 1 pitch periods of the current lost frame stored therein as the excitation signal of 0 to
pitch periods of the current lost frame, and outputs the obtained excitation signal of one pitch period of the current lost frame to the lost frame excitation signal output module 332.
The lost frame excitation signal output module 332 is adapted to sequentially and repeatedly write the excitation signal of one pitch period sent from the good frame excitation signal output module 331 into a buffer thereof for the excitation signal of the current lost frame.
Further, referring to
In Step 501, whenever a good frame is received, an excitation signal of the good frame is stored in a good frame excitation signal buffer.
The length of the buffer may be set by experience.
In Step 502, a current lost frame is detected, and a pitch period of the current lost frame is determined on the basis of a pitch period of the last good frame before the lost frame.
In Step 503, an excitation signal of the current lost frame is determined on the basis of the pitch period of the current lost frame and an excitation signal of the good frame before the lost frame.
In Step 601, whenever a good frame is received, an excitation signal of the good frame is stored in a good frame excitation signal buffer.
The length of the buffer may be set by experience.
In Step 602, a current lost frame is detected, and pitch periods of sub-frames contained in the last good frame before the lost frame are obtained from an adaptive codebook of the last good frame before the lost frame.
In Step 603, it is determined whether the pitch period of the last good frame before the lost frame is in a decreasing trend. If yes, Step 604 is performed; otherwise, Step 605 is performed.
In the G.729, each frame is 10 ms long, and can be divided into two 5 ms long sub-frames. It can be known whether the pitch period of the last good frame before the lost frame is in a decreasing trend by comparing lengths of pitch periods of two sub-frames of the last good frame before the lost frame. If the pitch periods of the two sub-frames of the last good frame before the lost frame are identical, the pitch period of the last good frame before the lost frame is considered in a decreasing trend.
In Step 604, a value obtained by subtracting n−1 sampling point durations from the pitch period T0 of the last good sub-frame before the lost frame serves as a pitch period Tn of the current lost frame, and then Step 606 is performed. In this step, n is a frame number of the current lost frame in continual lost frames.
Further, an integer Td (20≦Td≦143) is preset, and it is determined whether n>Td. If yes, the pitch period Tn of the current lost frame equals the pitch period T0 of the last good frame minus Td sampling point durations; otherwise, Tn equals the pitch period T0 of the last good sub-frame before the lost frame minus n−1 sampling point durations.
In Step 605, a value obtained by adding the pitch period T0 of the last good sub-frame before the lost frame to n−1 sampling point durations serves as the pitch period Tn of the current lost frame, and then Step 606 is performed. In this step, n is the frame number of the current lost frame in the continual lost frames.
Further, an integer Td (20≦Td≦143) is preset, and it is determined whether n>Td. If yes, the pitch period Tn of the current lost frame equals the pitch period T0 of the last good frame plus Td sampling point durations; otherwise, Tn equals the pitch period T0 of the last good sub-frame before the lost frame plus n−1 sampling point durations.
Since the pitch period changes gently during the stable voiced sound period, the pitch period of the first lost frame may be considered identical with that of the last good sub-frame before the lost frame when n=1.
In Step 606, an excitation signal of the last
pitch periods of the current lost frame, i.e., having a length of
stored in the good frame excitation signal buffer, is overlapped and added with an excitation signal of the last 1 to
pitch periods of the current lost frame, and the obtained excitation signal serves as the excitation signal of the last
pitch periods of the current lost frame. Further, the excitation signal of the last
to 1 pitch periods of the current lost frame stored in the good frame excitation signal buffer serves as the excitation signal of 0 to
pitch periods of the current lost frame.
An overlap-add window may be a triangular window or a Hanning window. In the case of the triangular window, the process of overlapping and adding includes the following steps. The excitation signal of the last
pitch periods of the current lost frame stored in the good frame excitation signal buffer is multiplied by a descending slope of the window function. Then, the excitation signal of the last 1 to
pitch periods of the current lost frame stored in the good frame excitation signal buffer is multiplied by an ascending slope of the window function. Finally, the above two products are added.
Further, in order to avoid buzzing, the energy of the excitation signal of the current lost frame may be attenuated, and an energy attenuation formula is given below:
gn=(a)n−1g0
n is a frame number of the current lost frame in continual lost frames, gn is the energy of the current lost frame, g0 is the energy of the last good frame before the lost frame, a is the energy attenuation coefficient, and usually a=0.9.
In Step 607, the excitation signal of one pitch period of the current lost frame obtained is sequentially and repeatedly written into an excitation signal buffer of the current lost frame.
Specifically, the data pointer of the excitation signal of the current lost frame is pointed at a start position of the excitation signal of one pitch period of the current lost frame obtained above, and the excitation signal of one pitch period obtained above is then sequentially replicated to the excitation signal buffer of the current lost frame. If the pitch period of the current lost frame obtained in Step 604 or 605 is shorter than the length of the current lost frame, 10 ms, the data pointer returns to the start position of the excitation signal of one pitch period obtained above after moving to an end position of the excitation signal of one pitch period obtained above.
The above descriptions are merely about the embodiments of the process and method of the present invention, and may not limit the scope of the invention. Any modifications, equivalent substitutions, and variations made within the spirit and principle of the present invention fall within the scope of the same.
Number | Date | Country | Kind |
---|---|---|---|
2006 1 0087475 | Jun 2006 | CN | national |
This application is a continuation of International Application No. PCT/CN2007/070092, filed on Jun. 7, 2007, which claims priority to Chinese Patent Application No. 200610087475.4, filed on Jun. 8, 2006, entitled “DEVICE AND METHOD FOR LOST FRAME CONCEALMENT”, both of which are incorporated herein by reference in their entireties.
Number | Name | Date | Kind |
---|---|---|---|
5960386 | Janiszewski et al. | Sep 1999 | A |
7587315 | Unno | Sep 2009 | B2 |
Number | Date | Country |
---|---|---|
WO-0063885 | Oct 2000 | WO |
WO-2005086138 | Sep 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20090089050 A1 | Apr 2009 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2007/070092 | Jun 2007 | US |
Child | 12330265 | US |