The present invention relates to the field of signal processing, and particularly, to a method and an apparatus for obtaining an attenuation factor.
A transmission of voice data is required to be real-time and reliable in a real time voice communication system, for example, a VoIP (Voice over IP) system. Because of unreliable characteristics of a network system, data packets may be lost or not reach the destination in time in a transmission procedure from a sending end to a receiving end. These two kinds of situations are both considered as network packet loss by the receiving end. It is unavoidable for the network packet loss to happen. Meanwhile, the network packet loss is one of the most important factors influencing the talk quality of the voice. Therefore, a robust packet loss concealment method is needed to recover the lost data packet in the real time communication system so that a good talk quality is still obtained under the situation of the network packet loss.
In the existing real-time voice communication technology, in the sending end, an encoder divides a broad band voice into a high sub band and a low sub band, and uses ADPCM (Adaptive Differential Pulse Code Modulation) to encode the two sub bands, respectively, and sends them together to the receiving end via the network. In the receiving end, the two sub bands are decoded, respectively, by the ADPCM decoder, and then the final signal is synthesized by using a QMF (Quadrature Mirror Filter) synthesis filter.
Different Packet Loss Concealment (PLC) methods are adopted for two different sub bands. For a low band signal, under the situation with no packet loss, a reconstruction signal is not changed during CROSS-FADING. Under the situation with packet loss, for the first lost frame, the history signal (the history signal is a voice signal before the lost frame in the present application document) is analyzed by using a short term predictor and a long term predictor, and voice classification information is extracted. The lost frame signal is reconstructed by using a LPC (linear predictive coding) based on pitch repetition method, the predictor and the classification information. The status of ADPCM will be also updated synchronously until a good frame is found. In addition, not only the signal corresponding to the lost frame needs to be generated, but also a section of signal adapting for CROSS-FADING needs to be generated. In that way, once a good frame is received, the CROSS-FADING is executed to process the good frame signal and the section of signal. It is noticed that this kind of CROSS-FADING only happens after the receiving end loses a frame and receives the first good frame.
During the process of realizing the present invention, the inventor finds out at least following problems in the prior art: the energy of the synthesized signal is controlled by using a static self-adaptive attenuation factor in the prior art. Although the attenuation factor defined changes gradually, its attenuation speed, i.e. the value of the attenuation factor, is the same regarding the same classification of voice. However, human voices are various. If the attenuation factor does not match the characteristic of human voices, there will be uncomfortable noise in the reconstruction signal, particularly at the end of the steady vowels. The static self-adaptive attenuation factor cannot be adapted for the characteristic of various human voices.
The situation shown in
An embodiment of the present invention provides a method and an apparatus for obtaining an attenuation factor adapted to realize the smooth transition from the history data to the latest received data.
In order to realize the above object, an embodiment of the present invention provides a method for signal processing, adapted to process a synthesized signal in packet loss concealment, including:
obtaining a change trend of a pitch of a signal;
obtaining an attenuation factor according to the change trend of the pitch of the signal; and
obtaining a lost frame reconstructed after attenuating, according to the attenuation factor.
An embodiment of the present invention also provides an apparatus for signal processing, adapted to process a synthesized signal in packet loss concealment, including the following units:
a change trend obtaining unit adapted to obtain a change trend of a pitch of a signal;
an attenuation factor obtaining unit adapted to obtain an attenuation factor, according to the change trend obtained by the change trend obtaining unit; and
a lost frame reconstructing unit adapted to obtain a lost frame reconstructed after attenuating according to the attenuation factor.
An embodiment of the present invention also provides a voice decoder adapted to decode the voice signal, including a low band decoding unit, a high band decoding unit, and a quadrature mirror filtering unit.
The low band decoding unit is adapted to decode a received low band decoding signal and compensate a lost low band signal.
The high band decoding unit is adapted to decode a received high band decoding signal, and compensate a lost high band signal.
The quadrature mirror filtering unit is adapted to obtain a final output signal by synthesizing the low band decoding signal and the high band decoding signal.
The low band decoding unit includes a low band decoding subunit, a LPC based on pitch repetition subunit, and a cross-fading subunit.
The low band decoding subunit is adapted to decode a received low band stream signal.
The LPC based on pitch repetition subunit is adapted to generate a synthesized signal corresponding to the lost frame.
The cross-fading subunit is adapted to cross fade the signal processed by the low band decoding subunit and synthesized signal corresponding to the lost frame generated by the LPC based on pitch repetition subunit.
The LPC based on pitch repetition subunit includes an analyzing module and a synthesizing module, wherein the analyzing module is adapted to analyze a history signal, the synthesizing module is adapted to obtain a synthesized signal according to the analysis result of the analyzing module;
the synthesizing module comprises a first module, the apparatus for signal processing, and a second module;
wherein the first module is adapted to obtain a reconstructed lost frame signal;
the second module is adapted to control energy of the reconstructed lost frame signal by the apparatus for signal processing.
The apparatus for signal processing is adapted to obtain a change trend of a signal, obtain an attenuation factor according to the change trend, and obtain a lost frame reconstructed after attenuating according to the attenuation factor.
An embodiment of the present invention further provides a computer-accessible storage medium. The computer-accessible storage medium stores computer program codes, which enable a computer to execute the steps in any one of the method for signal processing in packet loss concealment when the computer program codes are executed by the computer.
Compared with the prior art, embodiments of the present invention have the following advantages:
A self-adaptive attenuation factor is adjusted dynamically by using the change trend of a history signal. The smooth transition from the history data to the latest received data is realized so that the attenuation speed between the compensated signal and the original signal is kept consistent as much as possible for adapting the characteristic of various human voices.
The present invention will be described in more detail with reference to the drawings and embodiments.
A method for obtaining an attenuation factor is provided in Embodiment 1 of the present invention, adapted to process the synthesized signal in packet loss concealment, as shown in the
Step s101, a change trend of a signal, is obtained.
Specifically, the change trend may be expressed in the following parameters: (1) a ratio of the energy of the last pitch periodic signal to the energy of the previous pitch periodic signal in the signal; and (2) a ratio of the difference between the maximum amplitude value and the minimum amplitude value of the last pitch periodic signal to the difference between the maximum amplitude value and the minimum amplitude value of the previous pitch periodic signal in the signal.
Step s102, an attenuation factor, is obtained according to the change trend.
The specific processing method of Embodiment 1 of the present invention will be described together with specific application scene.
A method for obtaining an attenuation factor which is adapted to process the synthesized signal in packet loss concealment is provided in Embodiment 1 of the present invention.
As shown in the
Only the low band signal is described in detail as follows.
Under the situation with no frame loss, the signal xl(n), n=0, . . . , L−1 obtained after decoding the current frame received by the low band ADPCM decoder, and the output is zl(n), n=0, . . . , L−1, corresponding to the current frame. In this situation, the reconstruction signal is not changed during CROSS-FADING, that is zl[n]=, n=0, . . . , L−1, wherein L is the length of the frame.
Under the situation with loss of frames, regarding the first lost frame, the history signal zl(n), n<0 is analyzed by using a short term predictor and a long term predictor, and voice classification information is extracted. By adopting the above predictors and the classification information, the signal yl(n) is generated by using a method of LPC, based on pitch repetition. And the lost frame signal zl(n) is reconstructed as zl(n)=yl(n), n=0, . . . , L−1. In addition, the status of ADPCM will also be updated synchronously until a good frame is found. It is noticed that not only the signal corresponding to the lost frame needs to be generated, but also a 10 ms signal yl(n), n=L, . . . , L+M−1 adapting for CROSS-FADING needs to be generated, the M is the number of signal sampling points which are included in the process when calculating the energy. In that way, once a good frame is received, the CROSS-FADING is executed for the xl(n), n=L, . . . , L+M−1, and the yl(n), n=L, . . . , L+M−1. It is noticed that this kind of CROSS-FADING only happens after a frame loss and when the receiving end receives the first good frame data.
A LPC based on pitch repetition method in the
When the data frame is a good frame, the zl(n) is stored into a buffer for use in future.
When the first lost frame is found, the final signal yl(n) needs to be synthesized in two steps. At first, the history signal zl(n), n=−297, . . . , −1 is analyzed. Then the signal yl(n), n=0, . . . , L−1 is synthesized according to the result of the analysis, wherein L is the frame length of the data frame, i.e. the number of sampling points corresponding to one frame of signal, Q is the length of the signal which is needed for analyzing the history signal.
The LPC module based on the pitch repetition specifically includes following parts.
(1) A LP (Linear Prediction) Analysis
The short-term analysis filter A(z) and synthesis filter 1/A(z) are Linear Prediction (LP) filters based on P order. The LP analysis filter is defined as:
A(z)=1+a1z−1+a2z−2+ . . . +apz−P
Through the LP analysis of the history signal zl(n), n=−Q, . . . , −1 with the filter A(z), a residual signal e(n), n=−Q, . . . , −1 corresponding to the history signal zl(n), n=−Q, . . . , −1 is obtained:
(2) A History Signal Analysis
The lost signal is compensated by a pitch repetition method. Therefore, at first a pitch period T0 corresponding to the history signal zl(n), n=−Q, . . . , −1 needs to be estimated. The steps are as follows: The zl(n) is preprocessed to remove a needless low frequency ingredient in a LTP (long term prediction) analysis, and the pitch period T0 of the zl(n) may be obtained by the LTP analysis. The classification of voice is obtained though combining a signal classification module after obtaining the pitch period T0.
Voice classifications are as shown in the following Table 1:
(3) A Pitch Repetition
A pitch repetition module is adapted to estimate a LP residual signal e(n), n=0, . . . L−1 of a lost frame. Before the pitch repetition is executed, if the classification of the voice is not VOICED, the following formula is adopted to limit the amplitude of a sample:
If the classification of the voice is VOICED, the residual e(n), n=0, . . . , L−1 corresponding to the lost signal is obtained by adopting a step of repeating the residual signal corresponding to the signal of the last pitch period in the signal of a good frame newly received, that is:
e(n)=e(n−T0)
Regarding other classifications of voices, for avoiding that the periodicity of the generated signal is too intense (regarding the non-voice signal, if the periodicity is too intense, some uncomfortable noise like music noise may be heard), the residual signal e(n), n=0, . . . , L−1 corresponding to the lost signal is generated by using the following formula:
e(n)=e(n−T0+(−1)n)
Besides generating the residual signal corresponding to the lost frame, the residual signals e(n), n=L, . . . , L+N−1 of extra N samples continue to be generated so as to generate a signal adapted for CROSS-FADING, in order to ensure the smooth splicing between the lost frame and the first good frame after the lost frame.
(4) A LP Synthesis
After generating the residual signal e(n) corresponding to the lost frame and the CROSS-FADING, a reconstructed lost frame signal ylpre(n), n=0, . . . , L−1 is obtained by using the following formula:
wherein, the residual signal e(n), n=0, . . . , L−1 is the residual signal obtained from the above pitch repetition steps.
Besides, ylpre(n), n=L, . . . , L+N−1 with N samples adapted for CROSS-FADING are generated by using the above formula.
(5) An Adaptive Muting
For realizing a smooth energy transition, before executing the QMF with the high band signal, the low band signal also needs to do the CROSS-FADING, the rules are shown as the following table:
In the above table, zl(n) is a finally outputted signal corresponding to the current frame; xl(n) is the signal of the good frame corresponding to the current frame; yl(n) is a synthesized signal corresponding to the same time of the current frame, wherein L is the frame length, the N is the number of samples executing CROSS-FADING.
Aiming at different voice classifications, the energy of signal in ylpre(n) is controlled before executing CROSS-FADING according to the coefficient corresponding to every sample. The value of the coefficient changes, according to different voice classifications and the situation of packet loss.
In detail, in the case that the last two pitch periodic signal in the received history signal is the original signal as shown in
Step s201, the change trend of the signal, is obtained.
The signal change trend may be expressed by the ratio of the energy of the last pitch periodic signal to the energy of the previous pitch periodic signal in the signal, i.e. the energy E1 and E2 of the last two pitch period signal in the history signal, and the ratio of the two energies is calculated.
E1 is the energy of the last pitch period signal, E2 is the energy of the previous pitch period signal, and T0 is the pitch period corresponding to the history signal.
Optionally, the change trend of signal may be expressed by the ratio of the peak-valley differences of the last two pitch periods in the history signal.
P1=max(xl(i))−min(xl(j)) (i,j)=T0, . . . , −1
P2=max(xl(i))−min(xl(j)) (i,j)=−2T0, . . . , −(T0+1)
wherein, P1 is the difference between the maximum amplitude value and the minimum amplitude value of the last pitch periodic signal, P2 is the difference between the maximum amplitude value and the minimum amplitude value of the previous pitch periodic signal, and the ratio is calculated as:
Step s202, the synthesized signal is attenuated dynamically, according to the obtained change trend of the signal.
The calculation formula is shown as follows:
yl(n)=ylpre(n)*(1−C*(n+1)) n=0, . . . N−1
wherein, ylpre(n) is the reconstruction lost frame signal, N is the length of the synthesized signal, and C is the self-adaptive attenuation coefficient whose value is:
Under the situation of the attenuation factor 1−C*(n+1)<0, it is needed to set 1−C*(n+1)=0, so as to avoid appearing of a situation that the attenuation factor corresponding to the samples is minus.
In particular, for avoiding the situation that the amplitude value corresponding to a sample may overflow under the situation of R>1, the synthesized signal is attenuated dynamically by using the formula of the Step s202 in the present embodiment that may take only the situation of R<1 into account.
In particular, in order to avoid the situation that the attenuation speed of the signal with less energy is too fast, only under the situation that E1 exceeds a certain limitation value, the synthesized signal is attenuated dynamically by using the formula of the Step s202 in the present embodiment.
In particular, for avoiding that the attenuation speed of the synthesized signal is too fast, especially under the situation of continuous frame loss, an upper limitation value is set for the attenuation coefficient C. When C*(n+1) exceeds a limitation value, the attenuation coefficient is set as the upper limitation value.
In particular, under the situation of bad network environment and continuous frame loss, a certain condition may be set to avoid too fast attenuation speed. For example, it may be taken into account that, when the number of the lost frames exceeds an appointed number, for example two frames; or when the signal corresponding to the lost frame exceeds an appointed length, for example 20 ms; or in at least one of the above conditions of the current attenuation coefficient 1−C*(n+1) reaches an appointed threshold value, the attenuation coefficient C needs to be adjusted so as to avoid the too fast attenuation speed which may result in the situation that the output signal becomes silence voice.
For example under the situation sampling in 8 k Hz frequency and the frame length of 40 samples, the number of lost frame may be set as 4, and after the attenuation factor 1−C*(n+1) becomes less than 0.9, the attenuation coefficient C is adjusted to be a smaller value. The rule of adjusting the smaller value is as follows.
Hypothetically, it's predicted that the current attenuation coefficient is C and the value of attenuation factor is V, and the attenuation factor V may attenuate to 0 after V/C samples. While more desirable situation is that the attenuation factor V should attenuate to 0 after M(M≠V/C) samples. So the attenuation coefficient C is adjusted to:
C=V/M
As shown in
According to the method provided by the above-mentioned embodiment, the self-adaptive attenuation factor is adjusted dynamically by using the change trend of the history signal, so that the smooth transition from the history data to the latest received data may be realized. The attenuation speed is kept consistent as far as possible between the compensated signal and the original signal as much as possible for adapting the characteristic of various human voices.
An apparatus for obtaining an attenuation factor is provided in Embodiment 2 of the present invention, adapted to process the synthesized signal in packet loss concealment, including:
a change trend obtaining unit 10, adapted to obtain a change trend of a signal; and
an attenuation factor obtaining unit 20, adapted to obtain an attenuation factor, according to the change trend obtained by the change trend obtaining unit 10.
The attenuation factor obtaining unit 20 further includes: an attenuation coefficient obtaining subunit 21, adapted to generate the attenuation coefficient according to the change trend obtained by the change trend obtaining unit 10; and an attenuation factor obtaining subunit 22, adapted to obtain an attenuation factor, according to attenuation coefficient generated by the attenuation factor obtaining subunit 21. The attenuation factor obtaining unit 20 further includes: an attenuation coefficient adjusting subunit 23, adapted to adjust the value of the attenuation coefficient obtained by the attenuation coefficient obtaining subunit 21 to a given value on given conditions which include at least one of the following: whether the value of the attenuation coefficient exceeds an upper limitation value; whether there exits the situation of continuous frame loss; and whether the attenuation speed is too fast.
The method for obtaining an attenuation factor in the above embodiment is the same as the method for obtaining an attenuation factor in the embodiments of method.
In detail, the change trend obtained by the change trend obtaining unit 10 may be expressed in the following parameters: (1) a ratio of the energy of the last pitch periodic signal to the energy of the previous pitch periodic signal in the signal; and (2) a ratio of a difference between the maximum amplitude value and the minimum amplitude value of the last pitch periodic signal to a difference between the maximum amplitude value and the minimum amplitude value of the previous pitch periodic signal in the signal.
When the change trend is expressed in the energy ratio in the (1), the structure of the apparatus for obtaining an attenuation factor is as shown in
an energy obtaining subunit 11 adapted to obtain the energy of the last pitch periodic signal and the energy of the previous pitch periodic signal; and
an energy ratio obtaining subunit 12 adapted to obtain the ratio of the energy of the last pitch periodic signal to the energy of the previous pitch periodic signal obtained by the energy obtaining subunit 11 and use the ratio to show the change trend of the signal.
When the change trend is expressed in the amplitude difference ratio in the (2), the structure of the apparatus for obtaining an attenuation factor is as shown in
an amplitude difference obtaining subunit 13, adapted to obtain the difference between the maximum amplitude value and the minimum amplitude value of the last pitch periodic signal, and the difference between the maximum amplitude value and the minimum amplitude value of the previous pitch periodic signal; and
an amplitude difference ratio obtaining subunit 14, adapted to obtain the ratio of the difference between the maximum amplitude value and the minimum amplitude value of the last pitch periodic signal to the difference between the maximum amplitude value and the minimum amplitude value of the previous pitch periodic signal, and use the ratio to show the change trend of the signal.
A schematic diagram illustrating the application scene of the apparatus for obtaining an attenuation factor, according to Embodiment 2 of the present invention is as shown in
By using the apparatus provided by the above-mentioned embodiment, the self-adaptive attenuation factor is adjusted dynamically by using the change trend of the history signal so that the smooth transition from the history data to the latest received data is realized. The attenuation speed is kept consistent as far as possible between the compensated signal and the original signal as much as possible for adapting the characteristic of various human voices.
An apparatus for signal processing is provided in Embodiment 3 of the present invention, adapted to process the synthesized signal in packet loss concealment, as shown in
By using the apparatus provided by the above-mentioned embodiment, the self-adaptive attenuation factor is adjusted dynamically by using the change trend of the history signal, and a lost frame reconstructed after attenuating is obtained according to the attenuation factor, so that the smooth transition from the history data to the latest received data is realized. The attenuation speed is kept consistent as far as possible between the compensated signal and the original signal as much as possible for adapting the characteristic of various human voices.
A voice decoder is provided by Embodiment 4 of the present invention, as shown in
For the low band decoding unit 50, as shown in
The low band decoding subunit 52 decodes the received low band stream signal. The LPC based on pitch repetition subunit 51 generates the synthesized signal by executing a LPC on the lost low band signal. And finally the cross-fading subunit 53 cross fades for the signal processed by the low band decoding subunit 52 and the synthesized signal in order to get a final decoding signal after the lost frame compensation.
The LPC based on pitch repetition subunit 51, as shown in
The signal processing module 512 further includes an attenuation factor obtaining unit 5121 and a lost frame reconstructing unit 5122. The attenuation factor obtaining unit 5121 obtains a change trend of a signal, and obtains an attenuation factor, according to the change trend; the lost frame reconstructing unit 5122 attenuates the reconstructed lost frame signal according to the attenuation factor, and obtains a lost frame reconstructed after attenuating. The signal processing module 512 includes two structures, corresponding to schematic diagrams illustrating the structure of the apparatus for signal processing in
The attenuation factor obtaining unit 5121 includes two structures, corresponding to schematic diagrams illustrating the structure of the apparatus for obtaining an attenuation factor in
Through the description of the above-mentioned embodiments, those skilled in the art may understand clearly that the present invention may be realized depending on software plus necessary and general hardware platform, and certainly may also be realized by hardware. However, in most situations, the former is a preferable embodiment. Based on such understanding, the essence or the part contributing to the prior art in the technical scheme of the present invention may be embodied through the form of software product which is stored in a storage media, and the software product includes some instructions for instructing one device to execute the embodiments of the present invention.
Though illustration and description of the present disclosure have been given with reference to embodiments thereof, it should be appreciated by persons of ordinary skill in the art that various changes in forms and details can be made without deviation from the scope of this disclosure.
Number | Date | Country | Kind |
---|---|---|---|
2007 1 0169618 | Nov 2007 | CN | national |
This application is a continuation of U.S. patent application Ser. No. 12/264,593, filed Nov. 4, 2008, which claims priority to Chinese Patent Application No. 200710169618.0, filed Nov. 5, 2007, both of which are hereby incorporated by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
5787430 | Doeringer et al. | Jul 1998 | A |
5953697 | Lin et al. | Sep 1999 | A |
6011795 | Varghese et al. | Jan 2000 | A |
6665637 | Bruhn | Dec 2003 | B2 |
6785687 | Baskins et al. | Aug 2004 | B2 |
6816856 | Baskins et al. | Nov 2004 | B2 |
6987821 | Li | Jan 2006 | B1 |
7415463 | Testa | Aug 2008 | B2 |
7415472 | Testa | Aug 2008 | B2 |
7502735 | Ehara | Mar 2009 | B2 |
20030055632 | Chen | Mar 2003 | A1 |
20030074197 | Chen | Apr 2003 | A1 |
20030177011 | Yasuda et al. | Sep 2003 | A1 |
20040064308 | Deisher | Apr 2004 | A1 |
20050010401 | Sung et al. | Jan 2005 | A1 |
20050049853 | Lee et al. | Mar 2005 | A1 |
20050143985 | Sung et al. | Jun 2005 | A1 |
20050154584 | Jelinek et al. | Jul 2005 | A1 |
20050166124 | Tsuchinaga et al. | Jul 2005 | A1 |
20060026318 | Lee | Feb 2006 | A1 |
20070083362 | Moriya et al. | Apr 2007 | A1 |
20070174047 | Anderson et al. | Jul 2007 | A1 |
20080071530 | Ehara | Mar 2008 | A1 |
20080133517 | Kapoor et al. | Jun 2008 | A1 |
20080133518 | Kapoor et al. | Jun 2008 | A1 |
20080154584 | Andersen | Jun 2008 | A1 |
20080275580 | Andersen | Nov 2008 | A1 |
20090061785 | Kawashima et al. | Mar 2009 | A1 |
20090089050 | Mo et al. | Apr 2009 | A1 |
Number | Date | Country |
---|---|---|
1983909 | Jun 2007 | CN |
1989548 | Jun 2007 | CN |
1 291 851 | Mar 2003 | EP |
06-130999 | May 1994 | JP |
09101800 | Apr 1997 | JP |
2000-059231 | Feb 2000 | JP |
2001-228896 | Aug 2001 | JP |
2004-512561 | Apr 2004 | JP |
2005-094356 | Apr 2005 | JP |
10-2005-0005517 | Jan 2005 | KR |
2007-0055943 | May 2007 | KR |
10-0745683 | Aug 2007 | KR |
WO 02071389 | Sep 2002 | WO |
WO 03102921 | Dec 2003 | WO |
WO 2005066937 | Jul 2005 | WO |
WO 2006009074 | Jan 2006 | WO |
WO 2006079350 | Aug 2006 | WO |
WO 2006098274 | Sep 2006 | WO |
WO 20070174047 | Apr 2007 | WO |
WO 2007143953 | Dec 2007 | WO |
Number | Date | Country | |
---|---|---|---|
20090316598 A1 | Dec 2009 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12264593 | Nov 2008 | US |
Child | 12556048 | US |