The present invention relates to the field of audio coding, more specifically to the field of synthesizing an audio signal. Embodiments relate to speech coding, particularly to the speech coding technique called code excited linear predictive coding (CELP). Embodiments provide an approach for adaptive tilt compensation in shaping the codes of a CELP in an innovative or fixed codebook.
The CELP coding scheme is widely used in speech communications and is an efficient way of coding speech. CELP synthesizes an audio signal by conveying to a linear predictive filter (e.g., LPC synthesis filter 1/A(z)) the sum of two excitations. One excitation is coming from the decoded past, which is called the adaptive codebook, and the other contribution is coming from a fixed or innovative codebook which is populated by fixed codes. One problem with the CELP coding scheme is that at low bit-rates the innovative codebook is not populated enough for modeling efficiently the fine structure of speech so that the perceptual quality is degraded and the synthesized output signal sounds noisy.
For mitigating coding artifacts, different solutions were already proposed and are described in reference [1] and in reference [2]. In these references, the codes of the innovative codebook are adaptively and spectrally shaped by enhancing the spectral regions corresponding to the formants of the current frame of the audio signal. The formant positions and the shapes can be deduced directly from the LPC coefficients which are coefficients available at both the encoder and the decoder. The formant enhancement of the codes c(n) of the innovative codebook are done by a simple filtering operation:
c(n)*fe(n).
In this filtering process fe(n) is the impulse response of the filter having the following transfer function:
where w1 and w2 are two weighting constants emphasizing more or less the formantic structure of the transfer function Fe(z). The resulting shaped codes of the innovative codebook inherit one characteristic of the speech signal and the synthesized signal sounds less noisy.
In the CELP coding scheme it is also usual to add a spectral tilt to the codes of the innovative code book, which is done by filtering the codes from the innovative codebook as follows:
Ft(z)=1−βz−1.
The factor β is related to the voicing of the previous audio frame, and the voicing can be estimated from the energy contribution from the adaptive codebook. For example, if the previous frame is voiced, it is expected that the current frame will also be voiced and that the codes will have more energy in the low frequencies, i.e. the spectrum has a negative tilt.
According to an embodiment, an apparatus for synthesizing an audio signal may have: a processing unit configured to apply a spectral tilt to the code of a codebook used for synthesizing a current frame of the audio signal, wherein the spectral tilt is based on the spectral tilt of the current frame of the audio signal.
According to another embodiment, an audio decoder may have an inventive apparatus for synthesizing an audio signal.
Another embodiment may have an audio decoder for decoding an audio signal, wherein the audio decoder is configured to apply a spectral tilt to the code of a codebook used for synthesizing a current frame of the audio signal, wherein the spectral tilt is based on the spectral tilt of the current frame of the audio signal.
Another embodiment may have an audio encoder for encoding an audio signal, wherein the audio encoder is configured to determine from a spectral tilt of a current frame of the audio signal a spectral tilt for a code of a codebook representing a current frame of the audio signal.
According to another embodiment, a system may have: an inventive audio decoder and an inventive audio encoder.
According to another embodiment, a method for synthesizing an audio signal may have the steps of: applying a spectral tilt to the code of a codebook used for synthesizing a current frame of the audio signal, wherein the spectral tilt is determined on the basis of the spectral tilt of the current frame of the audio signal.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the inventive method when said computer program is run by a computer.
The present invention provides an apparatus for synthesizing an audio signal which comprises a processing unit configured to apply a spectral tilt to the code of codebook used for synthesizing a current frame of the audio signal, wherein the spectral tilt is based on the spectral tilt of the current frame of the audio signal.
The present invention provides a method for synthesizing an audio signal, the method comprising applying a spectral tilt to the code of a codebook used for synthesizing a current frame of the audio signal, wherein the spectral tilt is determined on the basis of the spectral tilt of the current frame of the audio signal.
The inventors of the present application found out that the synthesizing of an audio signal can be further improved both at low and higher bit-rates by exploiting the nature of the spectral tilt of the audio signal upon synthesizing the signal for improving the achievable coding gain. In accordance with embodiments, the present invention provides for a speech coding, for example using the CELP speech coding technique, which allows enhancing the coding gain of CELP, thereby enhancing the perceptual quality of the decoded or synthesized signal. The inventive approach is based on the inventors' finding that this improvement can be achieved by adapting the spectral tilt of the codes of a codebook, for example the codes of the CELP innovative codebook, as a function of the spectral tilt of the actual input signal currently processed. The inventive approach is advantageous as, in addition to the enhanced coding gain, at low bit-rates, where the innovative codebook is not populated enough for modeling efficiently the fine structure of the speech, it also allows for a further formant enhancement. At higher bit-rates, where the innovative codebook is sufficiently populated, applying the inventive approach will enhance the coding gain. More specifically, at higher bit-rates the formant enhancement may not be needed, as the innovative codebook is large enough for modeling properly the fine structure of the speech, and further enhancing the formant will make the synthesized signal sound too synthetic. However, the optimal codes are not spectrally flat and adding a spectral tilt will enhance the coding gain. In accordance with embodiments the optimal tilt to apply to the codes of the innovative codebook is estimated more accurately, more specifically it is correlated to the tilt of the current frame of the input signal.
In accordance with embodiments the spectral tilt of the current frame of the audio signal is determined on the basis of spectral envelope information for the current frame of the audio signal, wherein the spectral envelope information may be defined by LPC coefficients. This embodiment is advantageous as it allows determining the spectral tilt of the current frame on the basis of information readily available both at the encoder and the decoder, namely the LPC coefficients.
In accordance with further embodiments the spectral tilt of the current frame of the audio signal, on the basis of the LPC coefficients, may be determined on the basis of a truncated infinite impulse response of the LPC synthesis filter. In accordance with embodiments, the truncation may be determined by the size of the innovative codebook, i.e. by the number of codes in the innovative codebook. This approach is advantageous as it allows to directly relate the determination of the spectral tilt to the actual size of the innovative codebook.
In accordance with further embodiments, the infinite impulse response may be of a LPC synthesis filter having a non-weighted transfer function or a weighted transfer function. Using the non-weighted transfer function allows for a simplified determination of the spectral tilt, while using the weighted transfer function is advantageous as it allows for a spectral tilt having a slope closer to the optimal tilt.
In accordance with embodiments, the determined spectral tilt is applied to the respective code by filtering the code from the codebook based on a transfer function which includes the spectral tilt. This embodiment is advantageous as by a simple filtering process the enhancement can be achieved.
In accordance with yet another embodiment the spectral tilt of the current frame may be combined with a factor related to the voicing of the previous frame of the audio signal, for example by filtering the code from the codebook based on a transfer function including the spectral tilt and the factor. This approach is advantageous as it provides for a possibility to obtain an even better estimate of the optimal tilt.
The present invention provides an audio decoder comprising the inventive apparatus for synthesizing an audio signal.
The present invention provides an audio decoder for decoding an audio signal, wherein the audio decoder is configured to apply a spectral tilt to the code of a codebook used for synthesizing a current frame of the audio signal, wherein the spectral tilt is based on the spectral tilt of the current frame of the audio signal.
The present invention provides an encoder for encoding an audio signal, wherein the audio encoder is configured to determine from a spectral tilt of a current frame of the audio signal a spectral tilt for a code of a codebook representing a current frame of the audio signal.
The present invention provides a system, comprising the inventive audio decoder and the inventive audio encoder.
The present invention provides a non-transitory computer medium storing instructions to carry out, when run on a computer, the inventive method for synthesizing an audio signal.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
In the following, embodiments of the inventive approach will be described. It is noted that in the subsequent description similar elements/steps are referred by the same reference signs.
In accordance with further embodiments, an adaptive tilt compensation for shaping codes of a CELP innovative codebook will be described.
The synthesizer 200 includes the filter 218 that is connected between the fixed codebook 202 and the first amplifier 212. The filter 218 receives from the storage 216 the LPC coefficients for the current frame. By means of the inventive structure the tilt of the audio frame that is currently processed is recovered from the already transmitted LPC coefficients that are stored in storage 216. In accordance with the embodiment of
where N is the size of the truncation of the infinite impulse response fS(n). In accordance with an embodiment, N is equal to the size of the innovative codebook, i.e. N is equal to the number of codes or codewords stored in the innovative codebook. The spectral tilt is applied, in accordance with the embodiment of
c(n)*ft1(n),
where ft1(n) is the impulse response of the following transfer function:
Ft1(z)=1−γz−1.
The embodiment of
In accordance with a third embodiment, for further improving the spectral tilt to be closer to an optimal tilt, i.e. to be closer to the actual tilt of the current frame of the input signal, the LPC synthesis filter 208 has the following transfer function:
with w1=0.8 and w2=0.9. In this case, the spectral tilt is defined as follows:
The weighting constants w1 and w2 are used to control the dynamic of the spectral envelope. For example, if w1=0 and w2=1, then Fe(z) follows quite closely the true signal envelope. The resulting spectral tilt γ will show a high dynamic and can fluctuate too much. This may be a solution for very low bit-rates where the codebook lacks definitively of tilt structure. However it was found that perceptually it is better to deduce the spectral tilt γ from a smooth version of the spectral envelope. A good smoothing was found to be achieved with the above values w1=0.8 and w2=0.9, which shows a good trade-off for a large range of bit-rates. In accordance with embodiments, w1 and w2 are be bit-rate dependent. At very high rates if the codebook is large enough and is able to model any spectral tilts γ, one may switch off the influence of the spectral tilt γ by setting w1=w2=1.
When compared to the second embodiment, which yields a tilt having a steeper slope than the optimal tilt would have, the third embodiment using the “weighted” transfer function provides for a tilt that is closer to the actual tilt of the current frame.
Ft2(z)=1−(α·β+b·γ)z−1
where a and b are constants. In an advantageous embodiment a=0.5 and b=0.25. The factor β may be deduced from the voicing of a previous frame as follows:
and the actual factor β may be determined as follows:
β=constant·(1+voicing)
The constants a and b are applied to control the mixture of voicing tilt β and the spectral tilt γ. As mentioned above with regard to the weighting constants w1 and w2, for low and medium bit-rates, it may be relevant to shape the codebook by sharpening low frequencies or high frequencies based on the spectral tilt γ. It was also observed that the more the signal is voiced the better is it to sharp the high frequencies. The constants a and b may be used to normalize the tilt factors β and γ and weigh their strengths in order to combine the two effects as desired. In accordance with embodiments, the constants a and b may be found empirically by assessing the perceptual quality. This gives about the same strength to both factors: γ is bounded between −1 and 1, s so b·γ is between −0.25 and 0.25 and β is bounded between 0 and 0.5 so a·β is bounded between 0 and 0.25. As for the weighting constants w1 and w2, also the constants a and b may be made bit-rate dependent.
In accordance with the fourth embodiment, the audio synthesis as shown in
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a non-transitory storage medium such as a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive method is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or programmed to, perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
This application is a continuation of copending U.S. application Ser. No. 14/811,386, filed Jul. 28, 2015, which in turn is a continuation of International Application No. PCT/EP2014/051592, filed Jan. 28, 2014, which is incorporated herein by reference in its entirety, and additionally claims priority from U.S. Application No. 61/758,098, filed Jan. 29, 2013, which is also incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5664055 | Kroon | Sep 1997 | A |
5915234 | Itoh | Jun 1999 | A |
6134518 | Cohen et al. | Oct 2000 | A |
6240386 | Thyssen et al. | May 2001 | B1 |
6385573 | Gao et al. | May 2002 | B1 |
6678651 | Gao et al. | Jan 2004 | B2 |
6678652 | Fuchigami et al. | Jan 2004 | B2 |
6996523 | Bhaskar et al. | Feb 2006 | B1 |
7092889 | Fuchigami et al. | Aug 2006 | B2 |
7881482 | Christoph | Feb 2011 | B2 |
8260611 | Vos et al. | Sep 2012 | B2 |
9373342 | Balam et al. | Jun 2016 | B2 |
9646624 | Disch | May 2017 | B2 |
9672843 | Krishnaswamy | Jun 2017 | B2 |
9706314 | Jenison | Jul 2017 | B2 |
9779747 | Wang | Oct 2017 | B2 |
9792920 | Disch | Oct 2017 | B2 |
9812143 | Liu | Nov 2017 | B2 |
10043525 | Kaniewska | Aug 2018 | B2 |
10121484 | Liu | Nov 2018 | B2 |
10269365 | Fuchs | Apr 2019 | B2 |
10304470 | Fuchs | May 2019 | B2 |
10373625 | Fuchs | Aug 2019 | B2 |
10431232 | Fuchs | Oct 2019 | B2 |
10607619 | Fuchs | Mar 2020 | B2 |
10909997 | Fuchs | Feb 2021 | B2 |
20030004710 | Gao et al. | Jan 2003 | A1 |
20050108007 | Bessette et al. | May 2005 | A1 |
20060089836 | Boillot et al. | Apr 2006 | A1 |
20060277042 | Vos et al. | Dec 2006 | A1 |
20080027716 | Rajendran et al. | Jan 2008 | A1 |
20090070106 | Gao | Mar 2009 | A1 |
20090177869 | Novichkov et al. | Jul 2009 | A1 |
20090265167 | Ehara | Oct 2009 | A1 |
20110099018 | Neuendorf et al. | Apr 2011 | A1 |
20110295598 | Yang et al. | Dec 2011 | A1 |
20140236588 | Subasingha et al. | Aug 2014 | A1 |
20150332696 | Fuchs et al. | Nov 2015 | A1 |
20150348562 | Krishnaswamy et al. | Dec 2015 | A1 |
20150371653 | Balam et al. | Dec 2015 | A1 |
20160343382 | Liu | Nov 2016 | A1 |
20170301361 | Liu | Oct 2017 | A1 |
20190333529 | Fuchs | Oct 2019 | A1 |
20210098010 | Fuchs | Apr 2021 | A1 |
Number | Date | Country |
---|---|---|
1468427 | Jan 2004 | CN |
101199004 | Jun 2008 | CN |
101836253 | Sep 2010 | CN |
2002523806 | Jul 2002 | JP |
2002528983 | Sep 2002 | JP |
2012042984 | Mar 2012 | JP |
2439721 | Jan 2012 | RU |
2469422 | Dec 2012 | RU |
200705823 | Feb 2007 | TW |
I383595 | Jan 2013 | TW |
0111655 | Feb 2001 | WO |
0191112 | Nov 2001 | WO |
03097258 | Nov 2003 | WO |
2006116025 | Nov 2006 | WO |
2011048094 | Apr 2011 | WO |
2011127569 | Oct 2011 | WO |
2011148230 | Dec 2011 | WO |
Entry |
---|
“ITU-T20080630”, Telecommunication Standardization (Category A). |
ETSI TS 126 090, “Mandatory Speech Codec Speech Processing Functions”, Universal Mobile Telecommunications System, 1999, pp. 1-62. |
ITU-T, G.718, “Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s”, Recommendation ITU-T G.718, Jun. 2008, 257 pages. |
Valin, JM et al., “Defintion of the Opus Audio Codec”, IETF, Sep. 2012, pp. 1-326. |
Number | Date | Country | |
---|---|---|---|
20190378528 A1 | Dec 2019 | US |
Number | Date | Country | |
---|---|---|---|
61758098 | Jan 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14811386 | Jul 2015 | US |
Child | 16549878 | US | |
Parent | PCT/EP2014/051592 | Jan 2014 | US |
Child | 14811386 | US |