The present disclosure relates generally to audio processing, and more particularly, to switching audio encoder modes.
The audible frequency range (the frequency of periodic vibration audible to the human ear) is from about 50 Hz to about 22 kHz, but hearing degenerates with age and most adults find it difficult to hear above about 14-15 kHz. Most of the energy of human speech signals is generally limited to the range from 250 Hz to 3.4 kHz. Thus, traditional voice transmission systems were limited to this range of frequencies, often referred to as the “narrowband.” However, to allow for better sound quality, to make it easier for listeners to recognize voices, and to enable listeners to distinguish those speech elements that require forcing air through a narrow channel, known as “fricatives” (‘s’ and ‘f’ being examples), newer systems have extended this range to about 50 Hz to 7 kHz. This larger range of frequencies is often referred to as “wideband” (WB) or sometimes HD (High Definition)-Voice.
The frequencies higher than the WB range—from about the 7 kHz to about 15 kHz—are referred to herein as the Bandwidth Extension (BWE) region. The total range of sound frequencies from about 50 Hz to about 15 kHz is referred to as “superwideband” (SWB). In the BWE region, the human ear is not particularly sensitive to the phase of sound signals. It is, however, sensitive to the regularity of sound harmonics and to the presence and distribution of energy. Thus, processing BWE sound helps the speech sound more natural and also provides a sense of “presence.”
An embodiment of the invention is directed to a hybrid encoder. When audio input received by the encoder changes from music-like sounds (e.g., music) to speech-like sounds (e.g., human speech), the encoder switches from a first mode (e.g., a music mode) to a second mode (e.g., a speech mode). In an embodiment of the invention, when the encoder operates in the first mode, it employs a first coder (e.g., a frequency domain coder, such as a harmonic-based sinusoidal-type coder). When the encoder switches to the second mode, it employs a second coder (e.g., a time domain or waveform coder, such as a CELP coder). This switch from the first coder to the second coder may cause delays in the encoding process, resulting in a gap in the encoded signal. To compensate, the encoder backfills the gap with a portion of the audio signal that occurs after the gap.
In a related embodiment of the invention, the second coder includes a BWE coding portion and a core coding portion. The core coding portion may operate at different sample rates, depending on the bit rate at which the encoder operates. For example, there may be advantages to using lower sample rates (e.g., when the encoder operates at lower bit rates), and advantages to using higher sample rates (e.g., when the encoder operates at higher bit rates). The sample rate of the core portion determines the lowest frequency of the BWE coding portion. However, when the switch from the first coder to the second coder occurs, there may be uncertainty about the sample rate at which the core coding portion should operate. Until the core sample rate is known, the processing chain of the BWE coding portion may not be able to be configured, causing a delay in the processing chain of the BWE coding portion. As a result of this delay, a gap is created in the BWE region of the signal during processing (referred to as the “BWE target signal”). To compensate, the encoder backfills the BWE target signal gap with a portion of the audio signal that occurs after the gap.
In another embodiment of the invention, an audio signal switches from a first type of signal (such as a music or music-like signal), which is coded by a first coder (such as a frequency domain coder) to a second type of signal (such as a speech or speech-like signal), which is processed by a second coder (such as a time domain or waveform coder). The switch occurs at a first time. A gap in the processed audio signal has a time span that begins at or after the first time and ends at a second time. A portion of the processed audio signal, occurring at or after the second time, is copied and inserted into the gap, possibly after functions are performed on the copied portion (such as time-reversing, sine windowing, and/or cosine windowing).
The previously-described embodiments may be performed by a communication device, in which an input interface (e.g., a microphone) receives the audio signal, a speech-music detector determines that the switch from music-like to speech-like audio has occurred, and a missing signal generator backfills the gap in the BWE target signal. The various operations may be performed by a processor (e.g., a digital signal processor or DSP) in combination with a memory (including, for example, a look-ahead buffer).
In the description that follows, it is to be noted that the components shown in the drawings, as well as labeled paths, are intended to indicate how signals generally flow and are processed in various embodiments. The line connections do not necessarily correspond to the discrete physical paths, and the blocks do not necessarily correspond to discrete physical components. The components may be implemented as hardware or as software. Furthermore, the use of the term “coupled” does not necessarily imply a physical connection between components, and may describe relationships between components in which there are intermediate components. It merely describes the ability of components to communicate with one another, either physically or via software constructs (e.g., data structures, objects, etc.)
Turning to the drawings, an example of a network in which an embodiment of the invention operates will now be described.
The communication device 104 may include a transceiver 240, which is capable of sending and receiving data over the network 102. The communication device may include a controller/processor 210 that executes stored programs, such as an encoder 222. Various embodiments of the invention are carried out by the encoder 222. The communication device may also include a memory 220, which is used by the controller/processor 210. The memory 220 stores the encoder 222 and may further include a look-ahead buffer 221, whose purpose will be described below in more detail. The communication device may include a user input/output interface 250 that may comprise elements such as a keypad, display, touch screen, microphone, earphone, and speaker. The communication device also may include a network interface 260 to which additional elements may be attached, for example, a universal serial bus (USB) interface. Finally, the communication device may include a database interface 230 that allows the communication device to access various stored data structures relating to the configuration of the communication device.
According to an embodiment of the invention, the input/output interface 250 (e.g., a microphone thereof) detects audio signals. The encoder 222 encodes the audio signals. In doing so, the encoder employs a technique known as “look-ahead” to encode speech signals. Using look-ahead, the encoder 222 examines a small amount of speech in the future of the current speech frame it is encoding in order to determine what is coming after the frame. The encoder stores a portion of the future speech signal in the look-ahead buffer 221
Referring to the block diagram of
The second coder 300b may be characterized as having a high-band portion, which outputs a BWE excitation signal (from about 7 kHz to about 16 kHz) over paths O and P, and low-band portion, which outputs a WB excitation signal (from about 50 Hz to about 7 kHz) over path N. It is to be understood that this grouping is for convenient reference only. As will be discussed, the high-band portion and the low-band portion interact with one another.
The high-band portion includes a bandpass filter 301, a spectral flip and down mixer 307 coupled to the bandpass filter 301, a decimator 311 coupled to the spectral flip and down mixer 307, a missing signal generator 311a coupled to the decimator 311, and a Linear Predictive Coding (LPC) analyzer 314 coupled to the missing signal generator 311a. The high-band portion 300a further includes a first quantizer 318 coupled to the LPC analyzer 314. The LPC analyzer may be, for example, a 10th order LPC analyzer.
Referring still to
The low-band portion includes an interpolator 304, a decimator 305, and a Code-Excited Linear Prediction (CELP) core codec 310. The interpolator 304 and the decimator 305 are both coupled to the CELP core codec 310.
The operation of the encoder 222 according to an embodiment of the invention will now be described. The speech/music detector 300 receives audio input (such as from a microphone of the input/output interface 250 of
The operation of the high-band portion of the second coder 300b will now be described with reference to
The missing signal generator 311a fills the gap in the BWE target signal that results from the encoder 222 switching between the first coder 300a and the CELP-type encoder 300b. This gap-filling process will be described in more detail with respect to
Referring still to
Referring again to
The total of the stochastic and adaptive components (path D) is also provided to the squaring circuit 306. The squaring circuit 306 generates strong harmonics of the core CELP signal to form a bandwidth-extended high-band excitation signal, which is provided to the mixer 309. The Gaussian generator 308 generates a shaped Gaussian noise signal, whose energy envelope matches that of the bandwidth-extended high-band excitation signal that was output from the squaring circuit 306. The mixer 309 receives the noise signal from the Gaussian generator 308 and the bandwidth-extended high-band excitation signal from the squaring circuit 306 and replaces a portion of the bandwidth-extended high-band excitation signal with the shaped Gaussian noise signal. The portion that is replaced is dependent upon the estimated degree of voicing, which is an output from the CELP core and is based on the measurements of the relative energies in the stochastic component and the active codebook component. The mixed signal that results from the mixing function is provided to the bandpass filter 312. The bandpass filter 312 has the same characteristics as that of the bandpass filter 301, and extracts the corresponding components of the high-band excitation signal.
The bandpass-filtered high-band excitation signal, which is output by the bandpass filter 312, is provided to the spectral flip and down-mixer 313. The spectral flip and down-mixer 313 flips the bandpass-filtered high-band excitation signal and performs a spectral translation down in frequency, such that the resulting signal occupies the frequency region from 0 Hz to 8 kHz. This operation matches that of the spectral flip and down-mixer 307. The resulting signal is provided to the decimator 315, which band-limits and reduces the sample rate of the flipped and down-mixed high-band excitation signal from 32 kHz to 16 kHz. This operation matches that of the decimator 311. The resulting signal has a generally flat or white spectrum but lacks any formant information The all-pole filter 316 receives the decimated, flipped and down-mixed signal from the decimator 314 as well as the unquantized LPC filter coefficients from the LPC analyzer 314. The all-pole filter 316 reshapes the decimated, flipped and down-mixed high-band signal such that it matches that of the BWE target signal. The reshaped signal is provided to the gain computer 317, which also receives the gap-filled BWE target signal from the missing signal generator 311a (via path L). The gain computer 317 uses the gap-filled BWE target signal to determine the ideal gains that should be applied to the spectrally-shaped, decimated, flipped and down-mixed high-band excitation signal. The spectrally-shaped, decimated, flipped and down-mixed high-band excitation signal (having the ideal gains) is provided to the second quantizer 319, which quantizes the gains for the high band. The output of the second quantizer 319 is the quantized gains. The quantized LPC parameters and the quantized gains are subjected to additional processing, transformations, etc., resulting in radio frequency signals that are transmitted, for example, to the second communication device 106 via the network 102.
As previously noted, the missing signal generator 311a fills the gap in the signal resulting from the encoder 222 changing from a music mode to a speech mode. The operation performed by the missing signal generator 311a according to an embodiment of the invention will now be described in more detail with respect to
The encoder 222 superimposes the copied signal portion 406 onto the regenerated signal estimate 408 so that a portion of the copied signal portion 406 is inserted into the gap 416. In some embodiments, the missing signal generator 311a time-reverses the copied signal portion 406 prior to superimposing it onto the regenerated signal estimate 402, as shown in
In an embodiment, the copied portion 406 spans a greater time period than that of the gap 416. Thus, in addition to the copied portion 406 filling the gap 416, part of the copied portion is combined with the signal beyond the gap 416. In other embodiments, the copied portion is spans the same period of time as the gap 416.
While the present disclosure and the best modes thereof have been described in a manner establishing possession by the inventors and enabling those of ordinary skill to make and use the same, it will be understood that there are equivalents to the exemplary embodiments disclosed herein and that modifications and variations may be made thereto without departing from the scope and spirit of the disclosure, which are to be limited not by the exemplary embodiments but by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
4560977 | Murakami et al. | Dec 1985 | A |
4670851 | Murakami et al. | Jun 1987 | A |
4727354 | Lindsay | Feb 1988 | A |
4853778 | Tanaka | Aug 1989 | A |
5006929 | Barbero et al. | Apr 1991 | A |
5067152 | Kisor et al. | Nov 1991 | A |
5327521 | Savic et al. | Jul 1994 | A |
5394473 | Davidson | Feb 1995 | A |
5956674 | Smyth et al. | Sep 1999 | A |
6108626 | Cellario et al. | Aug 2000 | A |
6236960 | Peng et al. | May 2001 | B1 |
6253185 | Arean et al. | Jun 2001 | B1 |
6263312 | Kolesnik et al. | Jul 2001 | B1 |
6304196 | Copeland et al. | Oct 2001 | B1 |
6453287 | Unno et al. | Sep 2002 | B1 |
6493664 | Uday Bhaskar et al. | Dec 2002 | B1 |
6504877 | Lee | Jan 2003 | B1 |
6593872 | Makino et al. | Jul 2003 | B2 |
6658383 | Koshida et al. | Dec 2003 | B2 |
6662154 | Mittal et al. | Dec 2003 | B2 |
6680972 | Liljeryd et al. | Jan 2004 | B1 |
6691092 | Udaya Bhaskar et al. | Feb 2004 | B1 |
6704705 | Kabal et al. | Mar 2004 | B1 |
6813602 | Thyssen | Nov 2004 | B2 |
6895375 | Malah et al. | May 2005 | B2 |
6940431 | Hayami | Sep 2005 | B2 |
6975253 | Dominic | Dec 2005 | B1 |
7031493 | Fletcher et al. | Apr 2006 | B2 |
7130796 | Tasaki | Oct 2006 | B2 |
7161507 | Tomic | Jan 2007 | B2 |
7180796 | Tanzawa et al. | Feb 2007 | B2 |
7212973 | Toyama et al. | May 2007 | B2 |
7230550 | Mittal et al. | Jun 2007 | B1 |
7231091 | Keith | Jun 2007 | B2 |
7414549 | Yang et al. | Aug 2008 | B1 |
7461106 | Mittal et al. | Dec 2008 | B2 |
7761290 | Koishida et al. | Jul 2010 | B2 |
7840411 | Hotho et al. | Nov 2010 | B2 |
7885819 | Koishida et al. | Feb 2011 | B2 |
7889103 | Mittal et al. | Feb 2011 | B2 |
8423355 | Mittal et al. | Apr 2013 | B2 |
8442837 | Ashley et al. | May 2013 | B2 |
8577045 | Gibbs | Nov 2013 | B2 |
8639519 | Ashley et al. | Jan 2014 | B2 |
8725500 | Gibbs et al. | May 2014 | B2 |
8868432 | Gibbs et al. | Oct 2014 | B2 |
20020052734 | Unno et al. | May 2002 | A1 |
20030004713 | Makino et al. | Jan 2003 | A1 |
20030009325 | Kirchherr et al. | Jan 2003 | A1 |
20030220783 | Streich et al. | Nov 2003 | A1 |
20040252768 | Suzuki et al. | Dec 2004 | A1 |
20050261893 | Toyama et al. | Nov 2005 | A1 |
20060022374 | Chen et al. | Feb 2006 | A1 |
20060047522 | Ojanpera | Mar 2006 | A1 |
20060173675 | Ojanpera | Aug 2006 | A1 |
20060190246 | Park | Aug 2006 | A1 |
20060241940 | Ramprashad | Oct 2006 | A1 |
20070171944 | Schuijers et al. | Jul 2007 | A1 |
20070239294 | Brueckner et al. | Oct 2007 | A1 |
20070271102 | Morii | Nov 2007 | A1 |
20080065374 | Mittal et al. | Mar 2008 | A1 |
20080120096 | Oh et al. | May 2008 | A1 |
20080154584 | Andersen | Jun 2008 | A1 |
20090024398 | Mittal et al. | Jan 2009 | A1 |
20090030677 | Yoshida | Jan 2009 | A1 |
20090048852 | Burns et al. | Feb 2009 | A1 |
20090076829 | Ragot et al. | Mar 2009 | A1 |
20090100121 | Mittal et al. | Apr 2009 | A1 |
20090112607 | Ashley et al. | Apr 2009 | A1 |
20090234642 | Mittal et al. | Sep 2009 | A1 |
20090259477 | Ashley et al. | Oct 2009 | A1 |
20090306992 | Ragot et al. | Dec 2009 | A1 |
20090326931 | Ragot et al. | Dec 2009 | A1 |
20100049510 | Zhan et al. | Feb 2010 | A1 |
20100063827 | Gao | Mar 2010 | A1 |
20100088090 | Ramabadran | Apr 2010 | A1 |
20100169087 | Ashley et al. | Jul 2010 | A1 |
20100169099 | Ashley et al. | Jul 2010 | A1 |
20100169100 | Ashley et al. | Jul 2010 | A1 |
20100169101 | Ashley et al. | Jul 2010 | A1 |
20100217607 | Neuendorf et al. | Aug 2010 | A1 |
20100305953 | Susan et al. | Dec 2010 | A1 |
20110161087 | Ashley et al. | Jun 2011 | A1 |
20110202355 | Grill et al. | Aug 2011 | A1 |
20110218797 | Mittal et al. | Sep 2011 | A1 |
20110218799 | Mittal et al. | Sep 2011 | A1 |
20110238425 | Neuendorf et al. | Sep 2011 | A1 |
20120029923 | Rajendran et al. | Feb 2012 | A1 |
20120095758 | Gibbs et al. | Apr 2012 | A1 |
20120101813 | Vaillancourt et al. | Apr 2012 | A1 |
20120116560 | Francois et al. | May 2012 | A1 |
20120239388 | Sverrisson et al. | Sep 2012 | A1 |
20120265541 | Geiger et al. | Oct 2012 | A1 |
20130030798 | Mittal et al. | Jan 2013 | A1 |
20130317812 | Jeong et al. | Nov 2013 | A1 |
20130332148 | Ravelli et al. | Dec 2013 | A1 |
20140019142 | Mittal et al. | Jan 2014 | A1 |
20140114670 | Miao et al. | Apr 2014 | A1 |
20140119572 | Gao | May 2014 | A1 |
20140257824 | Taleb et al. | Sep 2014 | A1 |
Number | Date | Country |
---|---|---|
0932141 | Jul 1999 | EP |
1483759 | Aug 2004 | EP |
1533789 | May 2005 | EP |
1619664 | Jan 2006 | EP |
1818911 | Aug 2007 | EP |
1845519 | Oct 2007 | EP |
1912206 | Apr 2008 | EP |
1959431 | Jun 2010 | EP |
2352147 | Aug 2011 | EP |
9715983 | May 1997 | WO |
03073741 | Sep 2003 | WO |
2007063910 | Jun 2007 | WO |
2010003663 | Jan 2010 | WO |
Entry |
---|
Pulakka et al., “Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, No. 6, Aug. 2008. |
P. Esquef et al., “An Efficient Model-Based Multirate Method for Reconstruction of Audio Signals Across Long Gaps”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, No. 4, Jul. 2006. |
J. Princen et al., “Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-34, No. 5, Oct. 1986. |
3GPP TS 26.290 V7.0.0 (Mar. 2007); 3rd Generation Partnership Project; Techinical Specification Group Service and System Aspects; Audio Codec Processing Functions; Extended Adaptive Multi-Rate—Wideband (AMR-WB+) Codec; Transcoding Functions (Release 7). |
Chan et al.; Frequency Domain Postfiltering for Multiband Excited Linear Predictive Coding of Speech; Electronics Letters; Jun. 6, 1996, vol. 32 No. 12; 3 pages. |
Chen et al.; Adaptive Postfiltering for Quality Enhancement of Coded Speech; IEEE Transactions on Speech and Audio Processing, vol. 3. No. 1, Jan. 1995; 13 pages. |
Anderson et al.; Reverse Water-Filling in Predictive Encoding of Speech; Department of Speech, Music and Hearing, Royal Institute of Technology; Stockholm, Sweden; 3 pages, Jun. 20, 1999-Jun. 23, 1999. |
Ramprashad, “High Quality Embedded Wideband Speech Coding Using an Inherently Layered Coding Paradigm,” Proceedings of International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, vol. 2, Jun. 5-9, 2000, pp. 1145-1148. |
Ramprashad, “A Two Stage Hybrid Embedded Speech/Audio Coding Structure,” Proceedings of Internationnal Conference on Acoustics, Speech, and Signal Processing, ICASSP 1998, May 1998, vol. 1, pp. 337-340, Seattle, Washington. |
International Telecommunication Union, “G.729.1, Series G: Transmission Systems and Media, Digital Systems and Networks, Digital Terminal Equipments—Coding of analogue signals by methods other than PCM,G.729 based Embedded Variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729,” ITU-T Recomendation G.729.1, May 2006, Cover page, pp. 11-18. Full document available at: http://www.itu.int/rec/T-REC-G.729.1-200605-I/en. |
Kovesi, et al., “A Scalable Speech and Adiuo Coding Scheme with Continuous Bitrate Flexibility,” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2004 (ICASSP '04) Montreal, Quebec, Canada, May 17-21, 2004, vol. 1, pp. 273-276. |
Ramprashad, “Embedded Coding Using a Mixed Speech and Audio Coding Paradigm,” International Journal of Speech Technology, Kluwer Academic Publishers, Netherlands, vol. 2, No. 4, May 1999, pp. 359-372. |
Mittal, et al., “Coding Unconstrained FCB Excitation Using Combinatorial and Huffman Codes,” Proceedings of the 2002 IEEE Workshop on Speech Coding, Oct. 6-9, 2002, pp. 129-131. |
Ashley, et al., Wideband Coding of Speech Using a Scalable Pulse Codebook, Proceedings of the 2000 IEEE Workshop on Speech Coding, Sep. 17-20, 2000, pp. 148-150. |
Mittal, et al.,“Low Complexity Factorial Pulse Coding of MDCT Coefficients using Approximation of Combinatorial Functions,” IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, Apr. 15-20, 2007, pp. I-289-I-292. |
Makinen, et al., “AMR-WB+: A New Audio Coding Standard for 3rd Generation Mobile Audio Service,” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2005, ICASSP'05, vol. 2, Mar. 18-23, 2005, pp. ii/1109-ii/1112. |
Faller, et al., “Technical Advances in Digital Audio Radio Broadcasting,” Proceedings of the IEEE, vol. 90, Issue 8, Aug. 2002, pp. 1303-1333. |
Salami, et al., “Extended AMR-WB for High-Quality Audio on Mobile Devices,” IEEE Communications Magazine, vol. 44, Issue 5, May 2006, pp. 90-97. |
Hung, et al., “Error-Resilient Pyramid Vector Quantization for Image Compression,” IEEE Transactions on Image Processing, vol. 7, Issue 10, Oct. 1998, pp. 1373-1386. |
Tancerel, et al., “Combined Speech and Audio Coding by Discrimination”; Proceedings of the 2000 IEEE Workshop on Speech Coding, Sep. 17-20, 2000, pp. 154-156. |
Virette, et al., “Adaptive Time-Frequency Resolution in Modulated Transform at Reduced Delay”, Orange Labs, France; IEEE 2008; pp. 3781-3784. |
Princen, et al., “Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation”, IEEE 1987 pp. 2161-2164. |
B. Elder, “Coding of Audio Signals with Overlapping Block Transform and Adaptive Window Functions”, Frequenz; Zeitschnft fnr Schwingungs—und Schwachstromtechnik, 1989, vol. 43, pp. 252-256. |
Kim et al.; “A New Bandwidth Scalable Wideband Speech/Audio Coder” Proceedings of Proceedings of International Conference on Acoustics, Speech, and Signal Processing, ICASSP; Orlando, FL; vol. 1, May 13, 2002 pp. 657-660. |
Hung et al., Error-Resilient Pyramid Vector Quantization for Image Compression, IEEE Transactions on Image Processing, 1994 pp. 583-587. |
Daniele Cadel, et al. “Pyramid Vector Coding for High Quality Audio Compression”, IEEE 1997, pp. 343-346, Cefriel, Milano, Italy and Alcatel Telecom, Vimercate Italy. |
Markas et al. “Multispectral Image Compression Algorithms”; Data Compression Conference, 1993; Snowbird, UT USA Mar. 30-Apr. 2, 1993; pp. 391-400. |
“Enhanced Variable Rate Codec, Speech Service Options 3, 68, and 70 for Wideband Spread Spectrum Digital Systems”, 3GPP2 TSG-C Working Group 2, XX, XX, No. C. S0014-C, Jan. 1, 2007, pp. 1-5. |
Boris Ya Ryabko et al.: “Fast and Efficient Construction of an Unbiased Random Sequence”, IEEE Transactions on Information Theory, IEEE, US, vol. 46, No. 3, May 1, 2000, ISSN: 0018-9448, pp. 1090-1093. |
Ratko V. Tomic: “Quantized Indexing: Background Information”, May 16, 2006, URL: http://web.archive.org/web/20060516161324/www.1stworks.com/ref/TR/tr05-0625a.pdf, pp. 1-39. |
Ido Tal et al.: “On Row-by-Row Coding for 2-D Constraints”, Information Theory, 2006 IEEE International Symposium On, IEEE, PI, Jul. 1, 2006, pp. 1204-1208. |
Ramo et al. “Quality Evaluation of the G.EV-VBR Speech Codec” Apr. 4, 2008, pp. 4745-4748. |
Jelinek et al. “ITU-T G.EV-VBR Baseline Codec” Apr. 4, 2008, pp. 4749-4752. |
Jelinek et al. “Classification-Based Techniques for Improving the Robustness of CELP Coders” 2007, pp. 1480-1484. |
Fuchs et al. “A Speech Coder Post-Processor Controlled by Side-Information” 2005, pp. IV-433-IV-436. |
J. Fessler, “Chapter 2; Discrete-time signals and systems” May 27, 2004, pp. 2.1-2.21. |
Neuendorf, et al., “Unified Speech Audio Coding Scheme for High Quality oat Low Bitrates” ieee International Conference on Accoustics, Speech and Signal Processing, 2009, Apr. 19, 2009, 4 pages. |
Bruno Bessette: Universal Speech/Audio Coding using Hybrid ACELP/TCX techniques, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference, Mar. 18-23, 2005, ISSN : III-301-III-304, Print ISBN: 0-7803-8874-7, all pages. |
Ratko V. Tomic: “Fast, Optimal Entropy Coder” 1stWorks Corporation Technical Report TR04-0815, Aug. 15, 2004, pp. 1-52. |
Combesure, Pierre et al.: “A 16, 24, 32 KBIT/S Wideband Speech Codec Based on ATCELP”, Proceedings ICASSP '99 Proceedings of the Acoustics, Speech, and Signal PRocessing, 1999, on 1999 IEEE International Conference, vol. 01, pp. 5-8. |
Ejaz Mahfuz: “Packet Loss Concealment for Voice Transmission over IP Networks”, Department of Electrical Engineering, McGill University, Montreal, Canada, Sep. 2001, A thesis submitted to the Faculty of Graduate Studies Research in Partial fulfillment of hte requirements for the degree of Master of Engineering, all pages. |
Balazs Kovesi et al.: “Integration of a CELP Coder in the ARDOR Universal Sound Codec”, Interspeech 2006—ICSLP Ninth International Conference on Spoken Language Processing) Pittsburg, PA, USA, Sep. 17-21, 2006, all pages. |
Patent Cooperation Treaty, International Search Report and Written Opinion of the International Searching Authority for International Application No. PCT/US2013/058436, Feb. 4, 2014, 11 pages. |
Number | Date | Country | |
---|---|---|---|
20140088973 A1 | Mar 2014 | US |