The present invention relates generally to telecommunication techniques. More particularly, the invention provides an encoding and decoding system and method that support a plurality of compression standards and share computational resources. Merely by way of example, the invention has been applied to Code Excited Linear Prediction (CELP) techniques, but it would be recognized that the invention has a much broader range of applicability. A further example of the invention is a multi-codec that combines two or more speech or audio codecs. A wide range of speech and/or audio codecs may be integrated within the multi-codec architecture.
Code Excited Linear Prediction (CELP) speech coding techniques are widely used in mobile telephony, voice trunking and routing, and Voice-over-IP (VoTP). Such coders/decoders (codecs) model voice signals as a source filter model. The source/excitation signal is generated via adaptive and fixed codebooks, and the filter is modeled by a short-term linear predictive coder (LPC). The encoded speech is then represented by a set of parameters which specify the filter coefficients and the type of excitation.
Industry standards codecs using CELP techniques include Global System for Mobile (GSM) Communications Enhanced Full Rate (EFR) codec, Adaptive Multi-Rate Narrowband (AMR-NB) codec, Adaptive Multi-Rate Wideband (AMR-WB), G.723.1, G.729, Enhanced Variable Rate Codec (EVRC), Selectable Mode Vocoder (SMV), QCELP, and MPEG-4. These standard codecs apply substantially the same generic algorithms in extracting CELP parameters with modifications to frame and subframe sizes, filtering procedures, interpolation resolutions, code-book structures and code-book search intervals.
For example, the GSM standards AMR-NB and AMR-WB usually operate with a 20 ms frame size divided into 4 subframes of 5 ms. One difference between the wideband and narrowband coder is the sampling rate, which is 8 kHz for AMR-NB and 16 kHz downsampled to 12.8 kHz for analysis for AMR-WB. The linear prediction (LP) techniques used in both AMR-NB and AMR-WB are substantially identical, but AMR-WB performs adaptive tilt filtering, linear prediction (LP) analysis to 16th order over an extended bandwidth of 6.4 kHz, conversion of LP coefficients to/from Immittance Spectral Pairs (ISP), and quantization of the ISPs using split-multi-stage vector quantization (SMSVQ). The pitch search routines and computation of the target signal are similar. Both codecs follow an ACELP fixed codebook structure using a depth-first tree search to reduce computations. The adaptive and fixed codebook gains are quantized in both codecs using joint vector quantization (VQ) with 4th order moving average (MA) prediction. AMR-WB also contains additional functions to deal with the higher frequency band up to 7 kHz.
In another example, the Code Division Multiple Access (CDMA) standards SMV and EVRC share certain math functions at the basic operations level. At the algorithm level, the noise suppression and rate selection routines of EVRC are substantially identical to SMV modules. The LP analysis follows substantially the same algorithm in both codecs and both modify the target signal to match an interpolated delay contour. At Rate ⅛, both codecs produce a pseudo-random noise excitation to represent the signal. SMV incorporates the full range of post-processing operations including tilt compensation, formant postfilter, long term postfilter, gain normalization, and highpass filtering, whereas EVRC uses a subset of these operations.
As discussed above, a large number of industry standards codecs use CELP techniques. These codecs are usually supported by mobile and telephony handsets in order to interoperate with emerging and legacy network infrastructure. With the deployment of media rich handsets and the increasing complexity of user applications on these handsets, the large number of codecs is putting increasing pressure on handset resources in terms of program memory and DSP resources.
Hence it is desirable to improve codec techniques.
The present invention relates generally to telecommunication techniques. More particularly, the invention provides an encoding and decoding system and method that support a plurality of compression standards and share computational resources. Merely by way of example, the invention has been applied to Code Excited Linear Prediction (CELP) techniques, but it would be recognized that the invention has a much broader range of applicability.
According to an embodiment, the present invention provides a method and apparatus for encoding and decoding a speech signal using a multiple codec architecture concept that supports several CELP voice coding standards. The individual codecs are combined into an integrated framework to reduce the program size. This integrated framework is referred to as a thin CELP codec. The apparatus includes a CELP encoder that generates a bitstream from the input voice signal in a format specific to the desired CELP codec, and a CELP decoding module that decodes a received CELP bitstream and generates a voice signal. The CELP encoder includes one or more codec-specific CELP encoding modules, a common functions library, a common math operations library, a common tables library, and a bitstream packing module. The common libraries are shared between more than one voice coding standard. The output bitstream may be bit-exact to the standard codec implementation or produce quality equivalent to the standard codec implementation. The CELP decoder includes bitstream unpacking module, one or more codec-specific CELP decoding modules, a common functions library, a common math operations library and a library of common tables. The output voice signal may be bit-exact to the standard codec implementation or produce quality equivalent to the standard codec implementation
According to another embodiment, the method for encoding a voice signal includes generating CELP parameters from the input voice signal in a format specific to the desired CELP codec and packing the codec-specific CELP parameters to the output bitstream. The method for decoding a voice signal includes unpacking the bitstream into codec-specific CELP parameters, and decoding the parameters to generate output speech.
According to yet another embodiment of the present invention, an apparatus for encoding and decoding a voice signal includes an encoder configured to generate an output bitstream signal from an input voice signal. The output bitstream signal is associated with at least a first standard of a first plurality of CELP voice compression standards. Additionally, the apparatus includes a decoder configured to generate an output voice signal from an input bitstream signal. The input bitstream signal is associated with at least a first standard of a second plurality of CELP voice compression standards. The CELP encoder includes a plurality of codec-specific encoder modules. At least one of the plurality of codec-specific encoder modules including at least a first table, at least a first function or at least a first operation. The first table, the first function or the first operation is associated with only a second standard of the first plurality of CELP voice compression standards. Additionally, the CELP encoder includes a plurality of generic encoder modules. At least one of the plurality of generic encoder modules includes at least a second table, a second function or a second operation. The second table, the second function or the second operation is associated with at least a third standard and a fourth standard of the first plurality of CELP voice compression standards. The third standard and the fourth standard of the first plurality of CELP voice compression standards are different. The CELP decoder includes a plurality of codec-specific decoder modules. At least one of the plurality of codec-specific decoder modules includes at least a third table, at least a third function or at least a third operation. The third table, the third function or the third operation is associated with only a second standard of the second plurality of CELP voice compression standards. Additionally, the CELP decoder includes a plurality of generic decoder modules. At least one of the plurality of generic decoder modules includes at least a fourth table, a fourth function or a fourth operation. The fourth table, the fourth function or the fourth operation is associated with at least a third standard and a fourth standard of the second plurality of CELP voice compression standards. The third standard and the fourth standard of the second plurality of CELP voice compression standards are different.
According to yet another embodiment of the present invention, a method for encoding and decoding a voice signal includes receiving an input voice signal, processing the input voice signal, and generating an output bitstream signal based on at least information associated with the input voice signal. The output bitstream signal is associated with at least a first standard of a first plurality of CELP voice compression standards. Additionally, the method includes receiving an input bitstream signal, processing the input bitstream signal, and generating an output voice signal based on at least information associated with the input bitstream signal. The output voice signal is associated with at least a first standard of a second plurality of CELP voice compression standards. The processing the input voice signal uses at least a first common functions library, at least a first common math operations library, and at least a first common tables library. The first common functions library includes a first function; the first common math operations library includes a first operation, and the first common tables library includes a first table. The first function, the first operation and the first table are associated with at least a second standard and a third standard of the first plurality of CELP voice compression standards. The second standard and the third standard of the first plurality of CELP voice compression standards are different. The generating an output bitstream signal includes generating a first plurality of codec-specific CELP parameters based on at least information associated with the input voice signal, and packing the first plurality of codec-specific CELP parameters to the output bitstream signal. The processing the input bitstream signal uses at least a second common functions library, at least a second common math operations library, and a second common tables library. The second common functions library includes a second function, the second common math operations library includes a second operation, and the second common tables library includes a second table. The second function, the second operation and the second table are associated with at least a second standard and a third standard of the second plurality of CELP voice compression standards. The second standard and the third standard of the second plurality of CELP voice compression standards are different. The generating an output voice signal includes unpacking the input bitstream signal and decoding a second plurality of codec-specific CELP parameters to produce an output voice signal.
An example of the invention are provided, specifically a thin CELP codec which combines the voice coding standards of GSM-EFR, GSM AMR-NB and GSM AMR-WB. Another example illustrates the combination of the EVRC and SMV voice coding standards for CDMA. Many variations of voice coding standard combinations are applicable.
Numerous benefits are achieved using the present invention over conventional techniques. Certain embodiments of the present invention can be used to reduce the program size of the encoder and decoder modules to be significantly less than the combined program size of the individual voice compression modules. Some embodiments of the present invention can be used to produce improved voice quality output than the standard codec implementation. Certain embodiments of the present invention can be used to produce lower computational complexity than the standard codec implementation. Some embodiments of the present invention provide efficient embedding of a number of standard codecs and facilitates interoperability of handsets with diverse networks.
Depending upon the embodiment under consideration, one or more of these benefits may be achieved. These benefits and various additional objects, features and advantages of the present invention can be fully appreciated with reference to the detailed description and accompanying drawings that follow.
The present invention relates generally to telecommunication techniques. More particularly, the invention provides an encoding and decoding system and method that support a plurality of compression standards and share computational resources. Merely by way of example, the invention has been applied to Code Excited Linear Prediction (CELP) techniques, but it would be recognized that the invention has a much broader range of applicability.
An illustration of the encoder and decoder modules for voice coding to encode to and decode from multiple voice coding standards are shown in
An encoder 900 of a thin CELP codec includes specific modules 990 and generic modules 992. The specific modules 990 include CELP encoding modules 920 and bitstream packing modules 940. The generic modules 992 include generic tables 960, generic math operations 970, and generic subfunctions 980. Input speech samples 910 are input to the codec-specific CELP encoding modules 920 and codec-specific CELP parameters 930 are produced. These parameters are then packed to a bitstream 950 in a desired coding standard format using the codec-specific bitstream packing modules 940. The codec-specific CELP encoding modules 920 contain encoding modules for each supported voice coding standard. However, the tables 960, math operations 970 and subfunctions 980 that are common or generic to two or more of the supported encoders are factored out of the individual encoding modules by a codec algorithm factorization module, and included only once in a shared library in the thin codec 900. This sharing of common code reduces the combined program memory requirements. Algorithm factorization is performed only once during the implementation stage for each combination of codecs in the thin codec. Efficient factorizing of subfunctions may require splitting the processing modules into more than one stage. Some stages may share commonality with other codecs, while other stages may be distinct to a particular codec.
The algorithm factorization module can operate at a number of levels depending on the codec requirements. If a bit-exact implementation is required to the individual standard codecs, only functions, tables, and math operations that maintain bit-exactness between more than two codecs are factored out into the generic modules.
If the bit-exact constraint is relaxed, then functions, tables and math operations that produce equivalent quality or provide equivalent functionality can be factored out into the generic modules. Alternatively, new generic processing modules can be derived and called by one or more codecs. This has the benefit of providing bit-compliant codec implementation. Using this approach, the program size can be reduced even further by having an increased number of generic modules.
It is beneficial to maintain a modular, generalized framework so that modules for additional coders can be easily integrated. The use of generic modules may provide output voice quality higher than the standard codec implementation without an increase in program complexity, for example, by applying more advanced perceptual weighting filters. The use of generic modules may also provide lower complexity than the standard codec, for example, by applying faster searching techniques. These benefits may be combined.
The greater the similarity between voice coding standards, the greater the program size savings that can be achieved by a thin codec according to an embodiment of the present invention. As an example for illustration of the bit-compliant specific embodiment of a thin CELP codec, the speech codecs integrated are GSM-EFR, AMR-NB and AMR-WB, although others can be used. GSM-EFR is algorithmically the same as the highest rate of AMR-NB, thus no additional program code is required for AMR-NB to gain GSM-EFR bit-compliant functionality. The GSM standards AMR-NB, which has eight modes ranging from 4.75 kbps to 12.2 kbps, and AMR-WB, which has eight modes ranging from 6.60 kbps to 23.85 kbps, share a high degree of similarity in the encoder/decoder flow and in the general algorithms of many procedures.
According to one embodiment of the present invention, an apparatus for encoding and decoding a voice signal includes an encoder configured to generate an output bitstream signal from an input voice signal. The output bitstream signal is associated with at least a first standard of a first plurality of CELP voice compression standards. Additionally, the apparatus includes a decoder configured to generate an output voice signal from an input bitstream signal. The input bitstream signal is associated with at least a first standard of a second plurality of CELP voice compression standards. The output bitstream signal is bit exact or equivalent in quality for the first standard of the first plurality of CELP voice compression standards.
The CELP encoder includes a plurality of codec-specific encoder modules. At least one of the plurality of codec-specific encoder modules including at least a first table, at least a first function or at least a first operation. The first table, the first function or the first operation is associated with only a second standard of the first plurality of CELP voice compression standards. Additionally, the CELP encoder includes a plurality of generic encoder modules. At least one of the plurality of generic encoder modules includes at least a second table, a second function or a second operation. The second table, the second function or the second operation is associated with at least a third standard and a fourth standard of the first plurality of CELP voice compression standards. The third standard and the fourth standard of the first plurality of CELP voice compression standards are different.
The plurality of codec-specific encoder modules includes a pre-processing module configured to process the speech for encoding, a linear prediction analysis module configured to generate linear prediction parameters, an excitation generation module configured to generate an excitation signal by filtering the input speech signal by the short-term prediction filter, and a long-term prediction module configured to generate open-loop pitch lag parameters. Additionally, the plurality of codec-specific encoder modules includes an adaptive codebook module configured to determine an adaptive codebook lag and an adaptive codebook gain, a fixed codebook module configured to determine fixed codebook vectors and a fixed codebook gain; and a bitstream packing module. The bitstream packing module includes at least one bitstream packing routine and is configured to generate the output bitstream signal based on at least codec-specific CELP parameters associated with at least the first standard of the first plurality of CELP voice compression standards.
The plurality of generic encoder modules comprises a first common functions library including at least the second function, a first common math operations library including at least the second operation, and a first common tables library including at least the second table. The first common functions library, the first common math operations library and the first common tables library are made by at least an algorithm factorization module. The algorithm factorization module is configured to remove a first plurality of generic functions, a first plurality of generic operations and a first plurality of generic tables from the plurality of codec-specific encoder modules and store the first plurality of generic functions, the first plurality of generic operations and the first plurality of generic tables in the first common functions library, the first common math operations library and the first common tables library.
The first common functions library, the first common math operations library and the first common tables library are associated with at least the third standard and the fourth standard of the first plurality of CELP voice compression standards and configured to substantially remove all duplications between a first program code associated with the third standard of the first plurality of CELP voice compression standards and a second program code associated with the fourth standard of the first plurality of CELP voice compression standards.
For example, the first common functions library, the first common math operations library and the first common tables library include only functions, math operations and tables configured to maintain bit exactness for the third standard and the fourth standard of the first plurality of CELP voice compression standards. For another example, the first common functions library, the first common math operations library and the first common tables library include only functions, math operations and tables algorithmically identical to ones of the third standard and the fourth standard of the first plurality of CELP voice compression standards, and functions, math operations and tables algorithmically similar to ones of the third standard and the fourth standard of the first plurality of CELP voice compression standards.
The CELP decoder includes a plurality of codec-specific decoder modules. At least one of the plurality of codec-specific decoder modules includes at least a third table, at least a third function or at least a third operation. The third table, the third function or the third operation is associated with only a second standard of the second plurality of CELP voice compression standards. Additionally, the CELP decoder includes a plurality of generic decoder modules. At least one of the plurality of generic decoder modules includes at least a fourth table, a fourth function or a fourth operation. The fourth table, the fourth function or the fourth operation is associated with at least a third standard and a fourth standard of the second plurality of CELP voice compression standards. The third standard and the fourth standard of the second plurality of CELP voice compression standards are different.
The plurality of codec-specific decoder modules include a bitstream unpacking module. The bitstream unpacking module includes at least one bitstream unpacking routine and is configured to decode the input bitstream signal and generate codec-specific CELP parameters. Additionally, the plurality of codec-specific decoder modules include an excitation reconstruction module configured to reconstruct an excitation signal based on at least information associated with adaptive codebook lags, adaptive codebook gains, fixed codebook indices and fixed codebook gains. Moreover, the plurality of codec-specific decoder modules include a synthesis module configured to filter the excitation signal and generate a reconstructed speech. Also, the plurality of codec-specific decoder modules include a post-processing module configured to improve a perceptual quality of the reconstructed speech.
The generic decoder modules comprise a second common functions library including at least the fourth function, a second common math operations library including at least the fourth operation, and a second common tables library including at least the fourth table. The second common functions library, the second common math operations library and the second common tables library are made by at least an algorithm factorization module. The algorithm factorization module is configured to remove a second plurality of generic functions, a second plurality of operations and a second plurality of tables from the plurality of codec-specific decoder modules and store the second plurality of generic functions, the second plurality of operations and the second plurality of tables in the second common functions library, the second common math operations library and the second common tables library.
The second common functions library, the second common math operations library and the second common tables library are associated with at least the third standard and the fourth standard of the second plurality of CELP voice compression standards and configured to substantially remove all duplications between a third program code associated with the third standard of the second plurality of CELP voice compression standards and a fourth program code associated with the fourth standard of the second plurality of CELP voice compression standards.
For example, the second common functions library, the second common math operations library and the second common tables library include only functions, math operations and tables configured to maintain bit exactness for the third standard and the fourth standard of the second plurality of CELP voice compression standards. For another example, the second common functions library, the second common math operations library and the second common tables library include only functions, math operations and tables algorithmically identical to ones of the third standard and the fourth standard of the second plurality of CELP voice compression standards, and functions, math operations and tables algorithmically similar to ones of the third standard and the fourth standard of the second plurality of CELP voice compression standards.
As discussed above and further emphasized here, one of ordinary skill in the art would recognize many variations, alternatives, and modifications. For example, the first plurality of CELP voice compression standards may be different from or the same as the second plurality of CELP voice compression standards. The first standard of the first plurality of CELP voice compression standards may be different from or the same as the first standard of the second plurality of CELP voice compression standards. The first standard of the first plurality of CELP voice compression standards may be different from or the same as the second standard of the first plurality of CELP voice compression standards. The first standard of the first plurality of CELP voice compression standards may be different from or the same as the third standard or the fourth standard of the first plurality of CELP voice compression standards. The first standard of the second plurality of CELP voice compression standards may be different from or the same as the second standard of the second plurality of CELP voice compression standards. The apparatus of claim 1 wherein the first standard of the second plurality of CELP voice compression standards is the same as the third standard or the fourth standard of the second plurality of CELP voice compression standards.
According to another embodiment of the present invention, a method for encoding and decoding a voice signal includes receiving an input voice signal, processing the input voice signal, and generating an output bitstream signal based on at least information associated with the input voice signal. The output bitstream signal is associated with at least a first standard of a first plurality of CELP voice compression standards. Additionally, the method includes receiving an input bitstream signal, processing the input bitstream signal, and generating an output voice signal based on at least information associated with the input bitstream signal. The output voice signal is associated with at least a first standard of a second plurality of CELP voice compression standards. The output bitstream signal is bit exact or equivalent in quality for the first standard of the first plurality of CELP voice compression standards. The output voice signal is bit exact or equivalent in quality for the first standard of the second plurality of CELP voice compression standards. For example, the first plurality of CELP voice compression standards include GSM-EFR, GSM-AMR Narrowband, and GSM-AMR Wideband. As another example, the first plurality of CELP voice compression standards includes EVRC and SMV.
The processing the input voice signal uses at least a first common functions library, at least a first common math operations library, and at least a first common tables library. The first common functions library includes a first function; the first common math operations library includes a first operation, and the first common tables library includes a first table. The first function, the first operation and the first table are associated with at least a second standard and a third standard of the first plurality of CELP voice compression standards. The second standard and the third standard of the first plurality of CELP voice compression standards are different. The first common functions library, the first common math operations library and the first common tables library are made by at least an algorithm factorization module. The algorithm factorization module is configured to store a first plurality of generic functions, a first plurality of operations and a first plurality of tables in the first common functions library, the first common math operations library and the first common tables library.
The generating an output bitstream signal includes generating a first plurality of codec-specific CELP parameters based on at least information associated with the input voice signal, and packing the first plurality of codec-specific CELP parameters to the output bitstream signal. The first plurality of codec-specific CELP parameters include a linear prediction parameter, an adaptive codebook lag, an adaptive codebook gain, a fixed codebook index, and a fixed codebook gain. For example, the linear prediction parameter includes a line spectral frequency. The generating a first plurality of code-specific CELP parameters includes performing a linear prediction analysis, generating linear prediction parameters, and filtering the input speech signal by a short-term prediction filter. Additionally, the generating a first plurality of code-specific CELP parameters includes generating an excitation signal, determining an adaptive codebook pitch lag parameter, and determining an adaptive codebook gain parameter. Moreover, the generating a first plurality of code-specific CELP parameters includes determining an index of a fixed codebook vector associated with a fixed codebook target signal, and determining a gain of the fixed codebook vector.
The processing the input bitstream signal uses at least a second common functions library, at least a second common math operations library, and a second common tables library. The second common functions library includes a second function, the second common math operations library includes a second operation, and the second common tables library includes a second table. The second function, the second operation and the second table are associated with at least a second standard and a third standard of the second plurality of CELP voice compression standards. The second standard and the third standard of the second plurality of CELP voice compression standards are different.
The generating an output voice signal includes unpacking the input bitstream signal and decoding a second plurality of codec-specific CELP parameters to produce an output voice signal. The decoding a second plurality of codec-specific CELP parameters includes reconstructing an excitation signal, synthesizing the excitation signal, and generating an intermediate speech signal. Additionally, the decoding a second plurality of codec-specific CELP parameters includes processing the intermediate speech signal to improve a perceptual quality.
As discussed above and further emphasized here, one of ordinary skill in the art would recognize many variations, alternatives, and modifications. For example, the first plurality of CELP voice compression standards may be different from or the same as the second plurality of CELP voice compression standards. The first standard of the first plurality of CELP voice compression standards is different from or the same as the first standard of the second plurality of CELP voice compression standards. The first standard of the first plurality of CELP voice compression standards may be different from or the same as the second standard or the third standard of the first plurality of CELP voice compression standards. The first standard of the second plurality of CELP voice compression standards may be different from or the same as the second standard or the third standard of the second plurality of CELP voice compression standards.
A comparison of certain features and processing functions of AMR-NB and AMR-WB according to an embodiment of the present invention is shown in Table 1. This table is merely an example, which should not unduly limit the scope of the present invention. One of ordinary skill in the art would recognize many variations, alternatives, and modifications.
As shown in Table 1, both AMR-NB and AMR-WB operate with a 20 ms frame size divided into 4 subframes of 5 ms. A difference between the wideband and narrowband coder is the sampling rate, which is 8 kHz for AMR-NB and 16 kHz downsampled to 12.8 kHz for analysis for AMR-WB. AMR wideband contains additional pre-processing functions for decimation and pre-emphasis. The linear prediction (LP) techniques used in both AMR-NB and AMR-WB are substantially identical, but AMR-WB performs linear prediction (LP) analysis to 16th order over an extended bandwidth of 6.4 kHz and converts the LP coefficients to/from Immittance Spectral Pairs (ISP). Quantization of the ISPs is performed using split-multi-stage vector quantization (SMSVQ), as opposed to split matrix quantization and split vector quantization for quantization of the LSFs in AMR-NB. The pitch search routines and computation of the target signal are similar, although the sample resolution for pitches differs. Both codecs follow an ACELP fixed codebook structure using a depth-first tree search to reduce computations. The adaptive and fixed codebook gains are quantized in both codecs using joint vector quantization (VQ) with 4th order moving average (MA) prediction. AMR-NB also uses scalar gain quantization for some modes. AMR-WB contains additional functions to deal with the higher frequency band up to 7 kHz. The post-processing for both coders includes high-pass filtering, with AMR-NB including specific functions for adaptive tilt-compensation and formant postfiltering, and AMR-WB including specific functions for de-emphasis and up-sampling.
In one preferred embodiment, the Multi-codec architecture is applied to integrate the GSM-EFR, AMR-NB and AMR-WB speech codecs. The foundation structure of the Multi-Codec for GSM-EFR, AMR-NB and AMR-WB is the AMR-NB code.
The pre-processing block for AMR-NB comprises highpass filtering and downscaling. Additional functions to perform upsampling/downsampling and lowpass filtering are added, as well as the AMR-WB lowpass, highpass and tilt filter coefficients.
The LP analysis block comprises autocorrelation calculation, lag windowing and Levinson-Durbin recursion. The encoder function calling routine is adapted to activate the LP analysis routine twice per frame in the case of EFR and 12.2 kbps AMR and once per frame in all other cases. Differing input parameters are the analysis window length in samples, the table accessed for the window coefficients (which is added), and the order of prediction. LP parameter quantization for AMR-WB requires conversion of LP coefficients to ISP coefficients. Instead of adding this additional code, it can be shown that the first 15 ISPs are the same as the line spectral pairs (LSPs) derived from 15th order LP analysis, and the 16th ISP is the 16th linear prediction coefficient (LPC). Hence, the ISPs can be calculated using the AMR-NB LPC-to-LSP and LSP-to-LPC conversion functions with minor alterations. AMR-WB ISP quantization code and tables are added.
The open-loop pitch search block consists primarily of a maximum autocorrelation search on a given signal. The input signal is the weighted speech for AMR-NB, and a filtered, downsampled version of the weighted speech for AMR-WB. The lag weighting functions are identical in form, with slightly different constant values. Code additions for WB include weighting parameters, pitch range values, and interpolating filter coefficients to find ½ and ¼ sample resolution for the closed-loop pitch search block. The quantization of absolute and relative pitch delays used in NB can be shared by WB.
The VAD processing block for AMR-NB comprises 2 options, the first of which is the basis for the AMR-WB VAD approach. While most of the program code is identical, some minor additions need to be made to VAD option 1 such as the AMR-WB VAD filterbank to include frequencies up to 6.4 kHz.
The ACELP codebook search block comprises computing the target signal, pre-calculation of search vectors, and testing particular pulse combinations. The fixed codebook search in AMR-WB and AMR-NB coders is one of the largest functions in terms of program size. This is due to specific fast search methods applied to reduce the number of pulse combinations tested. An exhaustive search can be compactly expressed using nested loops, however, the fast search is individual to each mode and takes up much more space to specify the order tracks are searched, the number of pulse positions optimized at once, and criteria needed to enter each stage. Further, there is a different codebook structure, number of pulses allowed and search combinations for almost every rate. The standard NB search is replaced with a unified ACELP search procedure that adapts to varying codebook structures, track orientation, pulse constraints and search conditions. The procedure has identical variable pre-calculations, specific outer layers which relate to the search order; and identical inner layers which relate to the actual combination testing. This can easily save over 50% of the reference implementation.
High band gain calculation is required for the 23.85 AMR-WB mode and must be added.
The excitation reconstruction and synthesis block comprises forming the excitation signal by adding the gain-scaled adaptive and fixed codebook contributions, including anti-sparseness processing, and adaptive gain control. Additional functions for noise and pitch enhancement are added for AMR-WB.
The post-processing block includes tilt compensation and formant postfiltering. For WB, function calls for highpass filtering, upsampling/downsampling, and addition of the high-band signal are required. The high-band generation block is only applicable to AMR-WB, and thus must be added in its entirety to the base codec.
As another example for illustration of the bit-compliant specific embodiment, a thin CELP codec is applied to integrate the Code Division Multiple Access (CDMA) standards SMV and EVRC, although others can be used. SMV has 4 bit rates including Rate 1, Rate ½, Rate ¼ and Rate ⅛ and EVRC has 3 bit rates including Rate 1, Rate ½ and Rate ⅛.
A comparison of certain features and processing functions of SMV and EVRC according to an embodiment of the present invention is shown in Table 2. This table is merely an example, which should not unduly limit the scope of the present invention. One of ordinary skill in the art would recognize many variations, alternatives, and modifications.
As shown in Table 2, SMV and EVRC share a high degree of similarity. At the basic operations level, SMV math functions are based on EVRC libraries. At the algorithm level, both codecs have a frame size of 20 ms and determine the bit rate for each frame based on the input signal characteristics. In each case, a different coding scheme is used depending on the bit rate. SMV has an additional rate, Rate ¼, which uses NELP encoding. The noise suppression and rate selection routines of EVRC are identical to SMV modules. SMV contains additional preprocessing functions of silence enhancement and adaptive tilt filtering. The 10th order LP analysis is common to both codecs, as is the RCELP processing for the higher rates which modifies the target signal to match an interpolated delay contour. Both codecs use an ACELP fixed codebook structure and iterative depth-first tree search. SMV also uses Gaussian fixed codebooks. At Rate ⅛, both codecs produce a pseudo-random noise excitation to represent the signal. SMV incorporates the full range of post-processing operations including tilt compensation, formant postfilter, long term postfilter, gain normalization, and highpass filtering, whereas EVRC uses a subset of these operations.
In a second preferred embodiment, the Multi-Codec architecture is applied to integrate the SMV and EVRC codecs. The foundation program code for this embodiment is the SMV program code. This is due to the large comparative size of SMV, which encompasses a broad selection of processing tools. A description of how to integrate EVRC functionality is provided herein.
The pre-processing block for SMV comprises silence enhancement, highpass filtering, noise suppression (2 options) and adaptive tilt filtering. All that is needed to be added for EVRC are the highpass filter coefficients, and a function call to cascade three SMV 2nd order filters. The EVRC noise suppression routine is identical to SMV noise suppression Option A.
The LP analysis block comprises autocorrelation calculation, lag windowing and Levinson-Durbin recursion. LP analysis is performed three times per frame in the case of SMV and once per frame for EVRC. The algorithms are identical with the exception of different analysis window lengths, analysis window coefficients, and lag window constants. These values are added, in addition to EVRC line spectral pair (LSP) quantization code and tables and large spectral transition flag calculations.
The open-loop pitch search block comprises finding the maximum autocorrelation of a given signal. The input signal is the weighted speech for SMV, and a filtered, downsampled version of the residual for EVRC. The EVRC pitch search is also follows an autocorrelation approach, but is far simpler than the SMV search, hence only small code additions are required. The closed-loop pitch search block is only applicable to SMV. The pitch lag quantization algorithm for EVRC is a subset of the quantization code already present in the SMV standard.
The rate determination block comprises functions to set the transmission rate and classify the frame type. The EVRC rate determination is identical to one of the SMV VAD options.
The RCELP signal modification block comprises forming an interpolated pitch contour and modifying the speech to match this contour. Interpolating filter coefficients and small functions to form the EVRC delay contour are added. The pulse shifting functions are shared, as SMV uses a dual warp/shift approach, part of which is the same as the pulse shifting of EVRC.
The fixed codebook search block comprises 2 main parts: ACELP codebooks and noise-excited codebooks. The fixed codebook search functions for the higher rates in SMV and EVRC coders are the largest in terms of program size. A similar approach to that described in the first embodiment can be applied here. SMV uses a more efficient different grouping and factorization of variables in calculations, which will lead to reduced EVRC complexity.
The EVRC fixed and adaptive gains are separately encoded. EVRC gain tables are added to the UTC, and corresponding gain quantization code.
The excitation reconstruction and synthesis block comprises warping the adaptive codebook with the decoded lag, forming the excitation signal by adding the gain-scaled adaptive and fixed codebook contributions. For the post-processing block, apart from allowing for different filter coefficients and weighting factors, no code in addition to the standard SMV code is needed.
For the high performance approach, in addition to the common modules, the functions performing RCELP pulse peak picking, delay contour selection and target signal computation are modified from the standard and a common technique is applied to both standards.
As discussed above and further emphasized here,
Numerous benefits are achieved using the present invention over conventional techniques. Certain embodiments of the present invention can be used to reduce the program size of the encoder and decoder modules to be significantly less than the combined program size of the individual voice compression modules. Some embodiments of the present invention can be used to produce improved voice quality output than the standard codec implementation. Certain embodiments of the present invention can be used to produce lower computational complexity than the standard codec implementation. Some embodiments of the present invention provide efficient embedding of a number of standard codecs and facilitate interoperability of handsets with diverse networks.
Although specific embodiments of the present invention have been described, it will be understood by those of skill in the art that there are other embodiments that are equivalent to the described embodiments. Accordingly, it is to be understood that the invention is not to be limited by the specific illustrated embodiments, but only by the scope of the appended claims.
This application is a continuation of U.S. patent application Ser. No. 10/688,857, filed on Oct. 17, 2003, which claims priority to U.S. Provisional Patent Application No. 60/419,776, filed Oct. 17, 2002 and U.S. Provisional Patent Application No. 60/439,366, filed on Jan. 9, 2003, all of which are commonly assigned, and hereby incorporated by reference for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
5787390 | Quinquis et al. | Jul 1998 | A |
6115688 | Brandenburg et al. | Sep 2000 | A |
6115689 | Malvar | Sep 2000 | A |
6167373 | Morii | Dec 2000 | A |
6314393 | Zheng et al. | Nov 2001 | B1 |
6424939 | Herre et al. | Jul 2002 | B1 |
6717955 | Holler | Apr 2004 | B1 |
6799060 | Kim | Sep 2004 | B1 |
6807524 | Bessette et al. | Oct 2004 | B1 |
6912584 | Wang et al. | Jun 2005 | B2 |
7254533 | Jabri et al. | Aug 2007 | B1 |
7539612 | Thumpudi et al. | May 2009 | B2 |
20020028670 | Ohsuge | Mar 2002 | A1 |
20030103524 | Hasegawa | Jun 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
60439366 | Jan 2003 | US | |
60419776 | Oct 2002 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10688857 | Oct 2003 | US |
Child | 11890263 | US |