The present invention relates generally to the field of processing telecommunications signals. More particularly, the invention provides a method and apparatus for voice transcoding from a CELP based voice compression codec to a hybrid based voice compression codec (i.e. a codec that uses both CELP and non-CELP parameters). Merely by way of example, the invention has been applied to transcoding from the GSM-AMR codec to the internet Low Bitrate Codec (iLBC), but it would be recognized that the invention may also include other applications.
Modern communication systems rarely transmit uncompressed signals. Instead, signals are compressed to allow efficient utilization of spectrum resources. Compression of signals is generally performed by removing statistical and perceptual redundancy in the signal. In the process of compression, a block (known as a frame) of uncompressed samples is represented by a set (also known as a frame) of compression parameters. The compression parameters are subsequently quantized. The quantization indices for the compression parameters are organized into a bitstream. In the decompression process, the quantized compression parameters are extracted from the bitstream and used to construct a signal that replicates the original and may or may not be exactly the same. Typically, compression systems aim to produce perceptually similar signals to the original but in some cases exact replicas are also produced.
A number of standardized compression systems, which will from this point on be referred to as codecs, are based on the Code Excited Linear Prediction (CELP) algorithm (for example, the ITU's G.723.1 and the GSM's AMR codecs). CELP based codecs are popular for speech signal compression in mobile networks. CELP based codecs represent a speech signal by a linear prediction filter and an excitation signal. The excitation signal is vector quantized with a codebook that contains an adaptive section (referred to as the adaptive codebook, in which the code words are constructed from past quantized excitation signal samples) and a fixed or innovation section (where the code words are extracted from a static codebook).
Different networks follow different formats in compressing signals (i.e., different terminals on the same network may also use different formats). Recently, the internet Low Bit-rate Codec (iLBC),has been introduced for voice over internet protocol (VoIP) applications. The main feature that makes iLBC suitable for VoIP application is its graceful performance degradation in the presence of packet loss, which is typical in Internet Protocol (IP) networks. Packet loss tolerance is achieved by quantizing the excitation signal of each frame independently of other frames.
In order to ensure that different terminals using different audio (of which speech is a subset) codecs can communicate, converting bitstreams of different formats is generally necessary. A straightforward way of carrying out a bitstream conversion task is by cascading a source bitstream decoder and a destination bitstream encoder in sequence. This is known as the tandem solution. Although the tandem solution is conceptually simple, actual implementation generally requires extensive computations and a tandem solution does not make effective use of the parameters used in the already encoded incoming bitstream. Thus, there is a need in the art for improved methods and systems for transcoding CELP based voice compression codec to a hybrid based voice compression codec in a more efficient manner.
According to an embodiment of the present invention an apparatus for transcoding an audio signal between a CELP-based coder and a hybrid coder is provided. The apparatus includes a source bitstream unwrapper configured to receive a source bitstream, extract one or more CELP compression parameters from the source bitstream, and construct an audio signal vector from the source bitstream while maintaining the one or more extracted CELP compression parameters. The apparatus also includes a frame interpolator coupled to the source bitstream unwrapper. The frame interpolator is configured to interpolate the one or more extracted CELP compression parameters and the constructed audio signal vector between a source frame rate and a destination frame rate and a source subframe rate and a destination subframe rate. The apparatus further includes a compression parameter converter coupled to frame interpolator. The compression parameter converter is configured to calculate output compression parameters from at least one of the interpolated compression parameters or the one or more extracted CELP compression parameters. Moreover, the apparatus includes a destination bitstream wrapper coupled to the compression parameter converter. The destination bitstream wrapper is configured to construct a destination bitstream. Additionally, the apparatus includes a mapping parameter tuner coupled to the frame interpolator. The mapping parameter tuner is configured to select one or more parameters for use by the compression parameter converter.
According to another embodiment of the present invention, a method of converting a CELP based bitstream to an iLBC bitstream is provided. The method includes processing the source CELP bitstream to extract one or more CELP compression parameters from the source CELP bitstream, synthesizing audio signal vectors from the CELP compression parameters, and aligning source and destination frame timing if the CELP based bitstream and the iLBC bitstream are characterized by at least one of a different frame rate or a different subframe rate. The method also includes selecting one or more algorithmic parameters for use in a destination compression parameter calculation based on the one or more CELP compression parameters and the synthesized audio signal vectors and calculating and quantizing one or more destination compression parameters using the one or more CELP compression parameters and the synthesized audio signal vectors. The method further includes wrapping the one or more destination compression parameters to provide the iLBC bitstream.
Embodiments of the present invention provide a transcoding method between CELP-based coders and hybrid coders that use some CELP-like elements. Embodiments of the present invention provide numerous benefits. For example, an embodiment of the present invention provides a low complexity transcoder apparatus, offering reduced resource consumption. Additionally, embodiments provide a high quality transcoder with the transcoded signal being perceived as being of higher quality than a transcoded signal produced using a tandem method. Further, embodiments provide a transcoder apparatus that uses less memory than a tandem transcoder of a CELP-based decoder with a hybrid encoder. Furthermore, other embodiments provide real time, low delay transcoding. Depending upon the embodiment, one or more of these benefits, as well as other benefits, may be achieved.
The objects, features, and advantages of the present invention, which to the best of our knowledge are novel, are set forth with particularity in the appended claims. Embodiments of the present invention, both as to their organization and manner of operation, together with further objects and advantages, may best be understood by reference to the following description, taken in connection with the accompanying drawings.
As discussed previously, a tandem solution to transcoding is conceptually simple. However, the tandem solution is also computationally demanding. As analysis on the speech signal has been performed by the source bitstream encoder in the case of a CELP based codec, it is desirable to make use of the source compression parameters to assist in the computation of the destination compression parameters. By so doing, substantial computational saving can be achieved with marginal or no speech quality degradation, and in some cases the reuse of the information actually allows for an increase in quality over a tandem bitstream. In this document, this approach is referred to as the smart bitstream conversion method.
Embodiments of the present invention provide methods and systems for conversion of a CELP based bitstream to a corresponding hybrid bitstream, an example of which is an iLBC bitstream. Methods and apparatuses for smart bitstream conversion have been reported in the prior art (see, for example, U.S. Pat. No. 6,829,579 issued to Jabri, et al. and entitled “Transcoding method and system between CELP based speech codes.” Computational requirements for obtaining destination compression parameters are substantially reduced by the methods and systems provided herein by exploring the similarity between the source compression format and the destination compression format. However, the source and destination codecs targeted in some of these methods share very similar codebook structures.
This similarity in codebook structure does not exist between a CELP based codec and a hybrid codec such as the iLBC. Unlike most CELP based coders, iLBC frames are encoded on a frame-by-frame basis with no reference to the past or future frames. Furthermore, the iLBC uses a 3-stage adaptive codebook, instead of the adaptive-fixed combination as used in CELP based codecs. Moreover, the iLBC codebook may contain decoded signal segments in the past or the future (as long as they are in the same frame of the current segment being coded), depending on the relative time location between the reference signal and the target signal. These differences between a CELP based codec, such as GSM-AMR, and a hybrid codec, such as iLBC, mean that the parameters of each codec may represent different physical quantities. In turn, these differences mean that there is a need to develop efficient, high quality transcoders that can extract one set of parameters from the other while accounting for the physically different quantities each set represents. Thus, embodiments of the present invention differ from, for example, CELP-to-CELP transcoders or speech-to-CELP codecs.
The LP parameter module takes one or more source LP parameters and converts them to one or more destination LP parameters. Methods for converting the source LP parameters to the destination LP parameters are described in additional detail throughout the present specification. With the destination LP parameters so obtained, the intermediate audio signal is calibrated by an LP difference calculation module, which takes into account the difference between the source and destination codecs linear prediction model due to the quantization of the LP coefficients.
A Start state section, which is used in the compression of other signal segments, is then identified in the residual signal and quantized to obtain a set of Start state parameters. The set of Start state parameters includes a Start state position indicating the first of the two consecutive subframes holding the Start state section, a Startstate_first flag indicating the location of the Start state at the beginning section or ending section of the consecutive subframes, and a Start state scale parameter that normalizes the signal samples in the Start state for quantization and a plurality of Start state quantized (using ADPCM) sample values.
The remaining sub-blocks in a residual signal frame may then be processed to generate a set of multistage codebook parameters. The destination LP parameters, the Start state parameters, and the multistage codebook parameters are finally wrapped into a destination bitstream for output. An external control signal may be used to configure the transcoder.
After the codebook indexes and codebook gains for all stages are computed for a sub-block of residual signal samples, they are used to update the codebook memory for the encoding of subsequent residual signal sub-blocks in the frame. The same operation is performed for all residual signal sub-blocks other than the Start state in a frame. Then the resulting multistage codebook indexes and gains for all sub-blocks are sent to bitstream wrapping.
Four mapping strategies for the mapping of the LP parameters are illustrated in
In the simplest method, shown in 8a), the iLBC LSFs (Line Spectral Frequencies) are obtained by merely converting the appropriate source LP parameter set to an LSF domain.
A more sophisticated approach, shown in 8b) and 8c), obtains the iLBC LP parameter by linear interpolation between neighboring source LP parameters. Since the source LP parameters may have a representation other than the LSFs, a conversion of LP parameter representation may be necessary. Depending on the order of the LP parameter representation conversion and the linear interpolation, one may have two different implementations of the LP mapping by linear interpolation method. These two different implementations may demonstrate different properties in terms of their computational complexities and speech qualities.
A more advanced technique for obtaining the destination LP parameters, shown in 8d), is by explicit spectral distortion minimization. Different measures of spectral distortion can be used for minimization. This technique has a clear theoretical interpretation, and allows a flexible choice of mapping structure via an explicit control of the spectral distortion. Although it is possible to exchange the order of the LP parameter representation conversion and the spectral distortion minimizer, it is computationally more desirable to have the spectral minimization following the LP parameter representation conversion because every candidate destination LP parameter set has to be converted to the source LP parameter domain.
The iLBC codebook parameters are calculated in essentially two steps: firstly, a section of the frame is selected as the Start state and encoded by scalar quantization; then the remaining signal sub-blocks of the frame is encoded with a 3-stage adaptive codebook initialized with the quantized Start state samples. The source adaptive codebook index can be used to limit the search range in the iLBC first stage adaptive codebook search. Moreover, the source compression parameter may contain information that can be used in speeding up the search for the Start state. These are source codec specific and will be demonstrated by examples provided in further exemplary embodiments throughout the present specification.
As part of this invention, novel fast adaptive codebook techniques may be used to reduce the computational requirements for obtaining the second and third stage codebook parameters. This is made possible by the relative lower importance of the second and third stage codebook contributions as compared to the first stage contribution.
One alternative method is to simply reduce the size of the second and third stage codebook through the removal of vectors that may be considered redundant using some measure, or even by randomly removing some vectors from a “well behaved” (as in close to periodic) codebook.
Yet another method is by reorganizing the codebook. A method to allow searching fewer codebook vectors in the second and third stages is to re-organize the codebook to be searched such that only small segments would then be searched. Re-organization in this case must be in terms of a reference signal. The logic behind this is as follows: the codebook search in iLBC is searching for signals (or vectors) that display high second order statistical similarity (that is why the normalized cross correlation is being maximized); hence, if a reference signal is used where the similarity of the reference signal to the codebook vector is determined and the similarity of the reference vector to the target vector is determined, then the level of similarity can be compared and this level can be used in the selection of the codebook vector. An embodiment of the present invention is described in the following pseudo code:
Note that this method can also be applied to general adaptive codebook search and its scope is not limited to bitstream conversion.
It has been reported in the literature that the perceptual weighting filter in the codebook parameter conversion can be fine tuned to improve the performance of the transcoder. Moreover, when the LP parameters are converted using the linear interpolation method, it adds one more degree of freedom that can be tuned. By jointly fine tuning these two parameters, one can further improve speech quality. The optimum sets of these predefined mapping coefficients can further improve the transcoded audio quality without increased computation. The optimum mapping coefficients for male and female speech signals are different, a frame classification can be applied to determine input signals, and optimized mapping coefficients can be applied to get further transcoded audio quality improvement. Based on this, a method for frame classification from input parameters and selecting the mapping parameters is set forth as shown in
where w0=w2=0.9 and w1=1 are example weights that can be used to bias the peak search toward the centre of the frame.
Forward Predicted Sub-Blocks
For forward predicted sub-blocks, both the iLBC index for the sub-block and the AMR index for the subframe containing the sub-block point to signal segment in the past. It is plausible that the AMR index can be used as the iLBC index after necessary conversion. The conversion is needed to account for the different organization of codebook vectors in the iLBC codebook and the AMR codebook. However, the reference signal segment for a sub-block of target signal in iLBC can be substantially shorter than that in AMR. It is therefore necessary to make sure the AMR index points to some section within the iLBC reference signal segment. Moreover, to account for the possible pitch doubling and pitch halving, the double and the half of the AMR index are also checked. If they fall in the range of the iLBC codebook, they are stored as candidate indexes after conversion.
Backward Predicted Sub-Blocks
For backward predicted sub-blocks, each subframe in the iLBC reference signal segment (referred to as a reference subframe) is tested. For each reference subframe any one of the AMR adaptive codebook index, its double or its half is stored as a candidate iLBC index after conversion if it points to the iLBC target signal.
Although the above description has many specifics, these should not be interpreted as limiting the scope of the present invention but as merely providing an example embodiment of the invention. Thus the scope of the invention should be determined by the made claims and their legal equivalents, rather than by the embodiments described.
While the invention has been described in connection with specific embodiments, these embodiments are not intended to limit the scope of the invention to the particular form set forth, but on the contrary, are intended to cover such alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims.
This present application claims priority to U.S. Provisional Patent Application No. 60/793,981, filed on Apr. 21, 2006, commonly owned, and hereby incorporated by reference for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
6260009 | Dejaco | Jul 2001 | B1 |
6829579 | Jabri et al. | Dec 2004 | B2 |
7307981 | Choi et al. | Dec 2007 | B2 |
7315815 | Gersho et al. | Jan 2008 | B1 |
20030014249 | Ramo | Jan 2003 | A1 |
20030142699 | Suzuki et al. | Jul 2003 | A1 |
20050159943 | Zinser et al. | Jul 2005 | A1 |
20050228651 | Wang et al. | Oct 2005 | A1 |
20060074644 | Suzuki et al. | Apr 2006 | A1 |
Number | Date | Country |
---|---|---|
WO 03049081 | Jun 2003 | WO |
Number | Date | Country | |
---|---|---|---|
20070288234 A1 | Dec 2007 | US |
Number | Date | Country | |
---|---|---|---|
60793981 | Apr 2006 | US |