NOT APPLICABLE
NOT APPLICABLE
NOT APPLICABLE
The present invention relates generally to processing telecommunication signals. More particularly, the invention relates to a method and apparatus for voice trans-rating from a first voice compression bitstream of one data rate encoding method to a second voice compression bitstream of a different data rate. Merely by way of example, the invention has been applied to voice trans-rating in multi-rate or multi-mode Code Excited Linear Prediction (CELP) based voice compression codecs, but it would be recognized that the invention may also include other applications.
Trans-rating is a digital signal processing technique used to bridge the gap between two terminals operating at different rates. This typically occurs when two or more terminals include a multi-rate voice codec such as a GSM-AMR codec that can operate under 8 different rates of active speech modes and SID and DTX frames for non-active speeches. When a GSM-AMR terminal operates at the highest rate of 12.2 kbps tries to communicate with another GSM-AMR terminal operating at a different rate, 4.95 kbps or other, trans-rating is needed.
One conventional trans-rating approach performs rate conversion through decoding the input bitstream into speech signals and then re-encoding the speech signals according to another rate voice compression method. This decoding and re-encoding procedure involve a significant amount of calculation which includes bit-unpacking to obtain voice compress parameters, reconstructing excitation signals, synthesizing a pulse-coded-modulated (PCM) format voice signals, post-filtering the voice signals, and analyzing the PCM speech signals again to obtain voice compression parameters and re-encoding the voice compression parameters such as LSP, adaptive codebook parameters, adaptive codebook gain, fixed-codebook index parameters and fixed-codebook gain according to the second rate voice coding method.
The conventional trans-rating process has a further disadvantage in that delay increases by at least one additional frame algorithm delay due to look-ahead in the re-encoding process.
Smart trans-rating is not the conventional way of decoding and re-encoding, but rather smart trans-rating operates in a completely different domain. Smart trans-rating performs the bitstream conversion restricted to the compression parameter domain. In many cases, some defined mathematical mapping for different rates is applied to the CELP parameter indices from the original bitstream to the destination bitstream. These parameters are applicable to the LPC, adaptive codebook parameters, adaptive codebook gain, fixed-codebook indices parameters and fixed-codebook gain parameters.
What is needed is a technique that overcomes the limitations of conventional trans-rating and effectively applies smart trans-rating principles.
Accordingly, the present invention is directed to a multi-rate voice coder bitstream trans-rating apparatus and method for converting a first rate voice packet data to a second rate voice packet data, which employs an input bitstream unpacker, one or more trans-rating pairs, pass-through modules, configuration modules, and an output bitstream packer. Each trans-rating pair includes at least one voice compression parameters mapping module among modules for direct space domain mapping, analysis in excitation domain mapping, and analysis in filtered excitation domain mapping. Finally the apparatus includes modules for mixing part of the pass-through and part of the mapping. The method of trans-rating includes either bit-unpacking or unquantization on an encoded packet at the input site to obtain rate information and voice compression parameters according to the first rate voice compression method. The information on the first rate and the required output rate, namely a second rate type, in addition to external control commands, is then used to determine the converting strategy of the trans-rating pair. Next, part or all of the compression parameters of the first rate are passed through, or mapped into compression parameters of the second rate in a manner compatible with the second rate voice compression method.
The transformation approaches can be varied and further optimized based on the characteristics of the pair of first rate compression method and the second rate compression method. Lastly, the second rate voice compression parameters are packed into a bitstream that is compatible with the second rate of multi-rate voice coder standard.
An apparatus according to the invention includes for example:
The present invention has the following objectives:
According to one aspect of the present invention, the trans-rating module apparatus further includes a decision module that is adapted to select a CELP parameter mapping strategy based upon a plurality of strategies, and at least one conversion module comprising:
The mapping module selected in a specific trans-rating pair can be pre-defined or be selected by the decision dynamically.
In another aspect of the present invention, a method for trans-rating a first rate bitstream to a second rate bitstream of multi-rate voice coders comprises the following steps:
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
The present invention, both as to its organization and manner of operation, together with further objects and advantages, may best be understood by reference to the following description, taken in connection with the accompanying drawings.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The cases of multi-rate voice coder GSM-AMR different rates trans-rating are used as examples for illustration purposes. The methods described herein apply generally to trans-rating between any pair of multi-rate voice codecs. A person skilled in the relevant art will recognize that other steps, configurations and arrangements can be used without departing from the spirit and scope of the present invention.
The invention includes methods used to perform smart trans-rating between two codecs of different code rates in a multi-rate voice coder. The invention also includes a special case of trans-rating pass-through where the required output bitstream has the same rate codec as that of the input bitstream. The following sections discuss the details of the present invention
The trans-rating control module receives the packet type and data rate of the input bitstream, and the external control commands of the output of the second codec rate, as shown in
The trans-rating control command module 24 (
The decision can change in a dynamic fashion based on available computational resource or minimum quality requirements. The input rate codec compressed parameters can be mapped in a number of ways giving successively better quality output at the cost of computation complexity. At the highest quality, the computation complexity of the transcoding algorithm is still lower than that of the brute-force tandem approach. Since the four methods trade-off quality for reduced computational load, they can be used to provide graceful degradation in quality in the case of the apparatus being overloaded by a large number of simultaneous channels. Thus the performance of the trans-rating can adapt the available resources.
Referring specifically to
Besides pass-through or partial pass-through methods, direct-space-mapping is the simplest trans-rating scheme. The mapping is based on similarities of physical meaning between input rate codec and output rate codec parameters, and the trans-rating is performed directly using analytical formulae without any iteration or extensive searches. The advantage of this scheme is that it does not require a large amount of memory and consumes almost zero MIPS but it can still generate intelligible, albeit degraded quality, sound. This method is generic and applies to all kinds of multi-rate voice coder trans-rating in term of different subframe size or different compressed parameter representation.
This method is more advanced than the direct-space-mapping method 102 in that the adaptive and fixed codebooks are searched, and the gains are estimated in the usual way defined by the output rate codec, except that they are done in the excitation domain, not the speech domain. The adaptive codebook is determined first by a local search using the unquantized adaptive codebook parameters from the input codec bitstream as the initial estimate. The search is within a small interval of the initial estimate, at the accuracy (integer or fractional pitch) required by the destination codec. The adaptive codebook gain is then determined for the best codeword vector. Once found, the adaptive codeword vector contribution is subtracted from the excitation and the fixed codebook determined by optimal matching to the residual. The advantage over the conventional tandem approach is that the open-loop adaptive codebook estimate does not need to be calculated from an auto-correction method used by the CELP standards, but it can instead be determined from the unquantized parameters of input bitstream. Moreover, the search is performed in the excitation domain, not the speech domain, so that impulse response filtering during adaptive codebook and fixed-codebook searches is not required. This saves a significant amount of computation without any compromising output voice quality.
Considering the difference of LSP parameters between input rate codec and output rate codec, the reconstructed excitation can be calibrated in order to compensate the effect of LSP parameters.
In some specific trans-rating pairs, the input and output codecs have the same compression algorithm and the same quantization tables in some compression parameters. The above mapping methods can be simplified to portions of pass-through and portions of mapping procedures.
It is noted that any combinations of the above methods may also be used. The best method to achieve both high quality and low complexity will depend on a balance between the input rate and output rate codecs.
The output rate bitstream packing module connects the trans-rating pair modules or pass-through modules through the configuration control command module 24 (
Examples of suitable systems according to the inventions are now described. A multi-rate voice coder (adaptive multi-rate or AMR, also called GSM-AMR) is taken as an example to show the principle of present invention. The AMR codec uses eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbps.
The codec is based on the CODE-EXCITED LINEAR PREDICTIVE (CELP) coding model. A 10th order linear prediction (LP), or short-term, synthesis filter is used. A long-term, or pitch, synthesis filter is implemented using the so-called adaptive codebook approach.
In the CELP speech synthesis model, the excitation signal at the input of the short-term Linear Prediction (LP) synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. The speech is synthesized by feeding the two properly chosen vectors from these codebooks through the short-term synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysis-by-synthesis search procedure in which the error between the original speech and synthesized speech is minimized according to a perceptually weighted distortion measure. The perceptual weighting filter used in the analysis-by-synthesis search technique uses the unquantized LP parameters.
The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8,000 sample per second. At each 160 speech samples, the speech signal is analyzed to extract the parameters of the CELP model (LP filter coefficients, adaptive and fixed codebooks' indices and gains). These parameters are encoded and transmitted. At the decoder, these parameters are decoded, and speech is synthesized by filtering the reconstructed excitation signal through the LP synthesis filter.
The GSM-AMR speech frame is divided into 4 subframes of 5 ms each (40 samples). The adaptive and fixed codebook parameters are transmitted every subframe. The quantized and unquantized LP parameters or their interpolated versions are used depending on the subframe. An open-loop pitch lag is estimated in every other subframe (except for the 5.15 and 4.75 kbit/s modes for which it is done once per frame) based on the perceptually weighted speech signal.
In trans-rating between 5.15 and 4.75, these three parameters of Linear Prediction Coefficient (LPC), adaptive codebook parameters and fixed-codebook parameters can be directly mapped from the original bitstream to the destination bitstream without any computation complexity.
In the case of the adaptive codebook gains and fixed-codebook gains, the compression method and tables are different, so the representations of these parameters are different between 5.15 and 4.75 kbps. As shown in
A direct space mapping method can be employed to map both adaptive codebook gains and fixed-codebook gains. The input rate joint adaptive codebook and fixed-codebook are initially unquantized. The method obtains the unquantized adaptive codebook gains and fixed-codebook gains every subframe. Then these gains are mapped to each two subframes separately. Finally the adaptive codebook gains and fixed-codebook gains are requantized every two subframes in accordance with the output for the 4.75 kbps codec. The mapping results of joint gain indices of 4.75 kbps are grouped with pass-through results of LSP, adaptive codebook parameters and fixed-codebook parameters together to form the output for the 4.75 kbps bitstream.
It is possible to select analysis in excitation space mapping or analysis in filtered excitation space mapping to search the quantized joint gains of adaptive codebooks and fixed-codebook gains. As both 4.75 kbps and 5.15 kbps have same LPC indices representations, it is not necessary to calibrate the reconstructed excitation vector from the input codec as target signals.
The joint gain indices of 4.75 kbps can be obtained from unquantization adaptive codebook gains and fixed-codebook gains of 5.15 kbps through one of the mapping methods among direct-space mapping, analysis in excitation space mapping or analysis in filtered excitation space mapping.
It is important to note that for AMR 12.2 kbps, LP analysis is performed twice per frame and only once for the other modes down to 4.75 kbps. For the 12.2 kbps mode, the two sets of LP parameters are converted to line spectrum pairs (LSP) and jointly quantized using split matrix quantization (SMQ), 38 bits. For the other modes, the single set of LP parameters is converted to line spectrum pairs (LSP) and vector quantized using split vector quantization (SVQ), 23 bits for 4.75 kbps.
First, the indices of LSF parameters are extracted from the incoming 12.2 kbps bitstream, and then the unquantized LSP parameters are obtained through lookup tables and the previous LSP residual vectors. The unquantized LSP parameters are interpolated and mapped to each subframe. These LSP parameters are re-quantized according to 4.75 kbps codec specified in AMR standard and converted to the LSP representation of 4.75 kbps.
Second, the excitation vector of the input codec 12.2 kbps is reconstructed through unquantized adaptive codebook parameters v[n], adaptive codebook gains ĝp, fixed-codebook parameters c[n] and fixed-codebook gains ĝp. The reconstructed excitation vector is represented as ĝpv[n]+ĝpc[n].
Before the reconstructed excitation vector becomes target signals in trans-rating process, a process of excitation vector calibration may be applied as shown in
The calibrated excitation vector is then used as the target signals for analysis in excitation space mapping for the output rate 4.75 kbps. The unquantized adaptive codebook parameters of 12.2 kbps as an initial estimate in the closed-loop adaptive codebook search of 4.75 kbps. This search obtains the quantized adaptive codebook parameters and adaptive codebook gains. As the 4.75 kbps codec uses joint gain indices to represent the adaptive codebook and fixed-codebook gains, the quantization of adaptive codebook gain of 4.75 kbps is performed after fixed-codebook searching.
The adaptive codeword vector contribution is removed from the calibrated excitation. The result is filtered using a filter to produce the target signal for the fixed codebook search. The fixed codebook vector of 4.75 kbps consists of two pulses forming the codeword vector is then searched by a fast technique. Thus, the fixed-codebook index of 4.75 kbps is obtained.
Unlike, 12.2 kbps codec, 4.75 kbps combines a joint search for both the adaptive codebook gain (ĝp) and fixed codebook gain (ĝp). Using the computed adaptive codeword vector v[n], along with the fixed codebook vector c[n], a dual search on the pitch gain and the fixed codebook gain is performed to minimize the relation ∥x−gpv−gc∥, where x is the target excitation. The common table index for the adaptive and fixed codebook is coded in the first and third subframe of the 4.75 kbps.
As mentioned previously, the other two methods, direct space mapping or analysis in excitation space mapping may be applied to the trans-rating from 12.2 kbps to 4.75 kbps. These different methods trade-off quality for reduced computational load, they can be used to provide a graceful degradation in quality in the case of the apparatus being overloaded by a large number of simultaneous channels.
First, the indices of LSF parameters are extracted from the incoming 4.75 kbit/s bitstream, and then the unquantized LSP parameters are obtained through lookup tables and the previous LSP residual vectors. The unquantized LSP parameters are interpolated and mapped to each subframe. These LSP parameters are re-quantized every two subframes according to the 12.2 kbps codec as specified in AMR standard and converted to the LSP representation of 12.2 kbps.
Second, the excitation vector of input codec 4.75 kbps is reconstructed through unquantized adaptive codebook parameters v[n], adaptive codebook gains ĝp, fixed-codebook parameters c[n] and fixed-codebook gains ĝp. The reconstructed excitation vector is represented as ĝpv[n]+ĝpc[n].
Before the reconstructed excitation vector becomes target signals in trans-rating process, a process of excitation vector calibration may be applied as shown in
The calibrated excitation vector is then used as the target signals for analysis in excitation space mapping for the output rate of 12.2 kbps. The unquantized adaptive codebook parameters of 4.75 kbps as an initial estimate in the closed-loop adaptive codebook search of 12.2 kbps. The adaptive codebook is searched within a small interval of the initial estimate, at the accuracy of ⅙ required by the 12.2 kbps codec. The adaptive codebook gain is then determined for the best code-vector and the adaptive code-vector contribution is removed from the calibrated excitation. The result is filtered using a filter to produce the target signal for the fixed-codebook search.
The fixed-codebook is then searched in the filtered excitation space by a fast technique to obtain indices to form a 10 pulse codeword vector according to the 12.2 kbps codec. Also the filtered excitation space is used to compute the fixed-codebook gain of the 12.2 kbps codec.
The trans-rating from 4.75 kbps to 12.2 kbps can also employ the other noted mapping methods. This allows the trans-rating to adapt to the available computation resources in real-time applications.
Other CELP Transcoders
The invention of adaptive codebook computation described in this document is generic to all multi-rate voice coders and applies to any voice trans-rating in known multi-rate voice codecs such as G.723.1, G.728, AMR, EVRC, QCELP, MPEG-4 CELP, SMV, AMR-WB, VMR and all other future CELP-based voice codecs that make use of multi-rate coding.
The invention has been explained with reference to specific embodiment to enable any person skilled in the art to make or use the invention. Various modifications will be apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded with the widest scope consistent with the principles and novel features disclosed herein as indicated by the claims.