The present invention relates to the coding/decoding of digital audio signals, using the “CELP” (Code Excited Linear Prediction) technique.
Compression-mode encoding of such signals can be required for their transmission or their storage. The signals can be speech signals or, more generally, digitized sound signals. More specifically, this invention relates to the predictive encoding technique in which:
More specifically, the invention relates to the family of CELP (Code Excited Linear Prediction) coders, which select the excitation signal from a set of candidate signals by comparing the output of the synthesis filter, excited by this signal, to the original signal, with the introduction of a perceptual weighting. Such coders have been widely employed for the coding of speech signals in 6 to 24 Kbit/s bit rates, and notably adopted in the ITU-T G.729, GSM-EFR, 3GPP/WB-AMR standards.
The invention is advantageously applicable in hierarchical coding systems described in detail hereinbelow and for which the bitstream is formed by a basic layer followed by supplementary layers for enhancing the quality.
A general diagram of a CELP coder is given in
Details regarding this type of coder/decoder are given in particular in a basic reference:
“Code-Excited Linear Production (CELP): High Quality Speech at Very Low Bit Rates”, B. S. Atal and M. R. Schroeder, ICASSP, 1985, pp. 937-940.
Referring to
The present invention is aimed more at the “fixed” codebook DICf, while what relates to the adaptive codebook DICa is preferably not dealt with below.
The modeling of the excitation signal is generally performed on sample blocks corresponding to signal sub-frames typically of the order of 5 ms. Hereinafter, the case will be considered of a signal sub-frame comprising N samples (for example N=40 samples at 8 kHz sampling frequency). In such a coder, the selection of an optimum code word in a codebook (also called “vector code” or “waveform”) is performed by minimizing the energy of the perceptually-weighted error signal, which is expressed by a relation of the type: E(z)=W(z)(S(z)−{tilde over (S)}(z)), in which the notations E(z), S(z), {tilde over (S)}(z) represent the z transforms, respectively, of the weighted error signal, of the original signal to be coded and of the reconstructed signal.
The filter W(z) is the perceptual weighting filter 11 (conventionally of the type
A(z) designating the LPC analysis filter, and the factors γ1 and γ2 adjust the degree of perceptual weighting).
The weighted error signal E(z) can be expressed by a relation of the type:
The signals excpast(n) and exccurrent(n) respectively represent the passed excitation signal (zero signal on the current block) and current excitation signal (zero-memory signal).
Thus, appropriate respective gains g=gai and g=gfi are applied to the signals c(n)=cai(n) and c(n)=cfi(n) at the output of the adaptive codebook DICa and fixed codebook DICf. Then, these signals are added together to obtain the excitation signal exc(n).
More particularly, in the example of
Also conventionally defined are the compound filter:
and the “filtered target signal” by a relation of the type:
x(z)=H(z)(res(z)−excpast(z)).
Devolving from these relations, for the weighted error signal, is an expression of the type:
E(z)=x(z)−H(z)×exccurrent(z).
The CELP minimization criterion (subsequent modules 13 and 14) is then expressed by a search in a codebook for the waveform {c(n); 0≦n≦N−1} which minimizes the quantity:
or even, which minimizes the ratio with
The elements {h(n)} represent the impulse response of the filter H (defined hereinabove by the above relation (1)).
It is generally considered that the filter H is causal, that is, the elements h(n) such that n<0 are zero. However, hereinafter, the more general case will be assumed in which all or some of the elements h(n) such that n<0 can be non-zero.
Conventionally, the so-called backward filtering technique explained in:
for k ranging from 0 to N−1.
Similarly, it is possible to calculate the self-correlation of the filter H(z) prior to the search in the codebook, and to use it to speed up the calculations of the denominator, with:
in which:
for k and k′ ranging from 0 to N−1.
The optimum gain associated with the selected vector code is quantized. A quantization index and the index associated with the selected vector code are transmitted (via a telecommunication network) or simply stored for a subsequent transmission. The decoding can then take place on the basis of these indices.
In the decoding, referring to
The choice of the excitation codebook is guided by constraints of bit rate, quality (or efficiency for a given bit rate) and complexity. For a restricted bit rate, it will be difficult to obtain a good reproduction quality for any signal to be coded. The complexity is also an important factor. For all the communication applications, the real time constraint imposes limitations on the calculation time. The first CELP codebooks proposed in the literature were formed by randomly-drawn vector codes, which impose calculating the numerator and the denominator of the criterion for each vector of the codebook. The search for the best code word was then prohibitively complex.
Structured codebooks were then proposed to speed up the search for the optimum waveform, certain search calculations being performed once for different input signals (or “common calculations”) using the relations induced between the vectors by the structure of the codebook. One of the most popular categories of structured codebooks is the family of algebraic codebooks, made up of pulses whose position is defined by an algebraic code or even according to an array of points (typically a Gosset array), regular or not. The most conventional representatives of such codebooks are known by the name ACELP (for “Algebraic CELP”). These structured codebooks make it possible to avoid the storage of the code words, a bijective relation that makes it possible to calculate the elements of the vector codes from their index.
Moreover, these codebooks have given rise to rapid searches accelerated by sub-optimal but very effective focused exploration algorithms. Thus, for a multi-pulse codebook, the expressions of the numerator and denominator defined hereinabove are simplified if it is assumed that the vectors of such a codebook are made up of K pulses, of amplitudes sk with k between 0 and K−1 (these amplitudes in practise often being reduced to a simple sign), with:
where ak and ai represent the positions at which the pulses appear.
However, these codebooks, when the bit rate constraint limits their size, present the drawback of a certain lack of richness in content. The pulses become fewer, and, because of this, very sparse. The term “Sparse Codebooks” then applies. All the non-zero samples have the same amplitude and it is difficult to correctly represent the balance in amplitude between the samples of the block with very few pulses. The degradations induced by the use of excessively poor algebraic codebooks are then very audible. They are characterized, for example, by a certain raucousness of the signal.
To overcome these drawbacks, the so-called “sparseness reduction” technique was proposed in U.S. Pat. No. 6,029,125. It proposes enriching a multi-pulse codebook comprising a small number of pulses (and therefore presenting a certain “sparseness”) either by addition with a noise signal, or by filtering using an all-pass filter, which disperses the pulses without modifying the modulus of the spectrum of the signal. Such a filtering acts mainly on the phase. These modifications of the codebook can be introduced after the decoding or can be introduced in the selection process (therefore in the coding).
However, when it is introduced into the coder, the addition of noise hampers the use of rapid algorithms for selecting the optimum waveform. Moreover, the filtering of the fixed codebook presupposes a certain continuity of the process because the filters tend to spread the support of the filtered signal, and, since it is generally not possible to correct the excitation of the preceding block, irregularities at the edge of the coded sample blocks, badly controlled by the process, can appear.
Furthermore, if there is a desire to adapt the type of modification made to the codebook according to the signal, there are no other solutions than to provide different filters that switch from filters to others, which can also generate distortions.
Moreover, as indicated already hereinabove, the technique presented in this document U.S. Pat. No. 6,029,125 seeks to remedy the lack of pulses of a codebook by applying a modification that retains the spectral appearance of the codebook. Now, it is often necessary to enrich the multi-pulse codebooks, by including vector codes that better encode certain parts of the spectrum, in particular the high frequencies, which is incompatible with the solution retained in U.S. Pat. No. 6,029,125.
Other types of codebooks have been proposed to increase performance by maintaining acceptable search complexities. Thus, the cascaded codebooks (or “multi-stage” codebooks), possibly different, give rise to multiple successive CELP searches, each search producing the index of a selected vector code with its associated gain.
The excitation vector is then expressed by:
if it is assumed that a number I of codebooks is cascaded.
The joint search for the code sub-vectors {ci(n)} in the I codebooks can be complex. In practise, a sub-optimal serial search method is used and consists in selecting the optimum waveform in the first codebook and calculating the associated gain, then in quantizing this gain and subtracting the known contribution of this first codebook, which, re-using the expressions given above, is translated by:
The “filtered target signal” is modified to x′(z)=H(z)(res(z)−exc1(z)) and the selection of the subvector of the second codebook is thus made. The process is then repeated for all the successive codebooks.
It should be noted that the use of orthogonal codebooks can also be provided for in this context.
There now follows a brief description of the hierarchical coding structures.
Such structures, also called “scalable”, supply the coding process with binary data that is divided into successive layers. A base layer is formed by the bits that are absolutely necessary to the decoding of the bitstream, and determine a minimum decoding quality. The subsequent layers make it possible to progressively enhance the quality of the decoded signal, each new layer adding new information which, used in the decoding, supplies as output a signal of increasing quality. One of the particular features of the hierarchical coders is the possibility of intervening at any level of the transmission or storage chain to delete a portion of the bitstream without having to provide any particular indication to the coder or to the decoder. The decoder uses the binary information that it receives and produces a signal of corresponding quality.
The composition of the hierarchical coding processing operations includes the concept of coding “layers”. These layers can be constructed by implementing methods deriving from different techniques. As a variant, the different coding layers can be derived from one and the same type of processing, in which it is possible to enhance the quality simply by providing supplementary data. Thus, the hierarchical CELP coders, also called “nested CELP” coders, generally use several codebooks, which can be different at each stage, or identical.
Nevertheless, the cascaded codebooks and the codebooks involved in the hierarchical coding structures still present the same problems as those described previously.
The present invention seeks to improve the situation.
In particular, it aims to remedy the lack of richness, in terms of waveforms and spectral content, of the CELP codebooks at low bit rates, while retaining the very simple decoding and the low complexity associated with these codebooks. It also offers a progressive enrichment of the codebooks, which is of particular interest in the context of the hierarchical coding structures. Another object is to propose an attractive alternative to the so-called “anti-sparseness” techniques and, more generally, to contribute to the enrichment of the sparse codebooks, with a better control of the continuity between successive blocks.
To this end, it proposes a method of constructing a codebook of CELP-type excitation vectors for coding/decoding digital audio signals, each vector of dimension N comprising pulses that can occupy N valid positions.
In the inventive method, an initial codebook (also called “base codebook”) is constructed by:
The expression “sequence of pulses” should be understood here to mean a succession of samples comprising pulses and, where appropriate, one or more zero-samples between the pulses, and/or at the start and/or at the end of the succession.
Preferably, the duly constructed codebook is a CELP excitation codebook of the so-called “fixed” type (referenced DICf for example in
Preferably, the basic pattern appearing on each occurrence in an excitation vector is multiplied by an amplitude associated with said occurrence, this amplitude being, for example, chosen from a set comprising the values +1 and −1.
Preferably again, all the vectors of the initial codebook comprise one and the same number of occurrences of the basic pattern.
Thus, an initial codebook can be defined by:
The invention thus proposes constructing codebooks of CELP excitation vectors, these codebooks being defined by the data of a basic pattern, appearing in one or more occurrences, each occurrence being multiplied by an amplitude. The patterns possibly appearing at the block edge (sample frames or sub-frames) are truncated to be inserted exactly in the block.
In more generic terms, it will be understood that the patterns appearing at the block edge of a vector are truncated and the remaining pulses of the truncated patterns occupy the start or the end of the block.
A codebook obtained by the inventive method, gathering together the vectors of dimension N, is then defined by a basic pattern, that is “shifted” in the block of length N. Each pattern appears in K occurrences that are added together, each occurrence being itself defined by:
It will be noted however that a multi-pulse codebook, well known in the state of the art, constitutes a particular case of a codebook obtained in this way, in as much as the length of a pattern in the case of a multi-pulse codebook is simply 1. This type of multi-pulse codebook will be designated hereinafter “trivial base codebook”.
Moreover, the inventive method makes it possible to construct combinations of codebooks (initial and constructed, as described hereinabove, without excluding also the use of one or more supplementary conventional multi-pulse codebooks).
Thus, a codebook obtained by the inventive method can consist of:
In more generic terms, a global codebook can be constructed by a summation of base codebooks, at least one of which is an initial codebook defined by a basic pattern. The vectors of the global codebook are formed in this case by adding together the common position pulses of the vectors of the base codebooks, preferably weighted one by one by a gain each associated with a codebook.
As a variant, a global codebook can be constructed by a union of base codebooks, at least one of which is an initial codebook defined by a basic pattern. In this case, the global codebook simply comprises all the vectors of all the base codebooks.
The construction of such codebooks already makes it possible to provide a variety of content types. Depending on the form of the basic patterns and their number of occurrences, it will be possible to obtain excitation vectors of varying appearances, possibly having a relatively high number of non-zero pulses. For example, the choice of the basic pattern can be guided by spectral-type considerations. This richness of content does not necessarily require a particularly large codebook size because, by adding together the occurrences of the patterns, it is possible to vary the forms of the excitation vectors with a moderate number of patterns and occurrences. Thus, it is possible to represent excitation vectors that have a spectral content substantially different from that of the conventional multi-pulse codebooks, for sets of equivalent indices.
In such an embodiment, it is possible to provide for the basic pattern to comprise at least one central pulse, preceded and succeeded by at least one pulse of sign opposite to the sign of the central pulse. More specifically, the pattern can comprise in total three pulses, namely:
It then proved advantageous to provide a coding/decoding device comprising a cascading of codebooks, of which at least one initial codebook is subsequent in the cascade, this initial codebook comprising such a symmetrical pattern with central pulse and preceding and subsequent pulses of amplitudes opposite to that of the central pulse. This device can advantageously comprise a high-pass filtering in a global perceptual weighting filter involved in particular in the coding in the search for an optimum excitation vector. One example of such an embodiment will be described in detail below, with reference to
It is stated simply here that this embodiment proposes a cascading of a multi-pulse codebook with a codebook defined by a pattern that is symmetrical in relation to its center, of which the occurrences of the center of the pattern describe the same set as the occurrences of the pulses of the multi-pulse codebook.
This implementation makes it possible to widen the spectral domain of the initial base codebook by the addition of one or more supplementary base codebooks, the search in these supplementary base codebooks then being focused spectrally by modifying the perceptual weighting filter involved in the search for the optimum vector, the choice of this modification and that of the pattern of these supplementary base codebooks possibly being linked.
More generally, in the case of a union or summation of several base codebooks, base codebooks are used for which the centers of the patterns and the associated amplitudes describe the same sets but for different patterns.
Thus, more generically, the positions of the patterns and/or of the pulses in the vectors of the codebooks, particularly when they are cascaded, describe sets that are preferably identical, the position of a pattern being identified substantially by the position of a central pulse in the sequence of pulses forming the pattern.
It is then possible to share the calculations and the rapid processing algorithms because the techniques for searching for a best candidate excitation vector remain rapid in the codebooks constructed in accordance with the invention, since the latter use the particular structure of the conventional multi-pulse codebooks, and make it possible to use effective processing operations put in place for the case of the multi-pulse codebooks.
It is indicated here that the position of a pattern can be identified by the position in the block of the sample of the center of the pattern, if the pattern comprises an odd number of samples. However, in a strictly equivalent manner, any pattern of even length can be complemented by a zero in order to produce an odd length. More generally, any other variant for identifying the position the of patterns can be envisaged.
The invention proposes very simple techniques for decoding the index of the vectors of such codebooks, by adding together the scaled occurrences of the pattern or patterns of which the position and the amplitude factor for each occurrence are transmitted.
In generic terms, in the coding, after determining a best candidate vector in an initial codebook, an index is formed that preferably comprises at least indications:
If a plurality of codebooks are provided, the index also comprises an indication of the codebook in which the best candidate vector has been found. Thus, if the best candidate vector has been found in an initial codebook comprising a basic pattern, the index comprises in particular an indication relating to the abovementioned initial codebook and, from this, an indication as to the basic pattern that made it possible to construct the codebook and therefore the best candidate vector.
In the case of a single base codebook, the index already reflects the amplitude and the position associated with each of its occurrences. To decode the best candidate vector, it is then sufficient to position the basic pattern at the different positions that it must occupy in each occurrence, multiply it by the associated amplitudes, and calculate the sum of the occurrences.
In the case of a union of base codebooks, the index also gives information on the selected base codebook, as indicated previously.
In the case of a summation of base codebooks, amplitudes and positions are available for the occurrences of each basic pattern, and the procedure is then equivalent to the union case, that this time summing the contributions of all the patterns.
The decoding of the indices of the vectors of a codebook according to the invention is very simple.
In the decoding, the best candidate vector is preferably reconstructed from the index:
In the case of a use of a sum of codebooks, the indices of the vectors in each of the codebooks are preferably determined and, from this, the last three steps described hereinabove are applied for each index.
It is possible to speed up the search in the codebooks according to the invention and there appeared to be a particular interest in providing the sets of the positions of the patterns with a strong structure, for example that of the ACELP codebooks, to adapt the very effective rapid search processing operations usually put in place in the ACELP codebooks.
Thus, in more generic terms, the codebook constructed according to the invention preferably comprises accepted pattern positions that describe a strongly structured set, advantageously as a set of positions of pulses of an ACELP codebook.
As indicated hereinabove, in the case of the use of a plurality of codebooks, there is particular interest in providing two or more base codebooks with identical sets of pattern positions, to be able to reuse the same processing operations in the search in the codebooks. Thus, at least one of these codebooks can advantageously be of ACELP type.
The cascading of codebooks including at least one base codebook is very advantageous. This variant is particularly suited to the case of hierarchical coding structures. Nevertheless, the various base codebooks do not serve the same purpose because, typically, the first codebook handles the coding of a minimum quality of the signals that it is desirable to reproduce. The subsequent codebooks are intended more to improve this quality, and will make it possible to consolidate the coding, reduce the sensitivity to the signal type, or address some other factor.
In more generic terms, the cascading of a plurality of codebooks amounts to constructing a single global codebook obtained by summation of the gain-weighted codebooks, as indicated hereinabove.
In this case, each excitation vector corresponds to the sum of vectors deriving from base codebooks multiplied by a gain, the base codebooks being explored one after the other, by subtracting the known contribution of the partial excitation produced by the vectors of the preceding codebooks. Thus, in this advantageous embodiment, the cascaded codebooks are explored one after the other, by subtracting, for a current codebook, a known contribution of a partial excitation produced by the vectors of at least one preceding codebook, which confers a hierarchical coding structure.
In a particularly advantageous way, the search in a codebook according to the invention for a best candidate excitation vector is performed according to an estimation of a CELP criterion, that is little changed from the prior art and then comprises the following steps:
It will be understood that, since the search can reveal basic patterns at the block edge, the estimation of the CELP criterion is slightly modified by the addition of the steps c) and e), compared to the estimation of the CELP criterion according to the prior art.
Moreover, simplifications to the optimum search algorithms of the base codebooks are also proposed when the relative energy of the parts to be truncated is low compared to those of the parts that remain in the block for the edge patterns. In this case, one of the steps c) and e), or both, can be omitted.
Other simplifications are also proposed, aiming to truncate the impulse responses of the synthesis filters multiplied by the perceptual filter, and to truncate the convoluted filter vector calculated in the step a).
The present invention targets not only the method defined hereinabove but also the codebook, itself, of CELP excitation vectors, that can be constructed by a digital audio signal coding/decoding device, by implementing the inventive method.
It also targets a computer program comprising instructions for implementing the method of constructing a codebook as defined hereinabove.
It also targets the digital audio signal coding/decoding device, comprising at least one codebook according to the invention. Typically, an advantageous embodiment consists in providing a device including means (such as a processor, a calculation memory, etc.) for generating the CELP excitation vectors of one or more codebooks, at least one of which is a codebook to be constructed by implementing the inventive method.
Advantageously, these codebooks can be constructed by executing a computer program of the above-mentioned type, then stored in a memory of such a coding/decoding device, for example by the use of an algebraic law associating the vector indices with the vector codes themselves (as, for example, in the ACELP technique).
The present invention also targets a use of such a device for coding/decoding digital audio signals (therefore typically a coding/decoding method), and the computer program intended for a digital audio signal coding/decoding device, and comprising instructions for implementing such a use.
Generally, all or some of the general and optional characteristics expressed hereinabove can be applied equally for the construction of the codebook, and for the codebook itself, or for the coding/coding device comprising at least one duly constructed codebook or for the use of such a device, or even for the computer program generating the codebook or for the computer program enabling the use of the device.
Thus, the invention proposes codebooks of CELP-type excitation vectors and their use, which offer a great potential richness of content for a moderate size. The decoding of the associated indices is not very complex, despite this variety of forms. It is also possible to put in place rapid algorithms for selecting the optimum vector, by exploiting the particular composition of these codebooks.
It will be remembered that the present invention proposes a category of CELP codebooks that allows for the encoding of a wide variety of excitation signals for relatively moderate bit rates, and also offering rapid and effective algorithms for selecting the appropriate vector.
Other characteristics and advantages of the invention will become apparent from studying the detailed description below, and the appended drawings in which, besides
a illustrates a basic pattern for the implementation of the invention,
b and 3c respectively illustrate a first A0 and a second A1 set of positions of the first and the second occurrences of a basic pattern,
d illustrates an example of vector code selected by the implementation of the invention,
a and 8b illustrate a first and a second base codebook in an exemplary embodiment of the present invention to refine a CELP coder according to the G.729 standard,
c compares the appearance of the mean spectra of the waveforms of the codebook of
Referring firstly to
The vector codes of a base codebook are obtained by defining a basic pattern y(j) (−p≦j≦p) as a sequence of samples (
As an example, the broken line box bearing the reference D2 in
Thus, in more generic terms, out of the accepted positions for the basic patterns in each block of an excitation vector, pattern positions are such that patterns overlap at least partially (case of the vector V2n). In this case, the pulses of the patterns that overlap are added together.
It will be noted that the formulation given hereinabove: y(j) (−p≦j≦p), presenting the advantage of clarifying the subsequent developments, seems to impose a priori an odd number 2p+1 of elements in the basic pattern (−p≦j≦p). In fact, as mentioned previously, this particular feature is by no means necessary for implementing the present invention. If a pattern having an even number of elements is to be used, all that is required is to add a zero element to one of the edges, and the formulation applied here can still be used.
Each vector {c(n)} of a base codebook, of dimension N, is constructed by adding K occurrence vectors yk such that:
YK={sk×yk(n)}, for n ranging from 0 to N−1 and k ranging from 0 to K−1.
These vectors are made up of a basic pattern assigned a given amplitude, truncated if necessary at the edge or edges and complemented by zeroes.
Each occurrence k is characterized:
b and 3c illustrate such a codebook for which in particular K=2. The first occurrence is characterized by the center a0 which can be located at the five positions of a set of positions
and by the amplitude s0∈S0={±1} (
and by the amplitude s1∈S1={±1} (
5 (positions A0)×4 (positions A1)×2 (polarities for A0)×2 (polarities for A1)=80 vector codes.
An example of vector code for this codebook (defined by the positions a0=a01 and a1=a13 and by the amplitude s0=+1 and s1=−1) is given in
The following therefore applies:
Which can also be expressed:
using the Kroenecker δ(.) and truncation t(n)=0 if n∉[0, N−1] functions.
Each vector {c(n)} is defined by the set of the positions of the centers of the basic patterns of each of the occurrences of which it is composed
where Π designates the Cartesian product of the sets, and by the set of the amplitudes
associated with the different occurrences.
The components c(n) (0≦n≦N−1) are obtained by summation of the (any) contributions of the K vectors yk to the sample n, according to the relation:
If the vectors {c0(n)} of dimension (N+2p) are defined such that:
The vectors {c(n)} of the base codebook are deduced from the vectors {c0(n)} by convolution with the basic pattern y and truncation at the edges of the segment [0, N−1].
It can be seen that the vectors {c0, (n)} are defined by the data of the centers
of the basic patterns and that of the amplitudes
If the centers are ordered structurally, it will be understood that it is possible to exploit this structure to define rapid algorithms in order to speed up the selection of the vector code in the codebook.
The truncation function t(n) introduces non-linearities into the expression of c(n), which can be dispensed with by extending the vector {c(n)} of dimension N to the vector {c′(n)} of dimension (N+2p):
It is therefore possible to reveal three parts in the vector {c′(n)}:
c′(n)=cc(n)+cg(n)+cd(n); −p≦n≦N−1+p
The central part
corresponds to the convolution of {c0(n)} with the basic pattern and its components in the intervals of the edges, [−p, −1] and [N, N+p−1] are non-zero a priori.
The other two terms cancel any non-zero components of the edges of cc(n) and correspond to the effects induced by the possible truncation of the pattern at the edges:
There now follows a description of the search for a vector code in a base codebook.
It will be remembered that the CELP criterion to be maximized:
involves calculating two quantities: the numerator Num and the denominator Den.
The vector {cw(n)} of dimension (N+2p) is defined by the convolution of the vector {c′(n)} given hereinabove with the impulse response of the filter H(z). However, in the selection of the optimum waveform, only the N central elements of this vector are used.
In this expression, the central factor
is calculated by introducing the vector {h′(i)}, corresponding to the convolution of the impulse response of the filter H with the basic pattern (or
The following is then obtained:
It will be remembered that the central factor is then expressed as follows:
The “left edge” factor
or
is also expressed:
by introducing the set
combining for the K sets Ak, k∈[0, K−1], the positions −2p≦ak<p.
The number of terms in the factor bg(n) depends on the definition domains Ak of the centers ak of the basic pattern in the K occurrences. However, for the patterns to overlap the current block at least partially, it is important to avoid the center being too distant from the first sample of this block, by more than p samples. This condition is expressed ak≧−p, which leads to:
By defining the function
the “left edge” factor is then expressed
It will be noted that the latter expression involves, for each occurrence k, only the values ak of the centers which are in the range [−p, p−1].
The “right edge” factor is expressed at the outset
and, to repeat the principles applied to the left edge hereinabove:
In a symmetrical manner to the preceding case, the center of the pattern is at most p samples away from the right edge, which leads to ak≦N+p−1, therefore:
By defining a function
it is also possible to express:
The number of non-zero elements h′″(n,j) thus depends on the number of non-zero elements h(n) such that n<0. If it is assumed that the filter H(z) is causal, all the elements bd(n) such that n≦N−1 are zero.
Therefore, in the case of a causal filter in which h(n)=0 if n<0, the right edge effects have no impact on this calculation.
Hereinafter, it will be assumed that a pattern cannot be truncated on both sides at once. The contrary case would mean that a pattern can be of a size greater than the length N of the block, the invention nevertheless possibly being applied also to this latter case.
There now follows a description of the application of the CELP criterion with a codebook according to the invention.
The numerator can be calculated as follows:
The “central” term
is similar to the usual expression of the numerator of the criterion for selecting the optimum waveform in a multi-pulse codebook. As in the conventional search,
is defined and this “central” term then becomes
It is possible to obtain a similar expression for all the numerator of the codebook according to the invention by posing:
which amounts to adding a correction to the elements d(ak) for the centers ak that belong to the sets Γg and Γd, that is corresponding to occurrences in which the pattern, placed at the edge, requires a truncation.
then applies, which is similar to the numerator of the search for the best waveform of a conventional multi-pulse type codebook.
The denominator is calculated as follows:
The “central” term is conventionally expressed by:
is an element of the self-correlation matrix of the vector {h′(n)}. For the search for the optimum waveform, only the elements of the self-correlation matrix involving the positions of the centers of the pattern in the different occurrences are used.
The latter expression is again similar to that of the denominator in the case of a conventional multi-pulse codebook.
On the other hand, for all the denominator estimated in the CELP criterion with a codebook according to the invention, a self-correlation function is introduced that is modified in the way presented in the table of
The modified matrix thus makes it possible to express the denominator of the search in the codebook according to the invention in the form:
which is identical to that of the denominator for the search in a conventional multi-pulse codebook.
There now follows a description of the search proper in the codebook according to the invention.
Referring to
The convolution vector of the impulse response of the filter H is calculated (step 51) with the basic pattern:
The elements
of the correlation vector between the target vector x(n) and the vector {h′(i)} (obtained in the step 51) are then calculated (step 52).
These elements (general step 53 of
The elements of the self-correlation matrix of {h′(i)} are then calculated (step 57) for the determination of the denominator:
These elements (general step 63 of
The search for the best waveform is then performed (step 61) using the conventional CELP search criterion, expressed as the maximization of a ratio in which the numerator implements the vector {d′(ak)} and the denominator the elements Φ′(ak,al), to ultimately obtain the best vector code VC (step 62).
It should be indicated here that
The search for the waveform in a base codebook according to the invention ultimately boils down to that, which is known and effective, of the search in a conventional multi-pulse codebook. In particular, if the positions of the centers ak∈Ak of the occurrences k (ranging from 0 to K−1) of the patterns describe the positions of the pulses of ACELP-type structured codebooks, it will be possible to use the effective rapid algorithms developed for such ACELP codebooks.
It has been assumed that the pattern is of a size less than that of the block. However, in the contrary case, all that is needed is to introduce a zone Γg∩Γd in which the two corrections are applied, with no loss of generality of the method.
Simplifications of the above method can also be provided. For example, when the relative energy of the elements that are supplanted in the truncation operation is low compared to the energy of the elements that remain in the block, for the occurrences at the edges, it may be possible to provide simply for disregarding the edge effects (without then conducting the tests 54 and 58). In this case, at least one (preferably the step 63) or both of the correction steps 53 and 63 can be simply eliminated.
There now follows a description of a few possible compositions of the base codebooks.
Two combination methods can be provided to offer a global codebook capable of providing varied representations of the waveforms, in particular to offer a very satisfactory spectral richness. In practise, it is possible to direct the content of each base codebook to one or several signal categories.
Union of Base Codebooks
The union of base codebooks makes it possible to provide a single codebook, each part of which corresponds to a base codebook. For a signal portion that will be better represented by one of the base codebooks, the best waveform can then be found in this base codebook to represent this signal portion.
For a given excitation vector to be coded, each of the base codebooks is preferably explored separately, the best waveforms deriving from the search in each base codebook then being compared to each other to select the most appropriate of them. The complexity of the search is in this case equivalent to the sum of the complexities of the searches in each base codebook. The rapid searches, induced by the advantageous structure of the base codebooks as seen previously, have proven very effective.
Exploration variants can also be proposed. For example, it is possible to determine firstly one (or several) base codebooks out of the codebooks that make up the global codebook, then to limit the search to the duly preselected base codebooks.
The decoding of the indices can be conducted by first identifying the base codebook that has been selected (for example by comparing the index of the selected vector code to values stored in memory corresponding to the boundaries of the base codebooks in the complete codebook). Then, the index of the vector code is decoded in the base codebook in the way indicated previously.
Sum of Base Codebooks
This implementation is advantageous. The object is to construct and use codebooks adding the vectors of the base codebooks to exploit characteristics specific to its component base codebooks, but also to exploit their combined characteristics.
Thus, in the case of a sum of codebooks, the vectors of the codebooks are simply formed by adding, one by one and sample by sample, all the vectors of the base codebooks, possibly weighted by gains as in the second embodiment described below.
In practise, two embodiments are proposed hereinbelow for taking the sum of several codebooks.
In a first embodiment, the global codebook D=D1+D2 is obtained by adding together the waveforms deriving from each base codebook.
Then, here, a single gain associated with a given sum is defined. Thus, there is still the benefit of the advantage relating to the simplicity of the decoding using codebooks, at least one of which is a base codebook. In practise, a vector code belonging to a base codebook D2 can be represented by indicating the positions at the centers of the patterns and the amplitudes of the occurrences in the different codebooks, that is, for the different patterns, and by then adding together the scaled then duly placed patterns.
The components of the vector codes of such a codebook, obtained by the summation of I base codebooks, is expressed by a relation of the type:
and the current excitation vector is expressed:
It can also be advantageous to adapt the rapid algorithms proposed in the context of a single base codebook to the sum of codebooks described hereinabove. As an illustrative example, consider the sum of two base codebooks, which is expressed:
where the indices 1 and 2 relate respectively to the vectors deriving from the first pattern γ1 and the second pattern γ2, encountered in K1 and respectively K2 occurrences.
As in the case of a single base codebook described previously, it is possible to define vectors {h1′(i)}, {h1″(i, ak); ak∈Γg1}, {h1′″(i, ak); ak∈Γd1} corresponding to the first pattern and vectors {h2′(i)}, {h2″(i, ak); ak∈Γg2}, {h2′″(i, ak); ak∈Γd2} corresponding to the second pattern. The conventional expressions of the numerators and denominators of the searches in multi-pulse codebooks again apply, provided that the expressions of the correlation vectors are adapted as follows.
For the intercorrelation with the target vector, it is possible to calculate modified vectors {d′1(ak)} and {d′2(ak)} as proposed above and the numerator is then expressed:
The case of the denominator is, however, more complicated because, in addition to the self-correlations
φ1′(ak,al); ak∈Ak1, al∈Al1 and φ2′(ak,al); ak∈Ak2, al∈Al2
defined above, the correlations between the occurrences of the first pattern and those of the second pattern have to be involved. Thus, for example, for center values ak1∈Ak1 such that ak1∉Γg1∪Γd1 and a12∈A12 such that a12∉Γg2∪Γd2, with k<1, the following must be calculated:
These expressions become fairly complicated in the general case, even though they remain within the scope of those skilled in the art.
The denominator can still be expressed according to a relation of the type:
in such a way that it is still possible to calculate the elements of a modified self-correlation matrix and, here again, the accelerated search algorithms of the multi-pulse codes can be used.
A second embodiment of a sum of base codebooks gives rise to simpler search algorithms. The principle consists of cascading the summation of the base codebooks, a different gain being associated with each subvector deriving from the base codebooks. In this case, the excitation vector is expressed by:
This variant is very advantageous in terms of complexity.
It presents even more advantages. Since each base codebook is more particularly intended to enrich the global codebook and, for example according to a particular type of excitation signals, it can be advantageous to use different perceptual filters Wi(z) (for i ranging from 0 to I−1) for the different searches in the base codebooks. For example, it is possible to use a first base codebook more suitable for representing the low frequency part of the excitation signal, and a second base codebook intended more to represent the high frequency part. It will then be particularly advantageous is such a scheme to favor the high frequency part of the spectrum in the search in the second base codebook. For example, in the second search, the conventional perceptual filter can be cascaded with a high-pass filter. Such an operation could moreover be qualified as “spectral focusing”. It will be described in detail later, with reference to
Finally, this second embodiment is advantageously suited to hierarchical CELP coding structures. In practise, in these structures, the bitstream is hierarchically organized and, in the implementation of this second embodiment, the bits corresponding to the indices and to the gains of each of the sub-vector codes of the base codebooks can form separate hierarchical layers (or “participate” in separate layers). If the decoder receives only a part of this information, it can reconstruct at least a part of the excitation by decoding the received indices and gains associated with the sub-vector codes of the base codebooks of the first layers and by adding together the partial excitations obtained in this way.
As indicated above the first base codebook then handles the minimum quality coding and the subsequent ones provide a progressive increase in quality and the better inclusion of the possible variety of the signals, for example by offering a broad spectral content.
There now follows a description of an embodiment of the invention applied to an existing coder/decoder.
The exemplary embodiment described hereinbelow is located in the context of a hierarchical CELP coder producing a bitstream comprising two layers, a first layer of which corresponds to the “core” coding of the hierarchical structure, which operates at the bit rate of 8 Kbit/s and a second layer provides a quality enhancement for four additional Kbit/s, which produces a total bit rate of 12 Kbit/s. The bitstream of the first layer is “compatible” with that of the ITU-T G.729 standardized coder so that a coder or respectively a decoder according to the invention can operate with a decoder or respectively a coder conforming to the G.729 standard and its appendices for the bit rate of 8 Kbit/s.
In the proposed exemplary embodiment, the hierarchy is provided by the use of a codebook according to the variant of cascaded summation of the base codebooks according to the invention. The block size is 5 ms, or 40 samples at 8 kHz.
The first base codebook D1 (
It is therefore a base codebook associated with a pattern restricted to the central pulse (p=0), with K=4 occurrences, the sets S0, S1, S2, S3 being given in the second column of table 2, and the sets A0, A1, A2, A3 in the last column.
The second base codebook D2 (
The number of occurrences, the amplitudes and the positions of the centers of the pattern are identical to those of the first codebook.
c shows the appearance of the mean spectra of the waveforms of the first codebook (arrow D1) and of the second codebook (arrow D2). It can be seen that the first codebook presents a spectrally flat content whereas the second codebook is richer in high frequencies.
This observation makes it possible to enhance the quality obtained by the first coding layer, which provides a good quality playback for the speech signals in the low-frequency part of the zone [300-3400 Hz], and tends to decrease in energy and in fidelity on approaching the high frequencies.
To better focus the search in the second base codebook on the high frequencies of the spectrum, when exploring this second codebook, a supplementary high-pass filter Hp(z) is applied to the filter W(z).
The search in the first base codebook D1 is known and uses, for example, one or other of the rapid and focused algorithms described in the G.729 standard and its reduced complexity appendix A (ITU-T Recommendation G.729, “Annex A: Reduced complexity 8 Kbit/s CS-ACELP speech codec”, November 1996).
The search in the second base codebook D2 also exploits this rapid algorithm, as described above.
In the interests of legibility hereinbelow, all the indices “2” relating to the second codebook will be omitted in the following (for example H2(z) becomes H(z), c2w(n) becomes cw(n), and so on).
According to a first simplification, the impulse response of the filter
is truncated to the elements h(n) such that 0≦n≦39 (recalling that the length of the blocks N=40).
The vector {cw(n)} is therefore defined for −1≦n≦40. As mentioned above, the right edge is not involved (bd(n)=0) because of the fact that h(n)=0 for n<0 (causal filter).
It will also be seen that the positions of the centers ak are all in the block [0,39].
In these conditions, the set
comprises only a single element, namely the position a0=0, in the set A0 only and corresponding to the first position of the tri-pulse pattern on the first occurrence: Γg={0}.
As mentioned previously, the convolution vector of the impulse response of the filter H with the basic pattern is calculated first (first step referenced 51 in
h′(n)=−αh(n+1)+h(n)−αh(n−1)
Since h(n) is zero for n≦0 or n≧40, h′(n) is however non-zero a priori for −1≦n≦40.
To calculate the numerator
of the CELP criterion, the intercorrelation
it is first calculated (step 52), modified (general step 53) to:
The correction to be made is therefore limited to correcting the first element:
The sets Ak cover all the positions of the block [0,39]. It is therefore necessary to calculate d′(j) for any of 0≦j≦39, with the relation:
For the denominator, the self-correlations must be calculated (step 57):
(It will be recalled that the notation k=x→y actually means: “for k ranging from x to y”).
The constraint h′(n)=0 for n<−1 leads to
for any pair of elements (i,j) with i<j, bearing in mind that Φ(i,j)=Φ(j,i).
The correction (step 60) to be made to the elements Φ′(ak,al) to take account of the left edge is as follows:
It is therefore ultimately not necessary to calculate h′(40), only the elements h′(n), with −1≦n≦39, involved in the calculation. It will be recalled that the other elements Φ(ak,ak), with ak≠0, and Φ(ak,al) with ak≠0, a1≠0, do not have to be corrected and Φ′(ak,al)=Φ(ak,al) is set in this case (step 59 of
Additional simplifications can also be provided, in particular for a small coefficient α. In practise, for the calculation of the denominator, if the elements are expressed h′(n)=−αh(n−1)+h(n)−αh(n+1), it is possible to show the self-correlation function:
i,j=0−>N−1 of the filter H(z).
A decision can then be taken to disregard all the terms involving elements of this matrix when they are multiplied by α2.
Furthermore, there is no need to take account of the edge effects in the calculation of the denominator, given that they are little involved in the sum
bearing in mind that p=1 and α is substantially less than 1.
Consequently, the edge effects can be disregarded both on the numerator and on the denominator.
Finally, it is possible to introduce to an additional simplification that makes it possible to calculate the elements of the self-correlation matrix of the second base codebook in exactly the same way as that of the first. This simplification involves truncating {h′(n)} in the range [0,39]. The error produced in this way depends on the value of α but also on the gradient of the spectrum. Typically, for a signal with a strong energy concentration in the low frequencies, the value of h(0) is of the same order as that of the adjacent elements and it will be understood that h′(−1)=−α×h(0) that has little influence on the calculation.
Of course, the present invention is not limited to the embodiment described hereinabove by way of example; it extends to other variants.
Generally, the codebooks defined by the implementation of the invention offer a wide flexibility of use. Since each block is totally independent of those that precede it or follow it, it is possible to use for one block a codebook that is totally different from that used for the adjacent blocks with no particular precautions. Any problems of continuity are thus avoided. It is then very easy to adapt the codebooks used to the signal to be coded, for example by modifying the pattern or patterns used for the base codebooks. Provision can also be made to modify the sets that define the positions of the centers of the patterns in the occurrences and/or the sets of amplitudes. These possible modifications are, for example, particularly suited to the case of source-governed variable bit rate coders.
Number | Date | Country | Kind |
---|---|---|---|
06 01563 | Feb 2006 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR2007/050780 | 2/13/2007 | WO | 00 | 11/21/2008 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2007/096550 | 8/30/2007 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5327519 | Haggvist et al. | Jul 1994 | A |
6449313 | Erzin et al. | Sep 2002 | B1 |
6928406 | Ehara et al. | Aug 2005 | B1 |
20020138256 | Thyssen | Sep 2002 | A1 |
Number | Date | Country |
---|---|---|
0 967 594 | Dec 1999 | EP |
Number | Date | Country | |
---|---|---|---|
20090222273 A1 | Sep 2009 | US |