The present invention relates to coding and decoding audio signals.
A parametric coding scheme in particular a sinusoidal coder is described in PCT patent application No. WO 00/79519-A1 (Attorney Ref. N 017502) and European Patent Application No. 01201404.9, filed Apr. 18, 2001 (Attorney Ref. PHNL010252). In this coder, an audio segment or frame is modelled by a sinusoidal coder using a number of sinusoids represented by amplitude, frequency and phase parameters. Once the sinusoids for a segment are estimated, a tracking algorithm is initiated. This algorithm tries to link sinusoids with each other on a segment-to-segment basis. Sinusoidal parameters from appropriate sinusoids from consecutive segments are thus linked to obtain so-called tracks. The linking criterion is based on the frequencies of two subsequent segments, but also amplitude and/or phase information can be used. This information is combined in a cost function that determines the sinusoids to be linked. The tracking algorithm thus results in sinusoidal tracks that start at a specific time instance, evolve for a certain amount of time over a plurality of time segments and then stop.
The construction of these tracks allows for efficient coding. For example, for a sinusoidal track, only the initial phase has to be transmitted. The phases of the other sinusoids in the track are retrieved from this initial phase and the frequencies of the other sinusoids. The amplitude and frequency of a sinusoid can also be encoded differentially with respect to the previous sinusoids. Furthermore, tracks that are very short can be removed. As such, due to the tracking, the bit rate of a sinusoidal coder can be lowered considerably.
Tracking is therefore important for coding efficiency. However, it is important that correct tracks are made. If sinusoids are incorrectly linked, this can increase the bit rate unnecessarily or degrade the reconstruction quality.
It is known, however, that sinusoid frequencies within segments of lengths in the order of 10–20 ms can be non-stationary, making the sinusoidal model less adequate. Take, for example, a harmonic signal which is continually increasing in pitch. If a single sinusoid is used to estimate say the average frequency of the fundamental frequency within a segment, then when this sinusoid is subtracted from the sampled signal, it will leave a residual harmonic frequency which the sinusoidal coder will attempt to fit with a high frequency harmonic. These “ghost” harmonics may then be matched in the tracking algorithm and included in the final encoded signal which when decoded will include some distortion as well as requiring a higher bit rate than necessary to encode the signal.
In PCT Application No. WO00/74039 and R. J. Sluijter, A. J. E. Janssen, “A time warper for speech signals” IEEE Workshop on Speech Coding, Porvoo, Finland, Jun. 20–23, 1999, pp. 150–152 there is disclosed a time warper to enhance the stationarity of an audio segment.
Sluijter et al disclose a method to obtain a warp parameter a for a segment. By warping the segment with a warp function of the form:
in which T represents the duration of the segment in seconds, t represents real time and T stands for the warped time, the time warper removes the part of the frequency variation which progresses linearly with time, without changing the time duration of that segment.
By applying the time warper proposed by Sluijter et al, the problem of non-stationarity of frequencies can be alleviated, and so a sinusoidal coder can more reliably estimate the frequencies within a warped segment. Sluijter et al also discloses the transmission of the warp factor in a bit-stream so that the warp factor may be used in synthesizing warped sinusoids within a decoder.
As an example of the improvements provided by Sluijter et al, a harmonic signal is used where the fundamental frequency is changing rapidly.
By doing the estimation on segments time-warped according to Sluijter, all frequencies are estimated correctly, as can be seen in
This is because once a group of frequencies has been estimated for one segment, the tracking algorithm attempts to link these with the group of frequencies of the next segment without taking into account the frequency variation of sinusoidal components within sequential segments. So as shown in
The present invention attempts to mitigate this problem.
According to the present invention there is provided a method of encoding an audio signal, the method comprising the steps of claim 1.
A first embodiment of the invention provides a method of using the time warper in the tracking algorithm of a sinusoidal coder. By applying a warp factor, more accurate tracks are obtained. As a result, the sinusoids can be encoded more efficiently. Furthermore, a better audio quality can be obtained by improved phase continuation.
In the first embodiment, the method disclosed in Sluijter et al for determining a warp factor is employed. Preferably, the warp factor of Equation 1 is employed in the tracking algorithm. Since the warp factor indicates the frequency variation that progresses linearly with time, it can be used to indicate the direction of the frequencies. Therefore, this factor can improve the tracking algorithm.
In a second embodiment of the invention, linking sinusoidal components is based on generating a polynomial to fit a number of the last frequency parameters of a track and extrapolating the polynomial to generate an estimate of the next value of frequency parameter of the track. A sinusoidal component of a subsequent segment in the track is linked or not according to the difference in frequencies between the estimate and the frequency parameter of the sinusoidal component.
An advantage the second polynomial fitting embodiment can have over the first warp factor based embodiment is that it does not make any assumption about the signal model, i.e. it does not presume that all tracks or at least contiguous groups of tracks are varying in the same manner. So, if an audio signal contains two main audio components, one decreasing in frequency and the other one increasing in frequency, both can be tracked successfully, whereas this would be less likely to be the case with the first embodiment.
By making more accurate tracks, coding efficiency is increased and better phase continuation is achieved.
a) and
a) to 9(c) show tracks formed according to a second embodiment of the invention.
In preferred embodiments of the present invention,
In both the earlier case and the preferred embodiments, the audio coder 1 samples an input audio signal at a certain sampling frequency resulting in a digital representation x(t) of the audio signal. The coder 1 then separates the sampled input signal into three components: transient signal components, sustained deterministic components, and sustained stochastic components. The audio coder 1 comprises a transient coder 11, a sinusoidal coder 13 and a noise coder 14. The audio coder optionally comprises a gain compression mechanism (GC) 12.
The transient coder 11 comprises a transient detector (TD) 110, a transient analyzer (TA) 111 and a transient synthesizer (TS) 112. First, the signal x(t) enters the transient detector 110. This detector 110 estimates if there is a transient signal component and its position. This information is fed to the transient analyzer 111. If the position of a transient signal component is determined, the transient analyzer 111 tries to extract (the main part of) the transient signal component. It matches a shape function to a signal segment preferably starting at an estimated start position, and determines content underneath the shape function, by employing for example a (small) number of sinusoidal components. This information is contained in the transient code CT and more detailed information on generating the transient code CT is provided in WO 01/69593-A1.
The transient code CT is furnished to the transient synthesizer 112. The synthesized transient signal component is subtracted from the input signal x(t) in subtractor 16, resulting in a signal x1. In case, the GC 12 is omitted, x1=x2.
The signal x2 is furnished to the sinusoidal coder 13 where it is analyzed in a sinusoidal analyzer (SA) 130, which determines the (deterministic) sinusoidal components. It will therefore be seen that while the presence of the transient analyser is desirable, it is not necessary and the invention can be implemented without such an analyser. In any case, the end result of sinusoidal coding is a sinusoidal code CS and a more detailed example illustrating the conventional generation of an exemplary sinusoidal code CS is provided in PCT patent application No. WO 00/79519-A1 (Attorney Ref: N 017502).
In brief, however, such a sinusoidal coder encodes the input signal x2 as tracks of sinusoidal components linked from one frame segment to the next. The tracks are initially represented by a start frequency, a start amplitude and a start phase for a sinusoid beginning in a given segment—a birth. Thereafter, the track is represented in subsequent segments by frequency differences, amplitude differences and, possibly, phase differences (continuations) until the segment in which the track ends (death). In practice, it may be determined that there is little gain in coding phase differences. Thus, phase information need not be encoded for continuations at all and phase information may be regenerated using continuous phase reconstruction.
In both the first and second embodiments of the invention, the extent of warping of tracks from one segment to the next is taken into account when linking sinsusoids from one segment to the next. In the first embodiment of the invention, to include a time warp factor in the generation of tracks, the frequencies that are used by the tracking algorithm portion of the sinusoidal coder have to be modified. If no warping is applied, the following equation is evaluated for each frequency in frame k and frame k+1:
Df=|e(fk+1)−e(fk)|, Equation 2
where e(.) denotes an arbitary mapping function, e.g. e(.) is the frequency in ERB, and f denotes a frequency in a frame. So in the example of
In the first embodiment, the warp factor is used in the sinusoidal coder tracking algorithm as follows. The frequencies of frame k and frame k+1 are transformed to frequencies {tilde over (f)}k and {tilde over (f)}k+1 as follows:
where a1 is the warp factor of frame i, T is the segment size on which a is determined (e.g 32.7 ms), and L is the update interval of the frequencies (e.g. 8 ms). As will be seen from the second embodiment below, the invention is not limited to the above formula or particular method for determining a warp factor as disclosed by Sluijter et al. Neither is an even division of the update interval required, so that, rather than L/2, an L1 may be used to determine {tilde over (f)}k,1 and an L2 used to determine {tilde over (f)}k+1,2 where L1+L2=L.
The frequencies {tilde over (f)}k,1 and {tilde over (f)}k+1,2 thus take into account the time warp factor. Now the tracking algorithm, when determining frequency differences from one segment to the next, uses a modified Equation 2 as follows:
Df=|e({tilde over (f)}k+1.2)−e({tilde over (f)}k,1)|, Equation 4
This will, for example, produce frequency differences δ3 and δ4,
By applying the tracking algorithm, that includes the time warp factor, on the examples of
In the first embodiment, the warp factor is further used to save bit rate for transmitting modified frequency differences from segment to segment. Equation 2 shows that by transmitting difference Df (and a sign bit), frequency fk+1 can be obtained from frequency fk. In the first embodiment, however, frequency differences according to equation 4 together with a warp factor and sign bits are transmitted.
By using entropy coding to encode frequency differences within this more defined frequency difference profile, the resulting signal will therefore either require less bits or be of higher quality. This is because for a given coding quantization scheme, there should be more symbols occurring in the most frequently used and so most compressed symbols, or alternatively a more focused quantization scheme should produce better discrimination for the same bit rate.
In a second embodiment of the invention, the extent of warping of tracks from one segment to the next is taken into account on a track by track basis. Referring now to
On the other hand, the second embodiment uses the evolution, potentially extending along a number of segments, of the frequency, and preferably the amplitude and the phase of the sinusoidal components of the tracks, until and including time segment k−1, to make a prediction of the frequency, and preferably the amplitude and the phase parameters of the sinusoidal components that could exist for time segment k, if the tracks were continuing.
The prediction of the frequency, amplitude and phase of the possible continuations are obtained by fitting a polynomial preferably of the form a+bx+cx2+dx3 . . . to the set of parameters along the track until the time segment k−1. In the case of track 1 which comprises a component with frequency fk−1(1) in segment k−1, the polynomial passing through this point is referred to a P1k−1 and similarly for track two. Corresponding polynomials (not shown) may be fitted to the amplitude and phase parameters of the components. Estimations of the frequency and where applicable the amplitude and the phase parameters of the possible following component are obtained by computation of the value of those polynomials at the time segment k. In the case of track 1, the frequency estimate is referred to as E1k−1 and similarly for track 2.
The formation of tracks is then based on the similarity between this set of predicted/estimated parameters and the parameters of the components really extracted at time segment k—in this case the frequency parameters are fk(1) and fk(2). If these frequency parameters fall within a tolerance T from the frequency estimates, the associated component becomes a candidate for being linked to the track for which the estimate is made.
So in the example of
Now advancing to
In the preferred version of the second embodiment, a maximum order of 4 is used for the polynomials fitted to frequency parameters, 3 is used for the polynomials fitted to amplitude parameters, and 2 is used for the polynomials fitted to phase parameters.
Turning now to
In the second embodiment, however, different tracks may be allowed to vary freely with respect to other tracks according only to the prior history of a given track—in so far as it is available. This can be considered to lead to potential problems, where a new track may start with a frequency parameter in the vicinity of adjacent varying tracks. Thus, in the example, fk+1(new) might be linked to fk+2(1) instead of the more likely candidate fk+1(1) being linked to fk+2(1).
However, in the case of the new component fk+1(new), in the second embodiment, the tracking algorithm can also take into account amplitude and/or phase predictions. These may help to ensure that the correct links are made, because, for example, fk+2(1) might be more likely to be in-phase with fk+1(1) than fk+1(new).
It will be seen that the coding gain of transmitting only the frequency differences such as δ4, of the first embodiment may be lost if frequency differences such as δ5 between subsequent frequency components of a track generated according to the second embodiment are encoded in the bitstream.
This has an advantage in that a decoder need then not be aware of the form of polynomial prediction employed within the encoder and as such it will be seen that the invention is not limited to any particular form of polynomial.
However, there can also be similar coding gains in the second polynomial based embodiment. Here, the encoder transmits the frequency difference, for example δ6, and preferably amplitude difference and/or phase difference that was determined between the estimate, in this case E1k+1, and the linked component parameter, in this case fk+2(1) from segment k+2. The decoder then needs to make a prediction via a polynomial fitting of the tracks already received up to a time segment say k+1 (same operation than in the encoder) before employing the frequency and amplitude and/or phase difference parameters for segment k+2. No extra factor such as the warp factor needs to be sent in this case, however, the decoder does need to be aware of the form of polynomial used in the encoder.
It will therefore been seen that the polynomials of the second embodiment encapsulate with a greater degree of freedom the warping of component parameters from segment to segment than using the alternative warp factor of the first embodiment.
However, regardless of which embodiment is used, as in the prior art, from the sinusoidal code CS generated with the improved sinusoidal coder of the invention, the sinusoidal signal component is reconstructed by a sinusoidal synthesizer (SS) 131. This signal is subtracted in subtractor 17 from the input x2 to the sinusoidal coder 13, resulting in a remaining signal x3 devoid of (large) transient signal components and (main) deterministic sinusoidal components.
The remaining signal x3 is assumed to mainly comprise noise and the noise analyzer 14 of the preferred embodiment produces a noise code CN representative of this noise, as described in, for example, PCT patent application No. WO 01/89086-A1 (Attorney Ref: PH NL000287). Again, it will be seen that the use of such an analyser is not essential to the implementation of the present invention, but is nonetheless complementary to such use.
Finally, in a multiplexer 15, an audio stream AS is constituted which includes the codes CT, CS and CN. The audio stream AS is furnished to e.g. a data bus, an antenna system, a storage medium etc.
The sinusoidal code CS is used to generate signal yS, described as a sum of sinusoids on a given segment. Where an encoder according to the first embodiment has been employed, in order to decode the frequencies, the warping parameter for each segment has to be known at the decoder side. In the decoder, the phase of a sinusoid in a sinusoidal track is calculated from the phase of the originating sinusoid and the frequencies of the intermediate sinusoids. When no warp factor is used in the decoder, phase φk of frame k is calculated as:
where L is the update interval (in seconds) of the frequencies and fk and fk−1 are frequencies (in Hertz) of frame k and frame k−1, respectively. By including the warp factor, the phase can be computed by:
It will be seen, however that other functions can also supply approximations for the phase and the invention is not limited to Equation 6. In any case, the use of such a function means that the continuous phase will better match the original phase by including the warp factor.
Where an encoder according to the second embodiment of the invention was employed to generate the bitstream, then if frequency differences such as δ5 are encoded in the bitstream, a prior art type decoder can be used to synthesize the signal as it need not be aware that improved linking has been used to generate the tracks of the sinusoidal codes.
If the encoder such as disclosed by Sluijter et al has employed warping to better estimate sinusoidal parameters and included the warp factor in the bitstream, then this warp factor can be used in synthesizing the sinusoidal components of the bistream to better replicate the original signal.
However, as mentioned previously, if the encoder according to the second embodiment includes frequency differences such as δ6 in the bitstream, then the decoder will need to generate the polynomials used in the tracking algorithm to determine the subsequent frequency and amplitude and/or phase parameters for subsequent sinusoidal components of tracks.
At the same time, the noise code CN is fed to a noise synthesizer NS 33, which is mainly a filter, having a frequency response approximating the spectrum of the noise. The NS 33 generates reconstructed noise yN by filtering a white noise signal with the noise code CN.
The total signal y(t) comprises the sum of the transient signal yT and the product of any amplitude decompression (g) and the sum of the sinusoidal signal yS and the noise signal yN. The audio player comprises two adders 36 and 37 to sum respective signals. The total signal is furnished to an output unit 35, which is e.g. a speaker.
In the first embodiment, the use of only one warp factor per segment is described. However, it will be seen that several warp factors per frame may be used. For example, for every frequency or group of frequencies a separate warp factor may be determined. Then, the appropriate warp factor can be used for each frequency in the equations above.
The present invention can be used in any sinusoidal audio coder. As such, the invention is applicable anywhere such coders are employed.
The invention also applies to objects which are combinations of frequency tracks. For example, some sinusoidal coders can be arranged to identify within a set of sinusoidal components one or more fundamental frequencies, each with a set of harmonics. An encoding advantage can be gained by transmitting such components as harmonic complexes each comprising parameters relating to the fundamental frequency and, for example, the spectral shape relating to its associated harmonics. It will therefore be seen that when linking such complexes from segment to segment, either the warp factor(s) determined for each segment or polynomial fitting can be applied to the components of such complexes to determine how these should be linked in accordance with the invention.
Number | Date | Country | Kind |
---|---|---|---|
01204062 | Oct 2001 | EP | regional |
02075316 | Jan 2002 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
6925434 | Oomen et al. | Aug 2005 | B1 |
Number | Date | Country |
---|---|---|
1318188 | Oct 2001 | CN |
Number | Date | Country | |
---|---|---|---|
20030083886 A1 | May 2003 | US |