The present invention relates to coding and decoding audio signals.
Referring now to
The first stage of the coder comprises a transient coder 11 including a transient detector (TD) 110, a transient analyzer (TA) 111 and a transient synthesizer (TS) 112. The detector 110 estimates if there is a transient signal component and its position. This information is fed to the transient analyzer 111. If the position of a transient signal component is determined, the transient analyzer 111 tries to extract (the main part of) the transient signal component. It matches a shape function to a signal segment preferably starting at an estimated start position, and determines content underneath the shape function, by employing for example a (small) number of sinusoidal components. This information is contained in the transient code CT.
The transient code CT is furnished to the transient synthesizer 112. The synthesized transient signal component is subtracted from the input signal x(t) in subtractor 16, resulting in a signal x2.
The signal x2 is furnished to a sinusoidal coder 13 where it is analyzed in a sinusoidal analyzer (SA)130, which determines the (deterministic) sinusoidal components. The end result of sinusoidal coding is a sinusoidal code CS and a more detailed example illustrating the conventional generation of an exemplary sinusoidal code CS is provided in PCT patent application No. WO00/79519A1.
From the sinusoidal code CSgenerated with the sinusoidal coder, the sinusoidal signal component is reconstructed by a sinusoidal synthesizer (SS) 131. This signal is subtracted in subtractor 17 from the input x2 to the sinusoidal coder 13, resulting in a remaining signal x3 devoid of (large) transient signal components and (main) deterministic sinusoidal components.
The remaining signal x3 is assumed to mainly comprise noise and a noise analyzer 14 produces the noise code CN representative of this noise, as described in, for example, PCT patent application No. WO01/89086A1.
FIGS. 2(a) and (b) show generally the form of an encoder (NE) suitable for use as the noise analyzer 14 of
In the parametric decoder (ND), a synthetic white noise sequence is generated (in WNG) resulting in a signal r3′ with a temporally and spectrally flat envelope. A temporal envelope generator (TEG) adds the temporal envelope on the basis of the received, quantised parameters Pt′ and a spectral envelope generator (SEG, a time-varying filter) adds the spectral envelope on the basis of the received, quantised parameters P., resulting in a noise signal r1′ corresponding to signal yn of
In a multiplexer 15, an audio stream AS is constituted which includes the codes CT, CS and CN.
The sinusoidal coder 13 and noise analyzer 14 are used for all or most of the segments and amount to the largest part of the bit rate budget.
It is well known that parametric audio coders can give a fair to good quality at relatively low bit rates for example 20 kbit/s. However, at higher bit rates the quality increase, as a function of increasing bit rate is rather low. Thus, an excessive bit rate is needed to obtain excellent or transparent quality. It is therefore difficult to attain transparency using parametric coding at bit rates comparable to those of, for example, waveform coders. This means that it is difficult to construct parametric audio coders having an excellent to transparent quality without an excessive usage of bit budget.
The reason for the fundamental difficulty in parametric coding reaching transparency is in the objects that are defined. The parametric coder is very efficient in encoding tonal components (sinusoids) and noisy components (noise coder). However, in real audio, a lot of signal components fall into a grey area: they can neither be modelled accurately by noise nor can they be modelled as (a small number of) sinusoids. Therefore, the very definition of objects in a parametric audio coder, though very beneficial from a bit rate point of view for medium quality levels, is the bottleneck in reaching excellent or transparent quality levels.
At the same time, traditional audio coders (sub-band and transform) give excellent to transparent coding quality at certain bit rates, typically in the order of 80-130 kbit/s for stereo signals sampled at 44.1 kHz. Combinations of transform and parametric coders (so-called hybrid coders) have been proposed for example as disclosed in European patent application no. 02077032.7 filed on May 24, 2002 (Attorney Docket No. ID 609811/PHNL020478). Here spectro-temporal intervals of an audio signal, which would otherwise be sub-band coded, are selectively coded with noise parameters in an attempt to reduce bit rate while maintaining audio quality.
Alternatively, a transform or sub-band coder might be cascaded with a parametric coder of the type shown in
Audio coders using spectral flattening and residual signal modelling using a small number of bits per sample are disclosed in A. Harma and U.K. Laine, “Warped low-delay CELP for wide-band audio coding”, Proc. AES 17th Int. Conf.: High Quality Audio Coding, pages 207-215, Florence, Italy, 2-5 Sep, 1999; S. Singhal, “High quality audio coding using multi-pulse LPC”, Proc. 1990 Int. Conf. Acoustic Speech Signal Process. (ICASSP90), pages 1101-1104, Atlanta Ga., 1990, IEEE Picataway, N.J.; and X. Lin, “High quality audio coding using analysis-by synthesis technique”, Proc. 1991 Int. Conf. Acoustic Speech Signal Process. (ICASSP91), pages 3617-3620, Atlanta Ga., 1991, IEEE Picataway, N.J. In a number of studies, it has been shown that this coding strategy enables an excellent to transparent quality at bit rates corresponding to 2 bit/sample for mono signals (88.2 kbit/s for 44.1 kHz audio). In that respect, they do not exceed the performance of sub-band or transform coders.
It is an object of the present invention to provide a parametric audio coder whose bit rate is controllable across a range and which provides high quality levels at a bit rate comparable with traditional coders.
According to the present invention, there is provided a method according to claim 1.
The invention provides scalability in a parametric coder, by supplementing the noise coder with a pulse train coder. This provides a large range of bit rate operating points and merges the two strategies into one coder without introducing a large overhead in complexity.
The coding strategies within the noise coder are complementary in terms of strengths and weaknesses. The Linear Predictor in the pulse train coder, for example, is inefficient in describing a tonal audio segment, but the sinusoidal coder can do this efficiently. Thus, for tonal items like harpsichord, the pulse train coder is unable to deliver transparent quality for a coarse quantisation of the residual. For other signals, the prediction order of the pulse train coder linear prediction stage has to be very high to allow a coarse quantisation of the residual. For noise like signals, decimation of the residual signal is a problem and leads to a loss of brightness.
In the preferred embodiment, the coding strategies are combined to form a base layer using the parametric coder and an additional (bit rate controlled) pulse train layer. The bit rate resources required for the combined techniques are less than the bit rate requirements per technique since both methods apply spectral flattening and, consequently, the bits needed for this stage only have to be invested once. With the preferred embodiment, a bit rate range from 20-120 kbit/s (for stereo signals) can be covered with performance better than or comparable with that of state-of-the-art coders.
Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIGS. 2(a) and (b) show a conventional parametric noise encoder (NE) and corresponding noise decoder (ND) respectively;.
In the preferred embodiment, a parametric audio coder of the type shown in
In the preferred embodiment, an overall bit rate budget determined according to the quality required from the coder, is divided into a bit-rate B usable by the parametric coder and an RPE coding budget which is inversely proportional to an RPE decimation factor D.
Referring now to
A waveform is generated by block TSS (Transient and Sinusoidal Synthesiser) corresponding to blocks 112 and 131 of
From signal r1, the spectral envelope is estimated and removed in the block (SE) using a Linear Prediction or a Laguerre filter as in the prior art
Because pulse train coders employ a first spectral flattening stage, the RPE coder can be selectively applied on the spectrally flattened signal r2 produced by the block SE according to whether a bit rate budget has been allocated to the RPE coder. In an alternative embodiment, indicated by the dashed line, the RPE coder is applied to the spectrally and temporally flattened signal r3 produced by the block TE.
As is known from the documents referred to in the background, the RPE coder performs a search in an analysis-by-synthesis manner on the residual signal r2/r3. Given a decimation factor D, the RPE search procedure results in an offset (value between 0 and D-1), the amplitudes of the RPE pulses (for example, ternary pulses with values −1, 0 and 1) and a gain parameter. This information is stored in a layer Lo included in the audio stream AS for transmittal to the decoder by a multiplexer (MUX) when RPE coding is employed.
Typically, the RPE coder require a bit rate of at least 40 kbit/s or so and is therefore switched on as the quality requirement and so bit budget of the encoder is increased towards the higher end of the quality range. For the lower part of the quality range where the RPE coder is initially employed, the bit rate B is decreased to less than the maximum bit rate allowed for when the parametric coder is employed alone. This enables a monotonically increasing overall bit rate budget range to be specified for the coder with quality increasing in proportion to the budget.
Experiments showed that the RPE coder results in a loss in brightness in the reconstructed signal, especially when using high decimation factors (e.g. D=8). Adding some low-level noise to the RPE sequence mitigates this problem. In order to determine the level of the noise, a gain (g) is calculated on basis of, for example, the energy/power difference between a signal generated from the coded RPE sequence and residual signal r2/r3. This gain is also transmitted to the decoder as part of the layer L0 information.
Referring now to
The excitation signal r2′ is then fed to a spectral envelope generator (SEG) which according to the codes Ps produces a synthesized noise signal r1′. This signal is added to the synthesized signals produced by the conventional transient and sinusoidal synthesizers to produce the output signal {circumflex over (x)}.
In an alternative embodiment, the signal generated by the pulse train generator PTG is used instead of the signal generated by WNG as an input to the temporal envelope generator as indicated by the hashed line.
Referring now to
The temporal envelope coefficients (PT) are then imposed on the excitation signal r3′ by the block TEG to provide the synthesized signal r2′ which is processed as before. As mentioned above, this is advantageous because a pulse train excitation typically gives rise to some loss in brightness which, with a properly weighted additional noise sequence, can be counteracted. The weighting can comprise simple amplitude or spectral shaping each based on the gain factor g.
As before, the signal is filtered by, for example, a Laguerre filter in block SEG (Spectral Envelope Generator), which adds a spectral envelope to the signal. The resulting signal is then added to the synthesized sinusoidal and transient signal as before.
It will be seen that in either
It should be noted that in the embodiment of
Number | Date | Country | Kind |
---|---|---|---|
031044472.0 | Dec 2003 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB04/52539 | 11/24/2004 | WO | 5/26/2006 |