The present invention relates to coding and decoding audio signals.
Referring now to
The first stage of the coder comprises a transient coder 11 including a transient detector (TD) 110, a transient analyzer (TA) 111 and a transient synthesizer (TS) 112. The detector 110 estimates if there is a transient signal component and its position. This information is fed to the transient analyzer 111. If the position of a transient signal component is determined, the transient analyzer 111 tries to extract (the main part of) the transient signal component. It matches a shape function to a signal segment preferably starting at an estimated start position, and determines content underneath the shape function, by employing for example a (small) number of sinusoidal components. This information is contained in the transient code CT.
The transient code CT is furnished to the transient synthesizer 112. The synthesized transient signal component is subtracted from the input signal x(t) in subtractor 16, resulting in a signal x2.
The signal x2 is furnished to a sinusoidal coder 13 where it is analyzed in a sinusoidal analyzer (SA) 130, which determines the (deterministic) sinusoidal components. The end result of sinusoidal coding is a sinusoidal code CS and a more detailed example illustrating the conventional generation of an exemplary sinusoidal code CS is provided in PCT patent application No. WO00/79519A1.
From the sinusoidal code CS generated with the sinusoidal coder, the sinusoidal signal component is reconstructed by a sinusoidal synthesizer (SS) 131. This signal is subtracted in subtractor 17 from the input x2 to the sinusoidal coder 13, resulting in a remaining signal x3 devoid of (large) transient signal components and (main) deterministic sinusoidal components.
The remaining signal x3 is assumed to mainly comprise noise and a noise analyzer 14 produces the noise code CN representative of this noise, as described in, for example, PCT patent application No. WO01/89086A1.
In a multiplexer 15, an audio stream AS is constituted which includes the codes CT, CS and CN.
In the transient coder 11, a part of the audio signal is labeled as a transient if an event occurs that is localized in time, for example, attacks of castanets or high-hats.
In US Published Application No. 2001/0032087A1, a transient is modeled with a number of sinusoids that are windowed by a special transient window (i.e. a Meixner window). In
transient position estimation: The position of the transient in the audio signal is determined by a transient detector 110;
transient envelope estimation: In case of a Meixner transient, the Meixner window, describing the time envelope of the transient, is estimated by a transient analyzer 111;
sinusoidal content estimation: Using the estimated Meixner window, the analyzer 111 estimates a number of sinusoids to describe the transient. The sinusoids are represented by a frequency and three complex, polynomial amplitudes.
In an implementation, where 7 sinusoids used for a Meixner transient, the bit rate range required by the transient module is typically between 0.5 and 2.0 kbit/s, depending on the number of transients that are detected in the audio signal.
By using the transient modeling as described above, a fair audio quality for excerpts containing transients is obtained. However, the audio quality can be improved by increasing the number of sinusoids that are used to model the transient. In this case, the attack of a transient is better defined and more “presence” of the transient is obtained. It has been found, for example, that good results are obtained by increasing the number of sinusoids from 7 to 25.
Referring to
However, using 25 sinusoids, the bit rate is required by the transient module 11 is increased significantly to around 6 kbit/s (from 2 kbit/s using 7 sinusoids). This increase in bit rate for the transient part has to be saved in the sinusoidal and/or noise modeling components 13, 14 of the coder, thus reducing the overall audio quality.
According to the present invention there is provided a method according to claim 1.
The invention extends the current transient model by including parameters for a noise component in the description of a transient. Thus, instead of using only sinusoids, both sinusoids and noise are used to describe the transient.
In preferred embodiments, the time interval of the transient modeled by the sinusoids and noise can differ.
The parameters for the noise component of a transient result in a small increase in bit rate. However, the perceptual quality of the transients is improved.
The invention thus reduces the bit rate otherwise required by additional sinusoids, while maintaining audio quality. This is because the additional sinusoids do not model clear peaks in the spectrum, as do the initial sinusoids, rather the additional sinusoids more or less fill the gaps between the initial sinusoids. In the time domain, the signal described by the additional sinusoids is noise-like and so these portions of the spectrum have been found to be more effectively modeled with noise parameters.
An embodiment of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
According to a preferred embodiment of the present invention the additional (18) sinusoids mentioned above are instead modeled by a localized noise burst with the same energy as the additional sinusoids. The noise burst is placed at the start of the transient and a fixed time window is used to shape the noise burst. Only the energy of the noise burst has to be transmitted within the transient codes (CT) of an encoded signal (AS), and so the bit rate requirement to implement the embodiment is only increased slightly.
More specifically, in the encoder of the preferred embodiment, the transient analyzer 111, estimates the Meixner transient and models the transient using a high number of sinusoids (e.g. 25) in a conventional manner. This signal is denoted by th and has length U=720 samples (at 44.1 kHz sampling rate). The most relevant sinusoids (for example 7) are used to generate another transient signal, tl. Selection of the most relevant sinusoids can employ for example an energy based cost function or any other conventional criterion. In any case, the signal tl is then subtracted from the signal th to provide a difference signal d=th-t1 which is used to generate the noise burst. The noise burst is placed at the start of the transient and has length L, preferably shorter than the transient. In the preferred embodiment, L=150 samples (at 44.1 kHz sampling rate). The difference signal is windowed according to the function:
dw(n)=d(n)wo(n), for n=1, . . . , L,
where wo is a window, with a fade-out slope, which is defined as:
The fade-out is the second part of a Hanning window. However, different definitions for the window are possible.
The energy of the windowed segment dw is measured as follows:
and the energy E along with the parameters for the sinusoids comprising signal tl are quantized and transmitted to the decoder as part of the transient codes CT. Thus, the information relating to the (additional) sinusoids of the difference signal d is discarded and replaced by the noise burst parameter.
The signal th is synthesized by synthesizer 112 as in the conventional encoder and is subtracted (16) from the input signal x(t) in order to create a residual signal x2 that is fed in the sinusoidal analysis module 13 as before. Alternatively, the transient codes CT could be synthesized by synthesizer 112 as in the decoder (explained below) before being subtracted from the input signal x(t) to produce residual signal x2.
In this way, the transient part can be better modeled by the sinusoidal 13 and noise 14 modules of the audio coder.
Referring now to
In the preferred embodiment of the present invention, in the transient synthesizer 31, the parameters for the signal tl comprising the initial sinusoids are used to re-construct the sinusoids in synthesizer TSS,
At the same time, the encoded energy value is reconstructed, resulting in energy Ê. A white noise generator (WNG) provides a segment of high-pass filter noise with length L. Preferably, the high-pass filter has a cut-off frequency of 300 Hz in order to avoid the modeling of very low frequencies by noise. The filtered noise signal is windowed (WDW) using window w, which is preferably a Hanning window of length L. However, other windows are also possible (e.g. an asymmetric Hanning window).
The windowed noise signal is denoted by rw. This signal is scaled by gain gt, which is calculated according to:
The resultant generated energy burst is added to the synthesized sinusoidal components of the transient in adder 39 thus completing the synthesis of the transient signal yT which can be treated as before when being added to the other synthesized components of the signal y(t).
In
Referring back to
The total signal y(t) comprises the sum of the transient signal yT and the product of any amplitude decompression (g) and the sum of the sinusoidal signal yS and the noise signal yN. The audio player comprises two adders 36 and 37 to sum respective signals. The total signal is furnished to an output unit 35, which is e.g. a speaker.
This invention can be used in an audio coder where transients are described by windowed sinusoids.
Number | Date | Country | Kind |
---|---|---|---|
03103325.1 | Sep 2003 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB04/51572 | 8/26/2004 | WO | 3/2/2006 |