The invention relates to generating an output audio signal based on an input audio signal, and in particular to an apparatus for supplying an output audio signal.
Erik Schuijers, Werner Oomen, Bert den Brinker and Jeroen Breebaart, “Advances in Parametric Coding for High-Quality Audio”, Preprint 5852, 114th AES Convention, Amsterdam, The Netherlands, 22-25 Mar. 2003 disclose a parametric coding scheme using an efficient parametric representation for the stereo image. Two input signals are merged into one mono audio signal. Perceptually relevant spatial cues are explicitly modeled. The merged signal is encoded using a mono parametric encoder. The stereo parameters Interchannel Intensity Difference (IID), the Interchannel Time Difference (ITD) and the Interchannel Cross-Correlation (ICC) are quantized, encoded and multiplexed into a bitstream together with the quantized and encoded mono audio signal. At the decoder side the bitstream is de-multiplexed to an encoded mono signal and the stereo parameters. The encoded mono audio signal is decoded in order to obtain a decoded mono audio signal m′ (see
In the MPEG-4 (ISO/IEC 14496-3:2002) Proposed Draft Amendment (PDAM) 2, Section 5.4.6, such a de-correlated signal is obtained by convoluting/filtering the mono-signal with a pre-defined impulse response.
Non pre-published European patent application 02077863.5 (Attorney docket PHNL020639) describes the use of an all-pass filter, e.g. a comb filter, comprising a frequency dependent delay to derive such a de-correlated signal. At high frequencies, a relatively small delay is used, resulting in a coarse frequency resolution. At low frequencies, a large delay results in a dense spacing of the comb filter. The filtering may be combined with a band-limiting filter, thereby applying the de-correlation to one or more frequency bands.
An object of the invention is to advantageously generate an output audio signal on the basis of an input audio signal. To this end, the invention provides a device, a method and an apparatus as defined in the independent claims. Advantageous embodiments are defined in the dependent claims.
According to a first aspect of the invention, an output audio signal is generated based on an input audio signal, the input audio signal comprising a plurality of input subband signals, wherein at least part of the input subband signals is delayed to obtain a plurality of delayed subband signals, wherein at least one input subband signal is delayed more than a further input subband signal of higher frequency, and wherein the output audio signal is derived from a combination of the input audio signal and the plurality of delayed subband signals. By providing such a frequency dependent delay in the subband domain, parametric stereo can advantageously be implemented especially in those audio decoders where the core decoder already includes a subband filter bank. Filter banks are commonly used in the context of audio coding, e.g. MPEG-1/2 Layer I, II and III all make use of a 32 bands critically sampled subband filter. The plurality of delayed subband signals may be used as a subband domain equivalent of the de-correlated signal as described above. In ideal circumstances the correlation between the plurality of delayed subband signals and the input audio signal is zero. However, in practical embodiments, the correlation may be up to 40% for acceptable audio quality, up to 10% for medium to high quality audio and up to a 2 or 3% for high audio quality.
In an embodiment of the invention the output audio signal includes a plurality of output subband signals. Combining the delayed subband signals and the input subband signals in subband domain in order to obtain the plurality of output subband signals is then relatively easy to implement. In practical embodiments, a time domain output audio signal is synthesized from the plurality of output subband signals in a synthesis subband filter bank.
In order to obtain an efficient implementation a plurality of delay units is provided, wherein the number of delay units is smaller than the number of input subband signals, and wherein the input subband signals are subdivided in groups over the plurality of delays.
Best audio quality is obtained in embodiments where the delays in the plurality of delay units are monotonically increasing from high frequency to low frequency.
In an advantageous embodiment of the invention, a complex filter bank is used, which is effectively oversampled by a factor of two because for every real input sample a complex output sample is generated which consists of effectively two values: a real and a complex one. This eliminates the large aliasing components of which the MPEG-1 and MPEG-2 critically sampled filter bank suffers.
In an efficient embodiment of generating the output audio signal, a Quadrature Mirror Filter (“QMF”) bank is used. Such a filter bank is known per se from Per Ekstrand, “Bandwidth extension of audio signals by spectral band replication”, Proc. 1st IEEE Benelux Workshop on Model based Processing and Coding of Audio (MPCA-2002), pp. 53-58, Leuven, Belgium, Nov. 15, 2002.
The use of an integer number of subband samples delayed signal as de-correlated signal causes time-domain smearing, i.e. the signal placement in time is not preserved. This may cause artefacts around transients, i.e. in those cases where a signal strength change is above a predetermined threshold. Signal strength can be measured in amplitude, power, etc. In an advantageous embodiment of the invention, artefacts around transients are mitigated by deriving a de-correlated signal in the surroundings of a transient by using fractional delays instead of integer delays. A fractional delay is a delay less than the time between two subsequent subband samples and can easily be implemented by using a phase rotation. A transition from fractional delays to the integer delays, and vice-versa, may result in discontinuities in the de-correlated signal. In order to prevent such discontinuities, an advantageous embodiment of the invention provides a cross-fade to go back from using the fractionally delayed decorrelated signal to the integer delayed decorrelated signal.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
In the drawings:
The drawings only show those elements that are necessary to understand the invention.
In the following, an advantageous embodiment of the invention is described for generating a stereo output audio signal based on a mono input audio signal by using parametric stereo. The input audio signal includes a plurality of input subband signals. The plurality of input subband signals are delayed in a plurality of delay units providing more delay for lower frequency subbands than for higher frequency subbands. The delayed subband signals serve as a subband domain version of the de-correlated signal needed in the generation of the stereo output signal.
In MPEG-4 PDAM 2, Section 5.4.6, the de-correlated signal is obtained by first calculating a phase characteristic φ, which for a sampling frequency fs of 44.1 kHz equals:
where φ0 has a value of π/2, K is equal to 256 and k=0 . . . 256. From this phase response function a filter impulse response is then calculated using the inverse FFT. It resembles a linear delay. This delay can be approximated by:
where d is the delay in samples and the frequency in radians.
Preferably, the input subband signals are obtained in a complex QMF analysis filter bank, which may be present in a remote encoder, but which may also be present in the decoder. As the outputs of a complex QMF filter bank are down sampled by a factor of N it is not possible to exactly map a desired time domain delay to a delay within each sub band. A perceptually good approximation can be obtained by using rounded versions of the delay function (2) as described above. As an example, the delay within each subband for N=64 subbands is shown in
The approach presented above is well suited for stationary signals. However, for non-stationary, i.e. transient-like signals problems occur using this approach. This is illustrated in
Hence, it is proposed to use a fractionally delayed or phase rotated version of the original signal instead of the frequency-dependent integer delay, starting from the transient position. Because of the temporal post-masking properties of the human auditory system it is not very critical how this de-correlated signal must be calculated. As such, the decorrelated signal can e.g. be obtained by applying a 90 degrees phase shift in each sub-band of the original signal.
In order to prevent discontinuities in the de-correlated signal from the transient on, a cross-fade is preferably applied between the integer delayed and the phase rotated signal. This cross-fade can be performed as:
dhybrid[n]=m[n]ddelay[n]+(1−m[n])drotation[n]
where n is a (sub-band) sample index, m[n] is a mixing or cross-fade factor, ddelay[n] is the de-correlated (sub-band) signal formed by the frequency-dependent integer delay, drotation[n] is the de-correlated sub-band signal formed by the fractional delay or phase rotation and dhybrid[n] is a resulting hybrid de-correlated signal. The mixing factor m[n] becomes zero at the start of the transient. It then remains zero for a period of time typically corresponding to around 20 ms (approx. 12 ms for the length of the delay and 8 ms for the length of the transient). The fade-in from zero to one is typically around 10-20 ms. The mixing factor m[n] can, but is not restricted to be linear or piece-wise linear. Note that this mixing factor m[n] can also be frequency dependent. As the delay is typically shorter for the higher frequencies, it is perceptually preferable to have a shorter cross-fades for the higher frequencies than for the lower frequencies.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps than those listed in a claim. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Number | Date | Country | Kind |
---|---|---|---|
03076134.0 | Apr 2003 | EP | regional |
03076280.1 | Apr 2003 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB04/50432 | 4/14/2004 | WO | 10/12/2005 |