This application claims the priority under 35 U.S.C. §119 of European patent application no. 10250574.0, filed on Mar. 25, 2010, the contents of which are incorporated by reference herein.
The invention relates to multi-channel audio signal processing, in particular to a method of processing a multi-channel audio signal and to a signal processing device.
FM radio was invented in the 1940's and extended for stereo broadcasts in the 1960's. The demodulated FM-stereo signal comprises a mono audio signal (L+R), a pilot tone of 19 kHz and a stereo difference signal (L−R) modulated on a 38 kHz sub carrier, as illustrated schematically in
In stereo broadcast FM signals, the left (L) and right (R) channels are matrixed into sum (S) and difference (D) signals, i.e. S=(L+R)/2 and D=(L−R)/2. A mono FM receiver will use just the S signal. A stereo receiver will matrix the S and D signals to recover L and R: L=S+D and R=S−D. As shown in
The final multiplex signal from the stereo generator is the sum of the baseband audio signal 101, the pilot tone 102, and the DSBSC modulated subcarrier signal 103. This multiplex, along with any other subcarriers, is modulated by the FM transmitter.
In a typical FM receiver, an input signal is first subjected to a limiter in order to eliminate any amplitude modulation (AM) noise present in the signal. The output of the limiter is a square wave with a constant amplitude. The square wave is then sent through a bandpass filter with a centre frequency equal to the carrier frequency and a bandwidth equal to the bandwidth of the FM signal. The bandpass filter filters out the square wave harmonics and returns a constant-amplitude sinusoidal signal. The constant-amplitude FM signal is then differentiated. The instantaneous frequency is converted to an AM signal modulating the FM carrier function. An envelope detector extracts the amplitude, or envelope, of the input signal of interest. In this way the multiplex signal shown in
As a consequence of differentiation, white noise present in the input signal becomes frequency dependent noise in the output signal. The RMS noise level is linearly proportional with the frequency. The power spectral density increases quadratically with frequency. This is described in more detail in “Information Transmission, modulation, and noise”, by M. Schwartz, 3ed, chapter 5-12 (reference [9] below).
Accordingly, the difference signal 103, which is present around the suppressed carrier at 38 kHz is significantly more affected than the mono sum signal 101 in the range up to 15 kHz. Receivers therefore tend to automatically switch to mono audio reproduction if the level of noise in a stereo signal is too high, since most of this noise will derive from the difference signal 103.
An alternative method to that of switching off the difference signal has been proposed in US 2006/0280310 (reference [4] below), in which a frequency selective stereo to mono blending is used based on the masking effect of the human auditory system.
WO 2008/087577 (reference [1] below) discloses a system that also attempts to restore a reasonable stereo image while maintaining a low noise level, in which a stereo audio coding tool derived from a technique known as “Intensity Stereo” (IS) is used (disclosed in reference [3] below). According to this technique, instead of reinstating a noisy difference signal for creating a stereo signal an estimated difference signal is constructed. This estimated difference signal is created in the frequency domain by calculating a gain factor for each frequency band. A difference signal is then obtained by multiplying the frequency domain representation of the sum signal by the envelope of calculated gain parameters.
Although, the system disclosed in WO 2008/087577 can greatly improve the overall quality compared to either the stereo signal obtained by sum/difference reconstruction or the mono fallback option, it still poses a number of disadvantages. Firstly, the technique used does not fully exploit knowledge currently available in audio coding tools. Intensity Stereo is a stereo coding tool that has been largely superseded by more powerful tools such as Parametric Stereo (disclosed in reference [2] below). Secondly, the channel conditions, and therefore the noise conditions, of the sum and difference signal will tend to vary over time. This knowledge is not fully exploited in WO 2008/087577, which instead proposes heuristic measures to account for noisy channel conditions. Thirdly, the system does not describe how to behave in case channel conditions are either very poor or very good.
It is an object of the invention to address one or more of the above mentioned problems.
According to a first aspect of the invention there is provided a method of processing a multi-channel audio signal, the method comprising the steps of:
The first gain is optionally a complex-valued scaling factor, and may be calculated from a ratio of a complex-valued cross correlation between the sum and difference signals and the power of the sum signal.
The second gain may be calculated as a square root of a ratio of the residual signal power and the power of the sum signal.
The first and second gains may be set to a minimum when an estimate of signal to noise in the difference signal is below a set minimum threshold value.
The first and second gains may be set to a maximum when an estimate of signal to noise in the difference signal is above a set maximum threshold value.
The first and second gains may be set to a value between a minimum value and a maximum value depending on a value of an estimate of signal to noise in the difference signals being between a set minimum threshold value and a set maximum threshold value respectively.
The estimate of signal to noise in the difference signal may be a ratio calculated from a combination of real and imaginary parts of a filtered and demodulated version of the difference signal.
The multi-channel audio signal may be a frequency modulated signal comprising a baseband sum signal and a sideband modulated difference signal.
According to a second aspect of the invention there is provided a signal processing device for processing a multi-channel audio signal comprising an input sum signal representing a sum of a first audio signal and a second audio signal and an input difference signal representing a differences between the first and second audio signals, the device comprising:
The first gain is optionally a complex-valued scaling factor, and the parameter estimation module may be configured to calculate the first gain from a ratio of a complex-valued cross correlation between the sum and difference signals and the power of the sum signal.
The parameter estimation module may be configured to calculate the second gain as a square root of a ratio of the residual signal power and the power of the sum signal.
The parameter estimation module may be configured to set the first and second gains to a minimum when an estimate of signal to noise in the difference signal is below a set minimum threshold value.
The parameter estimation module may be configure to set the first and second gains to a maximum when an estimate of signal to noise in the difference signal is above a set maximum threshold value.
The parameter estimation module may be configured to set the first and second gains to a value between a minimum value and a maximum value depending on a value of an estimate of signal to noise in the difference signals being between a set minimum threshold value and a set maximum threshold value respectively.
The signal processing device may comprise a noise estimation module configured to provide the estimate of signal to noise in the difference signal from a ratio calculated from a combination of real and imaginary parts of a filtered and demodulated version of the difference signal.
The invention may be embodied as a computer program for instructing a computer to perform the method according to the first aspect. The computer program may be stored on a computer-readable medium such as a disc or memory. The computer may be a programmable microprocessor, application specific integrated circuit or a general purpose computer such as a personal computer.
Embodiments according to the invention comprise a number of improvements that can deliver a significant reduction in noise and improvement in output sound quality, in particular with respect to the system disclosed in WO 2008/087577. These improvements include:
i) the use of decorrelation in a similar way to current parametric stereo coding methods;
ii) the use of upmixing techniques that depend on the signal (or signal plus noise) to noise ratio of the difference signal, which is preferably applied in a time and frequency variant manner to allow upmixing to be applied to each Time/Frequency (T/F) tile depending on the local SNR of the T/F tile; and
iii) the use of a hybrid scheme where, for each T/F tile, a gradual transition from an original difference signal to an estimated difference signal to using no difference signal (i.e. a sum signal alone).
Details of exemplary embodiments according to aspects of the invention are described below with reference to the accompanying drawings, in which:
a is a schematic representation of power spectral density of a frequency modulated multiplex signal in the frequency domain;
b is a schematic representation of power spectral density of a complex filtered version of the signal of
c is a schematic representation of power spectral density of the signal of
d is a schematic representation of power spectral density of the real part of the signal in
e is a schematic representation of power spectral density of the imaginary part of the signal in
d′=gs·s+gsd·sd
In comparison with the way the difference signal is calculated in WO 2008/087577, the above relationship includes an additional decorrelated signal component term gsd·sd.
The gains gs, gsd can be calculated as a function of the power of the sum and difference signals s, d and a non-normalized cross-correlation between the sum and difference signal, according to the following relationships:
where
represents the complex-valued inner product of the signal vectors x,y. The parameter ε is a small positive value to prevent division by zero. Therefore, effectively the parameter gs is calculated as the ratio of the complex-valued (complex-conjugate) cross correlation between the sum/difference signal pair and the power of the sum signal. This provides the least-squares fit. The parameter gsd is calculated as square root of the ratio of the residual signal power and the power of the sum signal.
In parallel with the parameter estimation process, the sum signal s is also input to a decorrelation module 202, in which a decorrelated sum signal sd is derived that has a correlation with the sum signal s substantially close to zero and having approximately the same temporal and spectral shape as the sum signal s. The decorrelation module 202 can be implemented for example by means of all-pass filters or by reverberation circuitry. An example of a synthetic reverb is given in Jot, J. M. & Chaigne, A. (1991), Digital Delay Networks for designing Artificial Reverb, 90th Convention of the Audio Engineering Society (AES), Preprint Nr. 3030, Paris, France (reference [5] below).
After decorrelation and parameter estimation, gains gs, are gsd applied to the sum signal s and the decorrelated sum signal sd by means of first and second amplifiers 203, 204. The output signals gs·s, gsd·sd from the amplifiers 203, 204 are provided to a summing module 205 and added together, resulting in a synthetic difference signal d′. The sum signal s and the synthetic difference signal d′ are then fed through a conventional sum and difference matrix module 206, which derives left and right audio signals l′, r′ according to the following relationship:
The left and right signals l′, r′ are output by the sum/difference matrix module 206 to a de-emphasis filter module 207, which derives an output stereo signal. The de-emphasis module 207 operates to invert a pre-emphasis that is applied during the frequency modulation process. In alternative embodiments, the de-emphasis module may be applied to the input sum and difference signals s, d instead.
The processing described above is preferably conducted in a number of frequency bands in order to provide the highest fidelity. In each case, the input multiplexed time domain signals will need to be first converted to the frequency domain, and converted back to the time domain after processing. Frequency and time domain conversions may be carried out by discrete Fourier transformation (DFT, a fast implementation using FFT) as for example described in Moorer, The Use of the Phase Vocoder in Computer Music Applications Journal of the Audio Engineering Society, Volume 26, Number 1/2, January/February 1978, pp 42-45 (reference [6] below), or applied to sub-band representations for example by using Quadrature Mirror Filter (QMF) banks, as for example described in P. Ekstrand, Bandwidth Extension of Audio Signals by Spectral Band Replication, Proc. 1st IEEE Benelux Workshop on Model based Processing and Coding of Audio (MPGA-2002), Leuven, Belgium, Nov. 15, 2002 (reference [7] below], or warped Linear Predictive (LP) structures as for example described in A. Härmä, M. Karjalainen, L. Savioja, V. Välimäki, U. K. Laine, and J. Huopaniemi. Frequency-warped signal processing for audio applications. J. Audio Eng. Soc., 48:1011-1031, 2000 (reference [8] below).
According to a second embodiment, the signal processing device of the first embodiment may be extended by the use of noise information that can be derived from the difference signal d. A trade-off can be made between the signal attributes corresponding to a stereo image and to noisiness of the signal, which may to some extend be separable.
a, which is a reproduction of
The difference signal 303 is effectively available twice, once in the frequency range from 23 to 38 kHz and once in the frequency range from 38 to 53 kHz. Hence, using this knowledge both the difference signal d, which consists of d=d+n, i.e., the original difference signal plus an additional noise component, is available as well as nd, where nd is an approximation of the noise signal n. The signals d and nd can be obtained as illustrated in
As a consequence, a ratio of the signal plus noise to the noise (SNNR) of the difference signal can be estimated.
The power of the difference signal d consists of the power of the difference signal plus the power of the noise estimate, under the assumption that there is zero correlation between the difference signal and the positive and negative noise components. In practice, accidental correlations may exist leading to deviations between the actual noise level of the difference signal and the noise estimate.
From the difference signal and the difference noise estimate, the SNNR can be estimated according to the following relationship:
The SNNR can be used as a means to control the parameter estimation.
Use of the SNNR as control information is applicable in situations where the difference signal is overwhelmed by noise, i.e. where the SNNR is approximately 0 dB. In such cases, the estimated parameters gs, gsd are not employed, since they would in such cases be solely based on the noise signal. For example, the SNNR can be used to weight the gains gs and gsd such that, for an SNNR below a certain threshold, for example below 1 dB, the gains are set to 0, thereby yielding a mono signal. Between a specified range of SNNR values, for example between 1 dB and 5 dB, the estimated gains are scaled with a weight between 0 and 1. For SNNR values above a specified threshold, for example 5 dB, the gains are left unaltered. These relationships can be expressed as the following relationships:
gs=gs,measured·ƒ1(SNNR)
gsd=gsd,measured·ƒ2(SNNR)′
where ƒ1 and ƒ2 are functions having a range of between 0 and 1.
As with the first embodiment, the above processing is preferably conducted in a time and frequency variant manner. The noise estimates may vary substantially from the actual noise levels for very small time and frequency tiles since the noise estimate signal nd, only provides an estimate of the actual noise signal n. Furthermore, due to poor reception conditions, such as e.g. multi-path reception effects, the noise estimate signal nd may substantially deviate from the actual noise signal. Therefore, the SNNR may be further processed to remove high frequent variations.
According to a third embodiment, the device of the second embodiment can be adapted to also allow for scaling up to transparency for low noise levels. A signal processing device 500 according to the third embodiment is illustrated in
In this embodiment, as well as in the second embodiment, the use of a metric to control the behaviour of the parameter estimation module 201 is required. This metric does not necessarily need to be an SNNR estimate as detailed above, but could be a different metric that can be used to provide an estimate of signal to noise in the difference signal. An alternative metric may, for example, be a measure of a level of the received input signal. The use of SNNR is therefore a specific embodiment of a more general control metric that represents an estimate of signal to noise in the difference signal.
The mix matrix used by the sum/difference matrix module 506 for calculating the output signals l′, r′ then becomes the following:
The effect of this is that the gain gd and the combined gains of gs and gsd will operate in a complementary fashion.
Other embodiments are within the scope of the invention, as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
10250574 | Mar 2010 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
3811011 | Hardy et al. | May 1974 | A |
6169973 | Tsutsui et al. | Jan 2001 | B1 |
6563773 | Yanagisawa et al. | May 2003 | B1 |
7085331 | Seo | Aug 2006 | B2 |
7715567 | Wildhagen | May 2010 | B2 |
20030055636 | Katuo et al. | Mar 2003 | A1 |
20050207585 | Christoph | Sep 2005 | A1 |
20050254446 | Breebaart | Nov 2005 | A1 |
20060190247 | Lindblom | Aug 2006 | A1 |
20060206316 | Sung et al. | Sep 2006 | A1 |
20060233380 | Holzer et al. | Oct 2006 | A1 |
20070194952 | Breebaart et al. | Aug 2007 | A1 |
20070236858 | Disch et al. | Oct 2007 | A1 |
20080048628 | Lee | Feb 2008 | A1 |
20100014679 | Kim et al. | Jan 2010 | A1 |
20100023335 | Szczerba et al. | Jan 2010 | A1 |
20110106543 | Jaillet et al. | May 2011 | A1 |
20130231940 | Ehara | Sep 2013 | A1 |
Number | Date | Country |
---|---|---|
1197958 | Nov 1998 | CN |
1 206 043 | May 2002 | EP |
2006148647 | Jun 2006 | JP |
03090206 | Oct 2003 | WO |
WO 03090206 | Oct 2003 | WO |
2008087577 | Jul 2008 | WO |
WO 2008087577 | Jul 2008 | WO |
Entry |
---|
Breebaart, J. et al., “Parametric Coding of Stereo Audio,” J. Applied Signal Processing, vol. 9, pp. 1305-1322 (2004). |
Moorer, J. “The Use of the Phase Vocoder in Computer Music Applications,” J. Audio Engineering Society, vol. 26, No. 1/2, pp. 42-45 (Jan. 1978). |
Jot, J-M., “Digital Delay Networks for Designing Artificial Reverberators,”, 90th Convention Audio Engineering Society, Preprint No. 3030 (E-2), 17 pgs. (Feb. 1991). |
Herre J. et al., “Intensity Stereo Coding,” 96th Convention Audio Engineering Society, Preprint No. 3799 (P3.9), 11 pgs. (Feb. 1994). |
Harma, A., et al. “Frequency-Warped Signal Processing for Audio Applications,” J. Audio Engineering Society, vol. 48, No. 11, pp. 1011-1031 (Nov. 2000). |
Ekstrand, P. “Bandwidth Extension of Audio Signals by Spectral Band Replication,” Proc. 1st IEEE Benelux Workshop on Model Based Processing and Coding of Audio, pp. 53-58 (Nov. 15, 2002). |
Breebaart, J. et al., “Parametic Coding of Stereo Audio,” J. Applied Signal Processing, vol. 9, pp. 1305-1322 (2004). |
Extended European Search Report for Patent Appln. No. 10250574.0 (Jan. 18, 2011). |
Number | Date | Country | |
---|---|---|---|
20110235809 A1 | Sep 2011 | US |