The present invention relates to digital audio signal processing, and more particularly to loudspeaker and headphone virtualization and cross-talk cancellation devices and methods.
Multi-channel audio inputs designed for multiple loudspeakers can be processed to drive a single pair of loudspeakers and/or headphones to provide a perceived sound field simulating that of the multiple loudspeakers. In addition to creation of such virtual speakers for surround sound effects, signal processing can also provide changes in perceived listening room size and shape by control of effects such as reverberation.
Multi-channel audio is an important feature of DVD players and home entertainment systems. It provides a more realistic sound experience than is possible with conventional stereophonic systems by roughly approximating the speaker configuration found in movie theaters.
e shows functional blocks of an implementation for the (real plus virtual) speaker arrangement of
Note that the dependence of H1 and H2 on the angle that the speakers are offset from the facing direction of the listener has been omitted.
yields Y1=E1 and Y2=E2.
An efficient implementation of the cross-talk canceller diagonalizes the 2×2 matrix having elements H1 and H2:
where M0(ejω)=H1(ejω)+H2(ejω) and S0(ejω)=H1(ejω)−H2(ejω). Thus the inverse becomes simple to compute:
And the cross-talk cancellation is efficiently implemented as sum/difference detectors with the inverse filters 1/M0(ejω) and 1/S0(ejω), as shown in
However, a practical problem arises in the actual implementation due to approximate nulls in the transfer functions M0(ejω)=H1(ejω)+H2(ejω) and S0(ejω)=H1(ejω)H2(ejω). The implementation of such filters would require considerable dynamic range reduction in order to avoid saturation about frequencies with response peaks. For example, with two real speakers each 30 degrees offset as in
has the form illustrated by
Now with cross-talk cancellation, the
For example, the left surround sound virtual speaker could be at an azimuthal angle of about 225 degrees. Thus with cross-talk cancellation, the corresponding two real speaker inputs to create the virtual left surround sound speaker would be:
where H1, H2 are for the left and right real speaker angles (e.g., 30 and 330 degrees), LSS is the (short-term Fourier transform of the) left surround sound signal, and TF3left=H1(225), TF3right=H2(225) are the HRTFs for the left surround sound speaker angle (225 degrees).
Again,
The conventional scheme for reducing the computational cost of multi-channel audio processing is to minimize the number of calculations involved in each FIR filtering process and does not consider the significant overhead introduced by multi-channel processing. The scheme can be described as a set of S×2 filters, where S is the number of sources.
The present invention provides speaker virtualization with separate frequency bands virtualized at differing directions but with adjacent bands at adjacent directions and/or combined cross-talk cancellation and virtualizer filters for headphone or speaker applications and/or a rear surround sound virtual speaker by psychoacoustic reflection and/or separation of FIR filters into sections corresponding to early arrivals and late reverberation with the late reverberation section shared by all filters and/or a cross-talk canceling shuffler with simplified contra-lateral response.
a-1g show preferred embodiment filters and method flowcharts.
a-2h illustrate head-related acoustic transfer function and virtualizer geometries.
a-4b show shuffler cross-talk canceller arrangements.
Preferred embodiment virtualizers and virtualization methods for multi-channel audio include filtering adapted to switching between loudspeakers and headphones, simplified reverberation by a common long-delay portion for all channels, cross-talk cancellation shuffler implementation with simplified inverse sum, Bark band based virtual locations for 2-channel input, and divided out peak frequencies for cross-talk cancellation simplification.
Preferred embodiment systems (e.g., home stereo sound systems, computer sound systems, et cetera) perform preferred embodiment methods with any of several types of hardware: digital signal processors (DSPs), general purpose programmable processors, application specific circuits, or systems on a chip (SoC) such as combinations of a DSP and a RISC processor together with various specialized programmable accelerators such as for FFTs and variable length coding (VLC). A stored program in an onboard or external flash EEPROM or FRAM could implement the signal processing.
If the two real speakers of
A block diagram is shown in
e-2f show functional blocks for 5-speaker virtualizers using either a pair of real loudspeakers or a set of headphones, respectively. These are identical except for the cross-talk canceller in
Consider a single channel, say left surround, the left input to the cross-talk canceller will be the left surround signal, LSS, filtered by TF3Left and the right input to the cross-talk canceller will be the LSS filtered by TF3Right. Thus the output of the cross-talk canceller which is input to the real speakers is as previously noted:
Then multiply everything out to get:
X1={(H1TF3Left−H2TF3Right)/(H12−H22)}LSS
X2={(H1TF3Right−H2TF3Left)/(H12−H22)}LSS
By using these separate channel cross-talk canceling filters (SCCTC filters), cross-talk cancellation can be applied to any input using the functional blocks in
TF3Left→(H1TF3Left−H2TF3Right)/(H12−H22)
TF3Right→(H1TF3Right−H2TF3Left)/(H12−H22)
where H1, H2 relate to the location of the two real speakers.
The SCCTC filters used for other channel inputs will be analogous but using the corresponding filters in place of the TF3left and TF3Right filters. In practice however, applying this at every frequency results in a loss of dynamic range due to approximate nulls of (H12−H22). To cope with this problem, the preferred embodiment can be combined with the preferred embodiment as illustrated in
c illustrates a preferred embodiment virtualizer which takes two-channel (stereo) input and locates a separate virtual speaker for each Bark frequency band of the input with the virtual speakers spread over a range of angles. That is, in contrast to creation of a virtual speaker for each channel of a multi-channel audio input, a two-channel input can be spread out to give special effects somewhat akin to virtualized multi-channel input. This is a particularly effective approach to two-channel input speaker virtualization and divides the input signals into different frequency bands and places each band at its own location. To maintain continuity, adjacent bands are placed in adjacent directions, although strictly this is not required. Placing different frequency bands at such locations can be thought of as similar to a rainbow effect, since a prism also divides the frequencies of light into adjacent positions.
This “rainbow” virtualizer can be thought of as consisting of a series of low-pass, band-pass and high-pass filters with cut-off frequencies corresponding to standard Bark bands which are listed in
Eright=Σ1≦n≦25H1(92.5−2.5n)BP(n)Sright+H2(267.5+2.5n)BP(n)Sleft
Eleft=Σ1≦n≦25H3(267.5+2.5n)BP(n)Sleft+H4(92.5−2.5n)BP(n)Sright
where the two input channels are Sleft and Sright and BP(n) is a bandpass filter for the nth Bark band. Of course, by symmetry H1(92.5−2.5n)=H3(267.5+2.5n) and H2(92.5−2.5n)=H4(267.5+2.5n). Further, the inputs Sleft and Sright factor out of the sums, so the filters can be combined into four artificial “rainbow” HRTFs defined as:
TFleft-to-right=Σ1≦n≦25H2(267.5+2.5n)BP(n)
TFright-to-right=Σ1≦n≦25H1(92.5−2.5n)BP(n)
TFleft-to-left=Σ1≦n≦25H3(267.5+2.5n)BP(n)
TFright-to-left=Σ1≦n≦25H4(92.5−2.5n)BP(n)
Again by symmetry TFleft-to-left=TFright-to-right, TFleft-to-right=TFright-to-left.
HRTFs for every 5 degrees azimuth in the horizontal plane have been published as noted in the background. The remaining HRTFs can be obtained using interpolation. The lowest Bark band (0-100 Hz) is the farthest from the facing direction, and higher Bark bands become progressively more centered as shown in
Also, the rainbow HRTF pair can be combined with the cross-talk canceller to produce the four filters in
Another useful configuration is to pass high frequencies directly to the two real speakers which helps focus the effect on the mid to lower frequencies, as shown in
Although the principle advantage of this approach is to create a pleasant wider sound, the act of separating frequency bands makes it simple to equalize the sound to better match the original. The first implementation achieved a wide pleasant sound, but with noticeable timbre differences to certain brass instruments (becoming more nasal) and some loss of bass. By weighting each bark band when creating the rainbow HRTF pair, these tonal differences can be minimized through equalization, while maintaining the desired effect. A different version which combined Bark bands and fewer HRTF angles (placed every 5 degrees) also produced a good effect, but was less easy to equalize since the frequency bands were larger.
b illustrates a preferred embodiment cross-talk cancellation shuffler implementation. Non-directional and directional components of stereophonic can be roughly separated through the calculation of the sum and difference signals between left and right channels. Conveniently, this process is performed at the beginning of the shuffler cross-talk cancellation scheme, as shown in
In terms of transfer function matrices, the inverse transform implemented by the preferred embodiment of
The forward transform that describes the hypothetical transformations suffered by the sound waves can be obtained by inverting the foregoing inverse, which results in:
This can be interpreted as the superposition of a constant and non-directional component k with a directional component S0=H1−H2 that produces opposite effects on the ipsi-lateral and contra-lateral paths. Note that if we replace k by M0, the original shuffler equations are recovered.
Also, if the HRTF matrix is applied to the preferred embodiment cross-talk canceller of
By defining F=(H1−H2)/k, we can rewrite this as
2Y1=F(E1+E2)+E1−E2
2Y2=F(E1+E2)−E1+E2
Note that in a situation where F=1 (i.e., the HRTFs are flat and k is adjusted accordingly), we obtain Y1=E1 and Y2=E2, characterizing an ideal cross-talk cancellation effect.
In preferred embodiments with multiple audio channels (for real and/or virtual speakers) each reverberation filter is subdivided into an early arrival section and a shared late reverberation section. The size of the early arrival section can be on the order of 100 coefficients and can be made even shorter by approximating it to a delay followed by a minimum-phase filter; 100 coefficients would correspond to about 2 ms at a 48 KHz sampling rate. The late reverberation section may contain around 8K coefficients in a typical room model with up to 8-th order reflections. The early arrival section is processed in a manner similar to that of
The preferred embodiment achieves significant computational savings due to the large late reverberation filter section that is executed only once per output channel. For example, consider the case of 5 input channels and a full reverberation filter containing 8K (8192) coefficients. Each one can be divided into an early arrival section containing 128 coefficients and a late reverberation section containing 8064 coefficients. Using the conventional scheme, the total number of taps would be 10×8192=81920. With the preferred embodiment scheme, the number of taps would be 10×128+8064×2=17408, which is only about 21% of the conventional scheme. Other obvious advantages relate to the amount of memory that is saved by reducing the number of filter coefficients.
Implementing the preferred embodiment consists of designing the late reverberation filter that is shared by all input channels. Straightforward solutions include taking the average across late reverberation filters or selecting one of the late reverberation sections of the full reverberation filters or choosing a subset of reflections from the original filters and combining. In all cases, the final energy for each channel can be adjusted to have the same value as the original filter section by adjusting parameter kci, where i=0, . . . , 4. Energy is defined as the square root of the mean square of the coefficients. Different delays are also introduced in each late reverberation filter section using parameter dci, and they are obtained directly from the original reverberation filter. The gain and delay for each channel i is represented as kci×z−dci in
The preferred embodiments can be modified in various ways while retaining one or more of the features of Bark band virtualization, common reverberation for multichannel audio, high frequencies divided out in cross-talk cancellation, and cross-talk cancellation filters combined with multi-channel filters.
For example, the two real loudspeakers can be asymmetrically oriented with respect to the listener which implies four distinct acoustic paths from loudspeaker to ear instead of two and thus an asymmetrical 2×2 matrix to invert for cross-talk cancellation. Similarly, three or more loudspeakers imply six or more acoustic paths and non-square matrices with matrix pseudoinverses to be used for cross-talk cancellations.
Analogously, the virtual locations of Bark bands could be varied so more or fewer high frequencies could be combined, and the Bark bands could be replaced with other decompositions of the audio spectrum into three or more bands.
Similarly, the partition of filters into early and late portions could differ from the partition of the first 128 (=27) taps for the early portion and the remaining 8068 of the total 8192 (=213) taps for the late portion. For example, the early portion could be anywhere from the first 1% to the first 10% of the total taps.
This application claims priority from provisional patent applications Nos. 60/657,234, filed Feb. 28, 2005 and 60/756,065, filed Jan. 4, 2006. The following co-assigned copending applications disclose related subject matter: application Ser. No. 11/125,927, filed May 10, 2005.
Number | Name | Date | Kind |
---|---|---|---|
4356349 | Robinson | Oct 1982 | A |
7440575 | Kirkeby | Oct 2008 | B2 |
7536017 | Sakurai et al. | May 2009 | B2 |
7801312 | Kasai et al. | Sep 2010 | B2 |
7835535 | Trautmann et al. | Nov 2010 | B1 |
Number | Date | Country | |
---|---|---|---|
60756065 | Jan 2006 | US | |
60657234 | Feb 2005 | US |