METHOD AND APPARATUS FOR PROVIDING 3D SOUND FOR SURROUND SOUND CONFIGURATIONS

Description

BACKGROUND

Surround Sound (e.g., ATSC A/52 5.1) can only place sound at 5 places (5.1) (FIG. 1), place sound at 7 places (7.1), or place sound at on the line between those places (panning). 3D Sound (e.g., BACCH and other cross-talk cancellers) allows Binaural Audio with 2 Loudspeakers (B2L), such that two speakers can place a sound anywhere in 3D space. Accordingly, when using 3D Sound, the other 3/5 speakers in 5.1/7.1 are not needed to place a sound in 3D space.

However, a large number of console gamers have 5.1 surround sound setups. These gamers currently get more value out of 5 speakers than they do out of 2 speakers. These gamers would like to continue to use 5 speakers, and get more value out of 5 speakers than they do out of 2 speakers.

Accordingly, there is a need to accomplish the 3D effects of 3D Sound while shaking more than two speakers, and preferable all 5/7 speakers. Preferably, the shaking of all 5/7 speakers is accomplished in a manner that actually makes the 5/7 speaker solution sound better.

Listening to Binaural Audio from signals from cross-talk canceller inputted to two (XTC) Speakers in front of the listener (FIG. 2) is desirable. With Headphones, if HRTF mismatch between listener and recording occurs, the sound collapses to inside the listener's head. With BACCH-SP, and two XTC speakers in front of the listener (FIG. 2), if HRTF mismatch between listener and recording occurs, the sound collapses to the stereo pan (speaker locations). The sound is still outside the listener's head and in front of the listener, the correct position for on-screen action.

Listening to binaural Audio with two XTC Speakers behind the listener (FIG. 3) is also enjoyable (e.g., in an automobile, in a video gaming chair, and/or speakers in a room). Again, with Headphones, if HRTF mismatch between listener and recording occurs, the sound collapses to inside the listener's head. With BACCH-SP, and the XTC speakers behind the listener, if HRTF mismatch between listener and recording occurs, the sound collapses to the stereo pan (speaker locations). The pair of rear speakers can be crosstalk cancelled and used to create the sound behind the listener.

One of the reasons for imperfection in the perception of the placement of a sound in 3 d space when using 5.1 or 7.1 setup can be described by comparing a Typical set-up (typical) (FIG. 4) with the Coherent set-up (perfect) 5.1 (FIG. 5) are as follows:

Typical 5.1 (FIG. 4): The Left and Right speakers are the only speakers with full size, full power, and full range; other channels (speakers) are special satellites and should be used for special effects; the speakers are positioned at non-uniform distances, e.g. the speakers are arranged in a square; when multiple listeners are in the space, most of listeners are off-center; the Left-Right Speaker angle about +/−30 degrees; the distance from listener to speakers is undefined; and/or the distance from the microphone to the source might or might not meet P&E recommendations of 6.5-7.5 feet.

Coherent 5.1 (FIG. 5): Identical Speakers are used. Identical Amplification is used. Identical Response of speakers is accomplished. Uniform distances are implemented. The speakers are arranged in a circle. There is a single listener in the center. The Left-Right speaker angle is exactly +/−22.5 or +/−30 degrees. The distance from listener to speakers meets P&E recommendations of 6.5-7.5 feet. The distance from microphone to source meets P&E recommendations of 6.5-7.5 feet.

Even though experts argue that the assumption that sound is always superimposable is not true, relying on this assumption typically works.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows a listener listening to surround sound with a speaker placement in accordance with ATSC A/52.

FIG. 2 shows a listener listening to binaural audio from signals from cross-talk canceller inputted to two (XTC) speakers in front of the listener.

FIG. 3 shows a listener listening to binaural Audio with two XTC Speakers behind the listener.

FIG. 4 shows a Typical surround sound speaker set-up (typical), where the Left and Right speakers are the only speakers with full size, full power, and full range; other channels (speakers) are special satellites and should be used for special effects and the speakers are positioned at non-uniform distances.

FIG. 5 shows with a Coherent surround sound speaker set-up (perfect) 5.1, where Identical Speakers are used, Identical Amplification is used, Identical Response of speakers is accomplished, and Uniform distances are implemented.

FIG. 6 shows an embodiment, where 3D Sounds (e.g., BACCH-SP) in the front hemisphere are sent to front XTC speakers of a 5.1 speaker set-up, and 3D Sounds in the rear hemisphere are sent to the rear XTC speakers of the 5.1 speaker set-up.

FIG. 7 shows an embodiment for providing 3D Sound with a 5.1 speaker set-up, where a BACCH-SP can be used for the front speakers of the 5.1 set-up, a BACCH-SP can be used for the rear speakers of the 5.1 set-up of the 5.1 set-up, and a Line-Out can be used for the Center speaker of the 5.1 set-up.

FIG. 8 shows planes of elevation with a ring of Azimuth point. One 90° elevation “north pole” point.

FIG. 9 shows a sphere around the listener vertically sliced, where the Mid Crossover Point (MCX), a.k.a. the 90 degree line, passes through the listener's ears, the Rear Crossover Point (RCX) is the point where the sound backward of the RCX should be panned completely to the rear speaker, and the Front Crossover Point (FCX) is the point where the sound forward of the FCX should be panned to the front speakers.

FIG. 10 shows the wide chop set.

FIG. 11 shows the wall chop set.

FIG. 12 shows the tight chop set.

FIG. 13 shows the slide chop set.

FIG. 14 shows the region of “The NoseCone.”

FIG. 15 shows the “The NoseCone” set.

FIG. 16 shows a circuit for applying XTC filtering in accordance with an embodiment of the invention.

DETAILED DISCLOSURE

In an embodiment, 3D Sounds (e.g., BACCH-SP) in the front hemisphere are sent to front XTC speakers of a 5.1 speaker set-up (FIG. 6) and 3D Sounds in the rear hemisphere are sent to the rear XTC speakers of the 5.1 speaker set-up. Since with BACCH-SP, when HRTF mismatch between listener and recording occurs the sound collapses to the speaker locations, having physical rear speakers in the 5.1/7.1 speaker set-up assures that rear sounds stay in the rear, giving value to a 5.1 speaker setup in a 3D world.

Crosstalk cancelling filters (XTC) can be generated in a Normal fashion such that sounds placed anywhere are of equal spectral flatness. Alternatively, XTC filters can be generated in a Narrow fashion such that a sound that has an at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, 100%, 95-96%, 96-97%, 97-98%, 98-99%, 99-100%, 95-100%, 96-100%, 97-100%, 98-100%, and/or 99-100% correlation between Left and Right—as in a mono source placed dead center—will appear to be 3-4 dB lower than expected. This is referred to as Narrow filter or an XTC filter with a center hole. A Narrow filter can be intentionally generated so that the traditional Center channel speaker is given a job to do in filling the center hole. Alternatively, a Normal filter can be used and the Center channel can be unused or used for an unrelated purpose, such as traditional surround purpose or a channel dedicated to dialogue.

Using Normal Span allows the full expected volume on Front Left and Front Right to be preserved. Mixing sounds that are 100% correlated between left and right to center—a narrow range of sounds near center—into the physical center speaker of the 5.1 speaker set-up will fill a portion of the annular band between left and right, which can be referred to as the center hole, where the binaural signals with 100% correlate between left and right are 3 dB down. In specific embodiments, 100% correlated is met with at least 97%, at least 98%, or at least 99% correlated. Mixing sounds to the center speaker also “shakes the center speaker,” fully utilizing all of the 5.1 speakers.

In this way, embodiments of the invention relate to a method and apparatus for providing 3D Sound with a 5.1 speaker set-up. In a specific embodiment, shown in FIG. 7, a BACCH-SP can be used for the front speakers of the 5.1 set-up, a BACCH-SP can be used for the rear speakers of the 5.1 set-up of the 5.1 set-up, and a Line-Out can be used for the Center speaker of the 5.1 set-up. A Line-Out can be used for the Sub speaker (subwoofer). A soundcard with 5.1 line out can be used to implement a specific embodiment.

FIG. 16 shows an embodiment of a circuit that can be used to apply a pair of XTC filters to a corresponding pair of binaural signals that have been created by (i) applying an HRTF filter to a mono signal, where the HRTF filter takes into account of the xyz position of the sound and inputs the binaural signal outputted from the HTRF filter to a mixer with position (xyz) based GAIN, such that the mixer outputs the pair of binaural signals inputted to the pair of XTC filters. The outputted pairs of signals from the pair of XTC filters can then be inputted to corresponding pairs of speakers. In a specific embodiment, the outputted pairs of signals from the pair of XTC filters can then be inputted to corresponding pairs of left and right speakers. In specific embodiments, (1) the GAIN F and R can be amplitude gains for front (F) pairs of speakers and rear (R) pairs of speakers, (2) the GAIN T and M (instead of F and R) can be amplitude gains with bandpass filtering applied as well, for pair of tweeter (T) speakers and pair of midrange (M) speakers, and (3) the mixer with position (xyz) based GAIN can output three pairs of speakers (tweeters, midranges, and woofers), where the gains T, M, and W (instead of F and R) can have bandpass filtering applied as well.

Software can be used to send front and rear hemisphere sounds to different line outs, and mix between different line outs. Center channel and low frequency effects channel (LFE) can be mixed (the “0.1” subwoofer channel is treated as directionless and handled in the traditional method).

The subject approach to providing 3D Sound with a 5.1 set-up can be extended to any number of speakers, e.g., 7.1 set-up by partitioning the sphere into more segments, separating the segment for each pair of speakers, and using a crosstalk canceller for each speaker pair. When audio is in the area reproduced by that speaker pair, the crosstalk cancelled binaural audio is sent to that speaker pair. In a specific embodiment, the mixing of areas between speaker zones can be adjusted to maintain a smooth transition. Embodiments can be applied to overhead and off-level pairs of speakers.

In an embodiment extending the implementation to provide 3D Sound via a 7.1 set-up, Back Left and Back Right speakers can be moved to between 90 and 110 degrees back. Two new channels, Surround Back Left and Surround Back Right are added at 130 to 150 degrees back.

In a specific embodiment, a set of 3 HRTF's are created, one for the center speaker, one for the front pair of speakers, and one for the rear pair of speakers.

Referring to the MIT HRTF's (via KEMAR) elevation is a plane, where elevation 0 is the horizontal plane. Elevation −40 dips to your knees, elevation 90 is directly overhead. All of these planes pass through the center of your ears and the center of your head.

Using Elevation zero as an example, Azimuth 0 is directly ahead, Azimuth 90 is direct right, Azimuth −90 is direct left, and Azimuth 180° (or −180°) is directly back.

The MIT space has the convenient property that azimuth does not change with elevation—if the elevations planes are all smashed together vertically like a stacking cups, the azimuths do not change.

FIG. 8 (MIT Model) shows planes of elevation with a ring of Azimuth point. One 90° elevation “north pole” point. By adding a distance value all of three-dimensional space can be addressed. Other coordinate systems are commonly used to describe points in space.

The physical speakers are at their standard locations in the elevation 0 plane (FIG. 1).

Because the speakers are in the horizontal plane, the sphere around the listener is vertically sliced as shown in FIG. 9, where the Mid Crossover Point (MCX), a.k.a. the 90 degree line, passes through the listener's ears.

The point of vertically slicing the sphere around the listener is that when or if the HRTF's between listener and recording mismatches or XTC mismatches, the sound collapses to the speakers that are either ahead or you or behind you. Ahead of you and behind you are distinct concepts, and the dividing line between them is hard left and hard right, in this space 90 degrees and −90 degrees.

The Rear Crossover Point (RCX) is the point where the sound backward of the RCX should be panned completely to the rear speaker. As the rear speaker is physically located at 110 degrees, RCX should not be farther back than 110 degrees.

The Front Crossover Point (FCX) is the point where the sound forward of the FCX should be panned to the front speakers.

An assumption is that symmetry is good. For the sake of symmetry, FCX should be as far forward as RCX is backward. RCX is at most 20 degrees from MCX, so FCX should be no farther forward than 70 degrees.

Table of Crossover Sets for specific embodiments

Chop Set Name
FCX
MCX
RCX
Reasoning

Wide
70
90
110
Widest mixing

zone that can be

made with initial

assumptions

Tight
80
90
100
A tighter mixing

zone

Wall
91
92
93
90 is all forward

and the next

point (95) is all

backward

Slide
90
100
110
90 is all forward

and we reach all

back by the time

we get to the

rear speaker

FIG. 10 shows the wide chop set.

FIG. 11 shows the wall chop set.

FIG. 12 shows the tight chop set.

FIG. 13 shows the slide chop set.

FIG. 14 shows the region of “The NoseCone.”

FIG. 15 shows the “The NoseCone” set.

Under a crater model there is a “center hole” where binaural signals with 100% correlation between left and right are 3 dB down. This area can be thought of like a crater, it has a flat floor at the bottom of a hole 3 dB deep. The bottom edges of this flat hole are the −3 dB base. A smooth slope from the bottom to the top is desired. The top edge, where center sound stops, is the rim. The width of this crater is quite narrow.

The height of the crater is a different matter. The 100% correlation can happen anywhere in the vertical plane that slices the user in half left-to-right, and it is clearly impractical to send power to center for overhead and behind signals. The crater height is tall, but not so tall as to detract from the sensation of actual overhead signals processed with HRTFs.

Impulse Reponses

Signals from in front and overhead are correctly positioned with the HRTF copies that are crosstalk cancelled and sent to front left and front right speakers. The goal is just to fill the center hole, not to detract from those positions. The front speaker cannot be crosstalk cancelled by itself, it has an unmistakable cue that matches its physical position. In an embodiment, the array can be replaced with a scalar value at its first position, which is appropriate if all the HRTF's are zero-phase filters. In another embodiment, in order to maintain the pure delay part of the HRTF's, the HRTF's are replaced with a scalar value at the location of their peak.

Table of Nosecone Sets for specific embodiments

Chop Set
Az base
Az Rim

El

Name
(AB)
(AR)
El base (EB)
rim (ER)
Reasoning

First
5
15
20
40
First, a narrow

width, a

modest height

Zero
5
15
20
40
Same area as

First, but with

scalars at the

start of the

array instead

of at the peak

location

In an embodiment to implement 3D Sound with 5.1 set-up,

A BACCH-SP can be used for the front speakers,

A BACCH-SP can be used for the rear speakers,

A Line-Out can be used for Center,

A Line-Out can be used for Sub,

A soundcard with 5.1 line out can be used.

In an embodiment, software can be used to send front and rear hemisphere to different line outs, such as front speakers and rear speakers, respectively.

In another embodiment, software can be used to send sounds forward of FCX to front speakers, send sounds rear of RCX to rear speakers, and mix sounds between RCX and FCX to front speakers and rear speakers line outs. In this way, sounds between RCX and FCX can be implemented using the front and rear speakers where sounds near RCX can send a larger portion to the rear speakers, sounds art MCX can send 50-50 to front and rear speakers, and sounds near FCX can send a larger portion to the front speakers. In a specific embodiment, this can be a linear transition.

Center and LFE mix (the “0.1” subwoofer channel can be treated as directionless and handled in the traditional method). A single speaker is often composed of multiple speakers, each reproducing a subset of the audible frequency band. These speakers are often combined into one speaker, even if they contain a subwoofer, a midrange, and a tweeter. The “0.1” makes it clear that there can be a different number of subwoofers than there are other loudspeakers, and that the subwoofer can be located in a different location. This is also true of midranges and tweeters. There can be a different number of midranges and tweeters in different locations, and their XTC filters can be designed separately, each responsible for generating crosstalk cancelled sound in their part of the audio spectrum.

This method of providing 3D sound for Surround Configurations should not be confused with the Optimal Source Distribution (OSD) method of providing 3D Sound through loudspeakers. In the method embodied herein, the speaker positions are constrained, either by the standard configurations already in use in the surround sound industry, by user placement, or by physical placement of a designer, such as the loudspeaker position chosen for an automotive cabin, and the XTC filters are then designed such that a 3D Sound listening experience is generated for the user. In the OSD method the location of the listener, the number of speaker pairs, and the required frequency response of each speaker pair is constrained by the desired quality of the results and the capabilities of the OSD filters and the entire system must be constructed to meet the constraints of the OSD method 719 190 006.

Embodiments can be extended to any number of speakers. The sphere can be partitioned into more segments, separating the segment for each pair of speakers, e.g., to extend 5.1 to 7.1.

A crosstalk canceller can be created for the speaker pair of the 7.1 set-up.

When audio is in the area reproduced by that speaker pair, send the crosstalk cancelled binaural audio to that speaker pair

The mixing areas between speaker zones can be adjusted to maintain a smooth transition for sounds between two CX's

Embodiments can be applied to overhead and off-level pairs of speakers.

Extending 5.1 to 7.1, Back Left and Back Right move between 90 and 110 degrees back, and two new channels, Surround Back Left and Surround Back Right, are added at 130 to 150 degrees back.

The use of 5/7/more speakers has been standardized in ITU-R BS.775 https://www.itu.int/dm_spubrec/itu-r/rec/bs/R-REC-BS.775-3-2012084-I!!PDF-E.pdf “Multichannel Stereophonic sound system with and without accompanying picture ITU-R BS.775-3 (August 2012) (Radiocommunication Sector of International Telecommunication Union BS.775-3 (OB/2012)) and in “Multichannel sound technology in home and broadcasting applications” IT4-R B5-2159-4 (May 2012), both of which are incorporated by reference herein in their entirety.

EMBODIMENTS
Embodiment 1

A system for listening to binaural audio through a plurality of speakers by dividing the speakers into pairs, generating a Crosstalk Cancellation Filter for each pair, and distributing the binaural audio among the speaker pairs.

Embodiment 2