Surround Sound (e.g., ATSC A/52 5.1) can only place sound at 5 places (5.1) (
However, a large number of console gamers have 5.1 surround sound setups. These gamers currently get more value out of 5 speakers than they do out of 2 speakers. These gamers would like to continue to use 5 speakers, and get more value out of 5 speakers than they do out of 2 speakers.
Accordingly, there is a need to accomplish the 3D effects of 3D Sound while shaking more than two speakers, and preferable all 5/7 speakers. Preferably, the shaking of all 5/7 speakers is accomplished in a manner that actually makes the 5/7 speaker solution sound better.
Listening to Binaural Audio from signals from cross-talk canceller inputted to two (XTC) Speakers in front of the listener (
Listening to binaural Audio with two XTC Speakers behind the listener (
One of the reasons for imperfection in the perception of the placement of a sound in 3 d space when using 5.1 or 7.1 setup can be described by comparing a Typical set-up (typical) (
Typical 5.1 (
Coherent 5.1 (
Even though experts argue that the assumption that sound is always superimposable is not true, relying on this assumption typically works.
In an embodiment, 3D Sounds (e.g., BACCH-SP) in the front hemisphere are sent to front XTC speakers of a 5.1 speaker set-up (
Crosstalk cancelling filters (XTC) can be generated in a Normal fashion such that sounds placed anywhere are of equal spectral flatness. Alternatively, XTC filters can be generated in a Narrow fashion such that a sound that has an at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, 100%, 95-96%, 96-97%, 97-98%, 98-99%, 99-100%, 95-100%, 96-100%, 97-100%, 98-100%, and/or 99-100% correlation between Left and Right—as in a mono source placed dead center—will appear to be 3-4 dB lower than expected. This is referred to as Narrow filter or an XTC filter with a center hole. A Narrow filter can be intentionally generated so that the traditional Center channel speaker is given a job to do in filling the center hole. Alternatively, a Normal filter can be used and the Center channel can be unused or used for an unrelated purpose, such as traditional surround purpose or a channel dedicated to dialogue.
Using Normal Span allows the full expected volume on Front Left and Front Right to be preserved. Mixing sounds that are 100% correlated between left and right to center—a narrow range of sounds near center—into the physical center speaker of the 5.1 speaker set-up will fill a portion of the annular band between left and right, which can be referred to as the center hole, where the binaural signals with 100% correlate between left and right are 3 dB down. In specific embodiments, 100% correlated is met with at least 97%, at least 98%, or at least 99% correlated. Mixing sounds to the center speaker also “shakes the center speaker,” fully utilizing all of the 5.1 speakers.
In this way, embodiments of the invention relate to a method and apparatus for providing 3D Sound with a 5.1 speaker set-up. In a specific embodiment, shown in
Software can be used to send front and rear hemisphere sounds to different line outs, and mix between different line outs. Center channel and low frequency effects channel (LFE) can be mixed (the “0.1” subwoofer channel is treated as directionless and handled in the traditional method).
The subject approach to providing 3D Sound with a 5.1 set-up can be extended to any number of speakers, e.g., 7.1 set-up by partitioning the sphere into more segments, separating the segment for each pair of speakers, and using a crosstalk canceller for each speaker pair. When audio is in the area reproduced by that speaker pair, the crosstalk cancelled binaural audio is sent to that speaker pair. In a specific embodiment, the mixing of areas between speaker zones can be adjusted to maintain a smooth transition. Embodiments can be applied to overhead and off-level pairs of speakers.
In an embodiment extending the implementation to provide 3D Sound via a 7.1 set-up, Back Left and Back Right speakers can be moved to between 90 and 110 degrees back. Two new channels, Surround Back Left and Surround Back Right are added at 130 to 150 degrees back.
In a specific embodiment, a set of 3 HRTF's are created, one for the center speaker, one for the front pair of speakers, and one for the rear pair of speakers.
Referring to the MIT HRTF's (via KEMAR) elevation is a plane, where elevation 0 is the horizontal plane. Elevation −40 dips to your knees, elevation 90 is directly overhead. All of these planes pass through the center of your ears and the center of your head.
Using Elevation zero as an example, Azimuth 0 is directly ahead, Azimuth 90 is direct right, Azimuth −90 is direct left, and Azimuth 180° (or −180°) is directly back.
The MIT space has the convenient property that azimuth does not change with elevation—if the elevations planes are all smashed together vertically like a stacking cups, the azimuths do not change.
The physical speakers are at their standard locations in the elevation 0 plane (
Because the speakers are in the horizontal plane, the sphere around the listener is vertically sliced as shown in
The point of vertically slicing the sphere around the listener is that when or if the HRTF's between listener and recording mismatches or XTC mismatches, the sound collapses to the speakers that are either ahead or you or behind you. Ahead of you and behind you are distinct concepts, and the dividing line between them is hard left and hard right, in this space 90 degrees and −90 degrees.
The Rear Crossover Point (RCX) is the point where the sound backward of the RCX should be panned completely to the rear speaker. As the rear speaker is physically located at 110 degrees, RCX should not be farther back than 110 degrees.
The Front Crossover Point (FCX) is the point where the sound forward of the FCX should be panned to the front speakers.
An assumption is that symmetry is good. For the sake of symmetry, FCX should be as far forward as RCX is backward. RCX is at most 20 degrees from MCX, so FCX should be no farther forward than 70 degrees.
Under a crater model there is a “center hole” where binaural signals with 100% correlation between left and right are 3 dB down. This area can be thought of like a crater, it has a flat floor at the bottom of a hole 3 dB deep. The bottom edges of this flat hole are the −3 dB base. A smooth slope from the bottom to the top is desired. The top edge, where center sound stops, is the rim. The width of this crater is quite narrow.
The height of the crater is a different matter. The 100% correlation can happen anywhere in the vertical plane that slices the user in half left-to-right, and it is clearly impractical to send power to center for overhead and behind signals. The crater height is tall, but not so tall as to detract from the sensation of actual overhead signals processed with HRTFs.
Impulse Reponses
Signals from in front and overhead are correctly positioned with the HRTF copies that are crosstalk cancelled and sent to front left and front right speakers. The goal is just to fill the center hole, not to detract from those positions. The front speaker cannot be crosstalk cancelled by itself, it has an unmistakable cue that matches its physical position. In an embodiment, the array can be replaced with a scalar value at its first position, which is appropriate if all the HRTF's are zero-phase filters. In another embodiment, in order to maintain the pure delay part of the HRTF's, the HRTF's are replaced with a scalar value at the location of their peak.
In an embodiment to implement 3D Sound with 5.1 set-up,
A BACCH-SP can be used for the front speakers,
A BACCH-SP can be used for the rear speakers,
A Line-Out can be used for Center,
A Line-Out can be used for Sub,
A soundcard with 5.1 line out can be used.
In an embodiment, software can be used to send front and rear hemisphere to different line outs, such as front speakers and rear speakers, respectively.
In another embodiment, software can be used to send sounds forward of FCX to front speakers, send sounds rear of RCX to rear speakers, and mix sounds between RCX and FCX to front speakers and rear speakers line outs. In this way, sounds between RCX and FCX can be implemented using the front and rear speakers where sounds near RCX can send a larger portion to the rear speakers, sounds art MCX can send 50-50 to front and rear speakers, and sounds near FCX can send a larger portion to the front speakers. In a specific embodiment, this can be a linear transition.
Center and LFE mix (the “0.1” subwoofer channel can be treated as directionless and handled in the traditional method). A single speaker is often composed of multiple speakers, each reproducing a subset of the audible frequency band. These speakers are often combined into one speaker, even if they contain a subwoofer, a midrange, and a tweeter. The “0.1” makes it clear that there can be a different number of subwoofers than there are other loudspeakers, and that the subwoofer can be located in a different location. This is also true of midranges and tweeters. There can be a different number of midranges and tweeters in different locations, and their XTC filters can be designed separately, each responsible for generating crosstalk cancelled sound in their part of the audio spectrum.
This method of providing 3D sound for Surround Configurations should not be confused with the Optimal Source Distribution (OSD) method of providing 3D Sound through loudspeakers. In the method embodied herein, the speaker positions are constrained, either by the standard configurations already in use in the surround sound industry, by user placement, or by physical placement of a designer, such as the loudspeaker position chosen for an automotive cabin, and the XTC filters are then designed such that a 3D Sound listening experience is generated for the user. In the OSD method the location of the listener, the number of speaker pairs, and the required frequency response of each speaker pair is constrained by the desired quality of the results and the capabilities of the OSD filters and the entire system must be constructed to meet the constraints of the OSD method 719 190 006.
Embodiments can be extended to any number of speakers. The sphere can be partitioned into more segments, separating the segment for each pair of speakers, e.g., to extend 5.1 to 7.1.
A crosstalk canceller can be created for the speaker pair of the 7.1 set-up.
When audio is in the area reproduced by that speaker pair, send the crosstalk cancelled binaural audio to that speaker pair
The mixing areas between speaker zones can be adjusted to maintain a smooth transition for sounds between two CX's
Embodiments can be applied to overhead and off-level pairs of speakers.
Extending 5.1 to 7.1, Back Left and Back Right move between 90 and 110 degrees back, and two new channels, Surround Back Left and Surround Back Right, are added at 130 to 150 degrees back.
The use of 5/7/more speakers has been standardized in ITU-R BS.775 https://www.itu.int/dm_spubrec/itu-r/rec/bs/R-REC-BS.775-3-2012084-I!!PDF-E.pdf “Multichannel Stereophonic sound system with and without accompanying picture ITU-R BS.775-3 (August 2012) (Radiocommunication Sector of International Telecommunication Union BS.775-3 (OB/2012)) and in “Multichannel sound technology in home and broadcasting applications” IT4-R B5-2159-4 (May 2012), both of which are incorporated by reference herein in their entirety.
A system for listening to binaural audio through a plurality of speakers by dividing the speakers into pairs, generating a Crosstalk Cancellation Filter for each pair, and distributing the binaural audio among the speaker pairs.
The system according to Embodiment 0,
wherein the plurality of speakers is a surround sound system.
The system according to Embodiment 0,
wherein the plurality of speakers is a 5.1 surround sound system in which the speakers are placed in approximately the ITU-R BS 775 configuration of a circle at the level of the listener, center speaker forward of the listener at zero degrees on the circle, left and right speakers at +/−30 degrees, and surround speakers at +/−110 degrees.
The system according to Embodiment 0,
wherein the plurality of speakers is a 5.1 surround sound system in which the speakers are placed in a variation of the ITU-R BS 775 configuration in which the height is ignored, the center speaker is forward of the listener at zero degrees on the circle or missing, left and right speakers are at the “music” position of +/−30 degrees or the “cinema” position of +/−22 degrees, and the surround speakers at +/−110 degrees or the popular variation of +/−135 degrees.
The system according to Embodiment 0,
wherein the plurality of speakers is a 5.1 surround sound system, 6.1 surround sound system, 7.1 surround sound system, 10.2, 22.2 or any count of surround sound loudspeakers in any configuration.
The system according to Embodiment 0,
wherein the Crosstalk Cancellation Filter uses BACCH 3D Sound technology invented at Princeton University.
The system according to Embodiment 0,
wherein there is a center speaker that is not matched as part of a pair or a plurality of speakers in the center plane equidistant to both ears of the listener, in which the unmatched speaker or speakers are unused or used for non-binaural content.
The system according to Embodiment 0,
wherein there is a center speaker that is not matched as part of a pair or a plurality of speakers in the center plane equidistant to both ears of the listener, in which the unmatched speaker or speakers are unused to contribute to the effect of sound coming directly or within a few degrees of the direction of themselves, and the energy of the 3D sound signal from the other speaker pairs is reduced commensurately in an volumetric area around the unpaired speaker or speakers.
The system according to Embodiment 0,
wherein there is a center speaker that is not matched as part of a pair or a plurality of speakers because it is a subwoofer used for low frequency effects.
A system and method for listening to binaural audio through a plurality of speakers by dividing the speakers into groups, generating a Crosstalk Cancellation Filter for each group, and distributing the binaural audio among the speaker groups.
The system according to Embodiment 0,
wherein the plurality of speakers consist of a pair of loudspeakers that excite only a portion of the audible audio frequency range.
The system according to Embodiment 0,
wherein the group of speakers consist of one set of speakers that excites all or a portion of the audible audio frequency range, and one set of speakers that excites all or an overlapping but not identical portion of the audible frequency range.
The system according to Embodiment 0,
wherein the group of speakers consist of the speakers in an automotive cabin.
A system for placing a binaural audio signal onto a plurality of pairs on crosstalk cancelled loudspeakers.
The system according to Embodiment 0,
wherein the portion of the signal directed to each loudspeaker pair is determined by the intended direction of the audio signal and the position of each loudspeaker pair.
The system according to Embodiment 0,
wherein the portion of the signal directed to each loudspeaker pair is determined by the intended direction of the audio signal and the position of each loudspeaker pair by starting with the desired perceived azimuthal angle of the audio source and comparing it to the azimuthal angles of each of the speakers.
The system according to Embodiment 0,
wherein the portion of the signal directed to each loudspeaker pair is determined by the intended direction of the audio signal and dividing a circle on the level with the listener into target groups of azimuthal angles in which each pair of speakers is intended to operate.
The system according to Embodiment 0,
wherein the portion of the signal directed to each loudspeaker pair is determined by the x,y position of the audio signal and dividing a space on the level with the listener into regions on a plane in which each pair of speakers is intended to operate.
The system according to Embodiment 0,
wherein the portion of the signal directed to each loudspeaker pair is determined by the intended direction of the audio signal and the position of each loudspeaker pair by starting with the desired perceived azimuthal and elevation angle of the audio source and comparing it to the azimuthal and elevation angle of each of the speakers.
The system according to Embodiment 0,
wherein the portion of the signal directed to each loudspeaker pair is determined by the intended direction of the audio signal and dividing a sphere around the listener into target groups of spherical sectors in which each pair of speakers is intended to operate.
The system according to Embodiment 0,
wherein the portion of the signal directed to each loudspeaker pair is determined by the x,y,z position of the audio signal or equivalent 3-space signal in any coordinate system and dividing 3-space into regions in which each pair of speakers is intended to operate.
The system according to Embodiment 0,
wherein the portion there are crossover regions between each pair of loudspeakers in which the binaural signal is mixed proportionally into each region.
The system according to Embodiment 0,
wherein the portion there are crossover regions between each pair of loudspeakers in which the binaural signal is mixed proportionally into each region using constant total power mixing.
The system according to Embodiment 0,
wherein the system is acting on an arbitrarily large number of source signals, each with unique position data.
The system according to Embodiment 0,
wherein the system is acting on an arbitrarily large number of source signals, each with unique position data, and the positions are changing as a function of time such that the processing from one time period and position needs to be mixed into the processing for the next time period and position in order to prevent discontinuity in the output signal.
The system according to Embodiment 0,
wherein certain regions around unmatched speakers are rendered as a combination of a crosstalk cancelled signal to a speaker pair and an unmatched signal to the unmatched speaker.
The system according to Embodiment 0,
wherein certain regions of the frequency spectrum are divided among crosstalk cancelled speaker pairs in a different manner than other regions of the frequency spectrum targeted at other speaker pairs.
Aspects of the invention, such as implementing filters and mixing signals, may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with a variety of computer-system configurations, including multiprocessor systems, microprocessor-based or programmable-consumer electronics, minicomputers, mainframe computers, and the like. Any number of computer-systems and computer networks are acceptable for use with the present invention.
Specific hardware devices, programming languages, components, processes, protocols, and numerous details including operating environments and the like are set forth to provide a thorough understanding of the present invention. In other instances, structures, devices, and processes are shown in block-diagram form, rather than in detail, to avoid obscuring the present invention. But an ordinary-skilled artisan would understand that the present invention may be practiced without these specific details. Computer systems, servers, work stations, and other machines may be connected to one another across a communication medium including, for example, a network or networks.
As one skilled in the art will appreciate, embodiments of the present invention may be embodied as, among other things: a method, system, or computer-program product. Accordingly, the embodiments may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware. In an embodiment, the present invention takes the form of a computer-program product that includes computer-useable instructions embodied on one or more computer-readable media.
Computer-readable media include both volatile and nonvolatile media, transient and non-transient media, removable and nonremovable media, and contemplate media readable by a database, a switch, and various other network devices. By way of example, and not limitation, computer-readable media comprise media implemented in any method or technology for storing information. Examples of stored information include computer-useable instructions, data structures, program modules, and other data representations. Media examples include, but are not limited to, information-delivery media, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD), holographic media or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage, and other magnetic storage devices. These technologies can store data momentarily, temporarily, or permanently.
The invention may be practiced in distributed-computing environments where tasks are performed by remote-processing devices that are linked through a communications network. In a distributed-computing environment, program modules may be located in both local and remote computer-storage media including memory storage devices. The computer-useable instructions form an interface to allow a computer to react according to a source of input. The instructions cooperate with other code segments to initiate a variety of tasks in response to data received in conjunction with the source of the received data.
The present invention may be practiced in a network environment such as a communications network. Such networks are widely used to connect various types of network elements, such as routers, servers, gateways, and so forth. Further, the invention may be practiced in a multi-network environment having various, connected public and/or private networks.
Communication between network elements may be wireless or wireline (wired). As will be appreciated by those skilled in the art, communication networks may take several different forms and may use several different communication protocols. And the present invention is not limited by the forms and communication protocols described herein.
The examples and embodiments described herein are for illustrative purposes only and various modifications or changes in light thereof will be apparent to persons skilled in the art and are included within the spirit and purview of this application. In addition, any elements or limitations of any invention or embodiment thereof disclosed herein can be combined with any and/or all other elements or limitations (individually or in any combination) or any other invention or embodiment thereof disclosed herein, and all such combinations are contemplated with the scope of the invention without limitation thereto.
All patents, patent applications, provisional applications, and publications referred to or cited herein (including those in the “References” section) are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
The present application claims the benefit of U.S. Provisional Application Ser. No. 62/308,661, filed Mar. 15, 2016, which is hereby incorporated by reference herein in its entirety, including any figures, tables, or drawings.
Number | Date | Country | |
---|---|---|---|
62308661 | Mar 2016 | US |