A typical surround sound home audio system uses multiple speakers driven with separate audio channels to create a “surround sound” listening experience. The most prevalent system currently is a 5.1 channel surround system that requires five speakers for left, center, right, surround left, and surround right channels, as well as a subwoofer for low frequency environmental effects (LFE). With proper placement of the speakers in front and in back of the listener (i.e., to the listener's front left, front center, front right, rear left and rear right), these systems create the sensation of being surrounded by the sound of a movie, music performance or other desired audio environment. However, the multiple speakers used by these systems make them over complicated for most home users to set up and configure properly. In particular, it is difficult and expensive to unobtrusively position and wire speakers in front and behind the listening position (chairs or couch) of a home theatre. These systems are further complicated by a need to conduct setup testing to adjust the speaker placement and amplifier balance to achieve the best surround sound listening experience.
Virtual surround systems use sound localization techniques to produce the sensation of a full surround sound field using a simple stereo pair of speakers. These sound localization techniques map the surround sound channels (e.g., the 5.1 surround channels) into a virtual space, creating the perception of sound sources (the missing speakers) to the sides and behind the listener without actual physical speakers positioned there. One approach to virtually localizing sound sources uses filtering with a head related transfer function (HRTF). An HRTF models the frequency response of the human head and ear as a function of the source direction. When the HRTF-based approach is used with speakers, it typically requires careful crosstalk cancellation to achieve good localization precision. Virtual surround systems therefore have used interaural path cancellation (also called interaural crosstalk cancellation) together with the HRTF processing. The interaural path cancellation attempts to isolate sounds intended for the left ear to the left speaker, and sound to the right ear from the right speaker. A drawback to this HRTF-based approach with interaural path cancellation, however, is that it generally produces a very narrow “sweet spot” where the virtualization effect can properly be heard. In other words, the virtual surround sound effect can be destroyed if the listener turns his or her head, or moves slightly away from the sweet spot. The listener thus is required to sit in a very specific position in the room, and maintain a head position directly toward the center of the two loudspeakers.
The following Detailed Description concerns various techniques and apparatus that provide virtual surround sound using a pair of physical loudspeakers. The techniques use a combination of head related transfer functions and shaped reverberation to provide widening and front/back auditory clues without requiring any kind of interaural path cancellation. This combination can provide a good sensation of front/back and left/right directionality, and envelopment. By eliminating the interaural path cancellation, the technique can be implemented in a simpler (lower computational power) device. With the interaural path cancellation eliminated, the listening area where the virtual surround sound effect can be perceived is much wider. Further, the effect is not dependent on head position or the direction that the listener faces.
According to a first aspect, the technique uses a combination of head related transfer functions, including a 360 degree power-response head related transfer function, to provide perceptual separation of the reverberant and direct paths.
According to a further aspect, the technique uses different, discrete reverberation for left and right rendering channels. This decorrelates the reverberation rendered to the left and right channels, which provides envelopment.
This Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Additional features and advantages of the invention will be made apparent from the following detailed description of embodiments that proceeds with reference to the accompanying drawings.
The following detailed description concerns various techniques and systems for speaker virtualization. The speaker virtualization techniques are illustrated in the context of their particular application to audio systems suitable for home and other like small listening areas, to provide a surround experience from as few as a pair of loudspeakers. The techniques can also be applied in other sound virtualization applications.
More particularly, the speaker virtualization systems and techniques use a combination of head related transfer functions and shaped reverberation to provide widening and front/back auditory clues without requiring interaural path cancellation. As compared to virtual surround techniques based on interaural path cancellation, the speaker virtualization systems and techniques described herein can provide a wider listening area and surround effect that is not dependent on head position or direction that the listener is facing.
The various techniques and tools described herein may be used independently. Some of the techniques and tools may be used in combination. Various techniques are described below with reference to flowcharts of processing acts. The various processing acts shown in the flowcharts may be consolidated into fewer acts or separated into more acts. For the sake of simplicity, the relation of acts shown in a particular flowchart to acts described elsewhere is often not shown. In many cases, the acts in a flowchart can be reordered.
I. Overview
With reference to
The speaker virtualization system 100 uses a combination of head-related transfer functions, including a 360 degree power-response HRTF to provide perceptual separation between reverberant and direct paths. Further, the speaker virtualization system uses different, discrete reverberation for the two output channels, so as to decorrelate the reverberation rendered via the two output channels to create a sensation of envelopment. This provides widening and front/back auditory clues without having interaural path cancellation. The speaker virtualization system 100 therefore can produce the virtual surround effect in a wider listening area, which is independent of the listener's head position and facing.
II. Detailed Explanation of Virtual Surround Processing
With reference to
The processing path 210 for the front channels includes several stages. In a first sum and difference processing stage 211, the processing path scales the left and right input channels 120, 121 by half, and produces the sum 212 and difference 213 of the scaled input channels. The front channels processing path 210 then applies a “near-front” head related transfer function (HRTF) 214 to the difference signal 213. This is followed by a second sum and difference processing stage 215, where the difference signal 213 is scaled up by a factor of 1.2 while the sum signal 212 is scaled down by a scaling factor equal to 0.8. This results in left and right channel signals 216, 217. Finally, a last processing stage 218 of the front channels processing path 210 subtracts the right channel signal with a delay (D) and scaling by 0.1 from the left channel (scaled by 0.9), and vice-versa. In a representative implementation, this delay can be 0.1 milliseconds, which relates to an assumed arrival time difference between the listener's ears from the two front loudspeakers 140, 141. The effect of the near front HRTF and sum and difference stages is to produce the sensation of the left and right virtual speakers from the two loudspeakers 140, 141, and to widen the listening area in which this effect can be perceived.
A plot 300 of an exemplary function that can be used as the near front HRTF 214 in the front channels processing path 210 is shown in
With reference again to
a_Norm_IIR=[% denominator (poles) 1.0000000000000000e+000, −1.6888094727864102e+000, 1.4837366524370064e+000, −8.5601030412333767e−001, 3.1768188713232198e−001, −1.9813914299408908e−001, 9.6933754378490042e−002];
b_Norm_IIR=[% numerator (zeros) 3.6843438710213988e−001, −1.9483915898255028e−001, −1.6684962978085230e−001, 7.5848874550809561e−002, 1.3679340931697379e−001, −6.8813369749838255e−003, −7.6482207859333587e−002];
Between the sum and difference stages 222, 223 in the rear channels processing path 220, two head related transfer functions (HRTFX and HRTFB) are applied to the sum and difference signals 224, 225. These head related transfer functions are derived from the near back HRTF (F1) and far back HRTF (F2), which relate to the ear's response to a loudspeaker placed near and farther behind the listener. More particularly, HRTFX is equal to the relation of near back and far back HRTFs by the equation
whereas HRTFB is given by the equation
a_HRTFB=[% denominator (poles) 1.0000000000000000e+000, −1.2570479899538574e+000, 4.2424536096528470e−001, −5.6087980625149664e−002, 4.2392917282740181e−002, 3.6752820157085697e−002, −1.2973307456470098e−001];
b_HRTFB=[% numerator (zeros) 1.8804327858095968e+000, −2.9676273667211244e+000, 1.7595091989408038e+000, −8.5895832371487202e−001, 4.9389363159725336e−001, −3.2762684986932166e−003, −2.2262689556048482e−001];
a_HRTFX=[% denominator (poles) 1.0000000000000000e+000, −1.4497763400048707e+000, 7.3484019001267709e−001, −3.4482752398561028e−001, 1.9311090365472569e−001, 5.0039045207491264e−002, −1.3383200293258363e−001];
b_HRTFX=[% numerator (zeros) 5.4275222551622471e−001, −6.1273613225000345e−001, 1.4823063002225800e−001, −9.9574656128668497e−003, 7.1240749882067042e−003, 3.4183062814524288e−002, −7.1560061721450768e−002];
In the diffuse sound processing path 230, the input left channel 120, left rear channel 123 and center channel 122 (scaled by half) are combined (summed) into a left signal path 231. The input right channel 121, right rear channel 124 and center channel 122 (scaled by half) also are combined (summed) into a right signal path 232. The diffuse sound processing path 230 then includes a pair of sum and difference stages 234, 235. The first sum and difference stage 234 produces a sum and difference of the left and right signal paths 231, 232 (scaled by half). The second sum and difference stage 235 recombines the sum and difference signals produced by the first sum and difference stage 234 to reconstruct left and right signal paths. However, the sum and difference signals are scaled in this second sum and difference stage 235 according to a widening/narrowing parameter (d). More specifically, the sum signal is scaled by a factor (2−d), while the difference signal is scaled by (d) as shown in
Following the sum and difference stages 234, 235, the diffuse sound processing path 230 applies a power 360 degree HRTF 236 to each of the left and right signals. The power 360 degree HRTF 236 represents the ear's response to a diffuse sound field surrounding the listener.
The diffuse sound processing path 230 also include separate reverberation 238, 239 applied to the left and right signals. The diffuse sound processing path 230 applies a different, discrete reverberation to each of the left and right signals, which serves to decorrelate the reverberation in these signals from each other and provide envelopment or diffuse sound effect. The amount of reverberation applied is based on a reverberation strength parameter (b). The reverberation path of the left and right signals is scaled by the reverberation strength parameter as shown in
The left and right signals from the front channels processing path, the rear channels processing path and the diffuse sound processing path are combined to form the left and right rendering channels 130, 131 to be output to the loudspeakers 140, 141 (
In an alternative implementation, the gains for the direct and diffuse sound (reverbed) paths can be expressed as t*(1−g) and t*g, respectively. This alternative parameterization decouples the reverberation weight parameter g from the output scale parameter.
It should be recognized that there exist various numerically equivalent operations that may be used to achieve similar results as the above described signal processing operations. It should be understood therefore that reference herein to these signal processing operations of the speaker virtualization system includes implementations using such numerically equivalent operations.
IV. Computing Environment
The speaker virtualization system 100 shown in
With reference to
A computing environment may have additional features. For example, the computing environment 800 includes storage 840, one or more input devices 850, one or more output devices 860, and one or more communication connections 870. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 800. Typically, operating system software (not shown) provides an operating environment for software executing in the computing environment 800 and coordinates activities of the components of the computing environment 800.
The storage 840 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CDs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 800. The storage 840 stores instructions for the software 880.
The input device(s) 850 may be a touch input device such as a keyboard, mouse, pen, touchscreen or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 800. For audio or video, the input device(s) 850 may be a microphone, sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD or DVD that reads audio or video samples into the computing environment. The output device(s) 860 may be a display, printer, speaker, CD/DVD-writer, network adapter, or another device that provides output from the computing environment 800.
The communication connection(s) 870 enable communication over a communication medium to one or more other computing entities. The communication medium conveys information such as computer-executable instructions, audio or video information, or other data in a data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
Embodiments can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 800, computer-readable media include memory 820, storage 840, and combinations of any of the above.
Embodiments can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.
For the sake of presentation, the detailed description uses terms like “determine,” “receive,” and “perform” to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
In view of the many possible embodiments to which the principles of our invention may be applied, we claim as our invention all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.