The present invention relates generally to stereo audio reproduction and specifically to the creation of virtual speaker effects.
Stereophonic sound works on the principle that differences in sound heard between the two ears by a human get processed by the brain to give distance and direction to the sound. To exploit this effect, reproduction systems use recorded audio signals in left and right channels, which correspond to the sound to be heard by the left ear and the right ear, respectively. When the listener is wearing headphones, the left channel sound is directed to the listener's left ear and the right channel sound is directed to the listener's right ear. However, when sound is produced by a pair of speakers, sound from a left channel speaker can be heard by the listener's right ear and sound from a right channel speaker can be heard by the listener's left ear. When the listener moves relative to the location of the speakers the depth of feeling of the reproduced sound will change. Stereo speaker systems typically rely on the physical separation between the left and right speakers to produce stereophonic sound, but the result is often a sound that appears in front of the listener. Modern sound systems include additional speakers to surround the listener so that the sound appears to originate from all around the listener.
Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
The first embodiment described herein is a system for producing phantom speaker effects. It gives the listener the illusion that speakers are farther apart than they physically are. The system takes a copy of each stereo channel and scales them by a spread value and delays them by a predetermined time interval. Optionally a digital filter can be applied to emphasize certain sound characteristics. The delay value can be fixed or adjustable. These processed copies are then subtracted from the opposite channel and added to their originating channel. For example, the processed left channel is subtracted from the right channel and added to the left channel.
The second embodiment produces an immersion effect. Each stereo channel is separated into low frequency components (bass signal) and middle to high frequency components (treble) signal. The immersion effect is applied to each treble signal. The left treble signal is altered by adding a scaled version of the right treble signal where the right treble channel is scaled by a spread value. The right treble signal is altered by adding a scaled version of the left treble signal also scaled by the spread value. The altered left treble signal is combined with the left bass signal. The altered right treble signal is phase inverted prior to being combined with the right bass signal.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.
A detailed description of embodiments of the present invention is presented below. While the disclosure will be described in connection with these drawings, there is no intent to limit it to the embodiment or embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents included within the spirit and scope of the disclosure.
In a first embodiment, speaker virtualization is employed to improve the quality of stereo reproduction by creating the illusion of either additional speakers or different speaker placement. For instance, speaker virtualization can make speakers that are physically close to each other, such as speakers on a notebook computer, produce sounds that appear to be wider apart than the speakers. This is known as “widening.” Speaker virtualization can also make sounds appear to come from virtual speakers at locations without a physical speaker, such as in a simulated surround sound system that uses stereo speakers.
Virtualization system 140 can be part of the audio driver and implemented using software or, hardware. Alternatively, an application program such as a music playback application or video playback application can use virtualization system 140 to produce left and right channel audio data with a virtual effect and provide the data to the audio driver. Although virtualization system 140 is shown as implemented in the digital domain, it may also be implemented in the analog domain.
In the illustrative embodiment, virtualization system 140 receives a spread value 106 that controls the degree of the virtualization effect. For example, if virtualization system 140 has a widening effect, the spread value can control the degree to which the speakers appear to have widened. The virtualization system 140 optionally receives a delay value 108, which can be used to tune the virtualization system based on the physical configuration of the speakers.
Audio interface 202 receives audio data which can be provided by an application such as music or video playback application, and provides virtualized audio data to the audio driver backend. Processor 216 can include a central processing unit (CPU), an auxiliary processor associated with the audio system, a semiconductor based microprocessor (in the form of a microchip), a macroprocessor, one or more application specific integrated circuits (ASICs), digital logic gates, a digital signal processor (DSP) or other hardware for executing instructions.
Memory 220 can include any one of a combination of volatile memory elements (e.g., random-access memory (RAM) such as DRAM, and SRAM) and nonvolatile memory elements (e.g., flash, read only memory (ROM), or nonvolatile RAM). Memory 220 stores one or more separate programs, each of which includes an ordered listing of executable instructions for implementing logical functions to be performed by the processor 216. The executable instructions include instructions for generating virtual audio effects and performing audio processing operations such as equalization and filtering. In alternate embodiments, the logic for performing these processes can be implemented in hardware or a combination of software and hardware.
A delayed phase inverted opposite signal in each speaker can be added to provide a level of cross-cancellation of the opposite signals. For example, in the left speaker, rather than transmitting l(t), the signal l(t)−r(t−Δτ) is transmitted to cancel out the right audio signal, leaving the left channel acoustic signal to be heard by left ear 306. Mathematically, the left ear hears l(t−τ)−r(t−τ−Δτ)+r(t−τ−Δτ)=l(t−τ), which is the left channel acoustic signal. However, for right ear 308 to gain the same experience, the right speaker transmits r(t)−l(t−Δτ) instead of r(t). As a result of the process of cross-cancellation, left ear 306 actually hears l(t−τ)−r(t−τ−Δτ)+(r(t−τ−Δτ)−l(t−τ−2Δτ))=l(t−τ)−l(t−τ−2Δτ) (an similarly for right ear 308, it hears r(t−τ)−r(t−τ−2Δτ)). If a signal is slow changing such as the bass components of an audio signal then l(t−τ)≈l(t−τ−2Δτ), so the overall effect of cross cancellations tends to cancel bass components of an audio signal.
Mathematically, if left channel signal 102 is represented by l(t) and right channel signal 104 is represented by r(t) and digital filter 416 transforms l(t) into l′(t) and digital filter 418 transforms r(t) into r′(t) then the resultant left channel signal output by digital filter 416 is s·l′(t−Δτ), where s is spread value 106 and Δτ is the delay imposed by delay unit 412. Similarly, the resultant right channel signal output by digital filter 418 is s·r′(t−Δτ). Therefore, left channel output signal 110 is lout(t)=l(t)−s·r′(t−Δτ)+s·l′(t−Δτ) and the right channel output signal is 112 is rout(t)=r(t)−s·l′(t−Δτ)+s·r′(t−Δτ). While for simplicity, the equations are expressed as analog signals, the processing can be performed digitally as well on l[n] and r[n] with their digital counterparts.
The spread value 106 influences the strength of the widening effect by controlling the volume of the virtual sound. If the spread value is zero, there is no virtualization, only the original sound. Generally speaking, the larger the spread value, the louder the virtual sound effect. As described in the present embodiment, the virtual sound and cross-cancellation mixed with the original audio data can be used to produce an audio output that would sound like an extra set of speakers outside of the original set of stereo speakers.
An additional feature of the embodiment described in
Using the Pythogorean theorem,
so the difference between the distances is
The desired delay can be calculated from Δd by multiplying Δd by the speed of sound.
In one embodiment, the distance between human ears de is assumed to be approximately 6 inches. For notebook computers, the distance between speakers ds typically ranges between 6 inches to 15 inches, depending on the configuration. The distance an average person sits from their notebook computers d is assumed to be between 12 to 36 inches in the present embodiment. For smaller electronic devices such as a portable DVD player, the distances between the individual speakers and the speakers to the user could even be smaller. Exemplary values are given by Table 1. Given the above assumptions, the delays fall between the range of 2 to 11 samples when using 48 kHz sampling rate. For higher sampling rates, such as 96 kHz and 192 kHz, the delay expressed in terms of samples increases proportionally with sampling rate. For example in the last case in Table 1 for 192 kHz, the delay is scaled to 11*192/48=44 samples.
Delay element 412 and delay element 414 can be implemented with variable delay units allowing the system 400 to be configurable to different sound system scenarios. As a result, in some embodiments of system 400, the delay is programmable through the introduction of delay value 108 which can adjust the delay on delay elements 412 and 414.
Another feature of system 400 is the addition of the processed signal left channel signal back into the left channel signal and the addition of the processed right channel signal back into the right channel signal. Traditional cross cancellation suffers from loss of center sound and loss of bass. The approach of the present embodiment produces a sound without a significant loss of center sound and bass, preserving the sound quality during cross cancellation. Empirical comparisons between virtualized audio samples with and without the additions by mixers 428 and 430 were compared. Superior virtualization is exhibited by the system with mixer 428 and 430.
Traditional cross-cancellation causes a loss of bass. For example examining the left channel mathematically, if lb(t) represents the low frequency components of the left channel signal, the left ear would hear lb (t)−lb(t−2Δτ). However because there is very little variation over time in the low frequency components of lb, l(t)≈l(t−2Δτ). Thus the low frequency components of the left channel are cancelled for the left ear.
In the case of system 400, the digital filters can be used to preserve the original bass frequencies in the output signal by suppressing the bass frequencies in the delayed scaled copies. The output of the digital filters can be expressed mathematically as l′b≈r′b≈0. As a result the low frequency components of the left output channel would be lout
With or without the digital filters, both bass frequencies and center sound are preserved. Mathematically, when digital filters are present, lout
The use of digital filters 416 and 418 is optional but, in addition to preserving bass frequencies, they can amplify the virtualization effect of certain frequencies. For example, it may be desirable to apply speaker virtualization to certain sounds such as speech or a movie effect and not to apply speaker virtualizations to other sounds such as background sounds. By applying filters 416 and 418, specific sounds are emphasized in the virtualization process.
The immersion effect in the present embodiment is produced when the left ear and right ear respectively perceive two signals that are 180° out of phase. Experiments show the resulting effect is a sound perceived to be near the listener's ears that appears to diffuse and “jump out” right next to the listener's ears. The use of the spread value in system 700 changes the nature of the immersion effect. For example if the spread value is set to zero, the right channel signal still has the high frequency components rt(t) phase inverted relative to the input signal which still yields the immersion effect. If the spread value is zero, lout(t)=lb(t)+lt(t)=l(t), but rout(t)=rb(t)−rt(t). If the spread value is one, lout(t)=lb(t)+lt(t)+rt(t), and rout(t)=rb(t)−rt(t)−lt(t). Except for the bass frequencies, as the spread value changes from zero to one, the output goes from stereo immersion to monaural immersion.
Both the speaker virtualization and the immersion effect can be offered to the end user within the same virtualization system.
Various fader techniques can be employed within left fader 802 and right fader 804. One example of a three-way fader that can be employed is a mixer where left audio output signal 110 can be expressed as lout(t)=αl(t)+αimmlimm(t)+αvirtlvirt(t), where limm(t) is the left output audio signal of immersion effect system 700 and lvirt(t) is the left output audio signal of virtual speaker system 400 and right audio output signal 112 can be expressed as rout(t)=αr(t)+αimmrimm(t)+αvirtrvirt(t), where rimm(t) is the right output audio signal of immersion effect system 700 and rvirt(t) is the right output audio signal of virtual speaker system 400 and α, αimm, and αvirt are gain coefficients. When immersion effects are chosen through input 806, αimm is increased gradually until it reaches 1 while α and αvirt are decreased gradually until they both reach 0. When virtual speakers are chosen through input 806, αvirt is increased gradually until it reaches 1 while α and αimm are decreased gradually until they both reach 0. When all effects are turned off by selecting “no effects” through input 806, α is increased gradually until it reaches 1 while αvirt and αimm are decreased gradually until they both reach 0. The gradual increase and decrease of the three gain factors can be linear or can employ exponential decays or another monotonic function. By using a smooth fader, a user can transition into or out of an effect without audible glitches during the transition.
The embodiments described above make the listener feel virtual speakers as well as experience immersion. Empirical evidence has shown these systems give a superior quality of the surround and spatial sound experience, while requiring little CPU power so it can be implemented in systems with and without a hardware DSP and embedded systems.
It should be emphasized that the above-described embodiments are merely examples of possible implementations. Many variations and modifications may be made to the above-described embodiments without departing from the principles of the present disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.
This application claims priority under 35 U.S.C. §119 to U.S. Patent Application No. 61/186,795, filed Jun. 12, 2009, entitled “Systems and Methods for Creating Immersion Surround Sound and Virtual Speakers Effects,” which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61186795 | Jun 2009 | US |