This disclosure relates to a sound stage controller for a near-field speaker-based audio system.
In some automobile audio systems, processing is applied to the audio signals provided to each speaker based on the electrical and acoustic response of the total system, that is, the responses of the speakers themselves and the response of the vehicle cabin to the sounds produced by the speakers. Such a system is highly individualized to a particular automobile model and trim level, taking into account the location of each speaker and the absorptive and reflective properties of the seats, glass, and other components of the car, among other things. Such a system is generally designed as part of the product development process of the vehicle and corresponding equalization and other audio system parameters are loaded into the audio system at the time of manufacture or assembly.
Conventional automobile audio systems, with stereo speakers in front of and behind the front seat passengers, include controls generally called fade and balance. The same stereo signal is sent to both front and rear sets of speakers, and the fade control controls the relative signal level of front and rear signals, while the balance control controls the relative signal level of left and right signals. These control schemes tend to lose their relevance in a personalized sound system using near-field speakers located near the passengers' heads, rather than in fixed locations behind the passengers.
In general, in one aspect, adjusting signals in an automobile audio system having at least two near-field speakers located close to an intended position of a listener's head includes, for each of a set of designated positions other than the actual locations of the near-field speakers, determining a binaural filter that causes sound produced by each of the near-field speakers to have characteristics at the intended position of the listener's head of sound produced by a sound source located at the respective designated position. An up-mixing rule generates at least three component channel signals from an input audio signal having at least two channels. A first set of weights for applying to the component channel signals at each of the designated positions define a first sound stage. A second set of weights for applying to the component channel signals at each of the designated positions define a second sound stage. The audio system combines the first set of weights and the second set of weights to determine a combined set of weights, the relative contribution of the first set of weights and the second set of weights in the combined set of weights being determined by a variable user-input value. A mixed signal corresponds to a combination of the component channel signals according to the combined set of weights for each of the designated positions. Each mixed signal is filtered using the corresponding binaural filter to generate a set of binaural output signals which are summed and output using the near-field speakers.
Implementations may include one or more of the following, in any combination. The user input providing the user-input value may be a fader input, and contribution of the first set of weights may be greater when the fader control may be in a more forward setting and the contribution of the second set of weights may be greater when the fader control may be in a more rearward setting. The audio system may include at least a first fixed speaker positioned near a left corner of the vehicle's cabin forward of the intended position of the listener's head, and a second fixed speaker positioned near a right corner of the vehicle's cabin forward of the intended position of the listener's head, with a third set of weights for applying to the component channel signals for each of the fixed speakers to define the first sound stage, and a fourth set of weights for applying to the component channel signals for each of the fixed speakers to define the second sound stage, with the audio system combining the third set of weights and the fourth set of weights to determine a second combined set of weights, the relative contribution of the third set of weights and the fourth set of weights in the second combined set of weights being determined by the variable user-input value, a mixed signal corresponding to a combination of the component channel signals according to the second combined set of weights for each of the fixed speakers, the mixed signals being output by the corresponding fixed speakers. The first and third sets of weights may cause a different set of the fixed speakers and near-field speakers to dominate spatial perception of the soundstage than the second and fourth sets, such that which set of speakers dominates spatial perception varies as the user-input value may be varied.
The near-field speakers may be located in a headrest of the automobile. The near-field speakers may be coupled to a body structure of the automobile. The relative contribution of the first set of weights and the second set of weights in the combined set of weights may vary according to a predetermined curve mapping the variable user-input value to the relative contribution. The predetermined curve may be not linear. The relative contribution of the first set of weights and the second set of weights in the combined set of weights may be determined automatically based on a characteristic of the input audio signal.
In general, in one aspect, adjusting signals in an automobile audio system having at least two near-field speakers located close to an intended position of a listener's head includes determining a first binaural filter that causes sound produced by each of the near-field speakers to have characteristics at the intended position of the listener's head of sound produced by a sound source located at a first designated position other than the actual locations of the near-field speakers, determining a second binaural filter that causes sound produced by each of the near-field speakers to have characteristics at the intended position of the listener's head of sound produced by a sound source located at a second designated position other than the actual locations of the near-field speakers and different from the first designated position, determining an up-mixing rule to generate at least three component channel signals from an input audio signal having at least two channels, mixing a set of the component channel signals to form a first mixed signal, filtering the mixed signal with a combination of the first binaural filter and the second binaural filter to generate a binaural output signal, and outputting the binaural output signal using the near-field speakers. The relative weight of the first binaural filter and the second binaural filter in the binaural output signal are determined by a variable user-input value.
Implementations may include one or more of the following, in any combination. The audio system may include at least a first fixed speaker positioned near a left corner of the vehicle's cabin forward of the intended position of the listener's head, and a second fixed speaker positioned near a right corner of the vehicle's cabin forward of the intended position of the listener's head, with a first set of weights for applying to the component channel signals for each of the fixed speakers defining the first sound stage, and a second set of weights for applying to the component channel signals for each of the fixed speakers defining the second sound stage. The audio system combines the first set of weights and the second set of weights to determine a combined set of weights, the relative contribution of the first set of weights and the second set of weights in the combined set of weights being determined by the variable user-input value. A mixed signal corresponding to a combination of the component channel signals according to the combined set of weights for each of the fixed speakers is output using the corresponding fixed speakers. The first binaural filter and first set of weights may cause a different set of the fixed speakers and near-field speakers to dominate spatial perception of the soundstage than the second binaural filter and second set of weights, such that which set of speakers dominates spatial perception varies as the user-input value is varied.
In general, in one aspect, signals in an automobile audio system having at least two near-field speakers located close to an intended position of a listener's head are adjusted such that in a first mode, audio signals are distributed to the near-field speakers according to a first filter that causes the listener to perceive a wide soundstage, and in a second mode, the audio signals are distributed to the near-field speakers according to a second filter that causes the listener to perceive a narrow soundstage. A user input of a variable value is received and, in response, distribution of the audio signals is transitioned from the first mode to the second mode, the extent of the transition being variable based on the value of the user input.
Implementations may include one or more of the following, in any combination. Transitioning the distribution of the audio signals may include applying both the first and second filters to the audio signals in a weighted sum, the relative weights of the first and second filters being based on the value of the user input.
In general, in one aspect, an automobile audio system includes at least two near-field speakers located close to an intended position of a listener's head, a user input generating a variable value, and an audio signal processor configured to, in a first mode, distribute audio signals to the near-field speakers according to a first filter that causes the listener to perceive a wide soundstage in a second mode, distribute the audio signals to the near-field speakers according to a second filter that causes the listener to perceive a narrow soundstage, and in response to a change in the value of the user input, transition distribution of the audio signals from the first mode to the second mode, the extent of the transition being variable based on the value of the user input.
Implementations may include one or more of the following, in any combination. The audio signal processor may include a memory storing a set of binaural filters that causes sound produced by each of the near-field speakers to have characteristics at the intended position of the listener's head of sound produced by a sound source located at each of a set of designated positions other than the actual locations of the near-field speakers, a first set of weights for applying to a set of component channel signals for each of the designated positions to define a first sound stage, and a second set of weights for applying to the set of component channel signals for each of the designated positions to define a second sound stage. The audio signal processor may transition distribution of the audio signals from the first mode to the second mode by applying an up-mixing rule to generate at least three component channel signals from an input audio signal having at least two channels, combining the first set of weights and the second set of weights to determine a combined set of weights, the relative contribution of the first set of weights and the second set of weights in the combined set of weights being determined by the value of the user input, determining a mixed signal corresponding to a combination of the component channel signals according to the combined set of weights for each of the designated positions, filtering each mixed signal using the corresponding binaural filter to generate a set of binaural output signals, summing the filtered binaural signals, and outputting the summed binaural signals to the near-field speakers. The audio signal processor may include a memory storing a first binaural filter that causes sound produced by each of the near-field speakers to have characteristics at the intended position of the listener's head of sound produced by a sound source located at a first designated position other than the actual locations of the near-field speakers and a second binaural filter that causes sound produced by each of the near-field speakers to have characteristics at the intended position of the listener's head of sound produced by a sound source located at a second designated position other than the actual locations of the near-field speakers and different from the first designated position. The audio signal processor may transition distribution of the audio signals from the first mode to the second mode by applying an up-mixing rule to generate at least three component channel signals from an input audio signal having at least two channels, mixing a set of the component channel signals to form a first mixed signal, filtering the mixed signal with a combination of the first binaural filter and the second binaural filter to generate a binaural output signal, and outputting the binaural output signal using the near-field speakers, the relative weight of the first binaural filter and the second binaural filter in the binaural output signal being determined by the value of the user input. Advantages include providing a user experience that responds to a variable sound stage control in a more immersive manner than a traditional fader control, and providing user control of sound stage spaciousness.
All examples and features mentioned above can be combined in any technically possible way. Other features and advantages will be apparent from the description and the claims.
U.S. patent application Ser. No. 13/888927, incorporated here by reference, describes an audio system using near-field speakers located near the heads of the passengers, and a method of configuring that audio system to control the sound stage perceived by each passenger.
Conventional car audio systems are based around a set of four or more speakers, two on the instrument panel or in the front doors and two generally located on the rear package shelf, in sedans and coupes, or in the rear doors or walls in wagons and hatchbacks. In some cars, however, as shown in
The audio system shown in
The driver's headrest 120 in
The near-field speakers can be used, with appropriate signal processing, to expand the spaciousness of the sound perceived by the listener, and more precisely control the frontal sound stage. Different effects may be desired for different components of the audio signals—center signals, for example, may be tightly focused, while surround signals may be intentionally diffuse. One way the spaciousness is controlled is by adjusting the signals sent to the near-field speakers to achieve a target binaural response at the listener's ears. As shown in
Human perception of the direction and distance of sound sources is based on a combination of arrival time differences between the ears, signal level differences between the ears, and the particular effect that the listener's anatomy has on sound waves entering the ears from different directions, all of which is also frequency-dependent. We refer to the combination of these factors at both ears, for a source at a given location, as the binaural response for that location. Binaural signal filters are used to shape sound that will be reproduced at a speaker at one location to sound like it originated at another location.
Although a system cannot be designed a priori to account for the unique anatomy of an unknown future user, other aspects of binaural response can be measured and manipulated.
The signals intended to be localized from the virtual sources are modified to attain a close approximation to the target binaural response of the virtual source with the inclusion of the response from near-field speakers to ears. Mathematically, we can call the frequency-domain binaural response to the virtual sources V(s), and the response from the real speakers, directly to the listener's ears, R(s). If a sound S(s) were played at the location of the virtual sources, the user would hear S(s)xV(s). For same sound played at the near-field speakers, without correction, the user will hear S(s)xR(s). Ideally, by first filtering the signals with a filter having a transfer function equivalent to V(s)/R(s), the sound S(s)xV(s)/R(s) will be played back over the near-field speakers, and the user will hear S(s)xV(s)xR(s)/R(s)=S(s)xV(s). There are limits to how far this can be taken—if the virtual source locations are too far from the real near-field speaker locations, for example, it may be impossible to combine the responses in a way that produces a stable filter or it may be very susceptible to head movement. One limiting factor is the cross-talk cancellation filter, which prevents signals meant for one ear from reaching the other ear.
One aspect of the audio experience that is controlled by the tuning of the car is the sound stage. “Sound stage” refers to the listener's perception of where the sound is coming from. In particular, it is generally desired that a sound stage be wide (sound comes from both sides of the listener), deep (sound comes from both near and far), and precise (the listener can identify where a particular sound appears to be coming from). In an ideal system, someone listening to recorded music can close their eyes, imagine that they are at a live performance, and point out where each musician is located. A related concept is “envelopment,” by which we refer to the perception that sound is coming from all directions, including from behind the listener, independently of whether the sound is precisely localizable. Perception of sound stage and envelopment (and sound location generally) is based on level and arrival-time (phase) differences between sounds arriving at both of a listener's ears, and sound stage can be controlled by manipulating the audio signals produced by the speakers to control these inter-aural level and time differences. As described in U.S. Pat. No. 8,325,936, incorporated here by reference, not only the near-field speakers but also the fixed speakers may be used cooperatively to control spatial perception.
If a near-field speaker-based system is used alone, the sound will be perceived as coming from behind the listener, since that is indeed where the speakers are. Binaural filtering can bring the sound somewhat forward, but it isn't sufficient to reproduce the binaural response of a sound truly coming from in front of the listener. However, when properly combined with speakers in front of the driver, such as in the traditional fixed locations on the instrument panel or in the doors, the near-field speakers can be used to improve the staging of the sound coming from the front speakers. That is, in addition to replacing the rear-seat speakers to provide “rear” sound, the near-field speaker are used to focus and control the listener's perception of the sound coming from the front of the car. This can provide a wider or deeper, and more controlled, sound stage than the front speakers alone could provide. The near-field speakers can also be used to provide different effects for different portions of the source audio. For example, the near-field speakers can be used to tighten the center image, providing a more precise center image than the fixed left and right speakers alone can provide, while at the same time providing more diffuse and enveloping surround signals than conventional rear speakers.
In some examples, the audio source provides only two channels, i.e., left and right stereo audio. Two other common options are four channels, i.e., left and right for both front and rear, and five channels for surround sound sources (usually with a sixth “point one” channel for low-frequency effects). Four channels are normally found when a standard automotive head unit is used, in which case the two front and two rear channels will usually have the same content, but may be at different levels due to “fader” settings in the head unit. To properly mix sounds for a system as described herein, the two or more channels of input audio are up-mixed into an intermediate number of components corresponding to different directions from which the sound may appear to come, and then re-mixed into output channels meant for each specific speaker in the system, as described with reference to
An advantage of the present system is that the component signals up-mixed from the source material can each be distributed to different virtual speakers for rendering by the audio system. As explained with regard to
A given up-mixed component signal may be distributed to any one or more of the virtual speakers, which not only allows repositioning of the component signal's perceived location, but also provides the ability to render a given component as either a tightly focused sound, from one of the virtual speakers, or as a diffuse sound, coming from several of the virtual speakers simultaneously. To achieve these effects, a portion of each component is mixed into each output channel (though that portion may be zero for some component-output channel combinations). For example, the audio signal for a right component will be mostly distributed to the right fixed speaker FR 106, but to position each virtual image 224-i on the right side of the headrest, such as 224-n and 224-p, portions of the right component signal are also distributed to the right near-field speaker and left near-field speaker, due to both the target binaural response of the virtual image and for cross-talk cancellation. The audio signal for the center component will be distributed to the corresponding right and left fixed speakers 104 and 106, with some portion also distributed to both the right and left near-field speakers 122 and 124, controlling the location, e.g., 224-m, from which the listener perceives the virtual center component to originate. Note that the listener won't actually perceive the center component as coming from behind if the system is tuned properly—the center component content coming from the front fixed speakers will pull the perceived location forward, the virtual center simply helps to control how tight or diffuse, and how far forward, the center component image is perceived. The particular distribution of component content to the output channels will vary based on how many and which near-field speakers are installed. Mixing the component signals for the near-field speakers includes altering the signals to account for the difference between the binaural response to the components, if they were coming from real speakers, and the binaural response of the near-field speakers, as described above with reference to
We use “component” to refer to each of the intermediate directional assignments to which the original source material is up-mixed. As shown in
The relationship between component signals, generally C1 through CN, virtual image signals, V1 through VP, and output signals FL, FR, HOL, and HOR is shown in
Another particular feature that can be provided with the system described above is a replacement for the traditional “fader” control. In typical car audio systems, with a set of stereo speakers in the front and another set of stereo speakers in the rear playing a scaled version of the same signal, a fader control adjusts the balance of sound energy between the front and rear speakers. For a full front setting, only the front speakers receive signal, and for a full rear setting, only the rear signals receive a signal. In the system described above, this would not be desirable, assuming the headrest speakers would be substituted for the rear speakers, as the signals going to the front and to the headrest speakers do not contain the same content, and don't play sound in the same bandwidths. Instead, a new interpretation of the fader is provided, which manipulates the mixing of component content into virtual image locations and fixed speaker signals. As discussed above, a binaural filter is designed that adjusts each virtual signal to account for the difference in binaural perception between signals coming from the virtual locations and the real speaker locations. Each virtual signal receives a mix of weighted component signals, which determines the location from which the listener perceives each component signal to originate. Rather than simply shifting sound energy between front and rear, this mixing can be varied for each virtual image location to change the precision and location of each component and the amount of envelopment provided by the virtual images.
To provide a sound stage control instead of a traditional fader function, two different sets of component mixing weights are designed, based on two different sound stage presentations. In some examples, as shown in
To effect a transition between the two sound stage configurations as the user adjusts the control, both sets of weights are applied simultaneously, with the relative contribution of each set of weights set based on the position of the sound stage control, as shown in
If the sound stage control is all the way at the start position 608, the contribution of the first set of weights (curve 602) is set to one and the contribution of the second set of weights (curve 604) is zero. As the fader is moved to the middle and then all the way to the ending position 610, the contribution of the first set is decreased and the contribution of the second set is increased until, at the full end position, the first set has a contribution of zero and the second set has a contribution of one. The curves are labeled as “narrow” and “wide”, but this is just a notation for convenience, as the actual description of the effect of the weights will vary in a given application, much like the control position labels mentioned above. Thus, the user can adjust the size of the sound stage from narrow and forward to wide and enveloping, or between whatever alternative a given system offers. These settings may also be applied automatically based on the content of the source audio signal, for example, talk radio may be played using the first set of weights with a narrow, forward sound stage, while music may be played using the second set of weights with a wider, more enveloping overall sound stage. The shape of the curves shown is merely for illustration purposes—other curves, including straight lines, could be used, depending on the desires of the system designer and the capabilities of the audio system.
In another embodiment, rather than or in addition to changing the mixing weights of the component signals, the binaural filters can be changed to move the virtual image locations. Two sets of binaural filters can be combined, based on a weight derived from the fader input control, such that the fader control determines which binaural filters are dominant and therefore where the virtual images are positioned. The fixed speakers may still be varied by changing the weights of the component signals mixed to form the output signals.
Embodiments of the systems and methods described above may comprise computer components and computer-implemented steps that will be apparent to those skilled in the art. For example, it should be understood by one of skill in the art that the computer-implemented steps may be stored as computer-executable instructions on a computer-readable medium such as, for example, floppy disks, hard disks, optical disks, Flash ROMS, nonvolatile ROM, and RAM. Furthermore, it should be understood by one of skill in the art that the computer-executable instructions may be executed on a variety of processors such as, for example, microprocessors, digital signal processors, gate arrays, etc. For ease of exposition, not every step or element of the systems and methods described above is described herein as part of a computer system, but those skilled in the art will recognize that each step or element may have a corresponding computer system or software component. Such computer system and/or software components are therefore enabled by describing their corresponding steps or elements (that is, their functionality), and are within the scope of the disclosure.
A number of implementations have been described. Nevertheless, it will be understood that additional modifications may be made without departing from the scope of the inventive concepts described herein, and, accordingly, other embodiments are within the scope of the following claims.
This application is a continuation of, and claims priority to, U.S. patent application Ser. No. 13/906,997 filed May 31, 2013 (having the same title and inventors as the instant application), which is incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 13906997 | May 2013 | US |
Child | 14938478 | US |