The present invention relates generally to acoustic modeling, and more particularly, to a system and method for rendering an acoustic environment using more than two speakers.
Positional three-dimensional audio algorithms produce the illusion of sound emanating from a source at an arbitrary point in space by calculating the acoustic waveform which would actually impinge upon a listener's eardrums from the source. Systems have been developed to simulate a virtual sound source in an arbitrary perceptual location relative to a listener. These virtual acoustic displays apply separate left ear and right ear filters to a source signal in order to mimic the acoustic effects of the human head, torso, and pinnae on source signals arriving from a particular point in space. These filters are referred to as head related transfer functions (HRTFs). HRTFs are functions of position and frequency which are different for different individuals. When a sound signal is passed through a filter which implements the HRTF for a given position, the sound appears to the listener to have originated from that position.
Many applications comprise acoustic displays utilizing one or more HRTF filters in attempting to spatialize or create a realistic three-dimensional aural impression. Acoustic displays can spatialize a sound by modeling the attenuation and delay of acoustic signals received at each ear as a function of frequency, and apparent direction relative to head orientation. U.S. patent application Ser. Nos. 5,729,612 and 5,802,180, which are incorporated herein by reference, provide examples of implementation of a virtual audio display using HRTFs.
Stereo audio streams in which the left and right channels are developed independently for the left and right ears of a listener are referred to as binaural signals. Headphones are typically used to send binaural signals directly to a listener's left and right ears. The main reason for using headphones is that the sound signal from the speaker on one side of the listener's head generally does not travel around the listener's head to reach the ear on the opposite side. Therefore, the application of the signal by one headphone speaker to one of the listener's ears does not interfere with the signal being applied to the listcner's other ear by the other headphone speaker through an external path. Headphones are thus an effective way of transmitting a binaural signal to a listener, however, it is not always convenient to wear headphones or earphones.
Complications arise in systems which do not deliver the audio signal directly to the listener's ear. If a binaural signal is used to drive free standing speakers directly, then the listener will hear contributions from each speaker at each ear. The receipt of the signal intended for the right ear at the left ear and vice versa is referred to as “cross-talk”. It is necessary in such systems to compensate for or to cancel somehow the cross-talk so that the desired binaural signal is effectively applied to each of the listener's ears. The speaker cross-talk canceller does this by eliminating the positional cues related to speaker position and removing the interference of each speaker on the other.
A conventional implementation of a positional three-dimensional audio system includes a head-related transfer function (HRTF) processor followed by a speaker cross-talk cancellation algorithm. As previously described, the HRTF processor simulates the interaction of sound waves with the listener's head, ears, and body to reproduce the natural cues that would be heard from a real source in the same position. An impression that an acoustic signal originates from a particular relative direction can be created in a binaural display by applying an appropriate HRTF to the acoustic signal, generating one signal for presentation to the left ear and a second signal for presentation to the right car, each signal changed in a manner which results in the perceived signal that would have been received at each ear had the signal actually originated from the desired relative direction.
An audio rendering system and method are disclosed. The audio rendering system generally comprises front and rear signal modifiers configured to receive a plurality of audio signals representing a plurality of sources of aural information and location information representing apparent location for the source of said aural information. A gain is applied to the signals representative of the location information. A front signal modifier includes a plurality of head-related transfer functions filters and a rear signal modifier includes a plurality of filters configured to approximate head-related transfer function filters. The system further includes front speakers comprising a left front speaker and right front speaker configured to receive signals from the front signal modifier and generate a signal to a listener. At least one rear speaker is configured to receive signals from the rear signal modifier and generate a signal to the listener to offset frontward bias created by the front speakers. The gains applied to the signal are calculated to produce generally equal perceived energy from each of the front and rear speakers.
A method for providing a two channel signal to the ears of a listener through an audio system including a plurality of audio signals which are played through two front speakers and at least one rear speaker generally comprises receiving a plurality of audio signals representing a plurality of sound sources and applying a head-related transfer function to each signal representative of a location of each of the sound sources. A front gain is applied to the signals to create front signals and the front signals are sent to the two front speakers. A rear gain is applied to the signals to create rear signals which are sent to the rear speaker. The gains applied to the signals are calculated to produce generally equal perceived energy from each of the front and rear speakers.
The above is a brief description of some deficiencies in the prior art and advantages of the present invention. Other features, advantages, and embodiments of the invention will be apparent to those skilled in the art from the following description, drawings, and claims.
Corresponding reference characters indicate corresponding parts throughout the several views of the drawings.
Referring now to the drawings, and first to
It is to be understood that the number and arrangement of speakers may be different than shown herein without departing from the scope of the invention. For example, although a symmetric speaker system is shown, the present invention includes any arbitrary arrangement of speakers so long as the transfer functions used to position each source account for differences in speaker position relative to the listener. Referring again to
The signals travelling along the first branch 32 are input to a plurality of filters 36. In order to simplify the illustration and description of the system, only one filter 36 is shown in FIG. 1. Also, the branches 32, 34 and paths between components are shown as single lines, however, these lines may represent one signal or a plurality of signals. The filter 36 may be an HRTF filter or any other type of headphone three dimensional rendering filter, as is well known by those skilled in the art. The filter 36 preferably converts the mono signal to a stereo pair. For example, there may be sixteen filters 36 which convert sixteen mono signals to sixteen stereo pairs (thirty-two signals). The filter 36 preferably provides spectral shaping and attenuation of the sound wave to account for differences in amplitude and time of arrival of sound waves at the left and right ears. The signals are then sent from the filters 36 to a mixer/scaler 38 which sums all of the signals (e.g., thirty-two signals from the sixteen filters 36) to produce a stereo output (one front left speaker signal and one front right speaker signal). The mixer/scaler 38 adjusts a front gain of the front speakers based on the position of the sound source. The sum is a weighted sum, with each weight depending on the corresponding source position. The front and rear gains may be applied in the filter 36, mixer/scaler 38, or combined in both the filter and mixer/scaler.
The left and right speaker signals are preferably sent from the mixer/scaler 38 to a cross-talk canceller 40. The cross-talk canceller 40 is designed to cancel cross-talk sounds which emerge when a person hears binaural sounds over two speakers. It is designed to eliminate the cross-talk phenomenon in which the right side sound enters the left ear and the left side sound enters the right ear. The cross-talk canceller 40 may be one as described in U.S. patent application Ser. No. 09/305,789, by Gerrard et al., filed May 4, 1999, for example. Under operation of the cross-talk canceller 40, the outputs arc converted into the sounds which, when heard over speakers in a specified position, are roughly heard by the left ear only from the left-side speaker and sounds which are roughly heard by the right ear only from the right side speaker. Such sound allocation roughly simulates the situation in which the listener hears the sounds by use of a headphone set.
The filter 36, mixer/scaler 38, and cross-talk canceller 40 may all be provided on a single chip as indicated by the dotted line shown in
The signals sent along path 34 are input to a plurality of filters 42 (only one shown) which add spectral coloring to the signals to smooth out the signals and approximately match the HRTF filtering. The filter 42 receives a mono input and produces a plurality of outputs equal to the number of rear speakers (e.g., two). The filters 42 are position dependent, as described above for the filters 36. The filter 42 may be the same as the HRTF filters 36 used for the front speakers or some approximation of the HRTF filters. Preferably, the filter 42 does not provide all of the processing included in the HRTF filter 36 to reduce system complexity. The filter 42 frequency characteristics are preferably designed to minimize tibral differences or mismatch between the front and rear speakers and help to provide for smooth transitions from the front speakers to the rear speakers. Since the filters 36, 42 change as the source changes position, the system is preferably designed to provide a form of smooth transitioning between the filters (e.g., tracking).
For two rear speakers, one simple approximation to HRTF filtering is panning. If an HRTF filter is not used in the rear sound processing, panning is preferably provided between the rear left speaker signal and the rear right speaker signal. The panning represents a certain source position which is located between two speakers. By varying the gain value between 0 and 1, it is possible to change the sound-image position corresponding to the sound produced responsive to the sound effect signal between two speakers. When the gain value is equal to zero, the sound signal is provided so that the sound image position is fixed at the position of one of the speakers 22c, 22d. When the gain value is at 1, the sound image position is fixed at a position directly above the speakers 22c, 22d. When the gain value is set at a point between 0 and 1, the sound image is positioned between the speakers 22c, 22d. The gain value for panning is preferably applied at the filters 42.
The signals are converted in the filters 42 from mono to two channels and sent to a mixer/scaler 44, as described above for the front speaker signals. The mixer/scaler 44 sums signals (e.g., thirty-two signals) to form a stereo pair (one signal for rear left speaker 22c and one signal for rear right speaker 22d). The sum is preferably a weighted sum, with each weight dependent on the corresponding source position. As previously described, each channel has its own gain and the mixer/scaler 44 adjusts the rear gain based on the position of the sound source. If only one rear speaker 22e is used, as shown in
It is to be understood that the configuration of components within the system and arrangement of the components may be different than those shown and described herein without departing from the scope of the invention.
In order to calculate weights for the mixer scalers 38, 44, 52, 54, location information is provided to identify the position of each sound source in a spherical coordinate system defined for the listening environment. The coordinate system of a three dimensional listening space is defined with respect to the illustration of
Front and rear gains for sources located at the ear level horizontal plane (elevation angle of 0) depend on which sector the source is located. A sector is defined as the region between two speakers relative to the listener. When the virtual source is located in the sector defined by the front two speakers (region 1b), operation is the same as with a two-speaker system. Front gain is one and rear gain is zero. When the virtual source is located between the rear two speakers (3b), the front gain is zero (or close to zero) and the rear gain is one. When the virtual source is located between one of the side speaker pairs (2b, 4b), the front gains are proportional to the fraction of the arc between the front and rear speaker spanned by the virtual source. The front gain varies from one to zero (or close to zero) as the virtual source azimuth angle φ moves from the front speakers 22a, 22b to the rear speakers 22c, 22d. Rear gains vary similarly, except that they vary from zero to one over the same range of source azimuth angles φ.
Sources located off the horizontal plane of the ears behave similarly, but with some adjustments that aid the perception of elevation. For elevation angles of plus or minus 90 degrees (i.e., directly above or below the listener), front and rear gains are adjusted to produce equal perceived energy contributions from all four speakers. As elevation angle varies from zero degrees to plus or minus 90 degrees, the front and rear gains vary smoothly from the horizontal plane case to the plus or minus 90 degrees case, maintaining a constant perceived power level (e.g., source trajectories maintain the same distance from the listener).
The following provides an example of a method for calculating front gains and rear gains based on the position of the sound source relative to the listener. In the following calculations, the front speakers 22a, 22b are located at ±π/4 and the rear speakers are positioned at ±3π/4 (FIG. 6).
When the source is located within the region defined by at ±π/4 (i.e., location between front left and right speakers) sound is generated only from the front speakers. If the sound moves rearward from these points it contributes to the rear gain. The point at which sound is first applied at the rear speakers (e.g., π/4) is called the rear pan start angle. In the following equations, the rear pan start angle is defined as π/4 and the rear speaker angle is defined as 3π/4. It is to be understood that the rear pan start angle may be different than the location of one of the front speakers.
The following provides an example of calculations for the front gain (Front Gain) and rear gain (Rear Gain) (for front to rear panning) and the left and right rear speaker gains (Left Rear Gain, Right Rear Gain) (for left to right panning). The front gain is preferably applied at the mixer/scalers 38, 52 of
In calculating the front gain for the front speakers 22a, 22b, the speakers are attenuated equally depending on the source location. At elevation (θ)=0, gain is only a function of φ. At elevation (θ)=±π/2, gain is independent of azimuth angle (φ). At elevations between 0 and π/2, the gain varies smoothly between the elevation=±90 gain and the elevation=0 gain for the given azimuth value. The front gain, when elevation is equal to zero, is calculated based on the azimuth angle of the virtual source. The first sector 1a is defined as a region between the front two speakers 22a, 22b (i.e., rear pan start angle >φ≧2π—rear pan start angle). The front attenuation of the front speakers (Front Atten) in sector la is equal to one.
The second sector 2a is defined as a region between the right front speaker 22b and π (i.e., π>φ≧ rear pan start angle). For sector 2a, front attenuation is defined as max(cos 1.2 * Ω1,0) where:
The third sector 3a includes the region between the left front speaker 22a and π (i.e., 2π—rear pan start angle >φ≧π). The front attenuation is defined as max(cos 1.2* Ω2,0) where:
The contribution from elevation is calculated as
The rear gain is calculated to produce equal perceived energy contributions from all the speakers while maintaining the same ratio of left to right rear volume. At θ=0, gains are purely a function of azimuth angle φ. At θ=±90, gains are independent of azimuth angle φ. For elevations between these extremes, the gains vary smoothly between the elevation =±90 gain and the elevation=0 gain for the given azimuth value. For any source position, the perceived energy coming from all four speakers preferably equals the perceived energy produced by the front speakers when the front gain is equal to one. Thus, when the front gain is less then one, the rear gain is scaled such that the perceived energy remains constant. The rear gain applied by the mixer/scalers 42, 54 is thus calculated so that the perceived energy coming from all four speakers is generally constant:
The following describes calculations used to determine the left and right rear gains applied at the filters 42, 55. The listening environment shown in
If the source is between the front left and right speakers 22a, 22b in sector 1b (i.e., rear pan start angle >φ≧2π—rear pan start angle) and
If the source is between the front right and rear right speakers 22b, 22d in sector 2b (i.e., rear speaker angle >φ≧ rear pan start angle):
If the source is between the rear left and right speakers 22c, 22d in sector 3b (i.e., 2* π—rear speaker angle >φ≧ rear speaker angle) then:
If the source is between front left speaker 22a and rear left speaker 22c in sector 4b (i.e., 2π—rear pan start angle >φ≧2π—rear speaker angle):
The Left and Right Rear gains are then calculated to transition between elevation angles θ=0 and ±90 degrees:
The Left Rear Gain and Right Rear Gain are applied at the filters 42, 55. The rear signals are then further modified by the Rear Gain at the mixer/scalers 44, 54 to produce equal perceived energy contributions from all the speakers while maintaining the same ratio of left to right rear volume.
It is to be understood that the above equations and plot shown in
In view of the above, it will be seen that the several objects of the invention are achieved and other advantageous results attained.
As various changes could be made in the above constructions and methods without departing from the scope of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
The present application claims the benefit of U.S. Provisional Application Ser. No. 60/152,152, filed August 31, 1999.
Number | Name | Date | Kind |
---|---|---|---|
3236949 | Atal et al. | Feb 1966 | A |
4975954 | Cooper et al. | Dec 1990 | A |
5034983 | Cooper et al. | Jul 1991 | A |
5136651 | Cooper et al. | Aug 1992 | A |
6577736 | Clemow | Jun 2003 | B1 |
Number | Date | Country | |
---|---|---|---|
60152152 | Aug 1999 | US |