1. Field
The present disclosure generally relates to audio signal processing, and more particularly, to systems and methods for filtering location-critical portions of audible frequency range to simulate three-dimensional listening effects.
2. Description of the Related Art
Sound signals can be processed to provide enhanced listening effects. For example, various processing techniques can make a sound source be perceived as being positioned or moving relative to a listener. Such techniques allow the listener to enjoy a simulated three-dimensional listening experience even when using speakers having limited configuration and performance.
However, many sound perception enhancing techniques are complicated, and often require substantial computing power and resources. Thus, use of these techniques are impractical or impossible when applied to many electronic devices having limited computing power and resources. Much of the portable devices such as cell phones, PDAs, MP3 players, and the like, generally fall under this category.
At least some of the foregoing problems can be addressed by various embodiments of systems and methods for audio signal processing as disclosed herein. In one embodiment, a discrete number of simple digital filters can be generated for particular portions of an audio frequency range. Studies have shown that certain frequency ranges are particularly important for human ears' location-discriminating capability, while other ranges are generally ignored. Head-Related Transfer Functions (HRTFs) are examples response functions that characterize how ears perceive sound positioned at different locations. By selecting one or more “location-critical” portions of such response functions, one can construct simple filters that can be used to simulate hearing where location-discriminating capability is substantially maintained. Because the filters can be simple, they can be implemented in devices having limited computing power and resources to provide location-discrimination responses that form the basis for many desirable audio effects.
One embodiment of the present disclosure relates to a method for processing digital audio signals. The method includes receiving one or more digital signals, with each of the one or more digital signals having information about spatial position of a sound source relative to a listener. The method further includes selecting one or more digital filters, with each of the one or more digital filters being formed from a particular range of a hearing response function. The method further includes applying the one or more filters to the one or more digital signals so as to yield corresponding one or more filtered signals, with each of the one or more filtered signals having a simulated effect of the hearing response function applied to the sound source.
In one embodiment, the hearing response function includes a head-related transfer function (HRTF). In one embodiment, the particular range includes a particular range of frequency within the HRTF. In one embodiment, the particular range of frequency is substantially within or overlaps with a range of frequency that provides a location-discriminating sensitivity to an average human's hearing that is greater than an average sensitivity among an audible frequency. In one embodiment, the particular range of frequency includes or substantially overlaps with a peak structure in the HRTF. In one embodiment, the peak structure is substantially within or overlaps with a range of frequency between about 2.5 KHz and about 7.5 KHz. In one embodiment, the peak structure is substantially within or overlaps with a range of frequency between about 8.5 KHz and about 18 KHz.
In one embodiment, the one or more digital signals include left and right digital signals to be output to left and right speakers. In one embodiment, the left and right digital signals are adjusted for interaural time difference (ITD) based on the spatial position of the sound source relative to the listener. In one embodiment, the ITD adjustment includes receiving a mono input signal having information about the spatial position of the sound source. The ITD adjustment further includes determining a time difference value based on the spatial information. The ITD adjustment further includes generating left and right signals by introducing the time difference value to the mono input signal.
In one embodiment, the time difference value includes a quantity that is proportional to absolute value of sin □ cos □, where □ represents an azimuthal angle of the sound source relative to the front of the listener, and □ represents an elevation angle of the sound source relative to a horizontal plane defined by the listener's ears and the front direction. In one embodiment, the quantity is expressed as |(Maximum_ITD_Samples_per_Sampling_Rate−1)sin □ cos □|.
In one embodiment, the determination of time difference value is performed when the spatial position of the sound source changes. In one embodiment, the method further includes performing a crossfade transition of the time difference value between the previous value and the current value. In one embodiment, the crossfade transition includes changing the time difference value for use in the generation of left and right signals from the previous value to the current value during a plurality of processing cycles.
In one embodiment, the one or more filtered signals include left and right filtered signals to be output to left and right speakers. In one embodiment, the method further includes adjusting each of the left and right filtered signals for interaural intensity difference (IID) to account for any intensity differences that may exist and not accounted for by the application of one or more filters. In one embodiment, the adjustment of the left and right filtered signals for IID includes determining whether the sound source is positioned at left or right relative to the listener. The adjustment further includes assigning as a weaker signal the left or right filtered signal that is on the opposite side as the sound source. The adjustment further includes assigning as a stronger signal the other of the left or right filtered signal. The adjustment further includes adjusting the weaker signal by a first compensation. The adjustment further includes adjusting the stronger signal by a second compensation.
In one embodiment, the first compensation includes a compensation value that is proportional to cos □, where □ represents an azimuthal angle of the sound source relative to the front of the listener. In one embodiment, the compensation value is normalized such that if the sound source is substantially directly in the front, the compensation value can be an original filter level difference, and if the sound source is substantially directly on the stronger side, the compensation value is approximately 1 so that no gain adjustment is made to the weaker signal.
In one embodiment, the second compensation includes a compensation value that is proportional to sin □, where □ represents an azimuthal angle of the sound source relative to the front of the listener. In one embodiment, the compensation value is normalized such that if the sound source is substantially directly in the front, the compensation value is approximately 1 so that no gain adjustment is made to the stronger signal, and if the sound source is substantially directly on the weaker side, the compensation value is approximately 2 thereby providing an approximately 6 dB gain compensation to approximately match an overall loudness at different values of the azimuthal angle.
In one embodiment, the adjustment of the left and right filtered signals for IID is performed when new one or more digital filters are applied to the left and right filtered signals due to selected movements of the sound source. In one embodiment, the method further includes performing a crossfade transition of the first and second compensation values between the previous values and the current values. In one embodiment, the crossfade transition includes changing the first and second compensation values during a plurality of processing cycles.
In one embodiment, the one or more digital filters include a plurality of digital filters. In one embodiment, each of the one or more digital signals is split into the same number of signals as the number of the plurality of digital filters such that the plurality of digital filters are applied in parallel to the plurality of split signals. In one embodiment, the each of one or more filtered signals is obtained by combining the plurality of split signals filtered by the plurality of digital filters. In one embodiment, the combining includes summing of the plurality of split signals.
In one embodiment, the plurality of digital filters include first and second digital filters. In one embodiment, each of the first and second digital filters includes a filter that yields a response that is substantially maximally flat in a passband portion and rolls off towards substantially zero in a stopband portion of the hearing response function. In one embodiment, each of the first and second digital filters includes a Butterworth filter. In one embodiment, the passband portion for one of the first and second digital filters is defined by a frequency range between about 2.5 KHz and about 7.5 KHz. In one embodiment, the passband portion for one of the first and second digital filters is defined by a frequency range between about 8.5 KHz and about 18 KHz.
In one embodiment, the selection of the one or more digital filters is based on a finite number of geometric positions about the listener. In one embodiment, the geometric positions include a plurality of hemi-planes, each hemi-plane defined by an edge along a direction between the ears of the listener and by an elevation angle □ relative to a horizontal plane defined by the ears and the front direction for the listener. In one embodiment, the plurality of hemi-planes are grouped into one or more front hemi-planes and one or more rear hemi-planes. In one embodiment, the front hemi-planes include hemi-planes at front of the listener and at elevation angles of approximately 0 and +/−45 degrees, and the rear hemi-planes include hemi-planes at rear of the listener and at elevation angles of approximately 0 and +/−45 degrees.
In one embodiment, the method further includes performing at least one of the following processing steps either before the receiving of the one or more digital signals or after the applying of the one or more filters: sample rate conversion, Doppler adjustment for sound source velocity, distance adjustment to account for distance of the sound source to the listener, orientation adjustment to account for orientation of the listener's head relative to the sound source, or reverberation adjustment.
In one embodiment, the application of the one or more digital filters to the one or more digital signals simulates an effect of motion of the sound source about the listener.
In one embodiment, the application of the one or more digital filters to the one or more digital signals simulates an effect of placing the sound source at a selected location about the listener. In one embodiment, the method further includes simulating effects of one or more additional sound sources to simulate an effect of a plurality of sound sources at selected locations about the listener. In one embodiment, the one or more digital signals include left and right digital signals to be output to left and right speakers and the plurality of sound sources include more than two sound sources such that effects of more than two sound sources are simulated with the left and right speakers. In one embodiment, the plurality of sound sources include five sound sources arranged in a manner similar to one of surround sound arrangements, and wherein the left and right speakers are positioned in a headphone, such that surround sound effects are simulated by the left and right filtered signals provided to the headphone.
Another embodiment of the present disclosure relates to a positional audio engine for processing digital signal representative of a sound from a sound source. The audio engine includes a filter selection component configured to select one or more digital filters, with each of the one or more digital filters being formed from a particular range of a hearing response function, the selection based on spatial position of the sound source relative to a listener. The audio engine further includes a filter application component configured to apply the one or more digital filters to one or more digital signals so as to yield corresponding one or more filtered signals, with each of the one or more filtered signals having a simulated effect of the hearing response function applied to the sound from the sound source.
In one embodiment, the hearing response function includes a head-related transfer function (HRTF). In one embodiment, the particular range includes a particular range of frequency within the HRTF. In one embodiment, the particular range of frequency is substantially within or overlaps with a range of frequency that provides a location-discriminating sensitivity to an average human's hearing that is greater than an average sensitivity among an audible frequency. In one embodiment, the particular range of frequency includes or substantially overlaps with a peak structure in the HRTF. In one embodiment, the peak structure is substantially within or overlaps with a range of frequency between about 2.5 KHz and about 7.5 KHz. In one embodiment, the peak structure is substantially within or overlaps with a range of frequency between about 8.5 KHz and about 18 KHz.
In one embodiment, the one or more digital signals include left and right digital signals such that the one or more filtered signals include left and right filtered signals to be output to left and right speakers.
In one embodiment, the one or more digital filters include a plurality of digital filters. In one embodiment, each of the one or more digital signals is split into the same number of signals as the number of the plurality of digital filters such that the plurality of digital filters are applied in parallel to the plurality of split signals. In one embodiment, the each of one or more filtered signals is obtained by combining the plurality of split signals filtered by the plurality of digital filters. In one embodiment, the combining includes summing of the plurality of split signals.
In one embodiment, the plurality of digital filters include first and second digital filters. In one embodiment, each of the first and second digital filters includes a filter that yields a response that is substantially maximally flat in a passband portion and rolls off towards substantially zero in a stopband portion of the hearing response function. In one embodiment, each of the first and second digital filters includes a Butterworth filter. In one embodiment, the passband portion for one of the first and second digital filters is defined by a frequency range between about 2.5 KHz and about 7.5 KHz. In one embodiment, the passband portion for one of the first and second digital filters is defined by a frequency range between about 8.5 KHz and about 18 KHz.
In one embodiment, the selection of the one or more digital filters is based on a finite number of geometric positions about the listener. In one embodiment, the geometric positions include a plurality of hemi-planes, each hemi-plane defined by an edge along a direction between the ears of the listener and by an elevation angle □ relative to a horizontal plane defined by the ears and the front direction for the listener. In one embodiment, the plurality of hemi-planes are grouped into one or more front hemi-planes and one or more rear hemi-planes. In one embodiment, the front hemi-planes include hemi-planes at front of the listener and at elevation angles of approximately 0 and +/−45 degrees, and the rear hemi-planes include hemi-planes at rear of the listener and at elevation angles of approximately 0 and +/−45 degrees.
In one embodiment, the application of the one or more digital filters to the one or more digital signals simulates an effect of motion of the sound source about the listener.
In one embodiment, the application of the one or more digital filters to the one or more digital signals simulates an effect of placing the sound source at a selected location about the listener.
Yet another embodiment of the present disclosure relates to a system for processing digital audio signals. The system includes an interaural time difference (ITD) component configured to receive a mono input signal and generate left and right ITD-adjusted signals to simulate an arrival time difference of sound arriving at left and right ears of a listener from a sound source. The mono input signal includes information about spatial position of the sound source relative the listener. The system further includes a positional filter component configured to receive the left and right ITD-adjusted signals, apply one or more digital filters to each of the left and right ITD-adjusted signals to generate left and right filtered digital signals, with each of the one or more digital filters being based on a particular range of a hearing response function, such that the left and right filtered digital signals simulate the hearing response function. The system further includes an interaural intensity difference (IID) component configured to receive the left and right filtered digital signals and generate left and right IID-adjusted signal to simulate an intensity difference of the sound arriving at the left and right ears.
In one embodiment, the hearing response function includes a head-related transfer function (HRTF). In one embodiment, the particular range includes a particular range of frequency within the HRTF. In one embodiment, the particular range of frequency is substantially within or overlaps with a range of frequency that provides a location-discriminating sensitivity to an average human's hearing that is greater than an average sensitivity among an audible frequency. In one embodiment, the particular range of frequency includes or substantially overlaps with a peak structure in the HRTF. In one embodiment, the peak structure is substantially within or overlaps with a range of frequency between about 2.5 KHz and about 7.5 KHz. In one embodiment, the peak structure is substantially within or overlaps with a range of frequency between about 8.5 KHz and about 18 KHz.
In one embodiment, the ITD includes a quantity that is proportional to absolute value of sin □ cos □, where □ represents an azimuthal angle of the sound source relative to the front of the listener, and □ represents an elevation angle of the sound source relative to a horizontal plane defined by the listener's ears and the front direction.
In one embodiment, the ITD determination is performed when the spatial position of the sound source changes. In one embodiment, the ITD component is further configured to perform a crossfade transition of the ITD between the previous value and the current value. In one embodiment, the crossfade transition includes changing the ITD from the previous value to the current value during a plurality of processing cycles.
In one embodiment, the ITD component is configured to determine whether the sound source is positioned at left or right relative to the listener. The ITD component is further configured to assign as a weaker signal the left or right filtered signal that is on the opposite side as the sound source. The ITD component is further configured to assign as a stronger signal the other of the left or right filtered signal. The ITD component is further configured to adjust the weaker signal by a first compensation. The ITD component is further configured to adjust the stronger signal by a second compensation.
In one embodiment, the first compensation includes a compensation value that is proportional to cos □, where □ represents an azimuthal angle of the sound source relative to the front of the listener. In one embodiment, the second compensation includes a compensation value that is proportional to sin □, where □ represents an azimuthal angle of the sound source relative to the front of the listener.
In one embodiment, the adjustment of the left and right filtered signals for IID is performed when new one or more digital filters are applied to the left and right filtered signals due to selected movements of the sound source. In one embodiment, the ITD component is further configured to perform a crossfade transition of the first and second compensation values between the previous values and the current values. In one embodiment, the crossfade transition includes changing the first and second compensation values during a plurality of processing cycles.
In one embodiment, the one or more digital filters include a plurality of digital filters. In one embodiment, each of the one or more digital signals is split into the same number of signals as the number of the plurality of digital filters such that the plurality of digital filters are applied in parallel to the plurality of split signals. In one embodiment, the each of the left and right filtered digital signals is obtained by combining the plurality of split signals filtered by the plurality of digital filters. In one embodiment, the combining includes summing of the plurality of split signals.
In one embodiment, the plurality of digital filters include first and second digital filters. In one embodiment, each of the first and second digital filters includes a filter that yields a response that is substantially maximally flat in a passband portion and rolls off towards substantially zero in a stopband portion of the hearing response function. In one embodiment, each of the first and second digital filters includes a Butterworth filter. In one embodiment, the passband portion for one of the first and second digital filters is defined by a frequency range between about 2.5 KHz and about 7.5 KHz. In one embodiment, the passband portion for one of the first and second digital filters is defined by a frequency range between about 8.5 KHz and about 18 KHz.
In one embodiment, the positional filter component is further configured to select the one or more digital filters based on a finite number of geometric positions about the listener. In one embodiment, the geometric positions include a plurality of hemi-planes, each hemi-plane defined by an edge along a direction between the ears of the listener and by an elevation angle □ relative to a horizontal plane defined by the ears and the front direction for the listener. In one embodiment, the plurality of hemi-planes are grouped into one or more front hemi-planes and one or more rear hemi-planes. In one embodiment, the front hemi-planes include hemi-planes at front of the listener and at elevation angles of approximately 0 and +/−45 degrees, and the rear hemi-planes include hemi-planes at rear of the listener and at elevation angles of approximately 0 and +/−45 degrees.
In one embodiment, the system further includes at least one of the following: a sample rate conversion component, a Doppler adjustment component configured to simulate sound source velocity, a distance adjustment component configured to account for distance of the sound source to the listener, an orientation adjustment component configured to account for orientation of the listener's head relative to the sound source, or a reverberation adjustment component to simulate reverberation effect.
Yet another embodiment of the present disclosure relates to a system for processing digital audio signals. The system includes a plurality of signal processing chains, with each chain including an interaural time difference (ITD) component configured to receive a mono input signal and generate left and right ITD-adjusted signals to simulate an arrival time difference of sound arriving at left and right ears of a listener from a sound source. The mono input signal includes information about spatial position of the sound source relative the listener. Each chain further includes a positional filter component configured to receive the left and right ITD-adjusted signals, apply one or more digital filters to each of the left and right ITD-adjusted signals to generate left and right filtered digital signals, with each of the one or more digital filters being based on a particular range of a hearing response function, such that the left and right filtered digital signals simulate the hearing response function. Each chain further includes an interaural intensity difference (IID) component configured to receive the left and right filtered digital signals and generate left and right IID-adjusted signal to simulate an intensity difference of the sound arriving at the left and right ears.
Yet another embodiment of the present disclosure relates to an apparatus having a means receiving one or more digital signals. The apparatus further includes a means for selecting one or more digital filters based on information about spatial position of a sound source. The apparatus further includes a means for applying the one or more filters to the one or more digital signals so as to yield corresponding one or more filtered signals that simulate an effect of a hearing response function.
Yet another embodiment of the present disclosure relates to an apparatus having a means for forming one or more electronic filters, and a means for applying the one or more electronic filters to a sound signal so as to simulate a three-dimensional sound effect.
These and other aspects, advantages, and novel features of the present teachings will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. In the drawings, similar elements have similar reference numerals.
The present disclosure generally relates to audio signal processing technology. In some embodiments, various features and techniques of the present disclosure can be implemented on audio or audio/visual devices. As described herein, various features of the present disclosure allow efficient processing of sound signals, so that in some applications, realistic positional sound imaging can be achieved even with limited signal processing resources. As such, in some embodiments, sound having realistic impact on the listener can be output by portable devices such as handheld devices where computing power may be limited. It will be understood that various features and concepts disclosed herein are not limited to implementations in portable devices, but can be implemented in any electronic devices that process sound signals.
As also shown in
In one embodiment, a positional audio engine 104 can generate and provide signal 106 to the speakers 108 to achieve such a listening effect. Various embodiments and features of the positional audio engine 104 are described below in greater detail.
In some embodiments, such audio perception combined with corresponding visual perception (from a screen, for example) can provide an effective and powerful sensory effect to the listener. Thus, for example, a surround-sound effect can be created for a listener listening to a handheld device through a headphone. Various embodiments and features of the positional audio engine 104 are described below in greater detail.
Other configurations are possible. For example, various concepts and features of the present disclosure can be implemented for processing of signals in analog systems. In such systems, analog equivalents of positional filters can be configured based on location-critical information in a manner similar to the various techniques described herein. Thus, it will be understood that various concepts and features of the present disclosure are not limited to digital systems.
For the purpose of description, “location-critical” means a portion of human hearing response spectrum (for example, a frequency response spectrum) where sound source location discrimination is found to be particularly acute. HRTF is an example of a human hearing response spectrum. Studies (for example, “A comparison of spectral correlation and local feature-matching models of pinna cue processing” by E. A. Macperson, Journal of the Acoustical Society of America, 101, 3105, 1997) have shown that human listeners generally do not process entire HRTF information to distinguish where sound is coming from. Instead, they appear to focus on certain features in HRTFs. For example, local feature matches and gradient correlations in frequencies over 4 KHz appear to be particularly important for sound direction discrimination, while other portions of HRTFs are generally ignored.
Simulated filter responses 180 corresponding to the HRTFs 170 can result from the filter coefficients determined in the process block 194. As shown, peaks 186, 188, 182, and 184 (and the corresponding valleys) are replicated so as to provide location-critical responses for location discrimination of the sound source. Other portions of the HRTFs 170 are shown to be generally ignored, thereby represented as substantially flat responses at lower frequencies.
Because only certain portion(s) and/or structure(s) are selected (in this example, the two peaks and related valley), formation of filter responses (for example, determination of the filter coefficients that yields the example simulated responses 180) can be simplified greatly. Moreover, such filter coefficients can be stored and used subsequently in a greatly simplified manner, thereby substantially reducing the computing power required to effectuate realistic location-discriminating sound output to a listener. Specific examples of filter coefficient determination and subsequent use are described below in greater detail.
In the description herein, filter coefficient determination and subsequent use are described in the context of the example two-peak selection. It will be understood, however, that in some embodiments, other portion(s) and/or feature(s) of HRTFs can be identified and simulated. So for example, if a given HRTF has three peaks that can be location-critical, those three peaks can be identified and simulated. Accordingly, three filters can represent those three peaks instead of two filters for the two peaks.
In one embodiment, the selected features and/or ranges of the HRTFs (or other frequency response curves) can be simulated by obtaining filter coefficients that generate an approximated response of the desired features and/or ranges. Such filter coefficients can be obtained using any number of known techniques.
In one embodiment, simplification that can be provided by the selected features (for example, peaks) allows use of simplified filtering techniques. In one embodiment, fast and simple filtering, such as infinite impulse response (IIR), can be utilized to simulate the response of a limited number of selected location-critical features.
By way of example, the two example peaks (172 and 174 for the left hearing, and 176 and 178 for the right hearing) of the example HRTFs 170 can be simulated using a known Butterworth filtering technique. Coefficients for such known filters can be obtained using any known techniques, including, for example, signal processing applications such as MATLAB. Table 1 shows examples of MATLAB function calls that can return simulated responses of the example HRTFs 170.
In one embodiment, the foregoing example IIR filter responses to the selected peaks of the example HRTFs 170 can yield the simulated responses 180. The corresponding filter coefficients can be stored for subsequent use, as indicated in the process block 196 of the process 190.
As previously stated, the example HRTFs 170 and simulated responses 180 correspond to a sound source located at front at about 45 degrees to the right (at about the ear level). Response(s) to other source location(s) can be obtained in a similar manner to provide a two or three-dimensional response coverage about the listener. Specific filtering examples for other sound source locations are described below in greater detail.
In one embodiment, as shown in
In one embodiment, as described below in greater detail, various hemi-planes can be above and/or below the horizontal to account for sound sources above and/or below the ear level. For a given hemi-plane, a response obtained for one side (e.g., right side) can be used to estimate the response at the mirror image location (about the Y-Z plane) on the other side (e.g., left side) by way of symmetry of the listener's head. In one embodiment, because such symmetry does not exist for front and rear, separate responses can be obtained for the front and rear (and thus the front and rear hemi-planes).
In one embodiment, sound sources about the listener can be approximated as being on one of the foregoing hemi-planes. Each hemi-plane can have a set of filter coefficients that simulate response of sound sources on that hemi-plane. Thus, the example simulated response described above in reference to
Note that in the example simulated response 384, a bandstop Butterworth filtering can be used to obtain a desired approximation of the identified features. Thus, it should be understood that various types of filtering techniques can be used to obtain desired results. Moreover, filters other than Butterworth filters can be used to achieve similar results. Moreover, although IIR filter are used to provide fast and simple filtering, at least some of the techniques of the present disclosure can also be implemented using other filters (such as finite impulse response (FIR) filters).
For the foregoing example hemi-plane configuration (□=+45°, 0°, −45°), Table 2 lists filtering parameters that can be input to obtain filter coefficients for the six hemi-planes (366, 362, 370, 372, 364, and 368). For the example parameters in Table 2 (as in Table 1), the example Butterworth filter function call can be made in MATLAB as:
“butter(Order,[fLow/(SamplingRate/2),fHigh/(SamplingRate/2),Type)”
where Order represents the highest order of filter terms, fLow and fHigh represent the boundary values of the selected frequency range, and SamplingRate represents the sampling rate, and Type represents the filter type, for each given filter. Other values and/or types for filter parameters are also possible.
In one embodiment, as seen in Table 2, each hemi-plane can have four sets of filter coefficients: two filters for the two example location-critical peaks, for each of left and right. Thus, with six hemi-planes, there can be 24 filters.
In one embodiment, same filter coefficients can be used to simulate responses to sound from sources anywhere on a given hemi-plane. As described below in greater detail, effects due to left-right displacement, distance, and/or velocity of the source can be accounted for and adjusted. If a source moves from one hemi-plane to another hemi-plane, transition of filter coefficients can be implemented, in a manner described below, so as to provide a smooth transition in the perceived sound.
In one embodiment, if a given sound source is located at a location somewhere between two hemi-planes (for example, the source is at front, □=+30°), then the source can be considered to be at the “nearest” plane (for example, the nearest hemi-plane would be the front, □=)+45°. As one can see, it may be desirable in certain situations to provide more or less hemi-planes in space about the listener, so as to provide less or more “granularity” in distribution of hemi-planes.
Moreover, the three-dimensional space does not necessarily need to be divided into hemi-planes about the X-axis. The space could be divided into any one, two, or three dimensional geometries relative to a listener. In one embodiment, as done in the hemi-planes about the X-axis, symmetries such as left and right hearings can be utilized to reduce the number of sets of filter coefficients.
It will be understood that the six hemi-plane configuration (□=+45°, 0°, −45°) described above is an example of how selected location-critical response information can be provided for a limited number of orientations relative to a listener. By doing so, substantially realistic three-dimensional sound effects can be reproduced using relatively little computing power and/or resources. Even if the number of hemi-planes are increased for finer granularity—say to ten (front and rear at □=+60°, +30°, 0°, −30°, −60°)—the number of sets of filter coefficients can be maintained at a manageable level.
In one embodiment, the ITD component 224 can output left and right signals that take into account the arrival difference, and such output signals can be provided to the positional-filters component 226. An example operation of the positional-filters component 226 is described below in greater detail.
In one embodiment, the positional-filters component 226 can output left and right signals that have been adjusted for the location-critical responses. Such output signals can be provided into a component 228 that determines an interaural intensity difference (“IID”). IID can provide adjustments of the positional-filters outputs to adjust for position-dependence in the intensities of the left and right signals. An example of IID compensation is described below in greater detail. Left and right signals 230 can be output by the IID component 228 to speakers to provide positional effect of the sound source.
The input signal 242 is shown to be provided to an ITD calculation component 244 that calculates interaural time delay needed to simulate different arrival times (if the source is located to one side) at the left and right ears. In one embodiment, the ITD can be calculated as
ITD=|(Maximum_ITD_Samples_per_Sampling_Rate−1)sin □ cos □|. (1)
Thus, as expected, ITD=0 when a source is either directly in front (□=0°) or directly at rear (□=180°); and ITD has a maximum value (for a given value of □) when the source is either directly to the left (□=270°) or to the right (□=90°). Similarly, ITD has a maximum value (for a given value of □) when the source is at the horizontal plane (□=0°), and zero when the source is either at top (□=90°) or bottom (□=−90°) locations.
The ITD determined in the foregoing manner can be introduced to the input signal 242 so as to yield left and right signals that are ITD adjusted. For example, if the source location is on the right side, the right signal can have the ITD subtracted from the timing of the sound in the input signal. Similarly, the left signal can have the ITD added to the timing of the sound in the input signal. Such timing adjustments to yield left and right signals can be achieved in a known manner, and are depicted as left and right delay lines 246a and 246b.
If a sound source is substantially stationary relative to the listener, the same ITD can provide the arrival-time based three-dimensional sound effect. If a sound source moves, however, the ITD may also change. If a new value of ITD is incorporated into the delay lines, there may be a sudden change from the previous ITD based delays, possibly resulting in a detectable shift in the perception of ITDs.
In one embodiment, as shown in
As shown in
As shown in
For example, suppose that a sound source is located at □=10° and □=+10°. In such a situation, the front horizontal hemi-plane (362 in
As shown in
As described herein, the two left filters and two right filters are in the context of the two example location-critical peaks. It will be understood that other numbers of filters are possible. For example, if there are three location-critical features and/or ranges in the frequency responses, there may be three filters for each of the left and right sides.
As shown in
In one embodiment, the example gain values listed in Table 3 can be assigned to substantially maintain a correct level difference between left and right signals at the three example elevations. Thus, these example gains can be used to provide correct levels in left and right processes, each of which, in this example, includes a 3-way summation of filter outputs (from first and second filters 266 and 268) and a scaled input (from gain component 270).
In one embodiment, as shown in
In one embodiment, the IID component 280 can adjust the intensity of the weaker channel signal in a first compensation component 284, and also adjust the intensity of the stronger channel signal in a second compensation component 286. For example, suppose that a sound source is located at □=10° (that is, to the right side by 10 degrees). In such a situation, the right channel can be considered to be the stronger channel, and the left channel the weaker channel. Thus, the first compensation 284 can be applied to the left signal, and the second compensation 286 to the right signal.
In one embodiment, the level of the weaker channel signal can be adjusted by an amount given as
Gain=|cos □(Fixed_Filter_Level_Difference_per_Elevation−1.0)|+1.0. (2)
Thus, if □=0 degree (directly in front), the gain of the weaker channel is adjusted by the original filter level difference. If □=90 degrees (directly to the right), Gain=1, and no gain adjustment is made to the weaker channel.
In one embodiment, the level of the stronger channel signal can be adjusted by an amount given as
Gain=sin □+1.0. (3)
Thus, if □=0 degree (directly in front), Gain=1, and no gain adjustment is made to the stronger channel. If □=90 degrees (directly to the right), Gain=2, thereby providing a 6 dB gain compensation to roughly match the overall loudness at different values of □.
If a sound source is substantially stationary or moves substantially within a given hemi-plane, the same filters can be used to generate filter responses. Intensity compensations for weaker and stronger hearing sides can be provided by the IID compensations as described above. If a sound source moves from one hemi-plane to another hemi-plane, however, the filters can also change. Thus, IIDs that are based on the filter levels may not provide compensations in such a way as to make a smooth hemi-plane transition. Such a transition can result in a detectable sudden shift in intensity as the sound source moves between hemi-planes.
Thus, in one embodiment as shown in
As shown in
In one embodiment, the process 300 can further include a process block where crossfading is performed on the left and right ITD adjusted signals to account for motion of the sound source.
In a decision block 314, the process 310 determines whether the sound source is at the front and to the right (“F.R.”). If the answer is “Yes,” front filters (at appropriate elevation) are applied to the left and right data in a process block 316. The filter-applied data and the gain adjusted data are summed to generate position-filters output signals. Because the source is at the right side, the right data is the stronger channel, and the left data is the weaker channel. Thus, in a process block 318, first compensation gain (Equation 2) is applied to the left data. In a process block 320, second compensation gain (Equation 3) is applied to the right data. The position filtered and gain adjusted left and right signals are output in a process block 322.
If the answer to the decision block 314 is “No,” the sound source is not at the front and to the right. Thus, the process 310 proceeds to other remaining quadrants.
In a decision block 324, the process 310 determines whether the sound source is at the rear and to the right (“R.R.”). If the answer is “Yes,” rear filters (at appropriate elevation) are applied to the left and right data in a process block 326. The filter-applied data and the gain adjusted data are summed to generate position-filters output signals. Because the source is at the right side, the right data is the stronger channel, and the left data is the weaker channel. Thus, in a process block 328, first compensation gain (Equation 2) is applied to the left data. In a process block 330, second compensation gain (Equation 3) is applied to the right data. The position filtered and gain adjusted left and right signals are output in a process block 332.
If the answer to the decision block 324 is “No,” the sound source is not at F.R. or R.R. Thus, the process 310 proceeds to other remaining quadrants.
In a decision block 334, the process 310 determines whether the sound source is at the rear and to the left (“R.L.”). If the answer is “Yes,” rear filters (at appropriate elevation) are applied to the left and right data in a process block 336. The filter-applied data and the gain adjusted data are summed to generate position-filters output signals. Because the source is at the left side, the left data is the stronger channel, and the right data is the weaker channel. Thus, in a process block 338, second compensation gain (Equation 3) is applied to the left data. In a process block 340, first compensation gain (Equation 2) is applied to the right data. The position filtered and gain adjusted left and right signals are output in a process block 342.
If the answer to the decision block 334 is “No,” the sound source is not at F.R., R.R., or R.L. Thus, the process 310 proceeds with the sound source considered as being at the front and to the left (“F.L.”).
In a process block 346, front filters (at appropriate elevation) are applied to the left and right data. The filter-applied data and the gain adjusted data are summed to generate position-filters output signals. Because the source is at the left side, the left data is the stronger channel, and the right data is the weaker channel. Thus, in a process block 348, second compensation gain (Equation 3) is applied to the left data. In a process block 350, first compensation gain (Equation 2) is applied to the right data. The position filtered and gain adjusted left and right signals are output in a process block 352.
In a process block 392, mono input signal is obtained. In a process block 392, position-based ITD is determined and applied to the input signal. In a decision block 396, the process 390 determines whether the sound source has changed position. If the answer is “No,” data can be read from the left and right delay lines, have ITD delay applied, and written back to the delay lines. If the answer is “Yes,” the process 390 in a process block 400 determines a new ITD delay based on the new position. In a process block 402, crossfade can be performed to provide smooth transition between the previous and new ITD delays.
In one embodiment, crossfading can be performed by reading data from previous and current delay lines. Thus, for example, each time the process 390 is called, □ and □ values are compared with those in the history to determine whether the source location has changed. If there is no change, new ITD delay is not calculated; and the existing ITD delay is used (process block 398). If there is a change, new ITD delay is calculated (process block 400); and crossfading is performed (process block 402). In one embodiment, ITD crossfading can be achieved by gradually increasing or decreasing the ITD delay value from the previous value to the new value.
In one embodiment, the crossfading of the ITD delay values can be triggered when source's position change is detected, and the gradual change can occur during a plurality of processing cycles. For example, if the ITD delay has an old value ITDold, and a new value ITDnew, the crossfading transition can occur during N processing cycles: ITD(1)=ITDold, ITD(2)=ITDold+□ITD/N, . . . , ITD(N−1)=ITDold+□ITD(N−1)/N, ITD(N)=ITDnew; where □ITD=ITDnew−ITDold (assuming that ITDnew>ITDold).
As shown in
In a decision block 406, the process 390 determines whether there has been a change in the hemi-plane. If the answer is “No,” no crossfading of IID compensations is performed. If the answer is “Yes,” the process 390 in a process block 408 performs another positional filtering based on the previous values of □ and □. For the purpose of description of
In one embodiment, IID crossfading can be achieved by gradually increasing or decreasing the IID compensation gain value from the previous values to the new values, and/or the filter coefficients from the previous set to the new set. In one embodiment, the crossfading of the IID gain values can be triggered when a change in hemi-plane is detected, and the gradual changes of the IID gain values can occur during a plurality of processing cycles. For example, if a given IID gain has an old value IIDold, and a new value IIDnew, the crossfading transition can occur during N processing cycles: IID(1)=IIDold, IID(2)=IIDold+□IID/N, . . . , IID(N−1)=IIDold+□IID(N−1)/N, IID(N)=IIDnew; where □IID=IIDnew−IIDold (assuming that IIDnew>IIDold). Similar gradual changes can be introduced for the positional filter coefficients for crossfading positional filters.
As further shown in
In some embodiments, various features of the ITD, ITD crossfading, positional filtering, IID, IID crossfading, or combinations thereof, can be combined with other sound effect enhancing features.
As further shown in
In one embodiment, functionalities of the SRC 424, Doppler 426, Distance 428, Orientation 430, and Reverberation 440 components can be based on known techniques; and thus need not be described further.
In one embodiment, functionalities of the SRC 454, Doppler 456, Distance 458, Orientation 460, Downmix (470 and 474), and Reverberation (472 and 476) components can be based on known techniques; and thus need not be described further.
As shown in
As shown in
As seen by way of examples, various configurations are possible for incorporating the features of the ITD, positional filters, and/or IID with various other sound effect enhancing techniques. Thus, it will be understood that configurations other than those shown are possible.
In one embodiment, at least some portion of the 3D sound API 520 can reside in the program memory 516 of the system 510, and be under the control of a processor 514. In one embodiment, the system 510 can also include a display 512 component that can provide visual input to the listener. Visual cues provided by the display 512 and the sound processing provided by the API 520 can enhance the audio-visual effect to the listener/viewer.
As described herein, various features of positional filtering and associated processing techniques allow generation of realistic three-dimensional sound effect without heavy computation requirements. As such, various features of the present disclosure can be particularly useful for implementations in portable devices where computation power and resources may be limited.
For the example surround-sound configuration 560, positional-filtering can be configured to process five sound sources (for example, five processing chains in
Other implementations on portable as well as non-portable devices are possible.
In the description herein, various functionalities are described and depicted in terms of components or modules. Such depictions are for the purpose of description, and do not necessarily mean physical boundaries or packaging configurations. For example,
In general, it will be appreciated that the processors can include, by way of example, computers, program logic, or other substrate configurations representing data and instructions, which operate as described herein. In other embodiments, the processors can include controller circuitry, processor circuitry, processors, general purpose single-chip or multi-chip microprocessors, digital signal processors, embedded microprocessors, microcontrollers and the like.
Furthermore, it will be appreciated that in one embodiment, the program logic may advantageously be implemented as one or more components. The components may advantageously be configured to execute on one or more processors. The components include, but are not limited to, software or hardware components, modules such as software modules, object-oriented software components, class components and task components, processes methods, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
Although the above-disclosed embodiments have shown, described, and pointed out the fundamental novel features of the invention as applied to the above-disclosed embodiments, it should be understood that various omissions, substitutions, and changes in the form of the detail of the devices, systems, and/or methods shown may be made by those skilled in the art without departing from the scope of the invention. Consequently, the scope of the invention should not be limited to the foregoing description, but should be defined by the appended claims.
This application claims the benefit of priority under 35 U.S.C. §120 as a continuation of U.S. application Ser. No. 11/531,624, filed Sep. 13, 2006, now U.S. Pat. No. 8,027,477, which claims the benefit of priority under 35 U.S.C. §119(e) of U.S. Provisional Application No. 60/716,588 filed on Sep. 13, 2005 and titled SYSTEMS AND METHODS FOR AUDIO PROCESSING, the entirety of both of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4817149 | Myers | Mar 1989 | A |
4819269 | Klayman | Apr 1989 | A |
4836329 | Klayman | Jun 1989 | A |
4841572 | Klayman | Jun 1989 | A |
4866774 | Klayman | Sep 1989 | A |
5033092 | Sadaie | Jul 1991 | A |
5173944 | Begault | Dec 1992 | A |
5319713 | Waller, Jr. et al. | Jun 1994 | A |
5333201 | Waller, Jr. | Jul 1994 | A |
5438623 | Begault | Aug 1995 | A |
5491685 | Klein et al. | Feb 1996 | A |
5581618 | Satoshi et al. | Dec 1996 | A |
5592588 | Reekes et al. | Jan 1997 | A |
5638452 | Waller, Jr. | Jun 1997 | A |
5661808 | Klayman | Aug 1997 | A |
5742689 | Tucker et al. | Apr 1998 | A |
5771295 | Waller, Jr. | Jun 1998 | A |
5784468 | Klayman | Jul 1998 | A |
5809149 | Cashion et al. | Sep 1998 | A |
5835895 | Stokes, III | Nov 1998 | A |
5850453 | Klayman et al. | Dec 1998 | A |
5896456 | Desper | Apr 1999 | A |
5912976 | Klayman | Jun 1999 | A |
5943427 | Massie et al. | Aug 1999 | A |
5946400 | Matsuo | Aug 1999 | A |
5970152 | Klayman | Oct 1999 | A |
5974152 | Fujinami | Oct 1999 | A |
5995631 | Kamada et al. | Nov 1999 | A |
6035045 | Fujita et al. | Mar 2000 | A |
6072877 | Abel | Jun 2000 | A |
6078669 | Maher | Jun 2000 | A |
6091824 | Lin et al. | Jul 2000 | A |
6108626 | Cellario et al. | Aug 2000 | A |
6118875 | Moller et al. | Sep 2000 | A |
6195434 | Cashion et al. | Feb 2001 | B1 |
6281749 | Klayman et al. | Aug 2001 | B1 |
6285767 | Klayman | Sep 2001 | B1 |
6307941 | Tanner, Jr. et al. | Oct 2001 | B1 |
6385320 | Lee | May 2002 | B1 |
6421446 | Cashion et al. | Jul 2002 | B1 |
6504933 | Chung | Jan 2003 | B1 |
6553121 | Matsuo et al. | Apr 2003 | B1 |
6577736 | Clemow | Jun 2003 | B1 |
6590983 | Kraemer | Jul 2003 | B1 |
6741706 | McGrath et al. | May 2004 | B1 |
6763115 | Kobayashi | Jul 2004 | B1 |
6839438 | Riegelsberger et al. | Jan 2005 | B1 |
6993480 | Klayman | Jan 2006 | B1 |
7031474 | Yuen et al. | Apr 2006 | B1 |
7043031 | Klayman et al. | May 2006 | B2 |
7277767 | Yuen et al. | Oct 2007 | B2 |
7451093 | Kraemer | Nov 2008 | B2 |
7680288 | Melchior et al. | Mar 2010 | B2 |
7706543 | Daniel | Apr 2010 | B2 |
7720240 | Wang | May 2010 | B2 |
8027477 | Wang | Sep 2011 | B2 |
20010040968 | Mukojima | Nov 2001 | A1 |
20020006081 | Fujishita | Jan 2002 | A1 |
20020034307 | Kubota | Mar 2002 | A1 |
20020038158 | Hashimoto et al. | Mar 2002 | A1 |
20020097880 | Kirkeby | Jul 2002 | A1 |
20020161808 | Kamiya et al. | Oct 2002 | A1 |
20020196947 | Lapicque | Dec 2002 | A1 |
20040175005 | Roeck | Sep 2004 | A1 |
20040196991 | Iida et al. | Oct 2004 | A1 |
20040247132 | Klayman et al. | Dec 2004 | A1 |
20050117762 | Sakurai | Jun 2005 | A1 |
20050171989 | Koyanagi | Aug 2005 | A1 |
20050273324 | Yi | Dec 2005 | A1 |
20070061026 | Wang | Mar 2007 | A1 |
20090237564 | Kikinis et al. | Sep 2009 | A1 |
20090326960 | Breebaat | Dec 2009 | A1 |
20100135510 | Yoo et al. | Jun 2010 | A1 |
20100226500 | Wang | Sep 2010 | A1 |
Number | Date | Country |
---|---|---|
1294782 | May 2001 | CN |
1706100 | Dec 2005 | CN |
101884227 | Nov 2010 | CN |
101884227 | Nov 2010 | CN |
1320281 | Jun 2003 | EP |
1 617 707 | Jan 2006 | EP |
03-115500 | Nov 1991 | JP |
10-164698 | Jun 1998 | JP |
3208529 | Sep 2001 | JP |
2001-352599 | Dec 2001 | JP |
2002-191099 | Jul 2002 | JP |
2002-262385 | Sep 2002 | JP |
3686989 | Jun 2005 | JP |
WO 9820709 | May 1998 | WO |
WO 9914983 | Mar 1999 | WO |
WO 2005048653 | May 2005 | WO |
WO 2007033150 | Mar 2007 | WO |
WO 2007033150 | Mar 2007 | WO |
WO 2008035272 | Mar 2008 | WO |
WO 2008035275 | Mar 2008 | WO |
WO 2007123788 | Apr 2008 | WO |
WO 2008084436 | Jul 2008 | WO |
Entry |
---|
Office Action issued in Japanese application No. 2009-504224 on Oct. 4, 2011. |
Canadian Office Action, re Canadian Application No. 2,604,210, dated Aug. 21, 2013. |
Canadian Office Action, re Canadian Application No. 2,621,175, dated Aug. 7, 2013. |
Chinese Office Action Re CN Application No. 200780019630.1 on May 3, 2013. |
Chinese Office Action Re CN Application No. 200780019630.1 on Nov. 2, 2012. |
Korean Office Action, re Korean Application No. 10-2008-7024715, dated May 21, 2013. |
Chinese Office Action Re CN Application No. 200780019630.1 on May 4, 2012. |
Korean Office Action, re Korean Application No. 10-2008-7006288, dated Jul. 13, 2012. |
Advanced Multimedia Supplements API for JavaTM2 Micro Edition, JSR-234 Exper Group, May 17, 2005, pp. 1- 200, Appendix, Nokia Corporation. |
Chinese Office Action (Second), re CN Application No. 200680033693.8, dated Dec. 1, 2010. |
Chinese Office Action re Application No. 200780019630.1 on Jun. 15, 2011. |
Chinese Office Action, re CN Application No. 200680033693.8, dated Jul. 24, 2009. |
EPO Exam Report dated Aug. 10, 2010, re EP App. No. 06 814 495.5. |
Japanese Office Action re JP Application No. 2008-531246, dated Jan. 11, 2011. |
Lutfi, Robert A. and Wen Wang, Correlational analysis of acoustic cues for the discrimination of auditory motion, J. Acoustical Society of America, Aug. 1999, vol. 106(2), pp. 919-928, Department of Communicative Disorders and Department of Psychology, University of Wisconsin, Madison. |
MacPherson, E.A. A comparison of spectral correlational and local feature-matching models of pinna cue processing, Journal of the Acoustical Society of America, May 1997, vol. 101, No. 5, p. 3104. |
Moore, Richard F., Elements of Computer Music, 1990, pp. 362-369 and 370-391, Prentice-Hall, Inc. Englewood Cliffs, New Jersey 07632. |
Orfanidis, Sophocles, J. Introduction to Signal Processing, 1996, pp. 168-383, Prentice-Hall, Inc. Upper Saddle River, New Jersey 07458. |
Vodafone Group, Vodafone VFX Specification, Version 1.1.2., Sep. 10, 2004, pp. 1-134, Vodafone House the Connection, Newbury RG14 2FN England. |
Wang, W., and Lutfi, R.A. Thresholds for detection of a change in the displacement, velocity, and acceleration of a synthesized sound-emitting source, Journal of the Acoustical Society of America, vol. 95, No. 5, p. 2897. |
Wrightman, Frederic L. and Kistler, Doris J., Headphone simulation of free-field listening. I: Stimulus synthesis, J. Acoustical Society of America, Feb. 1989, pp. 858-867. |
Wrightman, Frederic L. and Kistler, Doris J., Headphone simulation of free-field listening. II: Psychophysical validation, J. Acoustical Society of America, 85(2), Feb. 1989, pp. 868-878. |
Engdegard et al.: “Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding”, Audio Engineering Society, convention paper, Presented at the 124th Convention, May 17-20, 2008, Amsterdam, The Netherlands, 15 pages. |
Gatzsche et al.: Beyond DCI: The integration of object oriented 3D sound in the Digital Cinema, 25 pages. |
Japanese Office Action re JP Application No. 2008-531246, dated Nov. 22, 2011. |
Potard et al.: “Using XML Schemas to Created and Encode Interactive 3-D Audio Scenes for Multimedia and Virtual Reality Applications”, Whisper Laboratory, University of Wollongong, Australia, 11 pages, 2002. |
European Extended Search Report and Opinion re Ep 07754557.2 dated Mar. 2, 2010. |
Kahrs M, and Brandenbur K., Applications of Digital Signal Processing to Audio and Acoustics, 2003, pp. 85-131. |
PCT International Preliminary Report on Patentability re PCT/US2007/008052 dated Jun. 19, 2009. |
PCT International Search Report and Written Opinion mailed Feb. 20, 2008 regarding International Application No. PCT/US07/08052. |
PCT International Search Report and Written Opinion re PCT/US2006/035446, dated Jan. 19, 2007. |
European Examination Report re EP 07754557.2 dated Jul. 1, 2010. |
Number | Date | Country | |
---|---|---|---|
20120014528 A1 | Jan 2012 | US |
Number | Date | Country | |
---|---|---|---|
60716588 | Sep 2005 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11531624 | Sep 2006 | US |
Child | 13244043 | US |