The present disclosure relates to systems for enhancing audio signals, and more particularly to systems for enhancing sound reproduction over headphones.
Advancements in the recording industry include reproducing sound from a multiple channel sound system, such as reproducing sound from a surround sound system. These advancements have enabled listeners to enjoy enhanced listening experiences, especially through surround sound systems such as 5.1 and 7.1 surround sound systems. Even two-channel stereo systems have provided enhanced listening experiences through the years.
Typically, surround sound or two-channel stereo recordings are recorded and then processed to be reproduced over loudspeakers, which limits the quality of such recordings when reproduced over headphones. For example, stereo recordings are usually meant to be reproduced over loudspeakers, instead of being played back over headphones. This results in the stereo panorama appearing on line in between the ears or inside a listener's head, which can be an unnatural and fatiguing listening experience.
To resolve the issues of reproducing sound over headphones, designers have derived stereo and surround sound enhancement systems for headphones; however, for the most part these enhancement systems have introduced unwanted artifacts such as unwanted coloration, resonance, reverberation, and/or distortion of timbre or sound source angle and/or position.
One or more embodiments of the present disclosure are directed to a method for enhancing reproduction of sound. The method may include receiving an audio input signal at a first audio signal interface and receiving an input indicative of a head rotational angle from a digital gyroscope mounted to a headphone assembly. The method may further include updating at least one binaural rendering filter in each of a pair of parametric head-related transfer function (HRTF) models based on the head rotational angle and transforming the audio input signal to an audio output signal using the at least one binaural rendering filter. The audio output signal may include a left headphone output signal and a right headphone output signal.
According to one or more embodiments, receiving input indicative of a head rotational angle may comprise receiving an angular velocity signal from the digital gyroscope mounted to the headphone assembly and calculating the head rotational angle from the angular velocity signal when the angular velocity signal exceeds a predetermined threshold or is less than the predetermined threshold for less than a predetermined sample count. Alternately, receiving input indicative of a head rotational angle may comprise receiving an angular velocity signal from the digital gyroscope mounted to the headphone assembly and calculating the head rotational angle as a fraction of a previous head rotational angle measurement when the angular velocity signal is less than a predetermined threshold for more than a predetermined sample count.
According to one or more embodiments, the audio input signal is a multi-channel audio input signal. Alternatively, the audio input signal may be a mono-channel audio input signal.
According to one or more embodiments, updating the at least one binaural rendering filter based on the head rotational angle may comprise retrieving parameters for the at least one binaural rendering filter from at least one look-up table based on the head rotational angle. Further, retrieving parameters for the at least one binaural rendering filter from the at least one look-up table based on the head rotational angle may comprise generating a left table pointer index value and a right table pointer index value based on the head rotational angle and retrieving the parameters for the at least one binaural rendering filter from the at least one look-up table based on the left table pointer index value and the right table pointer index value.
According to one or more embodiments, the at least one binaural rendering filter may comprise a shelving filter and a notch filter. Further, updating at least one binaural rendering filter based on the head rotational angle may include updating a gain parameter for each of the shelving filter and the notch filter based on the head rotational angle. The at least one binaural rendering filter may further comprise an inter-aural time delay filter. Moreover, updating at least one binaural rendering filter based on the head rotational angle may comprise updating a delay value for the inter-aural time delay filter based on the head rotational angle.
One or more additional embodiments of the present disclosure relate to a system for enhancing reproduction of sound. The system may comprise a headphone assembly including a headband, a pair of headphones, and a digital gyroscope. The system may further comprise a sound enhancement system (SES) for receiving an audio input signal from an audio source. The SES may be in communication with the digital gyroscope and the pair of headphones. The SES may include a microcontroller unit (MCU) configured to receive an angular velocity signal from the digital gyroscope and to calculate a head rotational angle from the angular velocity signal. The SES may further include a digital signal processor (DSP) in communication with the MCU. The DSP may include a pair of dynamic parametric head-related transfer function (HRTF) models configured to transform the audio input signal to an audio output signal. The pair of dynamic parametric HRTF models may have at least a cross filter, wherein at least one parameter of the cross filter is updated based on the head rotational angle.
According to one or more embodiments, the cross filter may comprise a shelving filter and a notch filter. The at least one parameter of the cross filter may include a shelving filter gain and a notch filter gain. The pair of dynamic parametric HRTF models may further include an inter-aural time delay filter having a delay parameter, wherein the delay parameter is updated based on the head rotational angle.
The MCU may also be configured to calculate a table pointer index value based on the head rotational angle. Moreover, the at least one parameter of the cross filter may be updated using a look-up table according to the table pointer index value. The MCU may be further configured to calculate the head rotational angle from the angular velocity signal when the angular velocity signal exceeds a predetermined threshold or is less than the predetermined threshold for less than a predetermined sample count. The MCU may also be further configured to gradually decrease the head rotational angle when the angular velocity signal is less than a predetermined threshold for more than a predetermined sample count.
One or more additional embodiments of the present disclosure relate to a sound enhancement system (SES) comprising a processor, a distance renderer module, a binaural rendering module, and an equalization module. The distance renderer module may be executable by the processor to receive at least a left-channel audio input signal and a right-channel audio input signal from an audio source. The distance renderer module may be further executable by the processor to generate at least a delayed image of the left-channel audio input signal and the right-channel audio input signal.
The binaural rendering module, executable by the processor, may be in communication with the distance renderer module. The binaural rendering module may include at least one pair of dynamic parametric head-related transfer function (HRTF) models configured to transform the delayed image of the left-channel audio input signal and the right-channel audio input signal to a left headphone output signal and a right headphone output signal. The pair of dynamic parametric HRTF models may have a shelving filter, a notch filter and an inter-aural time delay filter. At least one parameter from each of the shelving filter, the notch filter and the time delay filter may be updated based on a head rotational angle.
The equalization module, executable by the processor, may be in communication with the binaural rendering module. The equalization module may include a fixed pair of equalization filters configured to equalize the left headphone output signal and the right headphone output signal to provide a left equalized headphone output signal and a right equalized headphone output signal.
According to one or more embodiments, a gain parameter for each of the shelving filter and the notch filter may be updated based on the head rotational angle. Further, a delay value for the time delay filter may be updated based on the head rotational angle.
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention.
With reference to
The SES 110 can enhance reproduction of sound emitted by the headphones 118. The SES 110 improves sound reproduction by simulating a desired sound system without including unwanted artifacts typically associated with simulations of sound systems. The SES 110 facilitates such improvements by transforming sound system outputs through a set of one or more sum and/or cross filters, where such filters have been derived from a database of known direct and indirect head-related transfer functions (HRTFs), also known as ipsilateral and contralateral HRTFs, respectively. A head-related transfer function is a response that characterizes how an ear receives a sound from a point in space. A pair of HRTFs for two ears can be used to synthesize a binaural sound that seems to come from a particular point in space. For instance, the HRTFs may be designed to render sound sources in front of a listener at ±45 degrees.
In headphone implementations, eventually the audio output signal 115 of the SES 110 are direct and indirect HRTFs, and the SES 110 can transform any mono- or multi-channel audio input signal into a two-channel signal, such as a signal for the direct and indirect HRTFs. Also, this output can maintain stereo or surround sound enhancements and limit unwanted artifacts. For example, the SES 110 can transform an audio input signal, such as a signal for a 5.1 or 7.1 surround sound system, to a signal for headphones or another type of two-channel system. Further, the SES 110 can perform such a transformation while maintaining the enhancements of 5.1 or 7.1 surround sound and limiting unwanted amounts of artifacts.
The sound waves 124, if measured at the user 126, are representative of a respective direct HRTF and indirect HRTF produced by the SES 110. For the most part, the user 126 receives the sound waves 124 at each respective ear 122 by way of the headphones 118. The respective direct and indirect HRTFs that are produced from the SES 110 are specifically a result of one or more sum and/or cross filters of the SES 110, where the one or more sum and/or cross filters are derived from known direct and indirect HRTFs. These sum and/or cross filters, along with inter-aural delay filters, may be collectively referred to as binaural rendering filters.
The headphone assembly 112 may also include a sensor 130, such as a digital gyroscope. The sensor 30 may be mounted on top of the headband 116, as shown in
The SES 110 may include a plurality of modules. The term “module” may be defined to include a plurality of executable modules. As described herein, the modules are defined to include software, hardware or some combination of hardware and software that is executable by a processor, such as a digital signal processor (DSP). Software modules may include instructions stored in memory that are executable by the processor or another processor. Hardware modules may include various devices, components, circuits, gates, circuit boards, and the like that are executable, directed, and/or controlled for performance by the processor.
According to one or more embodiments, the pair of HRTFs 234 may also be dynamically updated in response to the head rotational angle u(i), where i=sampled time index. In order to dynamically update the pair of HRTFs, the SES 110 may also include the sensor 130, which may be a digital gyroscope 230 as shown in
The SES 110 may further include a microcontroller unit (MCU) 236 to process the angular velocity signal v(i) from the digital gyroscope 230. The MCU 236 may contain software to post process the raw velocity data received from the digital gyroscope 230. The MCU 236 may further provide a sample of the head rotational angle u(i) at each time instant i based on the post-processed velocity data extracted from the angular velocity signal v(i).
Referring to
The binaural rendering module 300 may include a left-channel head-related filter (HRTF) 320 and a right-channel head-related filter (HRTF) 322, according to one or more embodiments. Each HRTF filter 320, 322 may include an inter-aural cross function (Hcfront) 324, 326 and an inter-aural time delay (Tfront 328, 330, respectively, corresponding to frontal sound sources, thereby emulating a pair of loudspeakers in front of the listener (e.g., at ±30° or ±45° relative to the listener). In other embodiments, the binaural rendering module 300 also includes HRTFs that correspond to side and rear sound sources. The design of the binaural rendering module 300 is described in detail in U.S. application Ser. No. 13/419,806 to Horbach, filed Mar. 14, 2012, and published as U.S. Patent Appl. Pub. No. 2013/0243200 A1, which is incorporated by reference in its entirety herein.
The signal flow in
In the static case of fixed rendering at an angle 45 degrees relative to the listener, the parameters as set forth in U.S. application Ser. No. 13/419,806 may be:
Shelving filter: Q=0.7, f=2500 Hz, α=−14 dB;
Notch filter: Q=1.7, f=1300 Hz, α=−10 dB; and
Delay value: 17 samples.
In the dynamic case, according to one or more embodiments, the range of head movements may be limited to ±45 degrees in order to reduce complexity. For example, moving the head towards a source at 45 degrees will lower the required rendering angle from 45 degrees down to 0 degrees, while moving the head away from the source will increase the angle up to 90 degrees. Beyond these angles, the binaural rendering filters may stay at their extreme positions, either 0 degrees or 90 degrees. This limitation is acceptable because the main purpose of head tracking according to one or more embodiments of the present disclosure is to process small, spontaneous head movements, thereby providing a better out-of-head localization.
As shown in
The head rotational angle u(i), once determined, may be used to generate a left table pointer index (index_left) and a right table pointer index (index_right). The left and right table pointer index values may then be used to retrieve the shelving, notch, and delay filter parameters from the respective filter look-up tables. For a steering angle u=−44.5 . . . +45 and angular resolution of 0.5 degrees, the left and right table pointer indices are:
index_left=round[2*(u+45)] Eq. 1
index_right=181−index_left Eq. 2
Accordingly, if the head moves towards a left source, it moves away from a right source, and vice versa.
Similarly, the notch filter 336, 338 may be steered by its gain parameter “α” only, as shown in
The delay filter values may be steered by the variable delay table 344 between 0 and 34 samples, using a mapping as shown in
With respect to the distance and location rendering, the binaural model of the module 704 provides directional information, but sound sources may still appear very close to the head of a listener. This may especially be the case if there is not much information with respect to the location of the sound source (e.g., dry recordings are typically perceived as being very close to the head or even inside the head of a listener). The distance renderer module 702 may limit such unwanted artifacts. The distance renderer module 702 may include two delay lines, one per each of the initial left and right-channel audio input signals, Lin, Rin, respectively. In other embodiments of the SES, one or more than two tapped delay lines can be used. For example, six tapped delay lines may be used for a 6-channel surround signal.
By means of long, tapped delay lines, delayed images of the left- and right-channel audio input signals L, R may be generated and fed to simulated sources around the head, located at ±90 degrees (left surround, LS, and right surround, RS) and ±135 degrees (left rear surround, LRS, and right rear surround, RRS), respectively. Accordingly, the distance renderer module 702 may provide six outputs, representing the left- and right-channel input signals L, R, left and right surround signals LS, RS, and left and right rear surround signals LRS, RRS.
The binaural rendering module 704 may include a dynamic, parametric HRTF model 708 for rendering sound sources in front of a listener at ±45 degrees. Additionally, the parametric binaural rendering module 704 may include additional surround HRTFs 710, 712 for rendering the simulated sound sources at ±90 degrees and ±135 degrees. Alternatively, one or more embodiments of the SES 110 could employ other HRTFs for sources that have other source angles, such as 80 degrees and 145 degrees. These surround HRTFs 710, 712 may simulate a room environment with discrete reflections, which results in sound images perceived farther away from the head (distance rendering). The reflections, however, do not necessarily need to be steered by the head rotational angle u(i). Both options, static and dynamic, are possible, as illustrated in
Further,
After calibration, the head rotational angles u(i) may be generated in a loop by accumulating the elements of the velocity vector from the angular velocity signal v(i), according to the following equation, as shown at step 830:
u(i)=u(i−1)+v(i) Eq. 3
According to one or more embodiments, the loop may contain a threshold detector, which compares the absolute values of the angular velocity signal v(i) with a predetermined threshold, THR. Thus, at step 840, the MCU 236 may determine whether the absolute value of v(i) is greater than the threshold, THR.
If the absolute values of the angular velocity signal v(i) are below the threshold for a contiguous number of samples (e.g., a sample count exceeds a predetermined limit), then the MCU 236 may assume the sensor in the digital gyroscope 230 is not in motion. Thus, if the result of step 840 is NO, the method may proceed to step 850. At step 850, a sample counter (cnt) may be incremented by 1. At step 860, the MCU 236 may determine whether the sample counter exceeds a predetermined limit representing the contiguous number of samples. If the condition at step 860 is met, the head rotational angle u(i) may be gradually ramped down to zero at step 870 by the following equation:
u(i)=a*u(i−1), where a<1 (e.g., a=0.995) Eq. 4
This causes the SES 110 to automatically move the acoustic image back to its normal position in front of the head of the headphone user 126, thereby ignoring any remaining long-term drift of the sensor in the digital gyroscope 230. According to one or more embodiments, the hold time (defined by the limit counter) and the decay time may be in the order of a few seconds.
The head rotational angle u(i) resulting from step 870 may be output at step 880. If, on the other hand, the condition at step 860 is not met, the method may proceed directly to step 880, where the head rotational angle u(i) calculated at step 830 may be output.
Returning to step 840, if the absolute value of the angular velocity signal v(i) is above the threshold (THR), the MCU 236 may determine that the sensor in the digital gyroscope 230 is in motion. Accordingly, if the result at step 840 is YES, then the method may proceed to step 890. At step 890, the MCU 236 may reset the sample counter (cnt) to zero. The method may then proceed to step 880, where the head rotational angle u(i) calculated at step 830 may be output. Therefore, whether the headphone assembly 112 is determined to be in motion or not, the head rotational angle u(i) ultimately may be output at step 880 or otherwise used for updating the parameters of the shelving filters 332, 334, the notch filters 336, 338, and the delay filters 328, 330.
With reference now to
At step 910, the SES may receive audio input signals at the audio signal interface 231, which may be fed to the DSP 232. As explained with respect to
According to one or more embodiments, only the gain parameter “α” of the shelving and notch filters may vary with a change in the left and right table pointer index values. Further, only the number of samples taken by the delay filter may vary with a change in the left and right table pointer index values. According to one or more alternative embodiments, other filter parameters, such as the quality factor “Q” or the shelving/notch frequency “f,” may also vary with a change in the left and right table pointer index values.
Once the shelving, notch, and delay filter parameters are retrieved from their look-up tables, the DSP 232 may update the respective shelving filter 332, 334, notch filter 3346, 338, and delay filter 328, 330 for the dynamic, parametric HRTFs 320, 322 of the binaural rendering module 300 at step 950. At step 960, the DSP 232 may transform the audio input signal 113 received from the audio source 114 using the updated HRTFs to an audio output signal including a left headphone output signal LH and a right headphone output signal RH. Updating these binaural rendering filters 310 in response to head rotation results in stereo image that remains stable while turning the head. This provides an important directional cue to the brain, indicating that the sound image is located in front or in the back. As a result, so-called “front-back confusion” may be eliminated.
While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.