The invention relates to a listening device such as but not limited to headphones, and an accompanying signal processing method for use in, but not limited to, binaural 3-D audio reproduction.
Conventionally, the binaural (or hearing with two ears) 3D audio reproduction system uses a pair of headphones to reproduce the binaurally recorded or synthesized sound so that a listener can perceive sound images coming from certain locations, such as front, rear, up, above, near, and far in 3D space surrounding the listener. However, there are limitations in the conventional headphone system, which prevents the listener from accurately perceiving 3D audio.
Firstly, Møller [1] reasoned that the headphone coupling characteristics were not the same as the characteristics of free field sound sources.
Secondly, there are shape and size variations in human heads and ears—no two people have the same ear shape. Therefore, a binaural sound captured with a dummy head or synthesized using a generic set of Head-Related Transfer Functions (HRTFs), a set of sound source measurements in a 3D space surrounding the listener, will be perceived differently by different people. To overcome this issue, either individualized recording or individualized HRTFs for binaural synthesis are required, which are both tedious to perform.
Thirdly, it is well known that headphone listening causes sound to be perceived as coming from inside the head (far and near sound are perceived to be the same)—there is a tendency for sound image to be perceived from the rear for frontal sound cues, thus causing front/back confusion.
There are a number of improved 3D-audio enhanced headphones [2-6] that are designed with multiple sound emitters and off-positioned sound emitters in existing surround headphones. However, although such headphones have different sound emitters positioned at different locations in the ear, all sound emitters are positioned directing sound in parallel directions towards the opening of the ear entrance, as illustrated in
In general terms the invention proposes that a given ear is provided with two sets of sound emitters: at least one first sound emitter which directs sound against a wall portion of the concha (the part of the concha which extends outwardly from the head), and a second sound emitter which directs sound at the pinna from a different direction.
Specifically, in an aspect of the invention, there is provided a listening device for wearing by a user, comprising:
Typically the one or more first sound emitters emit sound in a direction substantially perpendicular to the axis of the ear canal. Typically at least one first sound emitter is positioned to the anterior of the ear when worn by the user.
Advantageously the individualized surface in the concha creates an individualized sound reflection that has been found to enhance binaural listening. This new positioning of sound emitters also results in externalization of sound source, with better frontal sound image.
In one embodiment at least one second sound emitter is positioned behind the pinna of said one or both ears when the listening device is worn by the user. Typically the second sound emitter(s) behind the ear are vibration exciters for generating low frequencies. Typically at least one second sound emitter is positioned to the posterior of the ear when worn by the user.
Advantageously if the first sound emitter(s) has a reduced low-frequency transmission compared to the conventional headphones, sound emitters (rear vibrating emitters) can be placed behind the pinna to create dynamic bass as well as a sense of sound proximity, thereby overcoming the deficiency. The bandwidth of the first sound emitters may be broadband and generate frequencies up to 20 KHz, and the bandwidth of the rear vibrating emitters (i.e. sound emitters that are placed behind the pinna) is frequencies up to about 500 Hz
In a further embodiment at least one second sound emitter is positioned such that sound is directed towards to ear canal of the corresponding ear when the listening device is worn by the user. Typically at least one second sound emitter emits sound in a direction substantially parallel to the axis of the ear canal. Advantageously if the first and second sound emitters are large enough to produce low frequencies, sound emitters behind the pinna are not required, resulting in a simplified design of the listening device. The bandwidth of the first and second sound emitters may again be broadband and generate frequencies up to 20 KHz.
In one embodiment the support structure includes two earcups (one for each of the user's ears), each earcup enclosing the corresponding sound emitters.
In one embodiment the listening device includes left and right sides corresponding to the user's ears, and the support structure includes an over-the-head headband or behind the head loop connecting said left and right sides.
In one embodiment the support structure includes a spectacles/glasses structure in which the sound emitters are embedded.
In a further aspect of the invention, there is provided a method of processing signals for a listening device worn by a user, comprising the steps of:
The first ear speakers may actually generate sound propagating in a range of directions (i.e. spanning a range of angles), and if so, the angles of 60, 70 and 90 degrees mentioned above refer to the angle between the axis of the ear canal and the central direction in the range of directions.
In one embodiment at least some of the sound signals are delivered to a second sound emitter positioned behind the pinna of said one or both ears.
In a further embodiment at least some of the sound signals are delivered to a second sound emitter which emits sound in a direction parallel the ear canal of the user.
In one embodiment the cues are processed via convolution with a set of head related impulse responses.
In one embodiment the cues are processed with a filterbank structure and/or adjustable gain.
In one embodiment the cues are processed to separate the frontal and side signals from the audio input, by computing the correlation and time differences between the left and right signals. Typically highly correlated signals with small time differences are delivered to the first sound emitters.
It will be convenient to further describe the present invention with respect to the accompanying drawings that illustrate possible arrangements of the invention. Other arrangements of the invention are possible, and consequently the particularity of the accompanying drawings is not to be understood as superseding the generality of the preceding description of the invention.
By making use of different reflections around the listener's concha area, the 3D sound perception is enhanced. The primary headphone driver 116 (first sound emitter) is positioned near the tragus 8 and points towards the wall portion of the concha 6 (sound signal arrives at an angle of 0°), instead of the normal headphones' position that directs sound perpendicular to the overall plane of the pinna 4. The sound generated by the headphone driver 116 propagates in a direction which is substantially horizontal, and substantially perpendicular to the axis of the ear canal. The headphone driver 116 projects sound waves towards the wall portion of the concha 6, and causes concha reflection. This approach enhances 3D sound perception through individualized cues produced from an individual's concha shape, size, and depth. Through measurement and subjective listening tests, improved sound externalization and front sound image can be achieved. However, the new position of the headphone driver 116 can greatly reduce the bass frequency response, and therefore vibration exciters 118 are used to compensate for the loss of low frequency.
The vibration exciters 118 (second sound emitters) are interfaced with foam or membrane 122 to transmit the vibration to the pinna (or outer ear) 4. A cable 124 is provided to transmit the signals to the sound emitters.
Advantageously, by combining concha-wall-directed exciters and vibration exciters in a single headphones unit in different configurations, more immersive and realistic 3D-audio playback can be created for use in connection with today's 3D media applications, such as 3D TV.
In more detail, the advantages include:
1. Individualized HRTF cues are produced using the unique shape of the human ear. These individualized HRTF cues result in better accuracy in perceiving sound source location, especially for frontal sound sources in a 3D audio headphone reproduction.
2. Reduction of the rear sound source biasness or front-back sound source confusion through the use of concha-excited driver. This driver configuration also improves on the externalization of the sound source, and reduces the in-the-head experience (near and far sounds are perceived to be the same).
3. The vibrating exciters placed at strategic positions around the ear add deeper and thumping bass effect, and enhance the low-frequency perception.
4. The vibrating exciters also add a sense of proximity (sound source close to the ear) to give the effect of someone speaking/whispering close to your ear. This feature can greatly enhance gaming effects.
With reference to
The device 220 can be worn on the head with the help of an over-the-head headband 228 or behind the head loop connecting the right and left side of the headphone, or embedded in a spectacles/glasses structure. Signals can be carried via a cable 224 or the device can be wireless. These different structures can potentially create many different types of headphones' design that can be applied to gaming, 3D-TV, and other interactive media applications.
In order to achieve a good 3D sound source positional effect on the new headphone structure, proprietary audio signal processing and sound distribution algorithms are implemented, as illustrated in
The algorithm, called the ambience and effect extraction based on human ear (ACEHE), performs the required effect and ambience extraction from stereo or surround sound audio signals. These extracted effect and ambience contents are then channelled into the concha and vibration exciters in the listening device for the optimal audio experience.
The extracted ambience and effect contents are further enhanced by signal processing algorithms, such as convolution with a set of head related impulse response (HRIR) to improve the 3D sound perception and deconvolution to improve sound externalization.
Furthermore, a combined front-back biasness circuit and headphone equalizer based on a filterbank structure and adjustable gains G1, G2, . . . GN (each gain varies from 0 to 1) are also implemented in the signal processing unit. In addition, a low pass filter is included to produce the signal for the vibration exciter. A specially designed concha exciter driving circuit is used to drive the concha exciters of the 3D headphone.
With reference to
Thus instead of placing several concha exciters and vibration exciters in respective front and rear sections of the earcup of a circumaural headphone, a single frontal emitter 316 can be used together with the side firing emitter 330 found in conventional headphones. Using a sufficient large frontal exciter to replace the smaller concha exciters, the problem of positioning of the concha exciters is avoided. Also a sufficient large frontal emitter, as well as the side firing emitter, are capable of producing low frequencies. Therefore, the vibration exciters can be avoided in this embodiment to reduce cost and power consumption. However, the vibration exciters can optionally be included to provide proximity sensation in gaming.
The algorithm of this embodiment may implement in several ways. One possible approach, also simplified, is as illustrated in
The main processing blocks of the signal processing technique is illustrated in
The processing blocks accept audio signals in different audio formats, namely binaural recording, 2 channel stereo sound, multichannel surround (5.1 format), and also the low frequency enhancement (LFE) signal. This flexibility allows signals from different sources (gaming, movie, and other digital media) to be processed and distributed to different emitters.
A two-stage approach is used. First, the multi-format signals are converted to a 2-channel format either using binaural synthesis (with HRTF or virtual surround) or through surround to stereo downmixing. Binaural recording and LFE need not go through this processing. The second stage involves special signal processing techniques before distributing to the various headphone drivers and vibrating emitters.
First Stage: Conversion to 2-Channel Format
The first stage applies necessary conversion from stereo and multichannel surround signals to a 2-channel format signal. Two possible conversion techniques include:
1. Binaural Synthesis or Virtual Surround
This conversion process applies HRTF filtering on the number of input channel, which correspond to the location of the virtual loudspeakers, to simulate a binaural signal. It accepts stereo and 5-channel surround signals. For stereo signals, only the L and R signals are inputted to the processing block.
The HRIR filter coefficients are obtained from an open source of HRTF database (128 taps). The virtual positions of loudspeakers are set at 0° for the center channel (C), ±40° for the left (L) and right (R) channel, and ±140° for the surround channels. In the ITU-R BS 775.2 standard, the recommended loudspeaker placement angle for the 5.1 surround setup is at 0° for the center channel (C), ±30° for the left (L) and right (R) channel, and ±110° for the surround channels. In this processing, ±40° is chosen instead of ±30° to increase the perceived width of the sound stage; ±140° is chosen instead of ±110° to improve the rear imaging. A complete diagram for creating a virtual surround is shown in
2. Left-Only Right-Only (LO-RO) Downmixing
This conversion process is a computationally simpler alternative to the binaural synthesis shown in
Second Stage: Enhancement and Distribution
Next, different processing techniques are applied to the pair of normalized signals to enhance the perceived auditory image send to the different pairs of emitters. In particular, frontal-biasing filters are applied to the 2-channel signal to enhance frontal auditory image in the concha emitters, and rear-biasing filters are applied to the vibrating emitters to enhance low-frequency and intimacy effect. The front and rear biasing filters enhance the perceived frontal and rear positioning of the sound image. The filters are based on Jens Blauert's subjective experiments on directional bands that affect frontal and rear perception. One possibility is as follows. There may be a five frequency filterbank with a frequency response as stated in Table 1. The filter is designed using the Filter Design and Analysis Tool (FDATool) in Matlab. A least square design method is chosen due to its reduced ripples in the pass band compared to the equiripple design method. The frequency responses for the frontal-biased filter (in solid line) and the rear-biased filter (in dash line) are plotted in
The signals for the vibrating emitters can be extracted from the 2-channel signals or directly from the low-frequency effect (LFE) signal from 5.1 surround sound format. A lowpass filter based on the 2nd order Butterworth infinite impulse response (IIR) filter with a cut-off frequency at 450 Hz is used to extract low-frequency content from the source. This cut-off frequency has been found to provide a good intimate/close effect. The levels of both low pass filtered and LFE signals are controlled manually to achieve the desired effect.
It will be appreciated by persons skilled in the art that the present invention may also include further additional modifications made to the device which does not affect the overall functioning of the device.
This Application claims priority to International Patent Application No. PCT/SG2012/000116, filed Apr. 2, 2012, entitled “Listening Device and Accompanying Signal Processing Method”, which claims priority to U.S. Provisional Patent Application No. 61/470,135, filed Mar. 31, 2011, which is incorporated herein by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SG2012/000116 | 4/2/2012 | WO | 00 | 9/30/2013 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2012/134399 | 10/4/2012 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
3796840 | Ohta | Mar 1974 | A |
3984885 | Yoshimura et al. | Oct 1976 | A |
6356644 | Pollak | Mar 2002 | B1 |
6434250 | Tsuhako | Aug 2002 | B1 |
6603863 | Nagayoshi | Aug 2003 | B1 |
7068803 | Kuhlmann et al. | Jun 2006 | B2 |
7155025 | Weffer | Dec 2006 | B1 |
8000490 | Yang | Aug 2011 | B2 |
20020122562 | Brennan | Sep 2002 | A1 |
20040196991 | Iida et al. | Oct 2004 | A1 |
20040264727 | Kim | Dec 2004 | A1 |
20080147763 | Levin | Jun 2008 | A1 |
20090093896 | Kobayashi | Apr 2009 | A1 |
20090161895 | Hruza | Jun 2009 | A1 |
20090175473 | Wong | Jul 2009 | A1 |
20090252361 | van Halteren | Oct 2009 | A1 |
20110038484 | Macours | Feb 2011 | A1 |
Entry |
---|
Konig, “A New Supra-Aural Dynamic Headphone System for In-Front Localization and Surround Reprodcution of Sound”, presented at the 102nd AES Convention, Munich, Germany, Mar. 1997. |
Moller et al., “Transfer Characteristics of Headphones Measured on Human Ears,” J. Audio Eng. Soc., vol. 43, Apr. 1995, pp. 203-217. |
Tan et al., “Direct Concha Excitation for the Introduction of Individualized Hearing Cues”, J. Audio Eng. Soc., vol. 48, Jul./Aug. 2000, pp. 642-653. |
Number | Date | Country | |
---|---|---|---|
20140153765 A1 | Jun 2014 | US |
Number | Date | Country | |
---|---|---|---|
61470135 | Mar 2011 | US |