The present application relates to apparatus and methods for audio transducer implementation enhancements, but not exclusively for audio transducer implementations enhancements for head mounted units and headphones related to spatial aspects.
ANC (Active Noise Cancellation) and pass-through/transparency features are becoming more commonly implemented within a range of devices. For example ANC and pass through applications can be implemented within usual devices such as headphones. Furthermore ANC and passthrough implementations can be employed within a vehicle such as a car or within apparel such as motorcycle helmets, personal protection equipment (or PPE). ANC actively (using electronics, microphones and speaker elements) attenuates sounds from external sound sources for the user. A pass through/transparency mode in turn actively plays back external sound sources to the user so that the user can hear their surroundings (for example headphone users could hear cars around them and hear and talk to other people present in the same space). Ideally, in transparency mode the user would hear their surroundings clearly (for example for the headphone user as if they were not wearing headphones).
Many devices allow a user to select how much they hear external sounds and thus the device can gradually alternate between ANC and transparency modes.
Transparency mode in headphones does not produce a pure pass through signal when employing the audio signals from outer or external microphones as the generated transparency signal would not be the same as the audio signals experienced by the user because the outer microphones are not located in the user's ear canal. Additionally, differences between the generated pass through audio signals and the ‘experienced audio signals’ can be caused by the inner speakers or transducers are not located in the same place as the outer microphones and furthermore there is a filtering effect generated by the physical design of the device (which have a shape, volume, and weight that affects the audio signals).
For example where the device is a set of headphones the outer microphones are not located in the headphone user's ear canal. Furthermore in a car example device the microphones are mounted outside the vehicle and the speakers are either in their normal positions (in the doors, dashboard etc) or the speakers are in the driver's seat.
There is provided according to a first aspect a method for generating audio signals for a device equipped with a transparency mode, the method comprising: obtaining at least two external audio signals from at least two microphones located on the device; determining at least one of sound direction or diffuseness or distance between a first microphone and a second microphone of the at least two microphones based on the at least two external audio signals; modifying at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance; and rendering the at least one modified audio signal.
The at least two microphones located on the device may be located on one side of the device.
The at least two microphones located on the device may be located on opposite sides of the device.
The method may further comprise obtaining at least one internal audio signal, and modifying at least one of the at least two external audio signals is further based on the at least one internal audio signal.
The method may further comprise modifying the at least one of the at least two external audio signals based on a frequency profile to modify the at least two external audio signals more in lower frequencies.
The method may further comprise determining a distance between at least one of the at least two microphones located on the device and an associated speaker, wherein modifying at least one of the at least two external audio signals may further based on the distance between at least one of the at least two microphones located on the device and an associated speaker.
The device may comprise one of: a smartphone; a headphone; a vehicle equipped with the at least two microphones; a helmet equipped with the at least two microphones; and a personal protection equipment equipped with the at least two microphones.
The first microphone and the second microphone of the at least two microphones may be left and right side device microphones.
The first microphone may be a first set of microphones and a second microphone may be a second set of microphones.
Modifying at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance may be such that the modified audio signal direction or diffuseness are more correctly perceived by a user of the device.
According to a second aspect there is provided an apparatus for generating audio signals for a device equipped with a transparency mode, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: obtaining at least two external audio signals from at least two microphones located on the device; determining at least one of sound direction or diffuseness or distance between a first microphone and a second microphone of the at least two microphones based on the at least two external audio signals; modifying at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance; and rendering the at least one modified audio signal.
The at least two microphones located on the device may be located on one side of the device.
The at least two microphones located on the device may be located on opposite sides of the device.
The apparatus may be caused to perform obtaining at least one internal audio signal, and the apparatus caused to perform modifying at least one of the at least two external audio signals may be further caused to perform modifying the at least one of the at least two external audio signals based on the at least one internal audio signal.
The apparatus may be further caused to perform modifying the at least one of the at least two external audio signals based on a frequency profile to modify the at least two external audio signals more in lower frequencies.
The apparatus may be further caused to perform determining a distance between at least one of the at least two microphones located on the device and an associated speaker, wherein the apparatus caused to perform modifying at least one of the at least two external audio signals may be further caused to perform modifying at least one of the at least two external audio signals based on the distance between at least one of the at least two microphones located on the device and an associated speaker.
The device may comprise one of: a smartphone; a headphone; a vehicle equipped with the at least two microphones; a helmet equipped with the at least two microphones; and a personal protection equipment equipped with the at least two microphones.
The apparatus may be integral to the device.
The device may comprise the apparatus.
The first microphone and the second microphone of the at least two microphones may be left and right side device microphones.
The first microphone may be a first set of microphones and a second microphone may be a second set of microphones.
The apparatus caused to perform modifying at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance may be such that the modified audio signal direction or diffuseness are more correctly perceived by a user of the device.
According to a third aspect there is provided an apparatus for generating audio signals for a device equipped with a transparency mode, the apparatus comprising means configured to: obtain at least two external audio signals from at least two microphones located on the device; determine at least one of sound direction or diffuseness or distance between a first microphone and a second microphone of the at least two microphones based on the at least two external audio signals; modify at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance; and render the at least one modified audio signal.
The at least two microphones located on the device may be located on one side of the device.
The at least two microphones located on the device may be located on opposite sides of the device.
The means may be configured to obtain at least one internal audio signal, and the means configured to modify at least one of the at least two external audio signals may be further configured to modify the at least one of the at least two external audio signals based on the at least one internal audio signal.
The means may be further configured to modify the at least one of the at least two external audio signals based on a frequency profile to modify the at least two external audio signals more in lower frequencies.
The means may be further configured to determine a distance between at least one of the at least two microphones located on the device and an associated speaker, wherein the means configured to modify at least one of the at least two external audio signals may further be configured to modify at least one of the at least two external audio signals based on the distance between at least one of the at least two microphones located on the device and an associated speaker.
The device may comprise one of: a smartphone; a headphone; a vehicle equipped with the at least two microphones; a helmet equipped with the at least two microphones; and a personal protection equipment equipped with the at least two microphones.
The apparatus may be integral to the device.
The device may comprise the apparatus.
The first microphone and the second microphone of the at least two microphones may be left and right side device microphones.
The first microphone may be a first set of microphones and a second microphone may be a second set of microphones.
The means configured to modify at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance may be such that the modified audio signal direction or diffuseness are more correctly perceived by a user of the device.
According to a fourth aspect there is provided an apparatus for generating audio signals for a device equipped with a transparency mode, the apparatus comprising: obtaining circuitry configured to obtain at least two external audio signals from at least two microphones located on the device; determining circuitry configured to determine at least one of sound direction or diffuseness or distance between a first microphone and a second microphone of the at least two microphones based on the at least two external audio signals; modifying circuitry configured to modify at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance; and rendering circuitry configured to render the at least one modified audio signal.
According to a fifth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising instructions] for causing an apparatus, for generating audio signals for a device equipped with a transparency mode, the apparatus caused to perform at least the following: obtaining at least two external audio signals from at least two microphones located on the device; determining at least one of sound direction or diffuseness or distance between a first microphone and a second microphone of the at least two microphones based on the at least two external audio signals; modifying at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance; and rendering the at least one modified audio signal.
According to a sixth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus, for generating audio signals for a device equipped with a transparency mode, to perform at least the following: obtaining at least two external audio signals from at least two microphones located on the device; determining at least one of sound direction or diffuseness or distance between a first microphone and a second microphone of the at least two microphones based on the at least two external audio signals; modifying at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance; and rendering the at least one modified audio signal.
According to a seventh aspect there is provided an apparatus, for generating audio signals for a device equipped with a transparency mode, comprising: means for obtaining at least two external audio signals from at least two microphones located on the device; means for determining at least one of sound direction or diffuseness or distance between a first microphone and a second microphone of the at least two microphones based on the at least two external audio signals; means for modifying at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance; and means for rendering the at least one modified audio signal.
According to an eighth aspect there is provided a computer readable medium comprising instructions for causing an apparatus, for generating audio signals for a device equipped with a transparency mode, to perform at least the following: obtaining at least two external audio signals from at least two microphones located on the device; determining at least one of sound direction or diffuseness or distance between a first microphone and a second microphone of the at least two microphones based on the at least two external audio signals; modifying at least one of the at least two external audio signals based on the determined at least one of sound direction or diffuseness or distance; and rendering the at least one modified audio signal.
An apparatus comprising means for performing the actions of the method as described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An electronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein.
Embodiments of the present application aim to address problems associated with the state of the art.
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
As discussed previously current transparency or passthrough modes for suitable devices equipped with ANC functionality produces transparency audio signals which differ from the ideal or expected audio signals. This difference is because of the relative differences between the external or outside microphones and the internal transducers or speakers and further the filtering aspects of the headphones themselves. Additionally, the quality between ideal and generated transparency audio signals where the transparency method considers audio signals from only one of the two microphones.
In addition, when listening to loud music volumes, the natural representation of the surrounding audio signals in a transparency mode may not be enough to enable the user to accurately detect all of the sound sources around them.
In order to render spatial audio, there are several characteristics that need to be implemented correctly. In particular, the characteristics that should be implemented correctly to produce good quality audio signals are direction and diffuseness. Direction is important for safety reasons to be able to hear, for example, obstacles or dangers such as car directions and diffuseness also provides the user valuable clues to sound object distances and intelligibility of speech.
Upcoming Immersive Voice and Audio Services (IVAS) standard and Immersive Voice applications are configured to provide immersive audio communications. This type of communication system is more immersive meaning that the mixing of far-end ambience sounds and local ambience environment can result in a confusing output audio signal. Increasing the clarity of local spatial audio environment is therefore a current research topic being investigated.
In the following examples the device shown is a headphone device, however it would be appreciated that the same methods and apparatus for implementing embodiments can be applied to other devices.
A first example of the difference between real and ideal microphone positions and their effect with respect to transparency modes can be shown with respect to
A further example of the difference between real and ideal microphone positions and their effect with respect to transparency modes can be shown with respect to
Furthermore
This effect is further demonstrated by the graphs in
In other words, since the microphones are not located where the ear canal would be, the head has little effect on the microphone signals and the signal as shown in the graph in
As indicated above a similar issue would occur for other devices. For example the in a vehicle situation the microphones can be mounted outside the vehicle and the speakers are located in conventional positions (in doors, dashboard etc) or as a soundbar configuration or located within a seat of the user (such as the driver). Similarly the motorcycle helmet or head mounted PPE protection implementation can be one in which the microphones are mounted on the exterior but the microphones mounted inside the helmet/PPE.
In some embodiments there is provided a headphone (or suitable apparatus and methods) that has a transparency mode where microphone signals are compared to estimate sound direction and/or diffuseness and/or distance of between left and right earcups/earspeakers (or more generally microphone and speaker positions) and a transparency signal is modified so that the direction and diffuseness are more correctly perceived by the user. In some embodiments a (headphone) transparency signal is modified more in the low frequencies.
In some embodiments, there is also provided suitable apparatus (for example headphones, vehicles or apparel) and methods that have a transparency mode where microphone signals from both microphone positions (for example the earcups in headphones) are compared to estimate sound direction and/or diffuseness and transparency signal is modified so that the direction and diffuseness are more correctly perceived by the user. In some embodiments a (headphone) transparency signal is modified more in the low frequencies.
In some embodiments, there is also provided suitable apparatus and methods that have a transparency mode where microphone signals are compared to estimate sound direction and/or diffuseness and transparency signal is modified so that the direction of sound sources is unnaturally clear in particular when the internal signal (music) from the device connected to the device is loud. In some embodiments a transparency signal is modified more in the low frequencies.
Furthermore, in some embodiments, there is also provided suitable apparatus and methods that have a transparency mode where microphone signals from both microphone positions (such as earcups on a headphone) are compared to estimate sound direction and/or diffuseness and transparency signal is modified so that the direction of sound sources is unnaturally clear in particular when the internal signal (music) from the device connected to the device or apparatus is loud. In some embodiments both (earcup) microphones are compared to achieve a perceptually better estimate. In some embodiments a (headphone) transparency signal is modified more in the low frequencies.
Additionally in some embodiments there is also provided a suitable apparatus and methods (such as headphones) that are configured to reduce the phase difference and diffuseness of at least two microphone signals and creates a transparency signal from the modified microphone signals using information about) distance(s) from the outer microphones to the inner speakers (for example the headphone thickness). The amount by which the spatial parameters are modified (smaller) is based on the distances amount.
In the following examples the concept as discussed herein can be implemented in any suitable headphones or headset. In some embodiments the headphones may be in-ear or over-the-ear type. The headphones may have a head band or not (for example may be earbuds or earphones which at least partially are located within or against or adjacent the ear canal). In embodiments where both cup microphones are used to create a transparency signal, the microphone signals are transmitted to both earcups. In headphones with a head band this can be implemented using cables, in headphones without a headband then a suitable wireless transmission is employed, such as Bluetooth.
With respect to
Similarly, the right earcup 633 in this example comprises at least one right outer (external) microphone 605 that records sounds from outside the headphones and at least one right speaker 607 that is configured to play or output the transparency signal and sounds from the suitably connected device. The right earcup 633 in some embodiments can further comprise at least one inner microphone 609 that is configured to record sounds from inside the headphones, between the right speaker 607 and the user's eardrum.
In some embodiments there is provided a device (for example headphone) that has a transparency mode where microphone signals are compared to estimate sound direction and/or diffuseness and the generated transparency signal is modified so that the direction and diffuseness are more correctly perceived by the user.
In some embodiments the audio signals from the microphones in the left ear cup (or more generally ‘left’ side microphone(s)) can be used to create a left transparency signal and the microphones in the right ear cup (or more generally ‘right’ side microphone(s)) can be used to create a right transparency signal. In this way there does not need to be any signal transmission between the two ‘sides’ of microphones or earcups. In these embodiments the total minimum number of microphones is four, two on each side (or in each earcup).
In such embodiments the device (headphones) use at least two microphone signals to analyse sound directions using e.g, methods in U.S. Pat. No. 9,456,289 and/or diffuseness as in GB1619573.7. Diffuseness can be estimated as D/A ratios (Direct-to-Ambient). These parameters can in some embodiments be analysed in frequency bands. In some embodiments there can be 20-50 frequency bands but the embodiments as discussed herein can be applied to implementations with one or more frequency band. In some embodiments a smaller number of frequency bands can help reduce the time it takes to analyse the parameters.
In some embodiments the left/right direction is estimated and a front/back ambiguity is left unsolved. This allows correcting the level and/or phase of the pass-through signal for left/right separation and ignores the front/back separation. The front/back direction separation can be used to include the effect of the shadowing (for example shadowing of the earlobes which can be the largest single contributor with respect to the head but the effect is not as strong as the shadowing of the human head in the case of left/right separation). Additionally solving only the left/right direction can be implemented with fewer microphones. For example, a minimum 2 of two microphones can be used to determine the left/right separation.
A right outer microphone audio signal can be equalized in frequency bands and fed to the same (earcup) loudspeaker to create the transparency signal. A similar approach can be applied to the left where the left outer microphone audio signal can be equalized in frequency bands and fed to the same left (earcup) loudspeaker or ‘left’ virtual loudspeaker in a soundbar to create the transparency signal for the left channel.
The equalization processing for the outer microphone audio signals is implemented as different frequencies leak differently acoustically through the headphones or vehicle or helmet or head mounted device to the user ear. With equalization the transparency signal compensates for the parts of the audio signal that the headphones passively attenuate from the leaked signal. The passive attenuation for headphones is higher in higher frequencies and thus the transparency signal level is higher in higher frequencies. For the same reason, the modifications implemented in some embodiments are applied more (and at higher levels of modification) within the lower frequencies where the leaked sound forms a large part of the lower frequencies.
In some embodiments the equalization is controlled based on the detected directions. The equalization is applied such that the difference in level in frequency bands between the left and right earcup transparency signal corresponds to a binaural signal from the detected direction. The level differences for the binaural signals can be obtained or determined from a stored database or similarly stored form. The database may be a general one or personalized for the current user.
In a more complex embodiment implementation a diffuseness of the audio signal is taken into account. The level difference between the left and right microphone audio signals (or in headphones the earcups) is modified to be the product of the D/A ratio and the level difference for a dry sound coming from the detected direction. The dry sound level difference is known as ILD (Inter-aural Level Difference) and values for different directions can be found form known databases. Alternatively, in some embodiments a user's own measured or estimated level difference can be used. The determination of a user/s own measured or estimated level difference can be implemented according to any known method.
In some embodiments the device (for example headphones or headset/earbuds, vehicle etc) based audio signals can additionally be modified with device specific values. In some embodiments the sound environment is estimated to be fully diffuse (typically D/A ratio is zero or close to it) then the product i.e. final level difference should be zero and if the sound environment is fully directional (D/A ratio is one or close to it) then the product, in other words the final level difference, should be the same as the dry level difference.
The method is further shown in the operations shown with respect to
With respect to step 701 there is shown the operation of receiving microphone signals from one side (L or R), which in headphones can be one ear cup.
Then with respect to step 703 there is shown the operation of dividing signals into time-frequency tiles.
As shown in step 705 is the operation of estimating sound direction in at least one tile.
Additionally, is shown in step 707 the operation of estimating D/A ratio in at least one tile. This estimation operation is an optional step.
Furthermore, is shown in step 709 the operation of searching/calculating (or otherwise obtaining or determining) a level and/or phase difference from a database for at least one tile direction (or suitable storage means).
Then as show in step 711 is the operation of modifying transparency signal based on the obtained or found level (and phase) difference, and optionally using the D/A ratio.
Step 713 furthermore shows converting the modified signal back to a time domain representation.
Finally step 715 shows the operation of using a modified time domain signal as the transparency signal for the same side (which in headphones can be the same side ear cup) after additional known modifications such as equalization are implemented.
In some embodiments the diffuseness of the transparency signal can also be modified based on measured diffuseness by decorrelating or correlating the transparency signal so that its diffuseness matches the measured diffuseness from the outer microphones. Decorrelating a signal can be implemented using known decorrelators and correlating a signal can be implemented, for example, by mixing the signal stereo transparency signal with its mono downmix.
Additionally, in some embodiments transparency (i.e. pass-through) signal should be presented to the user with very little delay. Direction and D/A ratio analysis may take some time (e.g. 20 ms or more) depending on the method employed. Typically, important directions and D/A ratio such as emergency vehicle sounds in audio do not change very quickly whereas less important directions such as ambient noise direction can vary rapidly. Therefore, the system may use directions and D/A ratio from earlier audio samples and use them to adjust current samples in the transparency signal without causing significant problems. Some direction detection methods are very fast though. For example, the level difference of the microphone signals can be directly mapped to a desired level difference in the transparency signal using measured data from the headphones where the measured data is stored in a table. Additionally or alternatively, higher sampling rates such as 192 KHz can be used for analysis to reduce the delay.
In addition or alternatively to level changes, the phase of the transparency signal can also be changed to match values in a database. Phase changes can be implemented using known methods. Typically, phase changes are more important at lower frequencies (<1.5 kHz) and level change is more important at higher frequencies (>1.5 kHz).
Furthermore, in some embodiments the changes are implemented to the transparency signal so that the changes minimally modify the original transparency signal. In these embodiments there is a focus on the difference of the level and phase of the left and right transparency signals. Therefore, both signals are modified minimally so that the difference is the same as in the database.
A more detailed example implementation is presented hereafter.
In this example two microphones are located on or close to a left/right axis on the device. This example can therefore solve the problem as shown in
x
m(k),m=1,2
Microphone 1 in this example is used for creating left ear transparency audio signals and microphone 2 for right ear transparency audio signals.
In some embodiments the first operation is to filter the microphone audio signals with a filterbank to enable processing in frequency bands, for example using:
Other filterbanks or time-to-frequency domain transforms may be employed (though as discussed a very low delay method such as the one discussed above is preferred). A low delay approach is employed because a headphone (or more generally the device) user can typically hear the sound sources through the speakers or headphones as well as the transparency signal and any significant delay between the two can create a situation where the user hears the sound sources twice.
The filtered signal with B bands is:
In low frequency bands direction is analysed from time difference between signals.
Time difference is converted into an effective distance
Effective distance can be converted into a direction is each low frequency band.
The direction although being ambiguous because there are two microphones. In some embodiments where there are three or more microphones, the ambiguity can be solved as explained in U.S. Pat. No. 9,456,289.
At higher frequencies the time difference of sound reaching the two microphones may be large compared to the wavelength of sound, that it is better to use level based direction analysis. Energy of microphone signals in frequency bands can be calculated as:
E
1
b
−E
2
b
Once the direction for each band is determined, the audio signals in the bands are modified to better reflect human hearing without the headphones. The direction is used to search from a HRTF (Head Related Transfer Function) database left and right ear level and phase differences that correspond to that direction in each frequency band.
HRTFs are typically provided given as frequency response bins for frequency indexes for both sides (left and right) for different directions. HRTF average energy in band b (the same bands are used here as in the filterbank previously) for left l ear and direction a is:
E
lα
b
−E
r,α
b
In these embodiments the pass-through signal is configured to have the same level difference in each band. For example if the direction in a band was estimated to be a then the following is:
X
m
b
For example, if microphone 1 is used for creating a pass-through signal for the left side or channel (or ear) and microphone 2 is used for creating a pass-through signal for the right side or channel (or ear) and the level difference of left and right sides is as previously discussed and the phase difference in the HRTF database:
In some embodiments, diffuseness is taken into account as well. Diffuseness can be measured using known methods, typically correlation between microphone signals. Diffuseness is often measured as D/A ratio. If the audio scene around the user is very diffuse, then it can be difficult to hear any clear audio directions and the modifications described above can be reduced or ignored altogether. For example where diffuseness has been estimated using known methods and is available as D/A ratio for each band so that if D/A ratio is zero, audio is very diffuse, and if D/A is 1, audio is very direct (the opposite of diffuse) and D/A ratio may get any values in between 0 and 1 to correspond to all kinds of level of diffuseness.
In some embodiments the earlier equation can be modified to the following:
In some embodiments where the microphones are very close to each other, diffuseness estimation methods may underestimate the amount of diffuseness that user would hear without headphones. Diffuseness may then be added to the signals by decorrelating the signals using known methods.
In some embodiments, at lower frequencies (<1.5 kHz) and increasingly towards lower frequencies than this, a larger modification is applied, because at lower frequencies the transparency signal forms only a small part of the audio user hears because at low frequencies environmental sounds leak through the headphones. The leakage depends on the headphones, typically over the ear type headphones have less leakage than in-ear headphones but this also depends on how tightly fitted the headphones are. Typically, the modification factor could be 1.5-2.0 for both the gain and phase at low frequencies.
In some embodiments the modified signals can then be input to an inverse filterbank that can be as simple as summing all the signals or more complex in the situations of applying time-frequency domain transforms. The result is used for creating a pass through signal.
In some embodiments further processing can be applied before or after these operations, such as analogue-to-digital (A/D) conversion, digital-to-analogue (D/A) conversion, compression, equalization (EQ), etc.
In some situations with many sampling rates (<48 kHz in particular) using full sample delay for phase may not be accurate enough. Therefore, a fractional delay may be applied.
Although in the above examples there are no smoothing operations described for simplicity and clarity reasons, however a typical implementation can implement smoothing.
In some embodiments there can be a headphone that has a transparency mode where microphone signals from both earcups are compared to estimate sound direction and/or diffuseness and transparency signal is modified so that the direction and diffuseness are more correctly perceived by the user.
Implementation wise this is similar to the above described embodiment implementations. The difference between these and the above implementations is that when microphones from both earcups are used, there are microphones whose pairwise distance is more similar to the distance between the user's ears and therefore the estimate of the direction and/or diffuseness are more similar to the directions and/or diffuseness that a human (not wearing headphones) would perceive. For both left and right ear transparency audio signals, both microphones from left and right earcups are used. In these embodiments a minimum total number of microphones is two, one in each earcup.
In these embodiments the best performance is experienced when both earcups of the headphone are connected with a wired connection so that there is insignificant delay when processing microphone signals from both earcups.
The method is further shown in the operations shown with respect to
With respect to step 801 there is shown the operation of receiving microphone signals from both sides (L and R), which for headphones can be both ear cups (L and R).
Then with respect to step 803 there is shown the operation of dividing signals into time-frequency tiles.
As shown in step 805 is the operation of estimating sound direction in at least one tile.
Additionally is shown in step 807 the operation of estimating D/A ratio in at least one tile. This estimation operation is an optional step.
Furthermore is shown in step 809 the operation of searching/calculating (or otherwise obtaining or determining) a level and/or phase difference from a database for at least one tile direction (or suitable storage means).
Then as show in step 811 is the operation of modifying transparency signal (L and R) based on the obtained or found level (and phase) difference, and optionally using the D/A ratio.
Step 813 furthermore shows converting the modified signal back to a time domain representation.
Finally step 815 shows the operation of using a modified time domain signal as the transparency signal for both sides or ear cups after additional known modifications such as equalization are implemented.
The processing equations are the same as discussed above but the microphone locations are different.
In some embodiments the device (headphone) is configured with transparency mode where microphone signals are compared to an estimated sound direction and/or diffuseness and transparency audio signals are modified so that the direction of sound sources is unnaturally clear in particular when the internal signal (music) from the device connected to the device is loud.
As hearing surrounding environmental sounds can be vital, for example in traffic. The ability to determine or hear audio signals from the correct directions (for example in big cities with lots of echoes from buildings) can be difficult even without headphones. Any degradation in the ability to determine directions because of the headphones can be problematic. Users can get distracted by for example loud music played from (within) the headphones or vehicle or helmet. The louder the music the more difficult it is to hear dangerous sound sources around you in real life.
Although there have been proposed many ways that can be used to give the user a ‘super’ hearing for sound directions, few are suitable for use in with transparency mode because of the ultra-low latency requirements. In some embodiments a proposed low-latency approach is presented below:
In some embodiments a low-latency filterbank is used to divide microphone signals, typically one from both sides or earcups into frequency bands. The louder of the microphones is chosen for each band. The louder signal is used as a transparency signal (+ other known modifications employed) in both ears but a level difference is introduced to the left and right ear transparency signals in each band. The level difference is the same as in the original microphone signals or the level difference is based on a detected direction similar to the earlier embodiments described above. This modification can in some situations causes artefacts in the transparency audio signal and therefore the modified signal is typically mixed with a normal or conventional (non-modified) transparency signal. The mixing depends on the loudness of the music from the user device. The louder the music the more the modified signal is used in the mixture. The modification achieves a reduction in diffuseness and in this way the directional hearing of the user is improved.
In some embodiments this can be implemented based on the above equation.
In these embodiments the equation replaces the D/A ratio with a ML value, where ML stands for Music Level and can be set as a value of 1 for high music level and a value of zero for low music level.
In some embodiments the modifications performed herein may need to be employed in a greater manner with respect to the lower frequencies where the leaked sound forms a large part of the lower frequencies.
In very diffuse conditions sounds may appear to be coming from a wrong direction because of a strong reflection. In such circumstances the application of the modification can produce a poorer performance. The modification can thus in some embodiments be limited so that in diffuse situations (diffuseness is measured with D/A ratio), the amount of the modified signal is limited in the mixture.
In some embodiments the device or headphone has a transparency mode where microphone signals from both sides or earcups are compared to estimate sound direction and/or diffuseness and transparency signal is modified so that the direction of sound sources is unnaturally clear in particular when the internal signal (music) from the device connected to the device or headphones is loud.
In such a manner the implementation can be similar to the ‘both’ sides or earcup modification as discussed above but when microphones from both sides or earcups are used, there are microphones whose pairwise distance are more similar to the distance between user ears and therefore the estimation of the direction and/or diffuseness is more similar to the perception of the user without headphones.
In a similar manner this embodiment is better when both sides (or earcups of the headphone) are connected with a wired connection so that there is insignificant delay for using microphone signals from both sides or earcups. In some embodiments a wireless connection between the device sides, for example the in-ear headphones, can be implemented, but the performance due to the additional delay can be poorer.
As described above a ‘thickness’ refers to the distance of the microphones from speaker elements on a line parallel to the axis defined by the users ears. The ‘thickness’ thus in the current application does not refer to the thickness of the headphones or more generally the device as a whole.
In the following embodiments there are benefits even when the microphones are not perfectly on a line that is parallel to the axis defined by the ears of the user but the result of the application in these situations are less optimal.
Thus, for example thickness is shown with respect to
Additionally, is shown in
With respect to
In some embodiments the left and right thickness values can differ as shown with respect to
As shown in
In its simplest form these further embodiment additionally control the equalization based on detected directions. The equalization is implemented such that the difference in level in frequency bands between the left and right earcup transparency signal corresponds to a binaural signal from the detected direction. The level differences for the binaural signals are to be found from a stored database or suitable storage. The database may be a general one or personalized for the current user.
The method is further shown in the operations shown with respect to
With respect to step 1401 there is shown the operation of receiving microphone signals from both ear cups (L and R).
Then with respect to step 1403 there is shown the operation of dividing signals into time-frequency tiles.
As shown in step 1405 is the operation of estimating sound direction in at least one tile.
Additionally is shown in step 1407 the operation of estimating D/A ratio in at least one tile. This estimation operation is an optional step.
Furthermore is shown in step 1409 the operation of searching/calculating (or otherwise obtaining or determining) a level and/or phase difference from a database for at least one tile direction (or suitable storage means).
Then as show in step 1411 is the operation of modifying transparency signal in one ear such that the level difference in at least one tile becomes the same as in the database.
In some embodiments the diffuseness of the transparency signal can also be modified based on measured diffuseness by decorrelating or correlating the transparency signal so that its diffuseness matches the measured diffuseness from the outer microphones. Decorrelating a signal can be implemented according to any suitable manner, for example employing decorrelators and correlating a signal can be done for example by mixing the signal stereo transparency signal with its mono downmix.
Where, the level difference between left and right ears is the bigger, the closer to the ear canal one measures it. Conversely, the phase difference gets the bigger, the further from the ear canal one measures it. Thus, the modification for the transparency signal in some embodiments increases the level difference for thicker headphones and decreases the phase difference for thicker headphones. The amount of increase and decrease may be frequency dependent and found from a database that has been measured (or modelled) and stored to the headphone memory.
One possible implementation uses Mid/Side coding. For example, when the microphones and loudspeakers in the headphones are as in
Mid/side-representation is converted back into left loudspeaker 907 audio signal (L) and right loudspeaker 917 audio signal (R).
The distance between the centre line 1501 to the left or right microphone 905 or 915 is the distance a 1503 and distance between the centre line 1501 to the left or right speaker 907 or 917 is the distance b 1505
Additionally, some equalisation for the loudspeaker signals is needed because headphones acoustically let sound pass through them in differing amounts at different frequencies.
The use of the Mid/Side representation is effective in the sense that it both improves the perceived directions of sounds to be closer to reality but it also modifies the coherence of the sound to correspond better to what user would hear if the microphones were not so far away from the speakers (coherence needs to be increased the more the farther away microphones are from user's ears). Mid/Side representation is also very simple to compute and thus does not consume too much processing power or battery of the headphones. In some embodiments other implementations are possible but would be more processor intensive.
In some implementations the microphones may be asymmetrically placed as in
In such cases different computation is needed according to the following formulas:
Thus where the microphones are outside the vehicle and the speakers are either in their normal positions (in doors, dashboard etc) or the speakers are in the driver's seat, the rendering of the microphone signals can differ. With the conventional position speaker configuration the microphone signals could be rendered to the speakers almost “as is” but for the speakers in the seat a stereo image should be narrowed significantly. The Mid Side examples discussed above can be applied with suitable microphone selection. For the normal speaker placement, the microphones that are widely dispersed on the vehicle outside surface can be selected but for the seat speakers example the selection can be for microphones that are closer to each other.
With respect to
The pass-through signal ideally is such that when it acoustically combines with the leaked sound (leaked sound has the properties of the lines with x), the combination has the properties of the lines with *.
With respect to
In some embodiments the device 2000 comprises at least one processor or central processing unit 2007. The processor 2007 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 2000 comprises a memory 2011. In some embodiments the at least one processor 2007 is coupled to the memory 2011. The memory 2011 can be any suitable storage means. In some embodiments the memory 2011 comprises a program code section for storing program codes implementable upon the processor 2007. Furthermore in some embodiments the memory 2011 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 2007 whenever needed via the memory-processor coupling.
In some embodiments the device 2000 comprises a user interface 2005. The user interface 2005 can be coupled in some embodiments to the processor 2007. In some embodiments the processor 2007 can control the operation of the user interface 2005 and receive inputs from the user interface 2005. In some embodiments the user interface 2005 can enable a user to input commands to the device 2000, for example via a keypad. In some embodiments the user interface 2005 can enable the user to obtain information from the device 2000. For example the user interface 2005 may comprise a display configured to display information from the device 2000 to the user. The user interface 2005 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 2000 and further displaying information to the user of the device 2000. In some embodiments the user interface 2005 may be the user interface for communicating.
In some embodiments the device 2000 comprises an input/output port 2009. The input/output port 2009 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 2007 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
The input/output port 2009 may be configured to receive the signals.
In some embodiments the device 2000 may be employed as at least part of the renderer. The input/output port 2009 may be coupled to headphones (which may be a headtracked or a non-tracked headphones) or similar.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or “fab” for fabrication.
As used in this application, the term “circuitry” may refer to one or more or all of the following:
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
The term “non-transitory,” as used herein, is a limitation of the medium itself (i.e., tangible, not a signal) as opposed to a limitation on data storage persistency (e.g., RAM vs. ROM).
As used herein, “at least one of the following: <a list of two or more elements>” and “at least one of <a list of two or more elements>” and similar wording, where the list of two or more elements are joined by “and” or “or”, mean at least any one of the elements, or at least any two or more of the elements, or at least all the elements
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2310048.0 | Jun 2023 | GB | national |