The present invention relates generally to audio devices and in particular to wireless headsets with multiple microphones and vibration detectors for voice quality enhancement in windy conditions.
The use of headsets wirelessly connected to host devices like smartphones, laptops, and tablets is becoming increasingly popular. Whereas consumers used to be tethered to their electronic device with wired headsets, wireless headsets are gaining more traction due to the enhanced user experience, providing the user more freedom of movement and comfort of use. Further momentum for wireless headsets has been gained by certain smartphone manufacturers abandoning the implementation of the 3.5 mm audio jack in the smartphone, and promoting voice communications and music listening wirelessly, for example by using Bluetooth® technology.
Wireless headsets typically have one or more microphones to pick up the voice of the user. This allows the user to make hands-free phone calls. The use of two or more microphones allows the application of beamforming, thus enhancing the voice pickup and providing the possibility of noise reduction.
Wind noise has always been hindering the use of wireless headsets, not only in windy weather conditions, but also wind created by cycling or other sports activities. Wind itself incident on the microphone membrane cause undesired noise. Furthermore, turbulences caused by wind flowing around the edges of acoustic canals that lead to the microphones, contribute greatly to the wind noise. One method to counteract wind noise has been the use of vibration sensors to pick up the voice instead. These sensors pick up the vibrations in the human body caused by the voice excitement. Vibration can be picked up at the skin (Skin Surface Microphones), from bones (Bone Conduction microphone), or from other tissues in the user's head. The vibration sensor can for example be implemented by an accelerometer which may use MEMS technology. Since the vibration sensor is not excited by displacement of air, it is insensitive to wind noise. Yet, vibration sensors and its use are hampered by low filtering characteristics. That is, high frequencies are damped in the tissues and are not picked up by the vibration sensors. This makes the voice sound unnatural. Wireless headsets with improved microphone performance in windy noise conditions are therefore desirable.
The Background section of this document is provided to place embodiments of the present invention in technological and operational context to assist those of skill in the art in understanding their scope and utility. Unless explicitly identified as such, no statement herein is admitted being prior art merely by its inclusion in the Background section.
The following presents a simplified summary of the disclosure in order to provide a basic understanding to those of skill in the art. This summary is not an extensive overview of the disclosure and is not intended to identify key/critical elements of embodiments of the invention or to delineate the scope of the invention. The sole purpose of this summary is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
According to one or more embodiments described and claimed herein, novel and nonobvious aspects of multiple microphones combined with an equalizer and a vibration sensor provide improved voice performance in a wireless stereo headset. By exploiting beam-forming using a dual-microphone arrangement with equalization, gain in voice pickup is achieved while keeping a natural sound in low-wind conditions. When wind is detected, the system gradually switches over to a voice pickup by a vibration sensor which is insensitive to wind.
Hereinafter, embodiments of the disclosure will be described in further detail. It should be appreciated, however, that these embodiments may not be construed as limiting the scope of protection for the present disclosure.
Embodiments will now be described, by way of example only, with reference to the accompanying schematic drawings in which corresponding reference symbols indicate corresponding parts, and in which:
The figures are meant for illustrative purposes only, and do not serve as restriction of the scope or the protection as laid down by the claims.
For simplicity and illustrative purposes, the present invention is described by referring mainly to exemplary embodiments thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be readily apparent to one of ordinary skill in the art that the present invention may be practiced without limitation to these specific details. In this description, well known methods and structures have not been described in detail so as not to unnecessarily obscure the present invention.
Electronic devices, such as mobile phones and smartphones, are in widespread use throughput the world. Although the mobile phone was initially developed for providing wireless voice communications, its capabilities have been increased tremendously. Modern mobile phones can access the worldwide web, store a large amount of video and music content, include numerous applications (“apps”) that enhance the phone's capabilities (often taking advantage of additional electronics, such as still and video cameras, satellite positioning receivers, inertial sensors, and the like), and provide an interface for social networking. Many smartphones feature a large screen with touch capabilities for easy user interaction. In interacting with modern smartphones, wearable headsets are often preferred for enjoying private audio, for example voice communications, music listening, or watching video, thus not interfering with or disturbing other people sharing the same area. Because it represents such a major use case, embodiments of the present invention are described herein with reference to a smartphone, or simply “phone” as the host device. However, those of skill in the art will readily recognize that embodiments described herein are not limited to mobile phones, but in general apply to any electronic device capable of providing audio content.
A microprocessor 270 may control the radio signals, applying audio processing (for example voice processing such as echo cancellation or noise suppression) on the signals exchanged with radio transceiver 250, or may control other devices and/or signal paths within the headset 12. Microprocessor 270 may be a separate circuit, or may be integrated into another component present in the headset, for example radio transceiver 250.
Audio codec 260 may include a Digital-to-Analog (D/A) converter, the output of which may connect to a speaker 210. To obtain beamforming for enhanced voice pickup, more than one microphone 220a, 220b may be embedded in headset 12. Audio codec 260 may include Analog-to-Digital (A/D) converters that receive input signals from microphones 220a and 220b. Alternatively, digital microphones may be used, which do not require A/D conversion and may provide digital audio signals directly to the audio codec 260 or the microprocessor 270.
Power Management Unit (PMU) 240 may provide a stable voltage and current supplied to all electronic circuitry. The headset 12 may be powered by a battery 230 which typically provides a 3.7V voltage and may be of the coin cell type. The battery 230 can be a primary battery but is preferably a rechargeable battery. Recharging circuitry may be included in the PMU 240.
h(t)=δ(t)−δ(t−τp−τd)
with a frequency response:
H
DMIC(f)=1−e−j2πf(τ
The maximum gain of 6 dB is realized if 2πf(τp+τd)=π. As an example, when we assume the velocity of sound νs to be 343 m/s and the distance L between the MICs to be 11 mm, the propagation delay amounts to τp=L/νs=33.07 μs. The delay in unit 340 can then simply be realized by delaying the sampled digital audio signal by one sample, assuming a sampling frequency of 32 kHz. The maximum gain for a sound source at the left is then realized at frequency:
Integer sample delays are easy to implement, but also non-integer sample delays can be implemented digitally. For example, by using a two-tap filter with inter-tap delay of one time sample and filter coefficients a1 and a2, any delay between 0 and one time sample can be achieved by a proper selection of the a1 and a2 coefficients.
So far, we have only considered sound from the right and from the left. If the sound source is at another angle, the propagation delay will be dependent on this angle. The gain as a function of the angle for the dual-MIC arrangement 300 is visualized in
In
The lower frequency part can be restored by applying an equalizer filter.
Herein, the lower cut-off frequency fmin is arbitrarily chosen at 50 Hz. It should be low enough not to be noticeable by the listener and high enough to prevent Heq(f) to reach too high amplitudes. The higher cut-off frequency fmax is arbitrarily chosen at 7.8 kHz, the frequency where the dual-MIC gain was maximal (see
Cascading the dual-MIC arrangement 300 with an equalizer filter 810 as shown in the cascaded configuration 800 of
The power spectrum of the same female voice recorded for 30s at the output of the equalizer filter 810 is depicted in
However, when there is an air flow along the headset, for example due to windy weather conditions or because the user is moving like biking or running, the wind noise may have a big impact on the cascaded configuration 800. There is very little correlation between the wind noise signals detected by each microphone. In fact, in the subtractor 360, the wind noise powers from the MIC signals may add up. But more importantly, the low frequency components of the uncorrelated wind noise signals are typically not suppressed by the high-pass filter behavior of the dual-microphone arrangement 300 (as would be the case with correlated signals like voice). The operation of the equalizer filter 810, emphasizing the lower frequency components, may now be disastrous as the wind signals at low frequency are strongly amplified causing a very bad Signal-to-Noise ratio (SNR) at the equalizer output. When the digital word size is not sufficient, clipping of the signal may occur. Due to the low SNR and/or clipping, the sound may be heavily distorted and may result in a complete saturation of the audio signal path. The cascading configuration 800 may thus not perform well a windy environment.
In a windy environment, instead of a microphone, another detector may be used that is not sensitive to air pressure variations but sensitive to vibrations of the human body caused by the utterance of speech. The vocal cords create vibrations that propagate through the body, causing vibrations in the bones and the skin. A vibration sensor in contact with the human body may pick up these signals. Yet, high frequency components are strongly attenuated by the propagation through the human tissue, and typically only low frequency components arrive at the vibration sensor.
A power spectrum of the same female voice picked by a vibration sensor is depicted in
Combining the dual-MIC arrangement 300 having the high-pass characteristics with a vibration sensor having the low-pass characteristics is the first step towards improving the acoustic performance of the dual-MIC arrangement. This is shown in the block diagram of
In the arrangement 1100, the voice typically sounds a little distorted since the vibration sensor does not perfectly replicate the low frequency content found in the original voice signal. Even if no wind is present, and a vibration sensor would not be necessary, the voice signal may sound distorted. The equalizer filter 810, as discussed before, does a better job in recreating the low-frequency voice content, but it was very sensitive to wind noise in the dual-MIC arrangement.
In the embodiment 1200 of
The weighting values WA and WB may depend on the measured wind power. An example of the variation in the weights as the wind power varies is shown in
An alternative circuit diagram to the dual-MIC arrangement with vibration sensor to provide robustness in noisy wind conditions is shown in
Further improvements to measure the wind noise power are shown in
In certain environments, the wind noise may be so strong that the SNR level, even at the higher frequencies, is too low for an acceptable voice quality to be experienced. In those cases, the high frequency components picked up by the MICs 220a, 220b are preferably not combined with the signal 1866 from the vibration sensor 1130 as was done previously. Instead, only the signal from the vibration sensor 1130 may be used. In this case, we can distinguish between three wind regimes: 1) low to no wind, 2) moderate wind, and 3) strong wind. In regime 1, the equalizer may be used to compensate for the dual-MIC high pass filtering; in regime 2, the equalizer is not used but the vibration sensor 1130 may be used with the high-frequency dual-MIC signals; finally, in regime 3 only the signal from the vibration sensor 1130 signal may be used. An exemplary schematic 1700 for adaptively control between these three regimes is shown in
In
In an alternative embodiment (not shown), adder 1150 in the schematics 1800 in
Various operations in the digital domain have been described like adders, subtractors, high- and low-pass filters, equalizing filters, delays, and so on. Several other audio operations may be added to the dual-MIC arrangement with equalizer shown in this invention in order to improve the voice pick-up function. For example, noise suppression, echo cancellation, active noise cancellation, and other audio enhancement functions may be added. All these operations can be carried out in different places in the wireless headset configuration. For example, some (or all) may be carried out in the audio codec 260. Others (or all) may be carried out in the microprocessor 270 or in an addition Digital Signal Processor DSP (not shown).
To measure the power of the wind, in step 1720 the output of the subtractor may be low-pass filtered, e.g. with a low-pass cut-off frequency of 200 Hz, and then the power in the filtered signal may be determined. The output of the vibration sensor may be low-pass filtered, e.g. with a low-pass cut-off frequency of 200 Hz, and then the power in the filtered signal may be determined in step 1722. From the wind power and possibly the vibration power, the weight factors WA and WB may be derived in step 1724.
In step 1730, the subtractor output determined in step 1708 may be high pass filtered to reduce any possible wind noise power. The cut-off frequency for the high-pass filter is for example 4 kHz. In step 1732, the output of the high-pass filter may be added to the output of a vibration sensor that has picked up the voice.
In step 1740, the subtractor output determined in 1708 may be equalized to enhance the low-frequency content of the signal.
Finally, in step 1760, the output of the equalizer may be multiplied with WA, and the output of the adder combining the vibration sensor with the high pass filtered subtractor output, may be multiplied with WB. Both multiplier outputs may then be added together to obtain the output signal to be audibly presented, for example via speaker 210.
Embodiments of the present invention present numerous advantages over the prior art. By exploiting beam-forming using a dual-microphone arrangement with equalization, gain in voice pickup may be achieved while keeping a natural sound in low-wind conditions. When wind is detected, the system may gradually switch over to a voice pickup by a vibration sensor which is insensitive to wind.