Embodiments of the invention relate generally to a system and method of wind and noise reduction for a headphone. Specifically, embodiments of the invention performs spectral mixing of signals from a microphone located inside the earcup (or ear bud, or phone) that is directed towards the ear canal (e.g., error microphone) with the signals from at least one microphone located on the outside of the earcup's housing to generate a mixed signal. In some embodiments, the signals from the internal microphone is also subject to a version of an adaptive noise cancelling technique to further enhance the internal microphone signal before the spectral mixing.
Currently, a number of consumer electronic devices are adapted to receive speech via microphone ports or headsets. While the typical example is a portable telecommunications device (mobile telephone), with the advent of Voice over IP (VoIP), desktop computers, laptop computers and tablet computers may also be used to perform voice communications.
When using these electronic devices, the user also has the option of using headphones, earbuds, or headset to receive his speech. However, a common complaint with these hands-free modes of operation is that the speech captured by the microphone port or the headset includes environmental noise such as wind noise, secondary speakers in the background or other background noises. This environmental noise often renders the user's speech unintelligible and thus, degrades the quality of the voice communication.
Generally, embodiments of the invention relate to a system and method of wind and noise reduction for a headphone. Embodiments of the invention apply to wireless or wired headphones, headsets, phones, and other communication devices that users can wear on or hold at their head or ears. By reducing the wind and noise in the signals captured by the microphones, the speech quality and intelligibility of the uplink signal is enhanced. Specifically, embodiments of the invention spectrally mix signals from a microphone located inside the earcup (or ear bud, or phone) that is directed towards the ear canal (e.g., error microphone) with the signals from at least one microphone located on the outside of the earcup's housing to generate a mixed signal. In some embodiments, the signals from the internal microphone is also subject to a version of an adaptive noise cancelling technique to further enhance the internal microphone signal before the spectral mixing.
In one embodiment, a method of wind and noise reduction for a headphone starts by receiving acoustic signals from a first external microphone included on an outside of a housing of an earcup of the headphone. Acoustic signals are also received from an internal microphone which is included inside the housing of the first earcup. A downlink signal is then processed to generate an echo estimate of a speaker signal to be output by a speaker. The echo estimate of the speaker signal is removed from the acoustic signals from the internal microphone to generate a corrected internal microphone signal. The corrected internal microphone signal is spectrally mixed with the acoustic signals from the first external microphone to generate a mixed signal. The lower frequency portion of the mixed signal includes a corresponding lower frequency portion of the corrected internal microphone signal. The higher frequency portion of the mixed signal includes a corresponding higher frequency portion of the acoustic signals from the first external microphone.
In another embodiment, the method receives acoustic signals from a first external microphone and a second external microphone. The first and second external microphones are included on an outside of a housing of an earcup of the headphone. A beamformer generates a voicebeam signal based on the first external microphone signal and the second external microphone signal. Acoustic signals are received from an internal microphone included inside the housing of the earcup. A downlink signal is processed to generate an echo estimate of a speaker signal to be output by a speaker. The echo estimate of the speaker signal is removed from the acoustic signals from the internal microphone to generate a corrected internal microphone signal. In this embodiment, the corrected internal microphone signal is spectrally mixed with the voicebeam signal to generate a mixed signal. The lower frequency portion of the mixed signal includes a corresponding lower frequency portion of the corrected internal microphone signal, and the higher frequency portion of the mixed signal includes a corresponding higher frequency portion of the voicebeam signal.
In another embodiment, a system of noise reduction for a headphone comprises: a speaker to output a speaker signal based on a downlink signal, an earcup of the headphone, an active-noise cancellation (ANC) downlink corrector, a first summator, a first and second acoustic echo canceller, an equalizer and a spectral combiner. The earcup includes a first external microphone included on an outside of a housing of the first earcup, and an internal microphone included inside the housing of the first earcup. The ANC downlink corrector processes the downlink signal to generate an echo estimate of the speaker signal. The first summator removes the echo estimate of the speaker signal from acoustic signals from the internal microphone to generate a corrected internal microphone signal. The first acoustic echo canceller removes a linear acoustic echo from acoustic signals from the first external microphone based on a downlink signal to generate an enhanced first external microphone signal and the second acoustic echo canceller removes a linear acoustic echo from the corrected internal microphone signal based on the downlink signal to generate an enhanced internal microphone signal. The equalizer scales the enhanced internal microphone signal to match a level of the enhanced first external microphone signal. The spectral combiner spectrally mixes the enhanced internal microphone signal with the enhanced first external microphone signal to generate a mixed signal. The lower frequency portion of the mixed signal includes a corresponding lower frequency portion of the enhanced internal microphone signal, and the higher frequency portion of the mixed signal includes a corresponding higher frequency portion of the enhanced first external microphone signal.
The above summary does not include an exhaustive list of all aspects of the present invention. It is contemplated that the invention includes all systems, apparatuses and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations may have particular advantages not specifically recited in the above summary.
The embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” embodiment of the invention in this disclosure are not necessarily to the same embodiment, and they mean at least one. In the drawings:
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures, and techniques have not been shown to avoid obscuring the understanding of this description.
Moreover, the following embodiments of the invention may be described as a process, which is usually depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed. A process may correspond to a method, a procedure, etc.
The headphone on
As shown in
The third microphone 113 is located inside each earcup facing the user's ear cavity (e.g., inside microphone, internal microphone, or error microphone). Since the third microphone 113 is located against or in the ear and the third microphone 113 is placed inside the earcup 101, the third microphone 113 is protected from external noises such as ambient noise, environmental noise, and wind noise. In some embodiments, the location of the third microphone 113 captures acoustic signals having ambient noises attenuated by 10 db-20 db and wind noises attenuated by 15-20 db. In one embodiment, the earcup is an earbud such that the third microphone 113 is located on the portion of the earbud (e.g., tube) that is placed in the user's ear such that the third microphone 113 is as close as possible to the user's eardrum. In some embodiments, at least one of the external microphones 111, 112, and the internal microphone 113 can be used to perform active noise cancellation (ANC).
While
While not shown in the
In another embodiment, the earcups 101, 102 are wireless and may also include a battery device, a processor, and a communication interface (not shown). In this embodiment, the processor may be a digital signal processing chip that processes the acoustic signal from the microphones 111-113. In one embodiment, the processor may control or include at least one of the elements illustrated in the system 3 in
The communication interface may include a Bluetooth™ receiver and transmitter which may communicate speaker audio signals or microphone signals from the microphones 111-113 wirelessly in both directions (uplink and downlink) with the electronic device. In some embodiments, the communication interface communicates encoded signal from a speech codec (not shown) to the electronic device.
As shown in
In the embodiment in
Embodiments of the invention may be applied in time domain or in frequency domain. In one embodiment, the sample rate converters (SRC) 3011-3013 in
The time-frequency transformers (FBa) 3021-3023 transform the acoustic signals from the first microphone 111, the acoustic signals from the second microphone 112, and the acoustic signal from the third microphone 113, from a time domain to a frequency domain. Similarly, the time-frequency transformer (FBa) 3024 transforms the downlink signal from a time domain to a frequency domain.
An active-noise cancellation (ANC) downlink corrector 318 processes the downlink signal from the downlink DSP chain 37 to generate an echo estimate of the speaker signal. A first summator 3041 receives the acoustic signals from the third microphone (e.g., internal microphone) 113 and the echo estimate of the speaker signal from the ANC downlink corrector 318. The first summator 3041 removes the echo estimate of the speaker signal from acoustic signals from the internal microphone to generate a corrected internal microphone signal. Accordingly, the first summator 3041 extracts from the internal microphone signal the echo generated by the downlink signal that is produced by the speaker 316 which may be included in the earcup 101 or the electronic device. This extraction further preserves the level of the speaker signal being played by the speaker 316.
Given the earcup 101's occlusion on the user's ear, the user's speech that is captured by the third microphone 113 is amplified at low frequencies comparing with the external microphones 111 and 112. To reduce this amplification to a level close to what the user would hear normally without the earcup occlusion, a feedback ANC corrector 313 processes the corrected internal microphone signal from the first combiner 3041 and generates an anti-noise signal. A second summator 3042 receives the anti-noise signal and the downlink signal from the downlink DSP chain 317. The second summator 3042 adds the anti-noise signal to the downlink signal to generate the speaker signal. The speaker signal is then played or output by the loudspeaker 316.
As further shown in
Referring back to the uplink signal processing, in
The time-frequency transformers (FBa) 3021-3024 may transform the signals from a time domain to a frequency domain by filter bank analysis. In one embodiment, the time-frequency transformers (FBa) 3021-3024 may transform the signals from a time domain to a frequency domain using the Fast Fourier Transforms (FFT).
Acoustic echo cancellers (AEC) 3031-3033 provide additional echo suppression. For example, the first AEC 3031 removes a linear acoustic echo from acoustic signals from the first external microphone 111 in the frequency domain based on a downlink signal in the frequency domain to generate an enhanced first external microphone signal in the frequency domain. The second AEC 3032 removes a linear acoustic echo from acoustic signals from the second external microphone 112 in the frequency domain based on a downlink signal in the frequency domain to generate an enhanced second external microphone signal in the frequency domain. The third AEC 3033 removes a linear acoustic echo from the corrected internal microphone signal in the frequency domain based on the downlink signal in the frequency domain to generate an enhanced internal microphone signal in the frequency domain.
A beamformer 306 is generating a voicebeam signal based on the enhanced first external microphone signal in the frequency domain and the enhanced second external microphone signal in the frequency domain.
In one embodiment, when only one external microphone is included in the system 3 (e.g., first microphone 110, instead of a beamformer 306, the system 3 includes an amplifier 306 that is a single-microphone amplifier to amplify the enhanced first external microphone signal to generate an amplified enhanced first external microphone signal which is transmitted to the spectral combiner 307 in lieu of the voicebeam signal.
While the beamformer 306 is able to help capture the sounds from the user's mouth and attenuate some of the environmental noise, when the power of the environmental noise (or ambient noise) is above a given threshold or when wind noise is detected in at least two microphones, the acoustic signals captured by the beamformer 306 may not be adequate. Accordingly, in one embodiment of the invention, rather than only using the acoustic signals captured by the beamformer 306, the system 3 performs spectral mixing of the acoustic signals from the internal microphone 113 and the voicebeam signal from the beamformer 306 to generate a mixed signal. In another embodiment, the system 3 performs spectral mixing of the acoustic signals from internal microphone 113 with the acoustic signals captured by at least one of the external microphones 111, 112 or a combination of them to generate a mixed signal.
As shown in
In one embodiment, when only one external microphone is included in the system 3 (e.g., first microphone 111), the wind and noise detector 305 only receives the enhanced first external microphone signal in the frequency domain from the first AEC 3031 and determines whether noise such as ambient noise and wind noise is detected in the enhanced first external microphone signal. In this embodiment, the noise detector detects ambient and wind noise when the acoustic noise power signal is greater than the pre-determined threshold. The wind and noise detector 305 generates the detector output to indicate whether the ambient or wind noise is detected in the enhanced first external microphone signal.
In one embodiment, an equalizer 310 scales the enhanced internal microphone signal in the frequency domain to match a level of the enhanced first external microphone signal. The equalizer 310 corrects the frequency response of the third microphone 113 (e.g., internal microphone) to match the frequency response of the first or second external microphones 111, 112. In one embodiment, the equalizer 310 may scale the enhanced internal microphone signal by a fixed scaling quantity. In another embodiment, the equalizer 310 may adaptively scale the enhanced internal microphone signal based on a comparison of the magnitudes of the signals from the first AEC 3031 and the third AEC 3033 at run time.
In
As shown in
Since acoustic signals from the internal microphone 113 are more robust to the wind and ambient noise than the external microphones 111, 112 (or voicebeam signal from the beamformer 306), a lower frequency portion of the mixed signal generated by the spectral combiner 307 includes a corresponding lower frequency portion of the corrected internal microphone signal and a higher frequency portion of the mixed signal includes a corresponding higher frequency portion of the voicebeam signal. The mixed signal generated by the spectral combiner 307 includes the lower frequency portion and the higher frequency portion.
In the embodiment where only one external microphone is included in the system 3 (e.g., first microphone 111), the spectral combiner spectrally mixes the enhanced internal microphone signal with the enhanced first external microphone signal to generate a mixed signal. In one embodiment, prior to the spectral mixing, a single microphone amplifier may amplify the enhanced first external microphone signal as discussed above. In this embodiment, a lower frequency portion of the mixed signal includes a corresponding lower frequency portion of the enhanced internal microphone signal, and a higher frequency portion of the mixed signal includes a corresponding higher frequency portion of the enhanced first external microphone signal.
As shown in
In one embodiment, the wind and noise detector 305 may generate a detector output that indicates that noise is detected and further indicates the type of noise that is detected. For example, the detector output may indicate that the type of noise detected is either ambient noise or wind noise. As shown in
In one embodiment, the spectral combiner 307 may include a low-pass filter and a high-pass filter. The low-pass filter applies the cutoff frequency (e.g., F1 or F2) to the acoustic signals from the internal microphone 113 (or scaled enhanced internal microphone signal) and the high-pass filter applies the cutoff frequency (e.g., F1 or F2) to the acoustic signals from the first external microphone 111 or to the voicebeam signal from the beamformer 306 to generate the mixed signal.
Referring to
In one embodiment, the enhanced mixed signal may be in the frequency domain. In
The following embodiments of the invention may be described as a process, which is usually depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed. A process may correspond to a method, a procedure, etc.
At Block 503, the ANC downlink corrector 318 processes a downlink signal to generate an echo estimate of a speaker signal to be output by a speaker 316. At Block 504, a first summator 3041 removes the echo estimate of the speaker signal from the acoustic signals from the internal microphone 113 to generate a corrected internal microphone signal.
At Block 505, a first AEC 3031 removes a linear acoustic echo from the acoustic signals from the first external microphone 113 based on the downlink signal to generate an enhanced first external microphone signal. At Block 506, a second AEC (e.g., AEC 3033) removes a linear acoustic echo from the corrected internal microphone signal based on the downlink signal to generate an enhanced internal microphone signal.
At Block 507, an equalizer 310 scales the enhanced internal microphone signal to match a level of the enhanced first external microphone signal. At Block 508, the spectral combiner 307 spectrally mixes of the output of the equalizer (e.g., equalized corrected internal microphone signal) with the enhanced first external microphone signal to generate a mixed signal. In one embodiment, a lower frequency portion of the mixed signal includes a corresponding lower frequency portion of the output of the equalizer and a higher frequency portion of the mixed signal includes a corresponding higher frequency portion of the enhanced first external microphone signal. At Block 509, a feedback ANC corrector 313 processes the corrected internal microphone signal to reduce amplification of the user's speech signal by the internal microphone and to generate an anti-noise signal. At Block 510, a second summator 3042 adds the anti-noise signal to the downlink signal to generate the speaker signal to be output by the speaker.
Keeping the above points in mind,
An embodiment of the invention may be a machine-readable medium having stored thereon instructions which program a processor to perform some or all of the operations described above. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), such as Compact Disc Read-Only Memory (CD-ROMs), Read-Only Memory (ROMs), Random Access Memory (RAM), and Erasable Programmable Read-Only Memory (EPROM). In other embodiments, some of these operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmable computer components and fixed hardware circuit components.
While the invention has been described in terms of several embodiments, those of ordinary skill in the art will recognize that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting. There are numerous other variations to different aspects of the invention described above, which in the interest of conciseness have not been provided in detail. Accordingly, other embodiments are within the scope of the claims.