The invention relates generally to microphone arrays, more particularly, to suppressing noise in microphone arrays.
Microphones are acoustic energy to electric energy transducers, i.e., devices that convert sound into an electric signal. A microphone's directionality or polar pattern indicates how sensitive the microphone is to sounds incident at different angles to a central axis of the microphone. Noise suppression may be applied to microphones to reduce an effect of noise on sound detected from a particular direction and/or in a particular frequency range.
In one implementation, a computer-implemented method in a microphone array, the microphone array including a left microphone and a right microphone, may include receiving a right microphone signal from the right microphone, receiving a left microphone signal from the left microphone, determining a timing difference between the left microphone signal and the right microphone signal, determining whether the timing difference is within the time threshold, time shifting one of the left microphone signal and the right microphone signal based on the timing difference when the timing difference is within the time threshold, and summing the shifted microphone signal and the other microphone signal to form an output signal.
In addition, identifying an average sound pressure level for a predetermined time slot for each of the left microphone signal and the right microphone signal, and selecting one of the left microphone signal and the right microphone signal that has a lowest average sound pressure level as the output signal for the predetermined time slot.
In addition, determining whether an output signal for a preceding time slot is from a same microphone signal as the output signal for the predetermined time slot, identifying a zero crossing point near a border of the preceding time slot and the predetermined time slot when the output signal for a preceding time slot is not from the same microphone signal as the output signal for the predetermined time slot, and transitioning from the output signal for the preceding time slot to the output signal for the predetermined time slot based on the zero crossing point.
In addition, smoothing the transition to the one of the left microphone signal and the right microphone signal that has the lowest relative sound pressure level.
In addition, identifying whether the left microphone signal and the right microphone signal are consistent with a target sound type based on at least one of an amplitude response, a frequency response, and a timing for each of the left microphone signal and the right microphone signal.
In addition, identifying a sound pressure level associated with each of the left microphone and the right microphone, determining a correlation between the timing difference and the sound pressure level associated with each of the left microphone and the right microphone, and determining whether the correlation indicates that left microphone signal and the right microphone signal are based on speech from a target source.
In addition, the computer-implemented method may include dividing the left microphone signal and the right microphone into a plurality of frequency bands, identifying noise in at least one of the plurality of frequency bands, and filtering the noise in the at least one of the plurality of frequency bands.
In addition, the computer-implemented method may include filtering the noise in the at least one of the plurality of frequency bands may include selecting a polar pattern for filtering the noise in the at least one of the plurality of frequency bands based on a signal to noise ratio in each of the at least one of the plurality of frequency bands.
In addition, the computer-implemented method may include determining whether noise is present in the left microphone signal and the right microphone signal based on a comparison between an omnidirectional polar pattern and a very directed polar pattern associated with the dual microphone array.
In addition, the computer-implemented method may include selecting a transition angle for passing sound in the dual microphone array, and determining a value for the time threshold based on the selected transition angle.
In another implementation, a dual microphone array device may include a left microphone, a right microphone, a memory to store a plurality of instructions, and a processor configured to execute instructions in the memory to receive a right microphone signal from the right microphone, receive a left microphone signal from the left microphone, determine a timing difference between the left microphone signal and the right microphone signal, determine whether the timing difference is within a time threshold, time shift at least one of the left microphone signal and the right microphone signal based on the timing difference when the timing difference is within the time threshold, and sum the shifted microphone signal and the other microphone signal to form an output signal.
In addition, the processor is further to identify an average sound pressure level for a predetermined time slot for each of the left microphone signal and the right microphone signal, and select one of the left microphone signal and the right microphone signal that has a lowest average sound pressure level as the output signal for the predetermined time slot.
In addition, the processor is further to divide the left microphone signal and the right microphone into a plurality of frequency bands, identify noise in at least one of the plurality of frequency bands, and filter the noise in the at least one of the plurality of frequency bands.
In addition, the processor is further to determine whether an output signal for a preceding time slot is from a same microphone signal as the output signal for the predetermined time slot, identify a zero crossing point near a border of the preceding time slot and the predetermined time slot when the output signal for a preceding time slot is not from the same microphone signal as the output signal for the predetermined time slot, and transition from the output signal for the preceding time slot to the output signal for the predetermined time slot based on the zero crossing point.
In addition, the dual microphone array device may further include a vibrational sensor, and the processor is further to identify user speech based on an input provided by the vibrational sensor, and select a polar pattern based on a current occurrence of user speech.
In addition, the dual microphone array device may further include a positioning element to hold each of the left microphone and the right microphone on the torso of a user at approximately equal distances from a mouth of the user in a forward facing position.
In addition, the processor is further to identify whether the left microphone signal and the right microphone signal are consistent with speech from the target source based on at least one of an amplitude response, a frequency response, and a timing for each of the left microphone signal and the right microphone signal.
In addition, the processor is further to identify a sound pressure level associated with each of the left microphone and the right microphone, determine whether a correlation between the timing difference and the sound pressure level associated with each of the left microphone and the right microphone, and determine whether the correlation indicates that left microphone signal and the right microphone signal are based on speech from a target source.
In addition, when filtering the noise in the at least one of the plurality of frequency bands, the processor is further to select a polar pattern for filtering the noise in the at least one of the plurality of frequency bands based on a signal to noise ratio in each of the at least one of the plurality of frequency bands, and to select the polar pattern from a group including an omnidirectional polar pattern, a figure eight polar pattern, and a frequency independent polar pattern.
In yet another implementation, a computer-readable medium includes instructions to be executed by a processor associated with a microphone array, the microphone array including a left microphone and a aright microphone, the instructions including one or more instructions, when executed by the processor, for causing the processor to receive a right microphone signal from the right microphone, receive a left microphone signal from the left microphone, determine a timing difference between the left microphone signal and the right microphone signal, determine whether the timing difference is within a time threshold, time shift one of the left microphone signal and the right microphone signal to a time of the other of the left microphone signal and the right microphone signal based on the timing difference, and sum the shifted microphone signal and the other microphone signal to form an output signal.
The accompanying drawings, which are incorporated in and constitute part of this specification, illustrate one or more embodiments described herein and, together with the description, explain the embodiments. In the drawings:
The following detailed description refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. Also, the following detailed description is exemplary and explanatory only and is not restrictive of the invention, as claimed.
Embodiments described herein relate to devices, methods, and systems for suppressing noise in a dual microphone array. Methods included herein may utilize correlation between two neck mounted microphones for suppression of noise, such as scratch noise, wind noise, and surrounding audio noise, in a voice based microphone application.
Consistent with embodiments described herein, noise suppression in a dual microphone array may be implemented based on correlation between the microphones. Alternatively, consistent with embodiments described herein, noise suppression in the dual microphone array may be achieved using filtering of the frequency bands.
In implementations described herein, dual microphone array 100 may utilize correlation between sound detected at left microphone 100-L and right microphone 100-R to implement suppression of noise, such as scratch noise, wind noise, and surrounding audio noise, in sounds received by dual microphone array 100.
Processor 202 may include a processor, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), and/or other processing logic (e.g., audio/video processor) capable of processing information and/or controlling device 200.
Memory 204 may include static memory, such as read only memory (ROM), and/or dynamic memory, such as random access memory (RAM), or onboard cache, for storing data and machine-readable instructions. Storage unit 206 may include a magnetic and/or optical storage/recording medium. In some implementations, storage unit 206 may be mounted under a directory tree or mapped to a drive.
Input component 208 and output component 210 may include a display screen, a keyboard, a mouse, a speaker, a microphone, a Digital Video Disk (DVD) writer, a DVD reader, Universal Serial Bus (USB) port, and/or other types of components for converting physical events or phenomena to and/or from digital signals that pertain to device 200. Communication path 214 may provide an interface through which components of device 200 can communicate with one another.
In different implementations, device 200 may include additional, fewer, or different components than the ones illustrated in
As shown in
As shown in
As shown in
For sounds coming from the side of user 110, left microphone 100-L and right microphone 100-R may have different timing (i.e., timing difference detected at respective microphones 100-L-R), and, for many sounds, also different amplitude and frequency responses. Scratch noise and wind noise are by nature uncorrelated in the respective microphones 100-L-R. These differences may be used to suppress sounds coming from the side compared to sounds coming from mouth 112. The spoken voice (from mouth 112) may be identified based on sounds arriving within a window of time at respective microphones 100-L-R and a corresponding correlation between SPL detected at the respective microphones 100-L-R.
As shown in
With respect to
Time adjusted dipole polar pattern 452 may be a spatial pattern of sensitivity to sound that is directed towards mouth 112 of the user 110. Sound which originates from sources other than mouth 112, such as sources outside of time adjusted dipole polar pattern 452, may be considered noise and (because the noise falls outside of the time adjusted dipole polar pattern 452) are suppressed. Time adjusted dipole polar pattern 452 may be continuously updated based on a current timing difference. For example, time adjusted dipole polar pattern 452 may be adjusted based on the timing difference in instances in which user 110 positions one of microphone 100-L-R close to mouth 112 and maintains the other microphone at a position further away from mouth 112.
According to an embodiment, time adjusted dipole polar pattern 452 may also be adjusted based on input received from a vibrational sensor (not shown) associated with dual microphone array 100 (i.e., a sensor that detects vibrations generated by bone conducted speech). Dual microphone array 100 may use the detected vibration as an input to identify instances in which user 110 is speaking Time adjusted dipole polar pattern 452 may be activated (i.e., sound may be passed/allowed) based on whether user 110 has been identified as currently speaking. If the user is not speaking sound may be suppressed/blocked.
A timing difference between sound received at left microphone 100-L and right microphone 100-R is independent of phase of the sound (i.e., sound from mouth 112 travels at a constant velocity regardless of phase). Accordingly, by adjusting the timing difference between output signals from left microphone 100-L and right microphone 100-R, dipole polar pattern 500 may be determined independent of frequency. In contrast to frequency dependent polar patterns (not shown), in which a full signal may be detected for in-phase sounds, and a lower signal for out-of-phase signals, dipole polar pattern 500 detect sounds, regardless of phase, in a particular direction. Dipole polar pattern 500 may provide improved directivity when compared to other dipole polar patterns.
According to one embodiment, dipole polar pattern 500 may be determined based on a predetermined threshold for timing correlation. The units for the predetermined threshold are time, in the scale of hundreds of micro seconds for an implementation such as shown in
The size of the predetermined threshold determines an opening angle 502 (shown as 43.1 degrees) in dipole polar pattern 500. A large predetermined threshold (i.e., a large timing difference) gives a large opening angle 502 and a small threshold gives a small opening angle 502 in dipole polar pattern 500. For example, a sound may be a limited sequence of samples (e.g., 220 consecutive samples at a sample frequency of 44 kHz correspond to a sound with a duration of 5 milliseconds) from both left microphone 100-L and right microphone 100-R. Left microphone 100-L and right microphone 100-R may be 78 mm apart. At 44 kHz sampling rate, each sample is about 7.8 mm long. A threshold timing window of +/−5 samples (equal to +/−0.1 milliseconds), may correspond to an opening angle 502 of +/−30 degrees (i.e. 60 degrees total) in dipole polar pattern 500.
According to another embodiment, a scale factor may be set between timing and suppression of sounds. This scale factor may be selected to provide a selectable transition angle between suppression and passing of sound based on particular requirements. Further filtering may be applied to improve the performance compared to the summed output of left microphone 100-L and right microphone 100-R, for instance as described with respect to FIGS. 6 and 7A-7D.
As shown in
According to one implementation, a figure eight polar pattern 612 (e.g., half a wavelength between the microphones) may be selected for particular frequencies to form a beam that allows sound to be included in a microphone signal. Figure eight polar pattern 612 has a directivity index of 2 in the plane, and of 4 in the space. In other words, of surrounding noise coming from all directions, only noise that originates from a particular 25% of the directions may be detected/received (i.e., noise may only pass the dipole figure of eight from 25% of possible directions), while the sounds from mouth 112 may be unaffected because these are within the figure eight polar pattern 612.
When user 110 is speaking, the voice signal is present in both microphones 100-L-R simultaneously.
The levels of the signals from the two microphones may be integrated over a selected time slot. As shown in
According to an implementation, the transition between microphone signals (i.e., from one microphone signal to the other microphone signal when the relative noise switches) may be performed at “zero crossing”, i.e. when the levels are low. If there is a difference between the signals in the transition from one microphone to the other, smoothing may also be applied.
MCU 104 may receive a right microphone signal from a right microphone 100-R (block 802). For example, right microphone 100-R, may receive sound from one or both of mouth 112 or extraneous noise, such as wind noise or scratch noise. MCU 104 may store right microphone signal in a right microphone buffer (not shown).
MCU 104 may receive a left microphone signal from a left microphone 100-L (block 804). MCU 104 may store left microphone signal in a left microphone buffer (not shown).
MCU 104 may determine a timing difference between left microphone signal and right microphone signal (block 806). For example, MCU 104 may determine whether left microphone signal is received within a particular number of sound samples (and accordingly within a particular time) after right microphone signal (i.e., the sound arrives at each of right microphone 100-R and left microphone 100-L at approximately the same time). MCU 104 may subtract the time that left microphone signal is received from the time that the corresponding right microphone signal is received.
MCU 104 may determine whether the timing difference is within a time threshold (block 808), such as described above with respect to
At block 810, MCU 104 may time shift one of left microphone signal and right microphone signal based on the timing difference when the timing difference is within the time threshold (block 808=yes). MCU 104 may sum the shifted microphone signal and the other microphone signal to form an output signal (block 812).
MCU 104 may also filter the signals, for instance as described with respect to
According to another implementation, the microphone signals may be filtered using frequency and/or amplitude correlation to sort out and suppress noise sources. MCU 104 may pass (i.e., allow) sounds with high correlation in amplitude and/or frequency to pass (i.e., MCU 104 may attribute sounds that fulfill these criteria as sounds from mouth 112). MCU 104 may suppress (or discard) sounds that do not fulfill the required criteria, such as sounds with different amplitude (e.g. sounds that may come from a person speaking nearby). The intensity of a voice from someone nearby (e.g., someone speaking over user 110's shoulder) will decrease with distance, and may give different amplitude in the two microphones.
At block 816, MCU 104 may suppress noise in dual microphone array 100 when the timing difference is not within the time threshold (block 808=no). For example, MCU 104 may discard uncorrelated sounds arriving at one microphone (e.g., left microphone 100-L) and at a time greater than the time threshold.
As described above, process 800 may occur continuously as sound is detected by right microphone 100-R and left microphone 100-L.
The foregoing description of implementations provides illustration, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the teachings. For example, the techniques described above can well be combined with known noise suppressing techniques used on single microphone. Additionally, although examples are described with respect to a dual microphone array, principles disclosed may be extended to a microphone array including more than two microphones.
In the above, while series of blocks have been described with regard to the exemplary processes, the order of the blocks may be modified in other implementations. In addition, non-dependent blocks may represent acts that can be performed in parallel to other blocks. Further, depending on the implementation of functional components, some of the blocks may be omitted from one or more processes.
It will be apparent that aspects described herein may be implemented in many different forms of software, firmware, and hardware in the implementations illustrated in the figures. The actual software code or specialized control hardware used to implement aspects does not limit the invention. Thus, the operation and behavior of the aspects were described without reference to the specific software code—it being understood that software and control hardware can be designed to implement the aspects based on the description herein.
It should be emphasized that the term “comprises/comprising” when used in this specification is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps, components, or groups thereof.
Further, certain portions of the implementations have been described as “logic” that performs one or more functions. This logic may include hardware, such as a processor, a microprocessor, an application specific integrated circuit, or a field programmable gate array, software, or a combination of hardware and software.
No element, act, or instruction used in the present application should be construed as critical or essential to the implementations described herein unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB12/52141 | 4/27/2012 | WO | 00 | 3/15/2013 |