This disclosure relates to a dual-use bilateral microphone array, and to controlling wind noise in such an array.
Hearing aids often include two microphones, which are used to form a two-microphone beam-forming array that potentially optimizes the detection of sound in a particular direction, typically the direction the user is looking. Each hearing aid (i.e., one for each ear) has such an array, operating independently of the other. Earpieces meant for communications, such as Bluetooth® headphones, also often include two-microphone arrays, aimed not at the far-field, but at the user's own mouth, to detect the user's voice for transmission to a far-end conversation partner. Such arrays are typically provided only on a single earpiece, even in devices having two earpieces.
The use of four microphones total, two in each ear, is described in U.S. Patent application publication 2015/0230026, incorporated here by reference. That disclosure provides improved performance over using a separate pair of microphones for each ear, in the context of detecting the voice of another person, for assisting the user in hearing and conversing with the other person in a noisy environment.
In general, in one aspect, a first earphone has a first microphone array including a first front microphone, providing a first front microphone signal, and a first rear microphone, providing a first rear microphone signal, and a first speaker. A second earphone has a second microphone array, including a second front microphone, providing a second front microphone signal, and a second rear microphone, providing a second rear microphone signal, and a second speaker. A processor receives the first front microphone signal, first rear microphone signal, second front microphone signal, and second rear microphone signal, uses a first set of filters to combine the four microphone signals to generate a far-field signal that is more sensitive to sounds originating a short distance away from the apparatus than to sounds close to the apparatus, and provides the far-field signal to the speakers for output. The processor also uses a second set of filters to combine the four microphone signals to generate a near-field signal that is more sensitive to voice signals from a person wearing the earphones than to sounds originating away from the apparatus, and provides the near-field signal to a communication system.
Implementations may include one or more of the following, in any combination. The first microphone array and second microphone array may be physically arranged to optimize detection of sounds a short distance away from the apparatus. The two front microphones may face forward when the earphones are worn, the two rear microphones face rearward when the earphones are worn, and a line through the microphones of the first array intersects a line through the microphones of the second array at a position about two meters ahead of the earphones when worn by a typical adult human. The processor may use a third set of filters, different from the second set of filters, to combine the four microphone signals to generate a second near-field signal that is more sensitive to voice signals from the person wearing the earphones than to sounds originating away from the apparatus, and provide the second near-field signal to the speakers for output. Providing the far-field signal to the speakers may include filtering the far-field signal according to a set of user preferences associated with an individual user. The processor may be made up of several sub-processors, and the filtering of the far-field signal according to the set of user preferences may be performed by a separate sub-processor from the sub-processor which applies first set of filters to combine the four microphone signals to generate the far-field signal.
The processor may generate the far-field signal and provide the far-field signal to the speakers by using a third set of filters, different from the first set of filters, to combine the four microphone signals to generate a second far-field signal that is more sensitive to sounds a short distance away from the apparatus than to sounds close to the apparatus, providing the first far-field signal to the first speaker, and providing the second far-field signal to the second speaker. Providing the first far-field signal and the second far-field signals to the respective first and second speakers may include filtering the first far-field signal according to a set of user preferences associated with a first ear of an individual user, and filtering the second far-field signal according to a set of user preferences associated with a second ear of an individual user. The processor may generate the near-field signal by summing the signals corresponding to the first front microphone and the second front microphone to form an combined front microphone signal, summing the signals corresponding to the first rear microphone and the second rear microphone to form a combined rear microphone signal, filtering the combined front microphone signal to form a filtered combined front microphone signal, filtering the combined rear microphone signal to form a filtered combined rear microphone signal, and combining the filtered combined front microphone signal and the filtered combined rear microphone signal to form a directional microphone signal, the near-field signal including the directional microphone signal. The processor may operate the first and second sets of filters simultaneously.
In general, in one aspect, a first earphone has a first microphone array including a first front microphone, providing a first front microphone signal, and a first rear microphone, providing a first rear microphone signal, and a first speaker. A second earphone has a second microphone array, including a second front microphone, providing a second front microphone signal, and a second rear microphone, providing a second rear microphone signal, and a second speaker. A processor receives the first front microphone signal, first rear microphone signal, second front microphone signal, and second rear microphone signal. The first microphone array and the second microphone array are physically arranged to have greater sensitivity to sounds a short distance away from the apparatus than to sounds close to the apparatus. The processor uses a first set of filters to combine the four microphone signals to generate a near-field signal that is more sensitive to voice signals from a person wearing the earphones than to sounds originating away from the apparatus, and provides the near-field signal to a communication system for output.
In general, in one aspect, a first earphone has a first microphone array providing a first plurality of microphone signals, and a first speaker. A second earphone has a second microphone array providing a second plurality of microphone signals, and a second speaker. A processor receives the first plurality of microphone signals and second plurality of microphone signals, and applies a first set of filters to a subset of the plurality of microphone signals from each of the first microphone array and the second microphone array, the first set of filters inverting the signals below a cutoff frequency, and provides the first-filtered signals and the remainder of the microphone signals from each of the first microphone array and the second microphone array to a second set of filters. The processor also uses the second set of filters to combine the microphone signals to generate a far-field signal that is more sensitive to sounds originating a short distance away from the apparatus than to sounds close to the apparatus above the cutoff frequency, and omnidirectional below the cutoff frequency, determines a level of wind noise present in the microphone signals, adjusts the cutoff frequency as a function of the determined level of wind noise, and provides the far-field signal to the speakers for output.
Implementations may include one or more of the following, in any combination. The processor may, after generating the far-field signal in the second set of filters, apply gain to the output of the filters below a second cutoff frequency which is a function of the first cutoff frequency. The processor may, after generating the far-field signal in the first set of filters, apply a high-pass filter to the output of the filters. The processor may determine a total low-frequency energy present in the microphone signals, and upon determining that the total sound level is below a first threshold, and the level of wind noise is below a second threshold, increase the cutoff frequency of the first set of filters. Generating the far-field signal may include determining a total low-frequency energy present in the microphone signals, computing a sum of the microphone signals, computing a difference of the microphone signals, comparing the sum of the microphone signals to the difference of the microphone signals, and determining the cutoff frequency based on the results of the comparison. Computing the difference of the microphone signals may include computing a first difference of microphone signals in the first plurality of microphone signals, computing a second difference of microphone signals in the second plurality of microphone signals, and computing a difference of the first difference and the second difference as the difference of the microphone signals.
In general, in one aspect, a first earphone has a first microphone array providing a first plurality of microphone signals, and a first speaker. A second earphone has a second microphone array providing a second plurality of microphone signals, and a second speaker. A processor receives the first plurality of microphone signals and second plurality of microphone signals, and uses a first set of filters to combine the microphone signals to generate a far-field signal that is more sensitive to sounds originating a short distance away from the apparatus than to sounds close to the apparatus above a cutoff frequency, and omnidirectional below the cutoff frequency, determines a level of wind noise present in the microphone signals, adjusts the cutoff frequency as a function of the determined level of wind noise, and provides the far-field signal to the speakers for output. The processor also uses a second set of filters to combine the microphone signals to generate a near-field signal that is more sensitive to voice signals from a person wearing the earphones than to sounds originating away from the apparatus, combines the microphone signals to generate an omnidirectional signal, combines the near-field signal and the omnidirectional signal using a weighted sum, the weight being a function of the determined level of wind noise to generate a communication signal, and provides the communication signal to a communication system.
Implementations may include one or more of the following, in any combination. The processor may determine the level of wind noise for adjusting the cutoff frequency based on a comparison of a sum of the microphone signals to a difference of the microphone signals, and determine the level of wind noise for adjusting the weight applied to the near field signal in the communication signal based on a comparison of the near field signal to the omnidirectional signal. Generating the far-field signal may include applying an all-pass filter to a subset of the plurality of microphone signals from each of the first microphone array and the second microphone array, the all-pass filter inverting the signals below the cutoff frequency, and providing the all-pass-filtered signals and the remainder of the microphone signals from each of the first microphone array and the second microphone array to the first set of filters. Generating the near-field signal and omnidirectional signal may include applying a third set of filters to a first subset of the plurality of microphone signals from each of the first microphone array and the second microphone array, applying a fourth set of filters to a second subset of the plurality of microphone signals from each of the first microphone array and the second microphone array, combining the filtered first subset with the filtered second subset to generate the near-field signal, and summing the first subset and the second subset to generate the omnidirectional signal. Generating the near-field signal and omnidirectional signal may also include summing the first subset and providing the summed first subset to the third set of filters, summing the second subset and providing the summed second subset to the fourth set of filters, summing the summed first subset and the second summed subset to generate the omnidirectional signal. The processor may be made up of several sub-processors, and the summing of the first and second subsets may be performed by a separate sub-processor from the applying of the third and fourth filters and combining of the filtered subsets.
In general, in one aspect, a first earphone has a first microphone, providing a first microphone signal, and a first speaker. A second earphone has a second microphone, providing a second microphone signal, and a second speaker. A processor receives the first microphone signal and second microphone signal, and uses a first set of filters to combine the microphone signals to generate an output signal. The processor generates the output signal by applying a low-pass filter to each of the first microphone signal an the second microphone signal, comparing the low-pass-filtered first microphone signal to the low-pass-filtered second microphone signal and determining whether one may have a greater noise content than the other, and upon determining that the first microphone signal has greater noise content than the second microphone signal, decreasing an amount of gain applied to the first microphone signal below a cutoff frequency in the first set of filters. Upon subsequently determining that the first microphone signal no longer has greater noise content than the second microphone signal, the processor restores the amount of gain applied to the first microphone signal in the first set of filters.
Implementations may include one or more of the following, in any combination. The processor may, upon determining that the first microphone signal has greater noise content than the second microphone signal, decrease an amount of gain applied to the first microphone signal below the cutoff frequency in a second set of filters, and upon subsequently determining that the first omnidirectional signal no longer has greater noise content than the second omnidirectional signal, restore the amount of gain applied to the first microphone signal in the second set of filters, and use the second set of filters to combine the microphone signals to generate a second output signal, where the first output signal is provided to the speakers and the second output signal is provided to a communication system. The first set of filters may produce a far-field array signal, and the second set of filters may produce a near-field array signal. The first earphone may include a third microphone, providing a third microphone signal, the second earphone may include a fourth microphone, providing a fourth microphone signal, and the processor may compare the first microphone signal to the second microphone signal by subtracting the signals corresponding to the third microphone from the first microphone to form a first difference signal, summing the signals corresponding to the fourth microphone from the second microphone to form a second difference signal, and comparing the first difference signal to the second difference signal and determining whether one may have a greater noise content than the other.
Advantages include improving both far-field sound detection for conversation assistance and near-field sound detection for remote communication, in a single device. Rejection of wind noise is also improved.
All examples and features mentioned above can be combined in any technically possible way. Other features and advantages will be apparent from the description and the claims.
In a new headphone architecture shown in
The processor 112 applies a number of configurable filters to the signals from the various microphones. The provision of a high-bandwidth communication channel from all four microphones 126, 128, 130, 132, two located at each ear, to a shared processing system provides new opportunities in both local conversation assistance and communication with a remote person or system. Specifically, as shown in
A third set of filters 206 is used to combine the four microphone signals to form a near-field array optimized for detecting the user's own voice. When we say the array is optimized for detecting the user's own voice, we mean that the sensitivity of the array to signals originating from the user's mouth is greater than the sensitivity to sounds originating farther from the headphones. Even with the microphones 126, 128, 130, 132 physically arranged to optimize far-field pickup in front of the user, the combination of all four microphones has been found to provide near-field voice performance at least as good as, and in some cases better than, a two-microphone array in the same earbud location but physically aimed at the user's mouth.
In some examples, yet another set of filters 208 is used for providing the user's voice back to the user himself, commonly called side-tone. The side-tone voice signal may be filtered differently from the outbound voice signal to account for the effect of the earphone's acoustics on the user's perception of his or her own voice. Finally, active noise reduction (ANR) filters 210, 212 for each ear use at least one of the local microphones to produce noise-cancelling signals. The ANR filters may use one or both external microphones and the feedback microphone for each ear to cancel ambient noise. In some examples, the external microphones from the opposite ear may also be used for ANR in each ear.
The ANR signals, far-field array signals, side-tone signals, and any incoming communication or entertainment signals (not shown) are summed for each ear. As shown in
In some examples, as shown in
Far-Field Filtering
An example topology for far-field microphone processing is shown in
The particular filters and related signal processing for generating the far-field signals for output to the left and right ear are described in application U.S. 2015/0230026, incorporated by reference above. All of the filtering, summing, equalizing, and processing shown in
Near-Field Communication Filters
As noted above, even with the four microphones physically arranged to optimize far-field voice pickup, when all four are combined, they also produce good near-field voice signals for communication purposes. Previous communication headsets have combined two microphones to improve detection of the user's voice, for example, in a beam-forming array aimed at the user's mouth. To a high level, the same type of processing shown in
Side-Tone Filters
In headsets that block the user's ear, hearing their own voice played back can help the user control the level at which they speak, and feel more comfortable talking into the headset. As anyone who has listened to a recording of themselves can relate, however, simply providing the outbound communication signal to the user's ear may not sound natural. This is even more pronounced due to the way the earphones 102, 104 change how the user perceives their own voice. U.S. Pat. No. 9,020,160, incorporated here by reference, discusses ways of filtering feedback and feed-forward microphone signals to produce a self-voice signal that sounds more natural. These techniques can be used in the present architecture either using all four microphones, as shown by filter 208 in
In a simplified example, such as in the example of
Wind-Noise Mitigation
As noted above, two microphones have previously been used as beam-forming arrays to detect the user's voice. In other examples, as described in U.S. Pat. No. 8,620,650, incorporated here by reference, two microphone signals can be combined to optimize rejection of ambient and wind noise. This can be adapted to the example of
The far-field array signal is also susceptible to wind noise, but different processing is used to manage it. In some examples, as shown in
A second set of wind filters 624 is applied after the far-field array processing 204. This second set of wind filters does two things: it decreases low-frequency gain, and it applies a high-pass filter. In the normal far-field array processing, high gain is applied at lower frequencies to account for the loss of energy due to the directionality of the array. As the sensitivity at lower frequencies is shifted to being omnidirectional, this energy is restored and the gain can be reduced. The cutoff frequency of this low-frequency gain is based on the cutoff frequency of the all-pass filters 622, but may not be exactly the same frequency. At the same time, the high-pass filter removes whatever residual wind noise is still picked up—at particularly high wind levels, this may be more effective than the other techniques. As the wind level increases, both the low-frequency gain cutoff frequency and the high-pass filter cutoff frequency are raised, following the raising inversion frequency of the wind pre-filters.
Mitigation of White Noise Gain at Low Frequencies
In some examples, also shown in
Bilateral Wind Mitigation
Rather than combining the left and right microphone signals, as mentioned above in the discussion of near-field voice pickup, the wind-vs-ambient noise mixing algorithm used for the near-field signal can also be adapted to use separate left and right microphone signals to optimize rejection of noise that is asymmetric in the far-field microphone signal, e.g., if wind is striking the user from one side more than the other. In this example, as shown in
The summing and comparison can be done in each of the array processors (assuming there are two, as in some of the examples), or done in one of them and a control signal provided to the other. If the communication processer were provided with all four microphone signals, rather than with the pre-summed front and rear signal pairs, then a similar left/right wind noise control could be applied to the near-end voice signal in combination with the omnidirectional/directional wind noise control shown in
Simultaneous Operation
With sufficient processing power, the different sets of filters can be used in parallel to simultaneously produce the near-field and far-field signals. This allows the user to his own voice and a conversation partner's voice simultaneously (i.e., if they are talking over each other), or to talk on the wireless connection at the same time as listening to another person. Aside from simply multitasking, that latter can be useful if more than one person in a conversation is using a device such as the one described herein. See, for example, U.S. Pat. No. 9,190,043, the entire contents of which are incorporated here by reference. Each of the multiple headsets can transmit its user's locally-detected voice, from the near-field filters, to the other headsets, where it can be combined with the results of that headset's far-field filters to provide the user with a complete set of their conversation partner(s) voices.
The simultaneous detection of near-field and far-field voice can also be useful where the near-field is not being used for conversation. For example, if the headset implements or is connected to a voice personal assistant (VPA, the near-field signal can be directed to that system, or to a wake-up word detection process. The near-field signal should provide a higher signal-to-noise ratio for this than simply using ambient microphones.
The near-field and far-field signals can also be compared to each other. One result of this comparison could be to estimate the proximity of the dominant signal—if the correlation of the two is high, it is the user speaking. This can be used for a voice activity detector, or to change other noise reduction algorithms, to name two examples.
In the particular example of
Embodiments of the systems and methods described above comprise computer components and computer-implemented steps that will be apparent to those skilled in the art. For example, it should be understood by one of skill in the art that the computer-implemented steps may be stored as computer-executable instructions on a computer-readable medium such as, for example, Flash ROMS, nonvolatile ROM, and RAM. Furthermore, it should be understood by one of skill in the art that the computer-executable instructions may be executed on a variety of processors such as, for example, microprocessors, digital signal processors, gate arrays, etc. For ease of exposition, not every step or element of the systems and methods described above is described herein as part of a computer system, but those skilled in the art will recognize that each step or element may have a corresponding computer system or software component. Such computer system and/or software components are therefore enabled by describing their corresponding steps or elements (that is, their functionality), and are within the scope of the disclosure.
A number of implementations have been described. Nevertheless, it will be understood that additional modifications may be made without departing from the scope of the inventive concepts described herein, and, accordingly, other embodiments are within the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
8194881 | Haulick et al. | Jun 2012 | B2 |
8488829 | Ring | Jul 2013 | B2 |
8620650 | Walters et al. | Dec 2013 | B2 |
20050281415 | Lambert et al. | Dec 2005 | A1 |
20060262946 | Yang | Nov 2006 | A1 |
20090125303 | Tachibana | May 2009 | A1 |
20090147966 | McIntosh et al. | Jun 2009 | A1 |
20090209290 | Chen et al. | Aug 2009 | A1 |
20100046776 | Fischer et al. | Feb 2010 | A1 |
20100128901 | Herman | May 2010 | A1 |
20100142716 | Lee et al. | Jun 2010 | A1 |
20100280824 | Petit et al. | Nov 2010 | A1 |
20120008807 | Gran | Jan 2012 | A1 |
20120020485 | Visser | Jan 2012 | A1 |
20140093093 | Dusan | Apr 2014 | A1 |
20150117659 | Kirsch | Apr 2015 | A1 |
20150170632 | Olsson | Jun 2015 | A1 |
20160267899 | Gauger, Jr. | Sep 2016 | A1 |
20160381453 | Ushakov | Dec 2016 | A1 |
Number | Date | Country |
---|---|---|
1085378 | Apr 1994 | CN |
101658048 | Feb 2010 | CN |
H02-119900 | May 1990 | JP |
3106299 | May 1991 | JP |
3106299 | May 1991 | JP |
H04058699 | Feb 1992 | JP |
2001-008282 | Jan 2001 | JP |
2008099200 | Aug 2008 | WO |
Entry |
---|
International Search Report and Written Opinion dated Jun. 6, 2012 for PCT/US2012/030686. |
International Search Report and the Written Opinion of the International Searching Authority dated Jul. 3, 2012 for PCT/US2012/030685. |
Wikipedia Microphone article, retrieved from the Internet Archive, entry dated Dec. 24, 2010. |
Translation of Japanese Patent No. 3106299. |