Some electronic devices, such as some wearable electronic devices, may detect a background noise level. For example, the detected background noise level may be used for noise cancellation, to adjust a playback volume for audio output, etc. To determine the background noise level, the electronic devices may monitor the audio received by a microphone. However, the microphone may pick up all noise, including noise that is not background noise. For example, the microphone may falsely detect a user's speech input as noise, and thus may provide for false estimations when calculating the background noise level.
One aspect of the disclosure provides a method for determining a background noise level. The method includes receiving, by one or more processors, audio from a first microphone and a second microphone. The method includes comparing, by the one or more processors, a first time the audio is received at the first microphone and a second time the audio is received at the second microphone, determining, based on the comparison whether the received audio is a user voice or background noise, a source of the audio, and suppressing, using the one or more processors based on the source of the audio, audio received from a first source. The method may further include determining, using one or more sensors, whether a user is talking.
When the user is talking, the method includes suppressing, using the one or more processors, the audio from the first microphone such as to create an effect of receiving the audio through a microphone that is beamformed in a direction away from the user's mouth. The method may further include detecting, by the one or more processors, the user's speech in the suppressed audio and nullifying, using the one or more processors, the user's speech from the suppressed audio.
Another aspect of the disclosure provides for a device having two or more microphones and one or more processors in communication with the two or more microphones. The one or more processors may be configured to receive audio from a first microphone and a second microphone. The one or more processors may be further configured to compare a first time the audio is received at the first microphone and a second time the audio is received at the second microphone, determine a source of the audio, and suppress audio received from a first source.
In some instances, the one or more processors in communication with the two or more microphones are further configured to determine whether a user is talking. When the user is talking, the one or more processors may be configured to suppress at the audio in a direction away from the user's mouth. The one or more processors may be further configured to determine the user's speech, detect the user's speech in the suppressed signal, and nullify the user's speech from the suppressed signal.
Yet another aspect of the disclosure provides for a non-transitory computer-readable medium storing instructions, which when executed by one or more processors, cause the one or more processors to receive audio from a first microphone and a second microphone, compare a first time the audio is received at the first microphone and a second time the audio is received at the second microphone, determine a source of the audio, and suppress audio received from a first source based on the location of the source of the audio.
The systems and methods described herein relate to a device configured to determine a background noise level after removing speech interference. The device may include two omnidirectional microphones that receive audio. The device may compare the audio received through the first microphone with the audio received through the second. For example, the device may compare a time at which the audio was received at the first microphone with a time it was received at the second microphone, a volume of the audio received through the first and second microphones, frequencies received through the first and second microphones, etc. Based on such comparison, the device may determine a source of the audio. For example, the device may determine whether the audio is a user talking or if the audio is background noise. According to some examples, the determination of the source of the audio may further be determined based on sensor information, such as an accelerometer that detects when the user's mouth is moving in a way consistent with the user talking. Based on the source of the audio, and whether the user is talking, the audio may be processed using Digital Signal Processing (“DSP”).
For example, if the user is talking, the audio may be processed to suppress the user's speech from the audio. In this regard, the processed audio has the effect of being received through a beamformed microphone, aimed in a direction of the background noise instead of a direction of the user's speech, though the audio was actually received through two or more omnidirectional microphones. In some examples, the user's speech may be canceled completely from the received audio, such as by using a cancellation technique. Accordingly, the user's speech does not contribute to a false estimation of the background noise level.
The input may include an audio detection input, such as a first microphone 102 and a second microphone 104, for receiving audio input signals. As shown in
According to some examples, the input may further include a separate user input, such as a touch-sensitive housing, dial, button, or other control for receiving a manual command.
Other types of user input, such as motion sensors or other types of sensors, may be adapted to receive gesture input or the like.
As shown in
While only two microphones are shown in
The sensors may determine whether a user is talking. For example, the sensors may include an accelerometer that detects movement consistent with the user talking. The movement may include movement of the mouth or jaw of the user. According to other examples, the sensors may determine whether the device is being worn by a user. For example, the sensors may include touch sensors, heat sensors, motion, sensors, or the like that detect conditions consistent with the device being inserted into a user's ear, worn on the head, or otherwise worn depending on a type of the device.
The internal electronics may include, for example, one or more processors or other components adapted to process the audio received through the two or more microphones 102, 104. Such processing may result in audio signals having an effect as if they were received through a beamformed microphone. For example, the internal electronics may determine a source of a particular audio signal, and process the received audio to reduce or remove audio signals from the determined source. For example, the internal electronics may determine whether the received audio is a user talking or if the audio is background noise. The internal electronics may determine the source by, for example, comparing audio received through the first and second microphones 102, 104. Such comparisons may relate to the audio's loudness in decibels (“dB”), the time at which the audio was received at each microphone, the frequency, etc.
By comparing the loudness of the signal received, the device may determine whether the location of the source of the audio is near the front of the user or coming from behind the user. For example, if the first signal received by the first microphone is louder than the second signal received by the second microphone, the location of the source of the sound may be closer to the first microphone. Additionally or alternatively, sound may be coming from towards a front of the user such that the first microphone receives a louder signal than the second signal received by the second microphone. Therefore, the first microphone receiving a louder signal than the second signal may indicate that the audio is the user's speech. In some examples, if the second signal received by the second microphone is louder than the first signal received by the first microphone, the location of the source of the sound may be closer to the second microphone. The sound may be coming from behind the user and, therefore, may be background noise. The examples provided herein are based on the first microphone being closest to the user's mouth and is not meant to be limiting with how respect to the placement of the microphones, the determination regarding location, or the type of audio received by the microphones.
By comparing the time at which the signal were received by each microphone, the device may determine whether the location of the source of the audio is near the front of the user or coming from behind the user. In some examples, if the first microphone receives the first signal before the second microphone receives the second signal, the location of the source of the sound may be closer to the first microphone. Additionally or alternatively, sound may be coming from the front of the user such that the first microphone receives the first signal before the second microphone receives the second signal. The first microphone receiving the first signal before the second microphone receives the second signal may indicate that the audio is the user's speech. When the second microphone receives the second signal before the first microphone receives the first signal, the location of the source of the audio may be closer to the second microphone or behind the user. Thus, the source of the audio may be background noise.
The internal electronic may suppress the signal received from at least one of the microphones 102, 104 in order to calculate the background noise level. For example, if the user is talking, the internal electronics may suppress the signal from the first microphone to remove the user's speech from the background noise calculation.
According to some examples, the internal electronics may additionally perform other types of signal processing simultaneously with suppressing the user's speech for background noise estimation. For example, the internal electronics may suppress the signal received from at least one of the microphones 102, 104 for noise cancellation purposes. In such an example, the internal electronics may suppress the background noise in order to amplify the user's speech for transmission purposes.
The output 136 may include one or more speakers for outputting audio, such as playback of music, speech, or other audio content. The output 136 may be located on a portion 138 of the device 100 that is inserted into the ear, such as the ear insert of the ear bud.
While the description and examples herein refer to the device 100 as an earbud, it should be understood that in other examples the device may be an augmented reality and/or virtual reality headset, Bluetooth enabled headset, smart glasses, head-mountable display, smart watch, mobile phone and/or smart phone, tablets, music players, etc.
The one or more processors 202 may be any conventional processors, such as commercially available microprocessors. Alternatively, the one or more processors may be a dedicated device such as an application specific integrated circuit (ASIC) or other hardware-based processor. Although
Memory 204 may store information that is accessible by the processors 202, including instructions 206 that may be executed by the processors 202, and data 208. The memory 204 may be of a type of memory operative to store information accessible by the processors 202, including a non-transitory computer-readable medium, or other medium that stores data that may be read with the aid of an electronic device, such as a hard-drive, memory card, read-only memory (“ROM”), random access memory (“RAM”), optical disks, as well as other write-capable and read-only memories. The subject matter disclosed herein may include different combinations of the foregoing, whereby different portions of the instructions 206 and data 208 are stored on different types of media.
Data 208 may be retrieved, stored or modified by processors 202 in accordance with the instructions 206. For instance, although the present disclosure is not limited by a particular data structure, the data 208 may be stored in computer registers, in a relational database as a table having a plurality of different fields and records, XML documents, or flat files. The data 208 may also be formatted in a computer-readable format such as, but not limited to, binary values, ASCII or Unicode. By further way of example only, the data 208 may be stored as bitmaps comprised of pixels that are stored in compressed or uncompressed, or various image formats (e.g., JPEG), vector-based formats (e.g., SVG) or computer instructions for drawing graphics. Moreover, the data 208 may comprise information sufficient to identify the relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories (including other network locations) or information that is used by a function to calculate the relevant data.
The instructions 206 can be any set of instructions to be executed directly, such as machine code, or indirectly, such as scripts, by the processor 202. In that regard, the terms “instructions,” “application,” “steps,” and “programs” can be used interchangeably herein. The instructions can be stored in object code format for direct processing by the processor, or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. Functions, methods and routines of the instructions are explained in more detail below.
The device 200 may further include an input 210 for receiving volume adjustment commands The input 210 may be, for example, a touch sensor, dial, button, or other control for receiving a manual command The device 200 may also include an output 218. The output 218 may be, for example, a speaker.
Device 200 may have at least two microphones 212 located in a variety of locations. A first microphone 214 may be located at a first location adjacent to a first edge of the device. The first microphone 212 may receive user audio, such as the user's speech and background noise. A second microphone 216 may be located at a second location adjacent to a second edge of the device. The second microphone 216 may receive user audio and background noise. The first microphone 214 and second microphone 216 may be located opposite each other.
Device 200 may include sensors 220 for determining whether a user is talking. The sensors 220 may include one or more of the at least two microphones 212. As described herein, the two or more microphones 212 may determine whether the audio signals received are the user's speech or background noise based on when the signals are received, the loudness at which the signals are received, the frequency at which the signals are received, etc. Additionally or alternatively, the sensors 220 may include an accelerometer 222. The accelerometer 222 may detect movement consistent with a user talking, such as movement of the user's mouth, jaw, and other parts of their body. The accelerometer 222 may also detect other types of movements that may be distinguished from the user talking. For example, while the accelerometer 222 may detect movements consistent with the user walking, typing, driving, etc., such movements can be distinguished from the talking movements and may be ignored. An accelerometer signal may be received by the device from the accelerometer 222. The received accelerometer signal is compared to a threshold, wherein the threshold indicates user activity consistent with talking. For example, motion may have a slower frequency response as compared to talking. While a person running may translate to approximately 3 Hz of frequency, a person talking may translate to approximately 100 Hz or more. Accordingly, a low pass filter may be placed at, for example, sub 10s of Hz or lower. The device determines whether the received accelerometer signal meets the threshold. If not, the device may continue to monitor accelerometer signals to determine whether the user is talking.
The accelerometer signal may in some examples include multiple signals from a plurality of accelerometers 222 inside one device or inside different coupled devices. For example, each of the plurality of accelerometers 222 may have a different sensitivity, or may be adapted to detect different types of user activity. Further, each of the plurality of accelerometers 222 may be positioned in a different way to optimize detection of the different types of user activity.
It should be understood that the device 200 may include other components which are not shown, such as a battery, charging input for the battery, signals processing components, etc. Such components may also be utilized in execution of the instructions 206.
The device 300, using the sensors (not shown) may determine the source of the sound. For example, the sensors, including the first and second microphones and at least one accelerometer, may determine that the user 306 is talking 308. The first microphone may receive 312 the user's speech 308 before the second microphone receives 314 the user's speech 308. Additionally or alternatively, the first microphone may receive 312 the user's speech 308 louder than the second microphone receives 314 the user's speech 308. The device 300 may also determine that the user 306 is talking 308 when the accelerometer detects movement consistent with the user 306 talking. Thus, the source of the sound may be the user 306. Additionally or alternatively, the first and second microphones may also receive 322, 324 background noise 318 such that the background noise is the source of the sound.
After the device determines that the user is talking, the device may determine which source of audio to suppress. The device 300 may suppress the user's speech 308 or the background noise 318. The device 300 may suppress the user's speech 308 prior to calculating the background noise level to prevent false estimations. The device 300 may suppress the background noise 318 and, therefore, focus on the user's speech 308 such that device may transmit clearer audio signal to the person at the receiving end of the conversation. Additionally or alternative, the device may suppress the background noise to provide clearer audio output to the user. The same signals may be processed using DSP multiple times simultaneously such that the device may process the signals to both focus on the user's talking for a first application and to suppress the user's talking for a second application. For example, the same signal may be processed simultaneously to allow the device to clearly transmit the user's speech 308 and to remove the user's speech 308 for purposes of calculating the background noise.
As shown in
Device 306 may calculate the background noise level once the user's speech 308 is suppressed such that the user's speech 308 does not provide a false estimation of the background noise level. The device may adjust the playback volume of the device based on the calculated background noise level. If the user's speech was included in calculating the background noise level, the calculated background noise level may be higher than it should be. As such, the playback volume may adjust higher than what is needed in that instance. The suppressed signals 330 may also be used as a reference for performing echo cancellation, noise cancellation, etc.
To suppress the background noise 318, the device 300 may determine the source of the audio is the background noise 318. The device may determine that the audio received is background noise based on a comparison of when the signals were received by the first and second microphone, the loudness of the signals received, etc. For example, background noise 318 may be received 324 by the second microphone before the background noise 318 is received 322 by the first microphone. Additionally or alternatively, the background noise 318 may be received 324 by the second microphone louder than the background noise 318 is received 322 by the first microphone. In these examples, the device 300 may determine that the source of the audio is behind the user 306 and, therefore, is background noise 318. The device 300 may suppress the background noise 318 by processing the signals received 312, 314, 322, 324 by the first and second microphones. The signals received 312, 314, 322, 324 by the first and second microphones may be processed to result in a beam pattern as if the first and second microphones were beamformed when receiving 312, 314, 322, 324 the signals, shown by beam pattern 332.
As shown in
While the above examples include suppressing the user's speech by processing the signal into cardioid beam patterns and hypercardioid beam patterns, the signals may be processed into a variety of other beam patterns and, therefore, the examples above are not meant to be limiting.
In block 420, the time that the audio signal is received by the first microphone is compared with the time the audio signal is received by the second microphone. Additionally or alternatively, the loudness of the audio signal received by the first microphone is compared to the loudness of the audio signal received by the second microphone. The frequency of the audio signal received by the first microphone may be compared to the frequency of the audio signal received by the second microphone.
In block 430, the device may determine whether the user is talking The device may use sensors to determine whether the user is talking. The sensors may include the two or more microphones. Additionally or alternatively, the sensors may include at least one accelerometer that can detect movement consistent with the user talking.
In block 440, the source of the audio is determined. The source may be determined based on a comparison of time, loudness, frequency, etc. of the audio received by the first microphone to the audio received by the second microphone. The source of the audio may be the user, such as when the user is talking. The source of the audio may be background noise.
In block 450, the audio received from a first source is suppressed. The source may be, according to some examples, the user or background noise. For example, if the user is talking, the device may suppress the user's speech such that the user's speech does not contribute to a false estimation of the background noise level. Additionally or alternatively, if the user is talking, the device may suppress the background noise when the user is talking such that the user's speech is the only audio transmitted.
In block 552, the device may perform DSP on the received audio to suppress audio signals from a particular source. In this regard, the processed audio has an effect of being received through a beamformed microphone, though it was received through two omnidirectional microphones. Moreover, because the effect was attained through DSP, various different types of beamformed effects may be created at a same or different times using the same two omnidirectional microphones. For example, to calculate the background noise, speech input from the user may be suppressed by processing the received audio to attain the effect of a beamformed microphone in a direction away from the user's mouth. At a same time or a different time, received audio may be processed for transmission over a network, and as such background noise may be suppressed such that the user's speech can be clearly transmitted. In this regard, the device may suppress background noise signals, giving the effect of having received the audio through a microphone beamformed in a direction towards the user's mouth, such as a cardioid or hypercardioid beam pattern.
In block 554, the device detects whether the user's speech remains in the processed signal. For example, while the user's voice as received through the first microphone closer to the user's mouth may have been suppressed, the user's voice may still have been more faintly picked up by the second microphone further from the user. If the device does not detect that the user's speech in the processed signal, the process returns to block 552.
If the device still detects the user's speech in the processed signal, the process continues to block 556 where the device cancels the user's speech. For example, the user's speech may be used as a reference signal for cancellation. Accordingly, received audio having characteristics matching the reference signal may be removed using digital signal processing.
In block 558, a background noise level may be calculated using the processed audio. Because the user's speech has been removed, the calculated noise level will not be artificially increased as a result of the user talking while audio for the calculation was being received.
Determining the background noise level by suppressing audio from at least one source provides the user with a greater user experience. As the user is talking, the device may suppress the user's speech in order to determine the background noise level without including the user's speech. Suppressing the user's speech may remove false estimations from the background noise level calculation. A more accurate background noise calculation may provide for better volume adjustments. Further, by suppressing audio from at least one source, the device may be able to provide more reliable echo cancellation and noise cancellation, such as when the background noise is suppressed.
Unless otherwise stated, the foregoing alternative examples are not mutually exclusive, but may be implemented in various combinations to achieve unique advantages. As these and other variations and combinations of the features discussed above can be utilized without departing from the subject matter defined by the claims, the foregoing description of the embodiments should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. In addition, the provision of the examples described herein, as well as clauses phrased as “such as,” “including” and the like, should not be interpreted as limiting the subject matter of the claims to the specific examples; rather, the examples are intended to illustrate only one of many possible embodiments. Further, the same reference numbers in different drawings can identify the same or similar elements.
This application claims the benefit of the filing date of U.S. Provisional Patent Application No. 62/908,829, filed Oct. 1, 2019, the disclosure of which is hereby incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/050869 | 9/15/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62908829 | Oct 2019 | US |