The present disclosure is generally related to a wearable device.
Advances in technology have resulted in smaller and more powerful computing devices. For example, a variety of portable personal computing devices, including wireless telephones such as mobile and smart phones, tablets and laptop computers are small, lightweight, and easily carried by users. These devices can communicate voice and data packets over wireless networks. Further, many such devices incorporate additional functionality such as a digital still camera, a digital video camera, a digital recorder, and an audio file player. Also, such devices can process executable instructions, including software applications, such as a web browser application, that can be used to access the Internet. As such, these devices can include significant computing capabilities.
A wearable device can wirelessly communicate with a mobile device, such as a mobile phone. A user can speak through the wearable device to communicate during a voice call, to communicate with an application on the mobile device (e.g., a voice assistant application), etc. However, if the user is in a noisy environment or in an environment where there are harsh environmental conditions (e.g., windy conditions), a microphone at the wearable device may not be able to clearly capture what is spoken by the user. As a result, other participants on the voice call may not be able to comprehend what the user is saying, the voice assistant application may not be able to determine what the user is saying, etc.
According to one implementation of the techniques disclosed herein, a device includes a memory and one or more processors coupled to the memory. The one or more processors are configured to perform an active noise cancellation (ANC) operation on noisy input speech as captured by a first microphone, the noisy input speech as captured by a second microphone, or both, to suppress a noise level associated with the noisy input speech as captured by the second microphone. The one or more processors are configured to match a second frequency spectrum of a second signal with a first frequency spectrum of a first signal. The first signal is representative of the noisy input speech as captured by the first microphone, and the second signal is representative of the noisy input speech as captured by the second microphone. The one or more processors are also configured to generate an output speech signal that is representative of input speech based on the second signal having the second frequency spectrum that matches the first frequency spectrum.
According to another implementation of the techniques disclosed herein, a method for suppressing noise associated with speech includes performing an active noise cancellation (ANC) operation on noisy input speech as captured by a first microphone of a wearable device, the noisy input speech as captured by a second microphone of the wearable device, or both, to suppress a noise level associated with the noisy input speech as captured by the second microphone. The second microphone is positioned within a threshold distance of an ear canal of a user. The method also includes performing an equalization operation to match a second frequency spectrum of a second signal with a first frequency spectrum of a first signal. The first signal is representative of the noisy input speech as captured by the first microphone, and the second signal is representative of the noisy input speech as captured by the second microphone. The method further includes generating an output speech signal that is representative of input speech based on the second signal having the second frequency spectrum that matches the first frequency spectrum. The method also includes transmitting a time-domain version of the output speech signal to a mobile device.
According to another implementation of the techniques disclosed herein, a non-transitory computer-readable medium includes instructions for suppressing noise associated with speech. The instructions, when executed by one or more processors within a wearable device, cause the one or more processors to perform an active noise cancellation (ANC) operation on noisy input speech as captured by a first microphone of a wearable device, the noisy input speech as captured by a second microphone of the wearable device, or both, to suppress a noise level associated with the noisy input speech as captured by the second microphone. The second microphone is positioned within a threshold distance of an ear canal of a user. The instructions also cause the processor to perform an equalization operation to match a second frequency spectrum of a second signal with a first frequency spectrum of a first signal. The first signal is representative of the noisy input speech as captured by the first microphone, and the second signal is representative of the noisy input speech as captured by the second microphone. The instructions also cause the processor to generate an output speech signal that is representative of input speech based on the second signal having the second frequency spectrum that matches the first frequency spectrum.
According to another implementation of the techniques disclosed herein, a wearable device includes first means for capturing noisy input speech and second means for capturing the noisy input speech. The second means for capturing is configured to be positioned within a threshold distance of an ear canal of a user. The wearable device also includes means for performing an active noise cancellation (ANC) operation on the noisy input speech as captured by the first means for capturing, the noisy input speech as captured by the second means for capturing, or both, to suppress a noise level associated with the noisy input speech as captured by the second means for capturing. The wearable device further includes means for matching a second frequency spectrum of a second signal with a first frequency spectrum of a first signal. The first signal is representative of the noisy input speech as captured by the first means for capturing, and the second signal is representative of the noisy input speech as captured by the second means for capturing. The wearable device also includes means for generating an output speech signal that is representative of input speech based on the second signal having the second frequency spectrum that matches the first frequency spectrum. The wearable device further includes means for transmitting a time-domain version of the output speech signal to a mobile device.
Other implementations, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.
Techniques described herein enable a wearable device to suppress noise captured in conjunction with input speech. For example, the wearable device includes at least one external microphone and one internal microphone (e.g., a microphone that is proximate to an ear of a user of the wearable device). As used herein, a microphone is “proximate” to an ear of a user if the microphone is within a threshold distance of the ear. As a non-limiting example, if the microphone is within five inches of the ear, the microphone is proximate to ear. To illustrate, the internal microphone can be positioned within a threshold distance of the ear such that the internal microphone captures the input speech of the user as heard through sound waves travelling from the user's ear canal. Active noise cancellation (ANC) can be performed proximate to the internal microphone to suppress the amount of noise captured by the internal microphone. For example, a feedforward ANC circuit can perform a feedforward ANC operation on the input speech as captured by the external microphone to suppress noise captured by the internal microphone. Alternatively, or in addition, a feedback ANC circuit can perform a feedback ANC operation on the input speech as captured by the internal microphone to suppress noise captured by the internal microphone. As a result, the internal microphone can capture the input speech (as heard through sound waves travelling from the user's ear canal) with relatively little noise (e.g., suppressed noise due to the ANC operations). The external microphone can also capture the input speech and any surrounding noise.
An equalizer integrated into the wearable device can match a frequency spectrum of a second audio signal associated with the input speech captured by the internal microphone with a frequency spectrum of a first audio signal associated with the input speech captured by the external microphone. As a result, audio properties of the second audio signal can be improved to offset bandwidth limitations that may otherwise be present due to capturing the corresponding input speech from sound waves propagating from the user's ear canal. The wearable device can use the second audio signal to generate an output speech signal that is representative of the user speech.
Based on the above-described noise suppression techniques, the speech quality of the user of the wearable device can be improved during a phone call or while giving a command to a voice assistant application. For example, the ANC operations can suppress the external noise leaked into an ear chamber proximate to the internal microphone. As a result, a signal-to-noise ratio of the input speech captured by internal microphone is improved.
Particular aspects of the present disclosure are described below with reference to the drawings. In the description, common features are designated by common reference numbers. As used herein, various terminology is used for the purpose of describing particular implementations only and is not intended to be limiting of implementations. For example, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It may be further understood that the terms “comprise,” “comprises,” and “comprising” may be used interchangeably with “include,” “includes,” or “including.” Additionally, it will be understood that the term “wherein” may be used interchangeably with “where.” As used herein, “exemplary” may indicate an example, an implementation, and/or an aspect, and should not be construed as limiting or as indicating a preference or a preferred implementation. As used herein, an ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a structure, a component, an operation, etc., does not by itself indicate any priority or order of the element with respect to another element, but rather merely distinguishes the element from another element having a same name (but for use of the ordinal term). As used herein, the term “set” refers to one or more of a particular element, and the term “plurality” refers to multiple (e.g., two or more) of a particular element.
In the present disclosure, terms such as “determining,” “calculating,” “estimating,” “shifting,” “adjusting,” etc. may be used to describe how one or more operations are performed. It should be noted that such terms are not to be construed as limiting and other techniques may be utilized to perform similar operations. Additionally, as referred to herein, “generating,” “calculating,” “estimating,” “using,” “selecting,” “accessing,” and “determining” may be used interchangeably. For example, “generating,” “calculating,” “estimating,” or “determining” content (or a signal) may refer to actively generating, estimating, calculating, or determining the content (or the signal) or may refer to using, selecting, or accessing the content (or signal) that is already generated, such as by another component or device.
As used herein, “coupled” may include “communicatively coupled,” “electrically coupled,” or “physically coupled,” and combinations thereof. Two devices (or components) may be coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) directly or indirectly via one or more other devices, components, wires, buses, networks (e.g., a wired network, a wireless network, or a combination thereof), etc. Two devices (or components) that are electrically coupled may be included in the same device or in different devices and may be connected via electronics, one or more connectors, or inductive coupling, as illustrative, non-limiting examples. In some implementations, two devices (or components) that are communicatively coupled, such as in electrical communication, may send and receive electrical signals (digital signals or analog signal) directly or indirectly, such as via one or more wires, buses, networks, etc.
Referring to
In the scene 100, the user 102 talks (e.g., provides input speech 120) into the wearable device 104 to communicate with the mobile device 106. For example, the user 102 says the phrase “Play my favorite song.” The wearable device 104 includes an external microphone 204 and an internal microphone 206 (e.g., an “in-ear” microphone). The external microphone 204 captures the input speech 120 via sound waves originating at the user's mouth and travelling through the air to the external microphone 204.
The internal microphone 206 captures the input speech 120 via sound waves originating at the user's vocal cords and travelling within the user's body through an ear canal 192 to the internal microphone 206. For example, the internal microphone 206 is configured to be positioned within a threshold distance of the ear canal 192 of the user 102. The threshold distance can vary based on audio parameters associated with the wearable device 104. As a non-limiting example, the threshold distance can be three centimeters. As another non-limiting example, the threshold distance can be two inches. According to one implementation, the internal microphone 206 is positioned at least partially inside the ear of the user 102, as illustrated in
The input speech 120, as captured by the external microphone 204, is subject to surrounding noise 130, 132. For example, as illustrated in
The input speech 120, as captured by the internal microphone 206, is substantially isolated from the noise 130, 132. However, because the input speech 120, as captured by the internal microphone 206, is based on sound waves that travel throughout the user's body, the input speech 120 is band limited between 0 Hertz (Hz) and approximately 2 kilohertz (kHz). As a result, the input speech 120, as captured by the internal microphone 206, may undergo an equalization process to adjust a balance between frequency components of the input speech 120 as captured by the internal microphone 206 and frequency components of the input speech 120 as captured by the external microphone 204.
The wearable device 104 includes circuitry, as illustrated in
Referring to
The ANC circuit 154 is configured to perform an ANC operation on noisy input speech 120A and noisy input speech 120B. The noisy input speech 120A corresponds to the input speech 120 as captured by the external microphone 204, and the noisy input speech 120B corresponds to the input speech 120 as captured by the internal microphone 206. The ANC circuit 154 can suppress a noise level associated with the noisy input speech 120B as captured by the internal microphone 206. The ANC circuit 154 generates a first signal 160 that is representative of the noisy input speech 120A as captured by the external microphone 204 and a second signal 162 that is representative of the noisy input speech 120B as captured by the internal microphone 206. The second signal 162 has a better signal-to-noise ratio than the noisy input speech 120B due to the ANC circuit 154, and the first signal 160 is not affected by the ANC circuit 154.
The spectrum matching circuit 156 is configured to match a second frequency spectrum of the second signal 162 with a first frequency spectrum of the first signal 160. For example, the spectrum matching circuit 156 can adjust (e.g., widen) the second frequency spectrum of the second signal 162 to generate a spectrally-matched second signal 164. The output signal generator 158 generates an output speech signal 166 that is representative of the input speech 120. For example, the output signal generator 158 can generate the output speech signal 166 based on the spectrally-matched second signal 164.
Thus, the system 200A of
Referring to
The external microphone 204 is configured to capture the input speech 120 and the noise 130, 132 (e.g., noisy input speech 120C). The sound captured by the external microphone 204 is provided as an audio signal to the feedforward ANC circuit 304. The feedforward ANC circuit 304 is configured to perform a feedforward ANC operation on the sound captured by the external microphone 204. To illustrate, the feedforward ANC circuit 304 can separate (e.g., filter out) the noise 130, 132 from the sound captured by the external microphone 204 to generate a noise signal 330 representative of the noise 130, 132 and to generate an input audio signal 250 representative of the noisy input speech 120C.
In the scenario where the ANC circuit 302 does not include the feedback ANC circuit 306, the feedforward ANC circuit 304 is configured to apply a phase compensation filter to the noise signal 330 to adjust a phase of the noise signal 330 by approximately one-hundred eighty (180) degrees and combine the phase-adjusted version of the noise signal 330 with the sound captured by the internal microphone 206. As a result, the noise 130, 132 captured by the internal microphone 206 is substantially canceled out (e.g., suppressed) when combined with the phase-adjusted version of the noise signal 330 to generate an input audio signal 253. However, in the illustration of
The internal microphone 206 is configured to capture the input speech 120 and the noise 130, 132 (e.g., noisy input speech 120D). The sound captured by the internal microphone 206 is provided as an audio signal to the feedback ANC circuit 306. The feedback ANC circuit 306 is configured to perform a feedback ANC operation on the sound captured by the internal microphone 206. To illustrate, the feedback ANC circuit 306 can separate (e.g., filter out) the noise 130, 132 from the sound captured by the internal microphone 206 to generate a noise signal 332 representative of the noise 130, 132. The noise signal 332 is “fed back” into the feedback ANC circuit 306. The feedback ANC circuit 306 is configured to apply a phase compensation filter to the noise signal 332 to adjust a phase of the noise signal 332 by approximately one-hundred eighty (180) degrees and combine the phase-adjusted version of the noise signal 332 with the sound captured by the internal microphone 206. As a result, the noise 130, 132 captured by the internal microphone 206 is substantially canceled out (e.g., suppressed) when combined with the phase-adjusted version of the noise signal 332 to generate the input audio signal 253.
In the implementation of
The input audio signals 250, 253 can undergo audio processing, as described with respect to
The TIR controller 228 can differentiate a target (e.g., the input speech 120) and any interference (e.g., any noise or other signals). As described in greater detail with respect to
The equalizer 230 is configured to generate a signal 276 that enables the control unit 234 to match a second frequency spectrum of the second signal 268 with a first frequency spectrum of the first signal 272. For example, the equalizer 230 can perform an equalizing operation on the first signal 272 and the second signal 268 to generate the signal 276 (e.g., a spectrum matching control signal). The equalizer 230 can reduce non-stationary noise if the target speech and non-stationary interferences are uncorrelated. The signal 276 is provided to the control unit 234. The control unit 234 is configured to adjust the spectrum and the gain of at least one of the signals 272, 277 such that the signals 272, 277 matching gains. As used herein, “matching” elements are elements that are equal or approximately equal to each other, such as within five percent of each other. In a particular implementation, the equalizer 230 and the control unit 234 use a frequency-domain adaptive filter to map a speech spectrum of the internal microphone 206 to a speech spectrum of the external microphone 204. Thus, the TIR controller 228, the equalizer 230, and the control unit 234 can interoperate to perform the functionality of the spectrum matching circuit 156 of
Referring to
In
Referring to
The system 200D includes a speaker 202, the external microphone 204, and the internal microphone 206. The speaker 202 is configured to playout an output audio signal 258 that is received from a communication transceiver 238, such as a BLUETOOTH® transceiver or an Institute of Electronics and Electrical Engineers (IEEE) 802.11 transceiver. BLUETOOTH® is a registered trademark assigned to BLUETOOTH SIG, INC., a Delaware corporation. The output audio signal 258 is a time-domain signal that is representative of audio received from a voice call, audio received from an interactive assistant application, or both. The speaker 202 is configured to playout the output audio signal 258 such that the user 102 of the wearable device 104 can listen to the representative audio via the speaker 202. According to one implementation, the speaker 202 is also used to playout anti-noise (generated by the ANC circuit 154) in the ear canal 192 of the user 102.
An analysis filter bank 208 is configured to perform a transform operation on the input audio signal 250 to generate a frequency-domain input audio signal 252. For example, the analysis filter bank 208 is configured to convert the input audio signal 250 from a time-domain signal to a frequency-domain signal. The transform operation can include a Discrete Cosine Transform (DCT) operation, a Fast Fourier Transform (FFT) operation, etc. The frequency-domain input audio signal 252 is provided to a frequency-domain echo cancellation circuit 210.
A full-band echo cancellation circuit 212 is configured to perform acoustic echo cancellation on the input audio signal 253 to generate an input audio signal 254. The input audio signal 254 is provided to an analysis filter bank 214. The analysis filter bank 214 is configured to perform a transform operation on the input audio signal 254 to generate a frequency-domain input audio signal 256. For example, the analysis filter bank 214 is configured to convert the input audio signal 254 from a time-domain signal to a frequency-domain signal. The transform operation can include a DCT operation, a FFT operation, etc. The frequency-domain input audio signal 256 is provided to a frequency-domain echo cancellation circuit 216.
An analysis filter bank 218 is configured to perform a transform operation on the output audio signal 258 to generate a frequency-domain output audio signal 260. For example, the analysis filter bank 218 is configured to convert the output audio signal 258 from a time-domain signal to a frequency-domain signal. The transform operation can include a DCT operation, a FFT operation, etc. The frequency-domain output audio signal 260 is provided to the frequency-domain echo cancellation circuit 210 and to the frequency-domain echo cancellation circuit 216.
The frequency-domain echo cancellation circuit 210 is configured to perform frequency-domain echo cancellation on the frequency-domain input audio signal 252 to generate a frequency-domain input audio signal 262. For example, the frequency-domain echo cancellation circuit 210 can substantially reduce the amount of echo present in the frequency-domain input audio signal 252. According to one implementation, the frequency-domain echo cancellation circuit 210 uses reverberation characteristics of the frequency-domain output audio signal 260 to reduce (e.g., cancel) the echo in the frequency-domain input audio signal 252. The frequency-domain input audio signal 262 is provided to a single microphone noise reduction unit 220. The frequency-domain echo cancellation circuit 216 is configured to perform frequency-domain echo cancellation on the frequency-domain input audio signal 256 to generate a frequency-domain input audio signal 264. For example, the frequency-domain echo cancellation circuit 216 can substantially reduce the amount of echo present in the frequency-domain input audio signal 256. According to one implementation, the frequency-domain echo cancellation circuit 216 uses reverberation characteristics of the frequency-domain output audio signal 260 to reduce (e.g., cancel) the echo in the frequency-domain input audio signal 256. The frequency-domain input audio signal 264 is provided to a single microphone noise reduction unit 224.
The single microphone noise reduction unit 220 is configured to perform noise reduction on the frequency-domain input audio signal 262 to generate a frequency-domain signal 270. For example, the single microphone noise reduction unit 220 is configured to remove stationary noise from the frequency-domain input audio signal 262. The frequency-domain signal 270 is provided to a post-processing circuit 222. The single microphone noise reduction unit 224 is configured to perform noise reduction on the frequency-domain input audio signal 264 to generate a frequency-domain signal 266. For example, the single microphone noise reduction unit 224 is configured to remove stationary noise from the frequency-domain input audio signal 264. The frequency-domain signal 266 is provided to a post-processing circuit 226.
The post-processing circuit 222 is configured to perform post-processing operations on the frequency-domain signal 270 to generate the first signal 272, and the post-processing circuit 226 is configured to perform post-processing operations on the frequency-domain signal 266 to generate the second signal 268. The post-processing operations can include additional echo cancellation processing, noise reduction processing, etc. The first signal 272 is representative of the noisy input speech 120C as captured by the microphone 204, and the second signal 268 is representative of the noisy input speech 120D as captured by the microphone 206. The first signal 272 is provided to the TIR controller 228, the equalizer 230, and the control unit 234. The second signal 268 is provided to the TIR controller 228, the equalizer 230, and the frequency extension unit 232.
The TIR controller 228 is configured to receive the first signal 272 and the second signal 268. The TIR controller 228 can differentiate a target (e.g., the input speech 120) and any interference (e.g., any noise or other signals). For example, the TIR controller 228 is configured to determine a noise characteristic 290 associated with the first signal 272. For example, the noise characteristic 290 can include a signal-to-noise ratio associated with the first signal 272, a speech intelligibility level associated with the first signal 272, a noise level of the surrounding noise 130, 132, etc. The speech intelligibility level corresponds to a percentage of intelligible words in speech associated with the first signal 272. Based on the noise characteristic 290, the TIR controller 228 is configured to generate the control signal 274 that indicates how to use the first signal 272 and the second signal 268 in generation of the output speech signal 278 that is representative of the input speech 120 captured by the microphones 204, 206. The control signal 274 is provided to the equalizer 230, the frequency extension unit 232, and the control unit 234.
The control signal 274 indicates how to adjust a frequency range of the second signal 268. For example, the TIR controller 228 is configured to determine the frequency range of the second signal 268. Because the second signal 268 is generated based on the noisy input speech 120D captured through the user's ear canal 192, the second signal 268 has a relatively low frequency range. As a non-limiting example, the frequency range of the second signal 268 is between 0 Hz and 2.5 kHz. As a result, the TIR controller 228 is configured to generate the control signal 274 such that the control signal 274 indicates how to extend (e.g., widen) the frequency range of the second signal 268 such that the second signal 268 covers a wider frequency range, such as 0 Hz to 20 kHz. To illustrate, the TIR controller 228 provides the control signal 274 to the frequency extension unit 232, and the frequency extension unit 232 is configured to perform frequency extension on the second signal 268 to generate the frequency-extended second signal 277, such that the frequency-extended second signal 277 has a wider frequency range than the second signal 268. The frequency-extended second signal 277 is provided to the control unit 234.
The TIR controller 228 is configured to compare the noise characteristic 290 to one or more noise thresholds. For example, the TIR controller 228 is configured to compare the noise characteristic 290 to a first noise threshold (e.g., a lower noise threshold), a second noise threshold (e.g., a higher noise threshold), or both. If the TIR controller 228 determines that the noise characteristic 290 fails to satisfy (e.g., is lower than) the first noise threshold, the control signal 274 indicates to generate the output speech signal 278 based on the first signal 272. For example, in scenarios where the input speech 120 captured by the microphone 204 is relatively noise-free input speech, the output speech signal 278 matches the first signal 272.
If the TIR controller 228 determines that the noise characteristic 290 satisfies (e.g., is higher than) the second noise threshold, the control signal 274 indicates to generate the output speech signal 278 based on the frequency-extended second signal 277. For example, in scenarios where the input speech 120 captured by the microphone 204 has a high degree of noise, the output speech signal 278 is generated based on the input speech 120 as detected by the microphone 206 (e.g., the internal microphone that captures the input speech 120 through the user's ear canal 192).
If the TIR controller 228 determines that the noise characteristic 290 satisfies the first noise threshold and fails to satisfy the second noise threshold, the control signal 274 indicates to generate the output speech signal 278 based on the first signal 272 and the frequency-extended second signal 277. According to one implementation, the signals 272, 277 are equalized, scaled, and combined to generate the output speech signal 278. For example, the equalizer 230 is configured to perform an equalizing operation on the first signal 272 and the second signal 268 to generate the signal 276 (e.g., a spectrum matching control signal). The equalizer 230 can reduce non-stationary noise if the target speech and non-stationary interferences are uncorrelated. The signal 276 is provided to the control unit 234. The control unit 234 is configured to adjust the spectrum and the gain of at least one of the signals 272, 277 such that the signals 272, 277 have approximately equal (e.g., matching) gains. For example, the control unit 234 can adjust the spectrum and the gain of one or more of the signals 272, 277 such that the gains of the signals 272, 272 are within five percent of each other. The equalizer 230 and the control unit 234 use a frequency-domain adaptive filter to map a noise spectrum of the internal microphone 206 to a noise spectrum of the external microphone 204. To illustrate, based on the signal 276, the control unit 234 is configured to match the second frequency spectrum of the second signal 268 (or the frequency-extended second signal 277) with the first frequency spectrum of the first signal 272.
As described above, the control signal 274 is generated based on comparing the noise characteristic 290 to one or more thresholds. However, in other implementations, the control signal 274 can be generated based on the noise characteristic 290 and neural network data. For example, the TIR controller 228 can apply the noise characteristic 290 to a neural network generated by a machine learning algorithm to generate the control signal 274.
Additionally, the control signal 274 indicates how to scale the first signal 272 and the frequency-extended second signal 277. For example, based on the noise characteristic 290, the control signal 274 indicates a first scaling factor for the first signal 272 and a second scaling factor for the frequency-extended second signal 277. To illustrate, if the noise characteristic 290 indicates the first signal 272 has a relatively high degree of noise, the second scaling factor is larger than the first scaling factor. If the noise characteristic 290 indicates the first signal 272 has a relatively low degree of noise, the first scaling factor is larger than the second scaling factor. The control unit 234 is configured to scale the first signal 272 by the first scaling factor to generate a first portion of the output speech signal 278, scale the frequency-extended second signal 277 by the second scaling factor to generate a second portion of the output speech signal 278, and combine the first portion of the output speech signal 278 and the second portion of the output speech signal 278 to generate the output speech signal 278.
The output speech signal 278 is provided to an inverse transform unit 236. The inverse transform unit 236 is configured to perform an inverse transform operation on the output speech signal 278 to generate a time-domain output speech signal 280. The inverse transform operation can include an Inverse Discrete Cosine Transform (IDCT) operation, an Inverse Fast Fourier Transform (IFFT) operation, etc. The time-domain output speech signal 280 is provided to the communication transceiver 238. The communication transceiver 238 can send the time-domain output speech signal 280 to the interactive assistant application, to a mobile phone transceiver for voice call communication, etc.
The system 200A-200D of
The systems 200A-200D also suppress noise at the internal microphone 206 by utilizing hybrid ANC technology. Although hybrid ANC technology is illustrated in
Referring to
The system 300 operates in a substantially similar manner as the systems 200A-200D of
Referring to
For example, the system 400 includes a second external microphone 404 that is configured to capture the input speech 120 and the noise 130, 132. The microphone 404 is located proximate to a mouth of the user 102 and is configured to capture the input speech 120 spoken by the user 102 and the surrounding noise 130, 132. The input speech 120 spoken by the user 102 and the surrounding noise 130, 132 captured by the microphone 404 are provided to an ANC circuit 402. The ANC circuit 402 operates in a substantially similar manner as the ANC circuit 302; however, the ANC circuit 402 includes a second feedforward ANC circuit (not shown) coupled to the second external microphone 404. The ANC circuit 402 generates an input audio signal 450. The input audio signal 450 is a time-domain signal that is representative of sounds captured (e.g., detected) by the microphone 404. The input audio signal 450 is provided to an analysis filter bank 408.
The analysis filter bank 408 is configured to perform a transform operation on the input audio signal 450 to generate a frequency-domain input audio signal 452. For example, the analysis filter bank 408 is configured to convert the input audio signal 450 from a time-domain signal to a frequency-domain signal. The transform operation can include a DCT operation, an FFT operation, etc. The frequency-domain input audio signal 452 is provided to a frequency-domain echo cancellation circuit 410.
The frequency-domain echo cancellation circuit 410 is configured to perform frequency-domain echo cancellation on the frequency-domain input audio signal 452 to generate a frequency-domain input audio signal 462. For example, the frequency-domain echo cancellation circuit 410 is configured to substantially reduce the amount of echo present in the frequency-domain input audio signal 452. According to one implementation, the frequency-domain echo cancellation circuit 410 uses reverberation characteristics of the frequency-domain output audio signal 260 to reduce (e.g., cancel) the echo in the frequency-domain input audio signal 452. The frequency-domain input audio signal 462 is provided to a single microphone noise reduction unit 420.
The single microphone noise reduction unit 420 is configured to perform noise reduction on the frequency-domain input audio signal 462 to generate a frequency-domain signal 470. For example, the single microphone noise reduction unit 420 is configured to remove stationary noise from the frequency-domain input audio signal 462. The frequency-domain signal 470 is provided to the post-processing circuit 222. In
Thus, the system 400 of
Referring to
The method 500 includes performing an ANC operation on noisy input speech as captured by a first microphone of a wearable device, the noisy input speech as captured by a second microphone of the wearable device, or both, to suppress a noise level associated with the noisy input speech as captured by the second microphone, at 502. The second microphone is positioned within a threshold distance of an ear canal of a user. For example, referring to
The method 500 also includes performing an equalization operation to match a second frequency spectrum of a second signal with a first frequency spectrum of a first signal, at 504. The first signal is representative of the noisy input speech as captured by the first microphone, and the second signal is representative of the noisy input speech as captured by the second microphone. For example, referring to
The method 500 also includes generating an output speech signal that is representative of input speech based on the second signal having the second frequency spectrum that matches the first frequency spectrum, at 506. For example, referring to
The method 500 also includes transmitting a time-domain version of the output speech signal to a mobile device, at 508. For example, referring to
According to one implementation, the method 500 also includes determining a noise characteristic associated with the input speech as captured by the first microphone. For example, referring to
According to one implementation, the method 500 also includes generating a control signal based on the noise characteristic. The control signal indicates how to use the first signal and the second signal in generation of the output speech signal. For example, referring to
According to another implementation, the method 500 can include determining that the noise characteristic 290 satisfies the lower noise threshold and fails to satisfy the upper noise threshold. The control signal 274 indicates, to the control unit 234, to generate the output speech signal 278 based on the first signal 272 and the second signal 268 (or the frequency-extended second signal 277) in response to determining that the noise characteristic 290 satisfies the lower noise threshold and fails to satisfy the upper noise threshold. In this scenario, the method 500 can include scaling the first signal 272 by the first scaling factor to generate the first portion of the output speech signal 278 and scaling the frequency-extended second signal 277 by the second scaling factor to generate the second portion of the output speech signal 278. The first scaling factor and the second scaling factor are based on the noise characteristic 290. The method 500 can also include combining the first portion of the output speech signal 278 and the second portion of the output speech signal 278 to generate the output speech signal 278.
According to one implementation, the method 500 includes determining a frequency range of the second signal 268 and performing, based on the frequency range, frequency extension on the second signal 268 to generate the frequency-extended second signal 277. According to one implementation, the method 500 includes performing the equalizing operation on the first signal 272 and the second signal 268. According to one implementation, the method 500 includes performing the inverse transform operation on the output speech signal 278 to generate the time-domain output speech signal 280 that is provided to the communication transceiver 238.
According to one implementation, the method 500 includes performing the feedforward ANC operation on the input speech 120 as captured by the microphone 204. The method 500 can also include performing the feedback ANC operation on the input speech 120 as captured by the microphone 206. The second signal 268 can be based on the feedforward ANC operation and the feedback ANC operation.
The method 500 of
The graphical user interface 600A includes a noise suppression option 602 that is visible to the user 102. The user 102 can use his or her finger to control a pointer 604 of the graphical user interface 600A. The pointer 604 is used to select one or more noise suppression options 602. For example, the user 102 can use his or her finger to enable active noise cancellation, to enable target-to-interference control, or to enable both. To illustrate, if the user 102 guides the pointer 604 to enable active noise cancellation, the ANC circuit 302 can perform the ANC operations to suppress noise at the internal microphone 206. If the user 102 guides the pointer 604 to enable target-to-interference control, the TIR controller 228 can determine the noise characteristic 290. Based on the noise characteristic, the TIR controller 228 can generate the control signal 274 that indicates how to use the first signal 272 and the second signal 268 in generation of the output speech signal 280, as described with respect to
Thus, the graphical user interface 600A enables the user 102 to selectively enable different noise suppression options 602 associated with the wearable device 104A. Although ANC operations and TIR operations are shown in
The graphical user interface 600B displays a quality indicator 604 that is visible to the user 102. The quality indicator 604 indicates a speech quality of the noisy input speech 120C captured by the external microphone 204. As illustrated in
As described above, the TIR controller 228 can determine the noise characteristic 290 associated with the noisy input speech 120C. Based on the noise characteristic 290, the TIR controller 228 can indicate whether the speech quality of the noisy input speech 120C is high, moderate, or low. For example, if the noise characteristic 290 is below a lower noise threshold, the TIR controller 228 can determine that the speech quality of the noisy input speech 120C is high. If the noise characteristic 290 is above an upper noise threshold, the TIR controller 228 can determine that the speech quality of the noisy input speech 120C is low. If the noise characteristic 290 is between the lower noise threshold and the upper noise threshold, the TIR controller 228 can determine that the speech quality of the noisy input speech 120C is moderate. Thus, based on the noise characteristic 290 determined by the TIR controller 228, the quality indicator 604 displayed at the graphical user interface 600B can indicate the speech quality of the noisy input speech 120C.
Although different colors are used to indicate the speech quality of the noisy input speech 120C, in other implementations, different visual indicators (e.g., numerical values, signal bars, etc.) can be used to indicate the speech quality. As a non-limiting example, if the quality indicator 604 displays a numerical value of one (1), the user 102 can determine that the speech quality of the noisy input speech 120C is low. However, if the quality indicator 604 displays a numerical value of ten (10), the user 102 can determine that the speech quality of the noisy input speech 120C is high.
The graphical user interface 600C displays a first stage quality indicator 650, a second stage quality indicator 652, and a third stage quality indicator 654. Each quality indicator 650, 652, 654 can correspond to the speech quality of the output speech signal 278 based on different microphone configurations. For example, the first stage quality indicator 650 can indicate a speech quality of the output speech signal 278 if the external microphone 204 is activated and the internal microphone 206 is deactivated, the second stage quality indicator 652 can indicate a speech quality of the output speech signal 278 if the external microphone 204 is deactivated and the internal microphone is activated, and the third stage quality indicator 654 can indicate a speech quality of the output speech signal 278 if the external microphone 204 and the internal microphone 206 are activated. In a similar manner as described with respect to
The TIR controller 228 is configured to determine the noise characteristic 290 associated with the noisy input speech 120C. The noise characteristic 290 is provided to the quality predictor 608. Based on the noise characteristic 290, the quality predictor 608 can generate a speech quality indicator 610 that indicates whether the speech quality of the noisy input speech 120C is high, moderate, or low. For example, if the noise characteristic 290 is below a lower noise threshold, the quality predictor 608 can determine that the speech quality of the noisy input speech 120C. If the noise characteristic 290 is above an upper noise threshold, the quality predictor 608 can determine that the speech quality of the noisy input speech 120C is low. If the noise characteristic 290 is between the lower noise threshold and the upper noise threshold, the quality predictor 608 can determine that the speech quality of the noisy input speech 120C is moderate. The speech quality indicator 610 is provided to the graphical user interface 600D. Based on the speech quality indicator 610, the graphical user interface 600D can display a visual representation of the speech quality of the noisy input speech 120C, as depicted in
Referring to
In a particular implementation, the mobile device 106 includes a processor 702, such as a central processing unit (CPU) or a digital signal processor (DSP), coupled to the memory 704. The memory 704 includes instructions 772 (e.g., executable instructions) such as computer-readable instructions or processor-readable instructions. The instructions 772 include one or more instructions that are executable by a computer, such as the processor 702.
The mobile device 106 also includes a display controller 726 that is coupled to the processor 702 and to a display device 780. According to one implementation, the display device 780 can display a graphical user interface 600, such as the graphical user interface 600A of
In some implementations, the processor 702, the display controller 726, the memory 704, the CODEC 734, the wireless interface 740, and the transceiver 746 are included in a system-in-package or system-on-chip device 722. In some implementations, a power supply 744 and an input device 730 are coupled to the system-on-chip device 722. Moreover, in a particular implementation, as illustrated in
The wearable device 104 is in communication with the mobile device 106 via the communication transceiver 238. For example, the communication transceiver 238 is configured to send the time-domain output speech signal 280 to the mobile device 106, and the communication transceiver 238 is configured to receive the output audio signal 258 from the mobile device 106. The wearable device 104 can include one or more components of the systems 200A-400 of
In a particular implementation, one or more components of the systems and devices disclosed herein is integrated into a decoding system or apparatus (e.g., an electronic device, a CODEC, or a processor therein), into an encoding system or apparatus, or both. In other implementations, one or more components of the systems and devices disclosed herein may be integrated into a wireless telephone, a tablet computer, a desktop computer, a laptop computer, a set top box, a music player, a video player, an entertainment unit, a television, a game console, a navigation device, a communication device, a personal digital assistant (PDA), a fixed location data unit, a personal media player, or another type of device.
In conjunction with the described techniques, a wearable device includes first means for capturing noisy input speech. For example, the first means for capturing may include the microphone 204, one or more other devices, circuits, modules, or any combination thereof.
The wearable device also includes second means for capturing the noisy input speech. The second means for capturing is configured to be positioned within a threshold distance of an ear canal of a user. For example, the second means for capturing may include the microphone 206, one or more other devices, circuits, modules, or any combination thereof.
The wearable device also includes means for performing an ANC operation on the noisy input speech as captured by the first means for capturing, the noisy input speech as captured by the second means for capturing, or both, to suppress a noise level associated with the noisy input speech as captured by the second means for capturing. For example, the means for performing the ANC operation may include the ANC circuit 302, the feedback ANC circuit 306, the feedforward ANC circuit 304, the instructions 799 executable by the processor 201, the ANC circuit 154, the processor 152, one or more other devices, circuits, modules, or any combination thereof.
The wearable device also includes means for matching a second frequency spectrum of a second signal with a first frequency spectrum of a first signal. The first signal is representative of the noisy input speech as captured by the first means for capturing, and the second signal is representative of the noisy input speech as captured by the second means for capturing. For example, the means for matching may include the equalizer 230, the control unit 234, the instructions 799 executable by the processor 201, the spectrum matching circuit 156, the processor 152, one or more other devices, circuits, modules, or any combination thereof.
The wearable device also includes means for generating an output speech signal that is representative of input speech based on the second signal having the second frequency spectrum that matches the first frequency spectrum. For example, the means for generating may include the control unit 234, the instructions 799 executable by the processor 201, the output signal generator 158, the processor 152, one or more other devices, circuits, modules, or any combination thereof.
The wearable device also includes means for transmitting a time-domain version of the output speech signal to a mobile device. For example, the means for transmitting may include the communication transceiver 238, one or more other devices, circuits, modules, or any combination thereof.
In accordance with one or more techniques of this disclosure, the mobile device may be used to acquire a sound field. For instance, the mobile device may acquire a sound field via the wired and/or wireless acquisition devices and/or the on-device surround sound capture (e.g., a plurality of microphones integrated into the mobile device). The mobile device may then code the acquired sound field into the Higher Order Ambisonic (HOA) coefficients for playback by one or more of the playback elements. For instance, a user of the mobile device may record (acquire a sound field of) a live event (e.g., a meeting, a conference, a play, a concert, etc.), and code the recording into HOA coefficients.
The mobile device may also utilize one or more of the playback elements to playback the HOA coded sound field. For instance, the mobile device may decode the HOA coded sound field and output a signal to one or more of the playback elements that causes the one or more of the playback elements to recreate the sound field. As one example, the mobile device may utilize the wireless and/or wireless communication channels to output the signal to one or more speakers (e.g., speaker arrays, sound bars, etc.). As another example, the mobile device may utilize docking solutions to output the signal to one or more docking stations and/or one or more docked speakers (e.g., sound systems in smart cars and/or homes). As another example, the mobile device may utilize headphone rendering to output the signal to a set of headphones, e.g., to create realistic binaural sound.
In some examples, a particular mobile device may both acquire a 3D sound field and playback the same 3D sound field at a later time. In some examples, the mobile device may acquire a 3D sound field, encode the 3D sound field into HOA, and transmit the encoded 3D sound field to one or more other devices (e.g., other mobile devices and/or other non-mobile devices) for playback.
Yet another context in which the techniques may be performed includes an audio ecosystem that may include audio content, game studios, coded audio content, rendering engines, and delivery systems. In some examples, the game studios may include one or more DAWs which may support editing of HOA signals. For instance, the one or more DAWs may include HOA plugins and/or tools which may be configured to operate with (e.g., work with) one or more game audio systems. In some examples, the game studios may output new stem formats that support HOA. In any case, the game studios may output coded audio content to the rendering engines which may render a sound field for playback by the delivery systems.
The mobile device may also, in some instances, include a plurality of microphones that are collectively configured to record a 3D sound field. In other words, the plurality of microphone may have X, Y, Z diversity. In some examples, the mobile device may include a microphone which may be rotated to provide X, Y, Z diversity with respect to one or more other microphones of the mobile device.
Example audio playback devices that may perform various aspects of the techniques described in this disclosure are further discussed below. In accordance with one or more techniques of this disclosure, speakers and/or sound bars may be arranged in any arbitrary configuration while still playing back a 3D sound field. In accordance with one or more techniques of this disclosure, a single generic representation of a sound field may be utilized to render the sound field on any combination of the speakers, the sound bars, and the headphone playback devices.
A number of different example audio playback environments may also be suitable for performing various aspects of the techniques described in this disclosure. For instance, a 5.1 speaker playback environment, a 2.0 (e.g., stereo) speaker playback environment, a 9.1 speaker playback environment with full height front loudspeakers, a 22.2 speaker playback environment, a 16.0 speaker playback environment, an automotive speaker playback environment, and a mobile device with ear bud playback environment may be suitable environments for performing various aspects of the techniques described in this disclosure.
In accordance with one or more techniques of this disclosure, a single generic representation of a sound field may be utilized to render the sound field on any of the foregoing playback environments. Additionally, the techniques of this disclosure enable a rendered to render a sound field from a generic representation for playback on the playback environments other than that described above. For instance, if design considerations prohibit proper placement of speakers according to a 7.1 speaker playback environment (e.g., if it is not possible to place a right surround speaker), the techniques of this disclosure enable a render to compensate with the other speakers such that playback may be achieved on a 6.1 speaker playback environment.
Moreover, a user may watch a sports game while wearing headphones. In accordance with one or more techniques of this disclosure, the 3D sound field of the sports game may be acquired (e.g., one or more Eigen microphones may be placed in and/or around the baseball stadium), HOA coefficients corresponding to the 3D sound field may be obtained and transmitted to a decoder, the decoder may reconstruct the 3D sound field based on the HOA coefficients and output the reconstructed 3D sound field to a renderer, the renderer may obtain an indication as to the type of playback environment (e.g., headphones), and render the reconstructed 3D sound field into signals that cause the headphones to output a representation of the 3D sound field of the sports game.
It should be noted that various functions performed by the one or more components of the systems and devices disclosed herein are described as being performed by certain components or modules. This division of components and modules is for illustration only. In an alternate implementation, a function performed by a particular component or module may be divided amongst multiple components or modules. Moreover, in an alternate implementation, two or more components or modules may be integrated into a single component or module. Each component or module may be implemented using hardware (e.g., a field-programmable gate array (FPGA) device, an application-specific integrated circuit (ASIC), a DSP, a controller, etc.), software (e.g., instructions executable by a processor), or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software executed by a processing device such as a hardware processor, or combinations of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or executable software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in a memory device, such as random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). An exemplary memory device is coupled to the processor such that the processor can read information from, and write information to, the memory device. In the alternative, the memory device may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or a user terminal.
The previous description of the disclosed implementations is provided to enable a person skilled in the art to make or use the disclosed implementations. Various modifications to these implementations will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other implementations without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the implementations shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.