The present disclosure relates to audio signal control.
Local participants in conferencing sessions (e.g., online or web-based meetings) often use headsets with an integrated speaker and/or microphone to communicate with remote meeting participants. The microphone detects speech from the local participant for transmission to the remote meeting participants, but frequently picks up undesired anisotropic background audio signals (e.g., background talkers) along with the speech. When transmitted with the speech, the undesired anisotropic background audio signals can prevent the remote meeting participants from understanding the speech. This can be a hindrance to all meeting participants and reduce the effectiveness of the conferencing session.
In one example embodiment, a headset obtains, from a first microphone on the headset, a first audio signal including a user audio signal and an anisotropic background audio signal. The headset obtains, from a second microphone on the headset, a second audio signal including the user audio signal and the anisotropic background audio signal. The headset extracts, from the first audio signal and the second audio signal, using a first adaptive filter, a reference audio signal including the anisotropic background audio signal. Based on the reference signal, the headset cancels, using a second adaptive filter, the anisotropic background audio signal from a third audio signal derived from the first and second audio signals to produce an output audio signal. The headset provides the output audio signal to a receiver device.
With reference made to
Wireless communications interface 135 may be configured to operate in accordance with the Bluetooth® short-range wireless communication technology or any other suitable technology now known or hereinafter developed. Wireless communications interface 135 may enable communication with telephony device 120(1). Although wireless communications interface 135 is shown in
Headset 115(1) also includes microphones 140(1) and 140(2), audio processor 145, and speaker 150. Audio processor 145 may include one or more integrated circuits that convert audio detected by microphones 140(1) and 140(2) to digital signals that are supplied (e.g., as receive signals) to the processor 130 for wireless transmission via wireless communications interface 135 (e.g., when meeting attendee 105(1) speaks). Thus, processor 130 is coupled to receive signals derived from outputs of microphones 140(1) and 140(2) via audio processor 145. Audio processor 145 may also convert received audio (via wireless communication interface 135) to analog signals to drive speaker 150 (e.g., when meeting attendee 105(2) speaks). Headset 115(2) may include similar functional components as those shown at 120 with reference to headset 115(1).
Anisotropic background audio signal 155 is present in the local environment of headset 115(1). In this example, anisotropic background audio signal 155 originates from person who is loudly speaking near meeting attendee 105(1), although it will be appreciated that anisotropic background audio signal 155 may be any noise that reaches microphones 140(1) and 140(2) at different levels of magnitude. Here, because the person is standing to one side of meeting attendee 105(1), the noise from the person reaches microphone 140(1) at a different (e.g., lower) level of magnitude than at microphone 140(2).
Conventionally, anisotropic background audio signal 155 would heavily interfere with the online meeting between meeting attendees 105(1) and 105(2). For example, in some conventional headsets, the anisotropic background audio signal 155 would drown out any speech from meeting attendee 105(1). Other conventional headsets might be configured for traditional noise reduction or suppression, although these are too limited to adequately deal with anisotropic background audio signal 155. Traditional noise reduction algorithms might not suppress anisotropic background audio signal 155 because anisotropic background audio signal 155 is a speech signal. Moreover, traditional noise suppression algorithms can attempt to suppress the anisotropic background audio signal 155 at some frequency and time, but this often distorts the speech from meeting attendee 105(1) because that speech and the anisotropic background audio signal 155 generally have some overlap in time and frequency. Thus, traditional methods often fail because the anisotropic background audio signal 155 and the speech from meeting attendee 105(1) can have similar energy signals.
Accordingly, in order to alleviate noise interference due to anisotropic background audio signal 155, anisotropic background audio signal control logic 160 is provided in memory 125. Briefly, anisotropic background audio signal control logic 160 causes processor 130 to perform operations to cancel (rather than merely reduce or suppress by conventional means) anisotropic background audio signal 155. Anisotropic background audio signal control logic 160 enables headset 115(1) to cancel anisotropic background audio signal 155 without distorting speech from meeting attendee 105(1). Headset 115(1) may remove anisotropic background audio signal 155 before providing an output audio signal to headset 115(2). It will be appreciated that at least a portion of anisotropic background audio signal control logic 160 may be included in devices other than headset 115(1), such as at communications server 110.
Headset 115(1) may have a boom design or a boomless design. In a boom design, headset 115(1) includes a boom that houses microphones 140(1) and 140(2).
In a boomless design, headset 115(1) includes a first earpiece that houses microphone 140(1) and a second earpiece that houses microphone 140(1). One of the first and second earpieces may be configured for the left ear of meeting attendee 105(1), and the other of the first and second earpieces may be configured for the right ear of meeting attendee 105(1). Microphones 140(1) and 140(2) may both be oriented toward the source of the user audio signal, and may be unidirectional or omnidirectional. It will be appreciated that microphones 140(1) and 140(2) may be physical microphones or virtual microphones comprising an array of physical microphones. In either design, the relative position between microphones 140(1) and 140(2) and the mouth of meeting attendee 105(1) does not change. Moreover the distances between the mouth and microphones 140(1) and 140(2) are relatively short, and therefore audio signals from the direct acoustic path tend to dominate.
Headset 115(1) extracts, from first audio signal 310 and second audio signal 315, reference audio signal 305. Reference signal 305 may include anisotropic background audio signal 155 and any (isotropic) background noise, but may exclude most or all of the user audio signal. Headset 115(1) uses adaptive filter 320 (e.g., time domain element filter) to extract the reference audio signal 305. In this example, first audio signal 310 is the primary input for adaptive filter 320, second audio signal 315 is the reference input for adaptive filter 320, and reference signal 305 is the error output of adaptive filter 320. Adder 322 generates reference signal 305 based on an output signal 325 of adaptive filter 320 and first audio signal 310 (e.g., by subtracting output signal 325 from first audio signal 310).
As shown in
In one example, delay node 350 may delay the first audio signal 310 by a length of time equal to a difference between a time at which the user audio signal reaches microphone 140(1) and a time at which the user audio signal reaches microphone 140(2). Delaying the first audio signal 310 may ensure that adaptive filter 320 converges when the user audio signal is present. The length of time may correspond to distance D (
Thus, first audio signal 310, second audio signal 315, and third audio signal 345 are candidate audio signals. Based on earpiece position estimation function 410, candidate signal selection function 450 selects one of the candidate audio signals (here, third audio signal 345). Candidate signal selection function 450 may make the selection based on SNRs 430, 440, and/or 445 (e.g., by selecting the highest SNR), and/or based on envelop 420. For example, in a boomless design, when meeting attendee 105(1) has not placed the earpieces at the optimal positions, the signal from one of microphones 140(1) and 140(2) may have a significantly lower level of the user audio signal than the other of microphones 140(1) and 140(2). Accordingly, in certain situations it may be preferable to intelligently select a signal with the highest SNR instead of, for example, the third audio signal 345.
Because adaptive filter 320 (
Suppression function 620 may determine the estimated signal strength of the user audio signal by comparing the signal strengths between reference signal 305 and the third audio signal 345. In particular, the third audio signal 345 includes the user audio signal, anisotropic background audio signal 155, and any (isotropic) background/environmental noise, while reference signal 305 includes anisotropic background audio signal 155 and the (isotropic) background/environmental noise, with the user audio signal removed. Moreover, suppression function 620 may use the SNR of reference signal 305 as the estimated signal strength of anisotropic background audio signal 155.
Performance estimation function 630 may provide a performance estimation of adaptive filter 510, and performance estimation function 640 may provide a performance estimation of adaptive filter 320. If there is strong performance from adaptive filter 320 (as indicated by performance estimation node 640), a user audio signal may be present, and therefore suppression may be limited (or nonexistent) so as to avoid distorting the user audio signal. For example, if there is a strong user audio signal, the first audio signal 310 and the third audio signal 345 would be relatively high, and reference signal 305 would be relatively low. Meanwhile, a strong performance from adaptive filter 510 (as indicated by performance estimation function 630) indicates that adaptive filter 510 is cancelling a large quantity of anisotropic background audio signal 155, and therefore suppression may be warranted. For example, when the estimated signal strength of the user audio signal is low, performance estimation function 630 may determine the cancellation performance of anisotropic background audio signal 155 by comparing the respective signal strengths of the third audio signal 345 and the fourth audio signal 520. With anisotropic background audio signal 155 removed from the third audio signal 345, the fourth audio signal 520 has the user audio signal and environmental noise. When meeting attendee 105(1) is not talking (i.e., the estimated signal strength of the user audio signal is low), the fourth audio signal 520 is mainly environment noise.
When the estimated user audio signal strength is relatively low, the suppression gain should be low if the estimated signal strength of anisotropic background audio signal 155 is relatively high and there is strong cancellation performance of anisotropic background audio signal 155. Low suppression gain attenuates anisotropic background audio signal 155 residue in the fourth audio signal 520. When the estimated signal strength of the user audio signal is relatively high, the suppression gain should be calculated based on the mask effect of the user audio signal and anisotropic background audio signal 155. When the estimated signal strength of the user audio signal is much higher than that of anisotropic background audio signal 155, anisotropic background audio signal 155 is masked by the user audio signal, and as such the suppression gain may be relatively high. When the estimated signal strength of anisotropic background audio signal 155 is high relative to the estimated signal strength of the user audio signal, more attenuation is necessary, and therefore the suppression gain should be relatively low.
The suppression gain calculation may consider both global spectrum (for all frequencies) and local spectrum (for specific frequency bins) of the user audio signal and the anisotropic background audio signal 155 signal strength. When global anisotropic background audio signal 155 signal strength is high, even if anisotropic background audio signal 155 signal strength is low for a specific frequency, gain for that frequency may be lower than it would otherwise be when the global anisotropic background audio signal 155 signal strength is low.
Techniques are presented to remove an anisotropic background audio signal from a microphone audio signal before sending an output audio signal to remote side in a conference call. A method that combines anisotropic background audio signal cancellation and suppression may optimize the audio experience for headsets. Multiple microphones may be used in these methods. Two adaptive filters may be used: one for reference signal extraction, and the other for anisotropic background audio signal cancellation. Techniques described herein may apply in boom or boomless headsets.
In one form, an apparatus is provided. The apparatus comprises: a first microphone; a second microphone; and a processor coupled to receive signals derived from outputs of the first microphone and the second microphone, wherein the processor is configured to: obtain, from the first microphone, a first audio signal including a user audio signal and an anisotropic background audio signal; obtain, from the second microphone, a second audio signal including the user audio signal and the anisotropic background audio signal; extract, from the first audio signal and the second audio signal, using a first adaptive filter, a reference audio signal including the anisotropic background audio signal; based on the reference signal, cancel, using a second adaptive filter, the anisotropic background audio signal from a third audio signal derived from the first and/or second audio signals to produce an output audio signal; and provide the output audio signal to a receiver device.
In one example, the apparatus further comprises a first earpiece that houses the first microphone and a second earpiece that houses the second microphone. In a further example, the processor is further configured to: select the third audio signal from a plurality of candidate audio signals, wherein the plurality of candidate audio signals includes the first audio signal, the second audio signal, and the third audio signal. In a still further example, the processor is configured to select the third audio signal based on a signal-to-noise ratio of the first audio signal, a signal-to-noise ratio the second audio signal, and/or a signal-to-noise ratio of the combined signal. In another still further example, the processor is configured to select the third audio signal based on an envelope of the output of the first adaptive filter.
In one example, the apparatus further comprises: a boom that houses the first microphone and the second microphone, wherein the first microphone is a directional microphone oriented toward a source of the user audio signal. In a further example, the third audio signal is the first audio signal. In another further example, the second microphone is a directional microphone oriented away from the source of the user audio signal. In yet another further example, the second microphone is an omnidirectional microphone.
In one example, the processor is configured to cancel the anisotropic background audio signal to produce a fourth audio signal, and the processor is further configured to: calculate a suppression gain based on the user audio signal and the anisotropic background audio signal; and remove a remaining anisotropic background audio signal from the fourth audio signal by applying the suppression gain to the fourth audio signal to produce the output audio signal.
In one example, the processor is further configured to: update coefficients of the first adaptive filter when a signal-to-noise ratio of the first audio signal is greater than a first predefined threshold, and when a signal-to-noise ratio of the second audio signal is greater than a second predefined threshold.
In one example, the processor is further configured to: update coefficients of the second adaptive filter when a signal-to-noise ratio of the reference signal is greater than a first predefined threshold, and when a signal-to-noise ratio of the third audio signal is between a second predefined threshold and a third predefined threshold.
In one example, the processor is further configured to: delay the first audio signal by a length of time substantially equal to a difference between a time at which the user audio signal reaches one of the first microphone and the second microphone and a time at which the user audio signal reaches the other of the first microphone and the second microphone.
In another form, a method is provided. The method comprises: obtaining, from a first microphone on a headset, a first audio signal including a user audio signal and an anisotropic background audio signal; obtaining, from a second microphone on the headset, a second audio signal including the user audio signal and the anisotropic background audio signal; extracting, from the first audio signal and the second audio signal, using a first adaptive filter, a reference audio signal including the anisotropic background audio signal; based on the reference signal, cancelling, using a second adaptive filter, the anisotropic background audio signal from a third audio signal derived from the first and second audio signals to produce an output audio signal; and providing the output audio signal to a receiver device.
In another form, one or more non-transitory computer readable storage media are provided. The non-transitory computer readable storage media are encoded with instructions that, when executed by a processor, cause the processor to: obtain, from a first microphone on a headset, a first audio signal including a user audio signal and an anisotropic background audio signal; obtain, from a second microphone on the headset, a second audio signal including the user audio signal and the anisotropic background audio signal; extract, from the first audio signal and the second audio signal, using a first adaptive filter, a reference audio signal including the anisotropic background audio signal; based on the reference signal, cancel, using a second adaptive filter, the anisotropic background audio signal from a third audio signal derived from the first and second audio signals to produce an output audio signal; and provide the output audio signal to a receiver device.
The above description is intended by way of example only. Although the techniques are illustrated and described herein as embodied in one or more specific examples, it is nevertheless not intended to be limited to the details shown, since various modifications and structural changes may be made within the scope and range of equivalents of the claims.