The present disclosure relates in general to circuits for audio devices, including without limitation multi-microphone speakerphones, teleconference phones, or other audio input devices, and more specifically, to systems and methods for suppressing audio noise incident upon microphones in such audio input devices.
This invention relates to audio signal processing and, in particular, to a circuit that estimates direction of arrival using plural microphones.
As used herein, “telephone” is a generic term for a communication device that utilizes, directly or indirectly, a dial tone from a licensed service provider. For the sake of simplicity, the invention is described in the context of a telephone but has broader utility; e.g., communication devices that do not utilize a dial tone, such as radio frequency transceivers or intercoms.
This present disclosure finds use in many applications where the internal electronics are essentially the same but the external appearance of the device is different.
Communication with telephones, hands-free devices, and other communication systems are often attempted in noisy acoustical environments. For example, communications with a multi-microphone telephone (e.g., telephone 10) may be in a conference room or office with poor acoustics, with significant background noise. Hands-free kits (e.g., hands-free kit 20) may often be used in even more harsh acoustic environments, such as automobiles, airports, and restaurants.
As used herein, “noise” refers to any unwanted sound, whether or not the unwanted sound is periodic, purely random, or somewhere in between. As such, noise includes background music, voices (herein referred to as “babble”) of people other than the desired speaker, tire noise, wind noise, etc.
Many digital signal processing techniques have been proposed for reducing noise. In products with a single microphone, reducing noise is quite difficult when the desired speech and the noise share the same frequency spectrum. It is difficult for these techniques to remove noise without damaging the desired speech. However, if the origin of the noise and the origin of the desired speech are spatially separated, then one can theoretically extract a clean speech signal from a noisy speech signal. One approach to spatially separating the origin of noise and the origin of desired speech is known as beam forming. Beam forming may be employed in a communication device having two or more microphones. With beam forming, one or more beams may be formed by a processing device (e.g., microprocessor, digital signal processor, etc.) of the communication device, wherein each beam acts as a spatial filter that passes acoustic energy from some spatial directions while filtering out acoustic energy from other directions. By forming a beam that points at or near a desired source of acoustic energy (e.g., a person who is speaking), the desired acoustic energy of the speaker may be passed by the spatial filter implemented by a beam while acoustic energy from noise sources or reflections of the desired source may be rejected or attenuated. In this manner, audio quality of the communication device may be improved.
Such improvement in audio quality may only be realized if the beam is pointed at or near the desired source (or alternatively, if the null of the beam is pointed at or near a noise source). However, this presents challenges in a speakerphone or video conference environment, as a location of desired source (e.g., a person talking) may not be known ahead of time and some method of desired source localization may be needed. This localization often takes the form of a “direction of arrival” estimation, wherein the angle of arrival of the desired acoustic energy (or the undesired noise) is estimated. In a speakerphone or videoconference environment, a desired source may move, or another desired source in a different spatial location may also exist (e.g., a second person begins speaking in another part of a room). Accordingly, desired sources must be tracked to maintain the beam pointing in a correct direction.
An existing approach to desired source location is cross-correlation. In cross-correlation, the delay between receipt of sounds at various microphones is calculated, and, because microphone geometry is typically known in advance, the direction of arrival may be determined based on such delay. However, cross-correlation may have many deficiencies. First, cross-correlation may be expensive, especially if there are more than two microphones, because a cross-correlation must be performed between each microphone and all other microphones, requiring significant processing resources. In addition, cross-correlation typically has significant latency, which impacts the rate at which a desired source can be tracked or the beam switched to another desired source. Furthermore, cross-correlation suffers from variation in spatial resolution, in that cross-correlation resolves desired source location when the desired source is about the same distance from each microphone, but as the desired source moves closer to one microphone, such resolution diminishes.
In accordance with the teachings of the present disclosure, one or more disadvantages and/or problems associated with existing approaches to suppressing audio noise in a communication system may be reduced or eliminated.
In accordance with embodiments of the present disclosure, a noise suppression system may include a first microphone input, a second microphone input, a plurality of beam formers, and a beam selector. The first microphone input may be configured to receive a first microphone signal indicative of sounds incident upon a first microphone. The second microphone input may be configured to receive a first microphone signal indicative of sounds incident upon a second microphone. The plurality of beam formers may each be configured to form a respective one of a plurality of beams to receive and spatially filter audible sounds from the first microphone and the second microphone. The plurality of beam formers may include a first beam former, a second beam former, and a third beam former. The first beam former may be configured to form a first unidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone. The second beam former may be configured to form a second unidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone, the second unidirectional beam having a spatial null in a direction different from that of the first unidirectional beam. The third beam former may be configured to form an omnidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone. The beam selector may be configured to determine a first energy of audible sounds filtered by the first unidirectional beam and a second energy of audible sounds filtered by the second unidirectional beam, and, based on at least the first energy and the second energy, select one of the plurality of beams as a selected beam.
In accordance with these and other embodiments of the present disclosure, a method may include forming a plurality of beams to receive and spatially filter audible sounds from a first microphone and a second microphone. The plurality of beams may include a first unidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone, a second unidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone, the second unidirectional beam having a spatial null in a direction different from that of the first unidirectional beam, and an omnidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone. The method may also include determining a first energy of audible sounds filtered by the first unidirectional beam and a second energy of audible sounds filtered by the second unidirectional beam and, based on at least the first energy and the second energy, selecting one of the plurality of beams as a selected beam.
In accordance with embodiments of the present disclosure, a noise suppression system may include a first microphone input, a second microphone input, a plurality of beam formers, a beam selector, and a beam mixer subsystem. The first microphone input may be configured to receive a first microphone signal indicative of sounds incident upon a first microphone. The second microphone input may be configured to receive a first microphone signal indicative of sounds incident upon a second microphone. The plurality of beam formers may each be configured to form a respective one of a plurality of beams to receive and spatially filter audible sounds from the first microphone and the second microphone. The plurality of beam formers may include a first beam former and a second beam former. The first beam former may be configured to form a first unidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone. The second beam former may be configured to form a second unidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone, the second unidirectional beam having a spatial null in a direction different from that of the first unidirectional beam. The beam selector may be configured to determine a first energy of audible sounds filtered by the first unidirectional beam and a second energy of audible sounds filtered by the second unidirectional beam, and, based on at least the first energy and the second energy, select at least one of the plurality of beams as a selected beam. The beam mixer subsystem may be configured to mix selected beams to create an audio output signal.
In accordance with these and other embodiments of the present disclosure, a method may include forming a plurality of beams to receive and spatially filter audible sounds from a first microphone and a second microphone. The plurality of beams may include a first unidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone and a second unidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone, the second unidirectional beam having a spatial null in a direction different from that of the first unidirectional beam. The method may also include determining a first energy of audible sounds filtered by the first unidirectional beam and a second energy of audible sounds filtered by the second unidirectional beam and, based on at least the first energy and the second energy, selecting at least one of the plurality of beams as a selected beam. The method may further include when selection is switched from a first selected beam to a second selected beam, mixing selected beams to create an audio output signal.
Technical advantages of the present disclosure may be readily apparent to one skilled in the art from the figures, description and claims included herein. The objects and advantages of the embodiments will be realized and achieved at least by the elements, features, and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are examples and explanatory and are not restrictive of the claims set forth in this disclosure.
A more complete understanding of the present embodiments and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
As shown in
Beam formers 200 may comprise microphone inputs for receiving microphone signals generated by each of the plurality of microphones 100 and may generate a plurality of beams based on such microphone signals. Each of the plurality of beam formers 200 of a noise suppression system 101 may be configured to form a respective one of a plurality of beams to spatially filter audible sounds from the plurality of microphones 100. The plurality of beam formers 200 may include at least two unidirectional beam formers 200a. Each unidirectional beam former 200a may be configured to form a respective unidirectional beam to receive and spatially filter audible sounds from microphones 100, wherein such respective unidirectional beam has a spatial null in a direction different from that of all other unidirectional beams formed by other unidirectional beam formers 200a, such that the beams formed by unidirectional beam formers 200a all point in a different direction. In addition, in some embodiments, the plurality of beam formers 200 may include an omnidirectional beam former 200b configured to form an omnidirectional beam to receive and spatially filter audible sounds from the first microphone and the second microphone.
In some embodiments, beam formers 200 may be implemented as time-domain beam formers. The various beams formed by beam formers 200 may be formed at all times during operation. While
Beam selector 300 may include any suitable system, device, or apparatus configured to receive the simultaneously formed plurality of beams (or, at least the plurality of unidirectional beams) from beam formers 200, and, based on an analysis thereof, select which of the simultaneously-formed beams will be output as an audio output signal from a noise suppression system 101. As shown in
Parameter estimation block 301 may comprise any system, device, or apparatus configured to estimate the acoustic energy present in each unidirectional beam formed by beamformers 200a. In some embodiments, parameter estimation block 301 may estimate the acoustic energy in each unidirectional beam on a frame-by-frame basis, wherein each frame is a collection of samples (e.g., 32 or 64 samples) of an input signal. Frame processing may provide a way to more efficiently process samples, because they are processed in batches. In other embodiments, parameter estimation block 301 may estimate the acoustic energy in each unidirectional beam on a sample-by-sample basis. In such embodiments, digital samples of each frame of the output of each unidirectional beam may be squared and summed. Thus, a frame energy E[m] for a unidirectional beam may be given as:
where x[n] is the nth sample of a frame and N is the number of samples in the frame. In some embodiments, the frame energy E[m] may be smoothed by an exponential averaging filter to generate a smoothed energy estimate Es[m] for the frame given by the equation:
Es[m]=rEs[m−1]+(1−r)E[m]
where r is a constant weighting factor between 0 and 1.
Parameter estimation block 301 may output estimated acoustic energies for each unidirectional beam (e.g., frame energies, or smoothed frame energies) to beam selection block 302. Based on these estimated acoustic energies, beam selection block 302 may select one or more beams to be output as an audio output signal of the noise suppression system 101, and output a signal SELECTED BEAM indicating the one or more beams selected. In some embodiments, beam selection block 302 may select a unidirectional beam if a scaled energy associated with such unidirectional beam is greater than each of the estimated energies of the other unidirectional beams. A scaled energy for a unidirectional beam may be calculated by multiplying a tolerance factor by the estimated energy (e.g., frame energy, smoothed frame energy) for such unidirectional beam, wherein such tolerance factor is a constant between 0 and 1. In some embodiments, the tolerance factor may be the same for each unidirectional beam, while in other embodiments, the tolerance factors for each unidirectional beam may be different. In these and other embodiments, the tolerance factor for each unidirectional beam may be adjustable by a user of a noise suppression system 101. Thus, beam selection block 302 may apply the following algorithm in order to select a beam for the audio output signal:
Thus, using such algorithm, or an algorithm similar thereto, beam selection block 302 may attempt to identify a dominant unidirectional beam with an energy much greater than the other unidirectional beams, and if such a dominant beam is identified, the dominant beam may be selected. Otherwise, if no dominant beam is found, the omnidirectional beam may be selected.
In the algorithm for beam selection described above, an omnidirectional beam is used as a default beam in the event no dominant unidirectional beam is identified. However, in other embodiments, instead of having a default beam, each beam may be mixed in some proportion to create an audio output signal. In such embodiments, each beam would be configured to detect voice activity on such beam, and if voice activity is detected on a unidirectional beam, then the unidirectional beam may be mixed into the output signal. The proportion of such mixing (e.g., the gain applied to mix each unidirectional beam) might be varied depending on the detected volume of speech or some other criteria. For example, in some embodiments, all of the energy levels of audio sources may be normalized such that they might have the same volume at a communication device to which the captured sound is transmitted.
Immediately switching an audio output of a noise suppression system 101 from one beam to another may lead to audio artifacts (e.g., “pops” and “clicks” which may be noticeable by a listener receiving the audio output of the noise suppression system 101). Accordingly, measures to reduce such artifacts may be desirable.
As shown in
Step Size=1/(Fs×RampTime)
where Fs is a sampling frequency of the noise suppression system 101 and RampTime is a time in seconds which it takes to ramp a signal gain from 0 to 1.
In each sample, the gain of the selected beam may be increased by the value Step Size, to a maximum of 1, while the gain of the other beams may be decreased by the value StepSize, to a minimum of 0. At each sample, after the gains for each beam have been updated, they may be applied to each beam, and the beams summed to generate the audio output signal AUDIO OUT.
In alternative embodiments of beam mixer 400, beams may have different maximum gains for each beam depending on a beam type. For example, a beam lacking a null (e.g., an omnidirectional beam) may have a lower maximum gain than a beam with a null (e.g., a unidirectional beam), as the beam without a null receives more energy.
The methods and systems disclosed herein may be an improvement over known approaches for noise suppression, as the methods and systems may provide for noise suppression in a communication device using a small number of microphones (e.g., 2 or 3), and such microphones can be implemented using omnidirectional microphones, which may be more cost-effective than directional microphones. The methods and systems disclosed herein may also provide for noise suppression in an efficient manner requiring relatively low processing and memory resources. The approaches disclosed herein may have low latency, may be able to switch between desired audio sources in a short amount of time, and may have the same spatial resolution at all angles.
This disclosure encompasses all changes, substitutions, variations, alterations, and modifications to the exemplary embodiments herein that a person having ordinary skill in the art would comprehend. Similarly, where appropriate, the appended claims encompass all changes, substitutions, variations, alterations, and modifications to the exemplary embodiments herein that a person having ordinary skill in the art would comprehend. Moreover, reference in the appended claims to an apparatus or system or a component of an apparatus or system being adapted to, arranged to, capable of, configured to, enabled to, operable to, or operative to perform a particular function encompasses that apparatus, system, or component, whether or not it or that particular function is activated, turned on, or unlocked, as long as that apparatus, system, or component is so adapted, arranged, capable, configured, enabled, operable, or operative.
All examples and conditional language recited herein are intended for pedagogical objects to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are construed as being without limitation to such specifically recited examples and conditions. Although embodiments of the present inventions have been described in detail, it should be understood that various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the disclosure.
The present disclosure claims priority to U.S. Provisional Patent Application Ser. No. 61/900,041, filed Nov. 5, 2013, which is incorporated by reference herein in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
7242781 | Hou | Jul 2007 | B2 |
7970150 | Oxford | Jun 2011 | B2 |
8565446 | Ebenezer | Oct 2013 | B1 |
20080317259 | Zhang | Dec 2008 | A1 |
20090055170 | Nagahama | Feb 2009 | A1 |
20130190041 | Andrews et al. | Jul 2013 | A1 |
Number | Date | Country |
---|---|---|
2011104655 | Sep 2011 | WO |
Number | Date | Country | |
---|---|---|---|
61900041 | Nov 2013 | US |