The present disclosure relates to audio processing and, more particularly, to methods and apparatuses for reducing multiple sources of side interference using adaptive microphone arrays.
A microphone array is a group of two or more microphones arranged in a specific geometric configuration and used to gather and process acoustic signals. One advantage of using a microphone array over a single microphone lies in the fact that the array adds dimensional information to the signal acquisition process. Accordingly, beam forming techniques may be used to provide a main lobe for receiving signals of interest that arrive from one or more desired directions. Beam forming increases the gain of the microphone array in one or more desired directions, while decreasing the gain in other directions to thereby improve the signal-to-noise ratio of a desired signal.
Adaptive microphone arrays may be configured to reduce side interference from sources of acoustic energy that are not situated in the main lobe of the array. For example, a source of undesired noise may be situated outside of the main lobe of the array at a left side of the array, or at a right side of the array. Temporal and spatial information included in the signals collected by the microphone array are analyzed using array signal processing and adaptation procedures to formulate a filter transfer function for the array. The filter transfer function provides the microphone array with a fixed directional pattern that reduces the response of the array to side interference arriving from the left or the right of the array. A null, or direction of minimum response, is provided in a particular direction along a specific fixed bearing.
Side interference may be reduced by utilizing a first circuit that orients a null towards a desired signal source, as well as a second circuit that provides maximum sensitivity towards the main lobe including the desired signal source. The first circuit provides the null by generating a difference signal between a first pair of microphones in the array. This difference signal primarily includes signals gathered from the left side of the array in a first side lobe, and signals gathered from the right side of the array in a second side lobe, but little or no signals gathered from the main lobe of the array. The second circuit provides maximum sensitivity towards the main lobe by generating a summed signal or a differential signal from a second pair of microphones. For example, the summed signal primarily includes signals gathered from the main lobe of the array, along with some signals from the left and right sides of the array.
An adaptive filtering mechanism may be employed to perform an interference cancellation procedure. The first signal produced by the first circuit and containing little or no signal from the source is filtered and subtracted out of a second signal produced by the second circuit and including the source. A source located off-axis, such that it is located in the first or second side lobe, is removed from the sum signal by means of the adaptive filter. Thus, the adaptive filtering mechanism effectively provides the microphone array with a flexible directional pattern that reduces the response of the array to side interference. The adaptive array provides a flexible spatial response pattern so as to reduce sensitivity of the array to signal sources outside of the main lobe by tracking undesired interfering sources that are situated to the side of the array.
The foregoing approach may provide acceptable results where a single source of interference is to the right or to the left of the array, or where multiple sources of interference are all situated in the same direction (left or right) with respect to the array. However, this approach is less effective for cancelling side interference in situations where multiple sources of interference are located on both the right side and the left side of the array. Due to the fact that signals in the first side lobe of the array (and to the left of the array) are 180 degrees out of phase with respect to signals in the second side lobe of the array (and to the right of the array), when employing a conventional difference circuit, it is mathematically impossible for the adaptive filter to find a solution that will provide simultaneous cancellation of interference from both the right side and the left side of the array.
In at least some embodiments, the present invention relates to a method that includes configuring a plurality of microphones in an array to gather signals from a desired source of sound in a main lobe of the array, and configuring the plurality of microphones to gather side interference by controlling at least one of a relative phase or a relative magnitude of signals gathered by each of the plurality of microphones to provide a respective plurality of differential microphone arrays each having a corresponding directional pattern that orients a null towards the desired source of sound and that provides one or more side lobes. The gathered side interference from the plurality of differential microphone arrays is processed to generate a reference signal that is subtracted from the signals that are gathered from the main lobe of the array to provide an output signal wherein the side interference is reduced or cancelled.
According to a set of further embodiments, the reducing or cancelling of the side interference is provided by processing the reference signal through an adaptive filter which tracks one or more undesired signal sources. The reference signal is processed by means of adjusting and adapting a set of weights for the adaptive filter such that energy minimization in the final output is achieved. For illustrative purposes, the adaptive filter may be a finite impulse response (FIR) filter, and the adaptive filter may utilize an adaptive mechanism such as least mean squares (LMS).
In at least some embodiments, the present invention relates to an apparatus that includes a plurality of microphones arranged in an array for gathering signals from a desired source of sound in a main lobe of the array. A phasing or filtering mechanism, operatively coupled to the plurality of microphones, configures the array for gathering side interference by controlling at least one of a relative phase or a relative magnitude of signals gathered by each of the plurality of microphones to provide a respective plurality of differential microphone arrays each having a corresponding directional pattern that orients a null towards the desired source of sound and that provides one or more side lobes. A processing mechanism, operatively coupled to the phasing or filtering mechanism and the array, processes the gathered side interference from the plurality of differential microphone arrays to generate a reference signal that is subtracted from the signals that are gathered from the main lobe of the array to provide an output signal wherein the side interference is reduced or cancelled.
Moreover, in at least some embodiments, the present invention relates to a non-transitory computer-readable memory encoded with a computer program comprising computer readable instructions recorded thereon for execution of a method that includes configuring a plurality of microphones in an array to gather signals from a desired source of sound in a main lobe of the array, and configuring the plurality of microphones to gather side interference by controlling at least one of a relative phase or a relative magnitude of signals gathered by each of the plurality of microphones to provide a respective plurality of differential microphone arrays each having a corresponding directional pattern that orients a null towards the desired source of sound and that provides one or more side lobes. The gathered side interference from the plurality of differential microphone arrays is processed to generate a reference signal that is subtracted from the signals that are gathered from the main lobe of the array to provide an output signal wherein the side interference is reduced or cancelled.
A second summer 154 is used to invert either the first output signal from the first microphone 151 or the second output signal from the second microphone 152 prior to summing the first and second output signals, to generate a difference signal between the first microphone 151 and the second microphone 152. The difference signal provides a minimum sensitivity along the look-up direction of the array, and provides a maximum sensitivity in a direction defined by the line connecting the two microphones. Thus the maximum sensitivity direction of the summation branch at the output of the first summer 153 coincides with the minimum sensitivity of the difference branch at the output of the second summer 154 of the array 150. The output of the difference branch at the output of the second summer 154 is then filtered by a finite impulse response (FIR) filter 156, which is trained using an adaptive algorithm such as normalized least mean squares (nLMS) 159, for example, to remove a maximum common signal that is present in both the sum and difference branches by means of a third summer 158. This maximum common signal is located to the sides of the look-up direction of the array 150. The nLMS algorithm represents a class of adaptive filter used to mimic a desired filter by determining a set of filter coefficients that produce a least mean square of an error signal. The error signal represents the difference between a desired signal and an actual signal. The nLMS algorithm constitutes a stochastic gradient descent method in that the filter is adapted based on the error at the current moment in time.
The arrays described in conjunction with
The use of a DSB in the array 150 of
Another approach is to use a differential microphone array (DMA) that forms a signal oriented along an intended look-up direction comprising a main lobe of the array. In a DMA, a first signal from a first microphone is delayed with respect to a second signal from a second microphone, and then the first signal is subtracted from the second signal. The amount of delay introduced into the first signal and the geometry of the array determine the direction of the null(s) formed by the array. DMAs are of relatively constant directivity, at least up to a specific frequency and dependent on the geometry of the array. DMAs allow minimum sensitivity (a null) to be directed at two arbitrary angles, axially symmetric to the line connecting the two or more microphones. The differential array is an end-fire array, and the maximum sensitivity of the array is along the line connecting the two or more microphones.
The first signal from the first microphone 301 is delayed with respect to the second signal from the second microphone 302 by means of the first delay line 303, and the delayed first signal is subtracted from the second signal using a first summer 305. Likewise, the second signal from the second microphone 302 is delayed with respect to the first signal from the first microphone 301 by means of the second delay line 304, and the delayed second signal is subtracted from the first signal using a second summer 306.
The amount of delay introduced by the first and second delay lines 303, 304 and the geometry of the array 300 determine the direction of the null(s) formed by the array 300. For example, the first summer 305 may provide an output signal that illustratively implements a cardioid-shaped spatial response pattern, with the null of the cardioid directed to the left of the array 300, and with the main lobe of the cardioid directed to the right of the array 300. The second summer 306 may provide an output signal that illustratively implements a cardioid-shaped spatial response pattern, with the null of the cardioid directed to the right of the array 300, and with the main lobe of the cardioid directed to the left of the array 300. The first delay line 303, the second delay line 304, the first summer 305, and the second summer 306 together comprise a differential microphone array (DMA) element 308.
The delay introduced by the first and second delay lines 303, 304 may, but need not, be a function of frequency. It should be appreciated that any one or more of the delay lines 303, 304 (or otherwise) can take any of a variety of forms depending upon the embodiment or circumstance and, for example, can be an integer sample delay, a fractional sample delay, a frequency dependent delay, or a delay filter. Also, implementations may be provided in the time domain or the frequency domain. The array 300 conceptually illustrates how a set of processing elements (delay lines 303, 304 and summers 305, 306) may be used to produce signals with opposing spatial directivities, such as the right-oriented cardioid and left-oriented cardioid discussed previously.
The array 400 of
An output of the first DMA element 404 is fed to an input of a first amplifier 407, an output of the second DMA element 405 is fed to an input of a second amplifier 408, and an output of the Nth DMA element 406 is fed to an input of an Nth amplifier 409. Each amplifier may provide frequency dependent amplification. An output of the first amplifier 407 is fed to a first non-inverting input of a summer 410. An output of the second amplifier 408 is fed to a second non-inverting input of the summer 410. An output of the Nth amplifier 409 is fed to an Nth non-inverting input of the summer 410. An output from the summer 410 is regarded as the output of the array 400. The characteristics of the first, second, and Nth DMA elements 404, 405 and 406 may each be adjusted, along with the gains provided by each of the first, second, and third amplifiers 407, 408, and 409 to provide any of a plurality of spatial directivity patterns. Provided that certain conditions are met, including an adequate spatial distribution of physical microphones, any number of microphones 401, 402, 403, DMA elements 404, 405, 406, and amplifiers 407, 408, 409 can be linearly combined by the summer 410, to form spatial directivity patterns of arbitrary shapes and orientations.
An output of the first amplifier 507 is fed to a first non-inverting input of a summer 510. An output of the second amplifier 508 is fed to a second non-inverting input of the summer 510. An output of the Nth amplifier 509 is fed to an Nth non-inverting input of the summer 510. An output from the summer 510 is regarded as the output of the array 500. The delays provided by the first, second, and Nth delay lines 504, 505, and 506 may each be adjusted, along with the gains provided by each of the first, second, and third amplifiers 507, 508, and 509 to provide any of a plurality of spatial directivity patterns. Provided that certain conditions are met, including an at least minimal spatial distribution of physical microphones, any number of microphones 501, 502, 503, delay lines 504, 505, 506, and amplifiers 507, 508, 509 can be linearly combined by the summer 510, to form spatial directivity patterns of arbitrary shapes and orientations.
According to a set of illustrative embodiments disclosed herein, methods are provided for reducing multiple sources of side interference using adaptive microphone arrays. For illustrative purposes, the adaptive microphone array may, but need not, be implemented using any of the systems described previously in connection with
Refer now to
The first, second, and Nth microphones 101, 102, and 103 gather signals from a desired source of sound in a main lobe of the array 100. The main lobe of the array 100 may be defined as the look-up direction of the array 100. In the example of
The first, second, and Nth microphones 101, 102, and 103 are also used to gathers side interference from one or more sources of undesired noise that are situated outside of the main lobe of the array. A relative phasing or filtering for each of the first, second, and Nth microphones 101, 102, and 103 is controlled to provide a respective plurality of differential microphone arrays each having a corresponding directional pattern that orients a null towards the desired source of sound and that provides one or more side lobes. For example, a second microphone array phasing or filtering mechanism 105 controls a relative phase and/or magnitude of the first, second, and Nth microphones 101, 102, and 103 to provide a first directional pattern 115 (
A third microphone array phasing or filtering mechanism 106 (
In
The first, second, third, and fourth directional patterns 115, 116, 117, and 118 all have nulls that are oriented at 0 degrees. Note that the location of this null can be “fixed” by design. The null can be oriented in a direction, where it is a priory known the desired talker will be located (for example, on the display side of a mobile device, or in front of a TV set. or appliance). Alternatively, the direction in which the desired talker is located can be tracked by either external means (for example, using camera(s) or other sensors on the device to detect when a person moves out of the “null” region), or by other methods such as tracking the voice of a known talker and the direction of arrival or location of that talker with respect to the microphone array 100 (
The array 100 of
Any number of different directional patterns may be provided by means of microphone array phasing or filtering mechanisms including the first, second, and third microphone array phasing or filtering mechanisms 104, 105, and 106 and optional additional microphone array phasing or filtering mechanisms.
The gathered side interference from the second and third microphone array phasing or filtering mechanisms 105 and 106 is processed to generate a reference signal that is subtracted from the signals that are gathered from the main lobe of the array by the first microphone array phasing or filtering mechanism 104 to provide an output signal wherein the side interference is reduced or cancelled 109. In the example of
The first FIR filter 107 generates a first reference signal, which contains the undesired interference source that is subtracted from the output of the first microphone array phasing or filtering mechanism using an inverting input of a first summer 110, and the second FIR filter 108 generates a second reference signal that is subtracted from the output of the first summer using an inverting input of a second summer 111. The output of the second summer 111 represents an output signal with reduced side noise 109. By way of example, the first and second reference signals generated by the first and second FIR filters 107 and 108 may illustratively be used as inputs for a noise estimation procedure to be performed by a noise suppressor.
The first FIR filter 107 and the second FIR filter 108 may each include a delay line that is implemented using a set of memory elements. The first and second FIR filters 107 and 108 are shown for illustrative purposes, as more than two FIR filters may be provided. The first and second FIR filters 107 and 108 each exhibit an “impulse response” in the form of a set of FIR coefficients or weights. For example, if an impulse, such as a single “1” sample followed by many “0” samples, is fed to the input of the first FIR filter 107, the output of the filter will be a set of coefficients or weights where the “1” sample sequentially moves past each coefficient in turn to form the output of the first FIR filter 107. In the example of
Illustratively, the first FIR filter 107 and second FIR filter 108 could, but need not, be implemented using a digital signal processor (DSP) microprocessor that is configured for executing one or more looped instructions. The first and second FIR filters 107 and 108 may be configured for performing multi-rate applications such as decimation (reducing the sampling rate), interpolation (increasing the sampling rate), or both. One or more taps may be provided by the first and second FIR filters 107 and 108 where each tap represents a coefficient/delay pair. The number of FIR taps may be selected in accordance with a desired amount of filtering to be performed by the first FIR 107 or the second FIR filter 108. Typically, the first and second FIR filters 107 and 108 do not provide a clearly defined stop-band/pass-band. Rather, the weights of the filter and the resulting filter shape approximate a transfer function describing a spatio-temporal relationship between an interference source and a set of microphones forming the two signals (desired and reference). Thus, increasing the taps of the first and/or second FIR filters 107 and 108 leads to better tracking of, and consequently removal of the unwanted source.
The aforementioned operational sequence of
As shown in
In the present embodiment of
The WLAN transceiver 205 may, but need not, be configured to conduct Wi-Fi communications in accordance with the IEEE 802.11 (a, b, g, or n) standard with access points. In other embodiments, the WLAN transceiver 205 can instead (or in addition) conduct other types of communications commonly understood as being encompassed within Wi-Fi communications such as some types of peer-to-peer (e.g., Wi-Fi Peer-to-Peer) communications. Further, in other embodiments, the WLAN transceiver 205 can be replaced or supplemented with one or more other wireless transceivers configured for non-cellular wireless communications including, for example, wireless transceivers employing ad hoc communication technologies such as HomeRF (radio frequency), Home Node B (3G femtocell), Bluetooth and/or other wireless communication technologies such as infrared technology. Thus, although in the present embodiment the mobile device 108 has two of the wireless transceivers 203 and 205, the present disclosure is intended to encompass numerous embodiments in which any arbitrary number of (e.g., more than two) wireless transceivers employing any arbitrary number of (e.g., two or more) communication technologies are present.
Exemplary operation of the wireless transceivers 202 in conjunction with others of the internal components of the mobile device 200 can take a variety of forms and can include, for example, operation in which, upon reception of wireless signals, the internal components detect communication signals and one or more of the wireless transceivers 202 demodulate the communication signals to recover incoming information, such as voice and/or data, transmitted by the wireless signals. After receiving the incoming information from one or more of the wireless transceivers 202, the processor 204 formats the incoming information for the one or more output devices 208. Likewise, for transmission of wireless signals, the processor 204 formats outgoing information, which may or may not be activated by the input devices 210, and conveys the outgoing information to one or more of the wireless transceivers 202 for modulation to communication signals. The wireless transceivers 202 convey the modulated signals by way of wireless and (possibly wired as well) communication links to other devices such as a server and possibly one or more content provider websites (as well as possibly to other devices such as a cell tower, access point, or another server or any of a variety of remote devices).
Depending upon the embodiment, the mobile device 200 may be equipped with one or more input devices 210, or one or more output devices 208, or any of various combinations of input devices 210 and output devices 208. The input and output devices 208, 210 can include a variety of visual, audio and/or mechanical outputs. For example, the output device(s) 208 can include one or more visual output devices 216 such as a liquid crystal display and light emitting diode indicator, one or more audio output devices 218 such as a speaker, alarm and/or buzzer, and/or one or more mechanical output devices 220 such as a vibrating mechanism. The visual output devices 216 can include, among other things, a video screen.
The input devices 210 (
The mobile device 200 may also include one or more of various types of sensors 228. The sensors 228 can include, for example, proximity sensors (a light detecting sensor, an ultrasound transceiver or an infrared transceiver), touch sensors, altitude sensors, a location circuit that can include, for example, a Global Positioning System (GPS) receiver, a triangulation receiver, an accelerometer, a tilt sensor, a gyroscope, or any other information collecting device that can identify a current location or user-device interface (carry mode) of the mobile device 200. Although the sensors 228 are for the purposes of
The memory 206 of the mobile device 200 can encompass one or more memory devices of any of a variety of forms (e.g., read-only memory, random access memory, static random access memory, dynamic random access memory, etc.), and can be used by the processor 204 to store and retrieve data. The memory 206 may comprise a computer-readable memory. In some embodiments, the memory 206 can be integrated with the processor 204 in a single device (e.g., a processing device including memory or processor-in-memory (PIM)), albeit such a single device will still typically have distinct portions/sections that perform the different processing and memory functions and that can be considered separate devices.
The data that is stored by the memory 206 can include, but need not be limited to, operating systems, applications, and informational data, such as a database. Each operating system includes executable code that controls basic functions of the communication device, such as interaction among the various components included among the mobile device 200, communication with external devices via the wireless transceivers 202 and/or the component interface 212, and storage and retrieval of applications and data, to and from the memory 206. In addition, the memory 206 can include one or more applications for execution by the processor 204. Each application can include executable code that utilizes an operating system to provide more specific functionality for the communication devices, such as file system service and the handling of protected and unprotected data stored in the memory 206.
Informational data is non-executable code or information that can be referenced and/or manipulated by an operating system or application for performing functions of the communication device. One such application is a client application which is stored in the memory 206 and configured for performing the methods described herein. The client application is intended to be representative of any of a variety of client applications that can perform the same or similar functions on any of various types of mobile devices, such as mobile phones, tablets, laptops, etc. The client application is a software-based application that operates on the processor 204 and is configured to provide an interface between one or more input devices 210, or one or more output devices 208, or any of various combinations thereof. In addition, the client application governs operation of one or more of the input and output devices 210, 208. Further, the client application may be configured to work in conjunction with a visual interface, such as a display screen, that allows a user of the mobile device 200 to initiate various actions. The client application can take any of numerous forms and, depending on the embodiment, be configured to operate on, and communicate with, various operating systems and devices. It is to be understood that various processes described herein as performed by the mobile device 200 can be performed in accordance with operation of the client application in particular, and/or other application(s), depending on the embodiment.
Further as shown, signals gathered by each of the first microphone 101, the second microphone 102, and the one or more additional microphones represented by the Nth microphone 103 are applied to a differential microphone array (DMA) processor 901. The DMA processor 901 uses differential processing to generate a plurality of differential output signals from the first microphone 101, second microphone 102, and one or more additional microphones represented by the Nth microphone 103. In turn, a first differential output signal is applied by the DMA processor 901 to a first microphone array subsystem 910, a second differential output signal is applied by the DMA processor to a second microphone array subsystem 912, and one or more additional differential output signals are respectively applied to one or more additional microphone array subsystems that are represented by an Mth microphone array subsystem 914. Again, as represented by dots provided between the second microphone array subsystem 912 and the Mth microphone array subsystem 914, the diagram is intended to indicate that any number M of three or more microphone array subsystems can be present. In the example of
Each of the microphone array subsystems 910, 912, and 914 produces a directional output signal that has a maximized amplitude response in a particular direction or set of directions, so as to provide spatial directivity. In the present example, the directivity of the first microphone array subsystem 910 is arranged such that it is most sensitive in a direction corresponding to that of a desired talker or acoustic source (corresponding to 0 degree on
Illustratively, one or more of the microphone array subsystems additionally are operatively coupled to one or more corresponding FIR filters that are equipped to perform nLMS procedures and to generate an output signal 940. More particularly, in the present embodiment, the second microphone array subsystem 912 is coupled to a first FIR filter 920, which receives both an output signal from the second microphone array subsystem 912 and also a nLMS signal from a first nLMS procedure block 930. An output signal from the first FIR filter 920 in turn is subtracted from an output signal provided from the first microphone array subsystem 910, at a first summing junction 942, to output a first difference signal 943.
Additionally it should be appreciated from
For example, in the event that the Mth microphone array subsystem 914 is a fourth microphone array subsystem (that is, in the case where M=4), the FIR filter 921 will be the third FIR filter, the NLMS procedure block 944 will be the third nLMS procedure block, and the summing junction 946 will be the third summing junction. In such case, the third summing junction would operate to subtract the output signal from the third FIR filter from the second difference signal output by a second summing junction (which is represented by, but not expressly shown in,
Additionally in such an example embodiment, an (M−1)th microphone array subsystem would also be present and would be a third microphone array subsystem. That third microphone array subsystem would operate in conjunction with an (M−2)th FIR filter, an (M−2)th nLMS procedure block, and an (M−2)th summing junction that would respectively be a second FIR filter, a second nLMS procedure block, and a second summing junction, respectively. Further with respect to such an example embodiment, the second summing junction would operate to subtract the output signal from the second FIR filter from the first difference signal output by the first summing junction 942 so as to generate a second difference signal that would both be provided to the second nLMS procedure block and also be provided to the third summing junction constituting the (M−1)th summing junction.
Multiple alternative embodiments of
Additionally, notwithstanding any of the above discussion concerning
It is specifically intended that the present disclosure not be limited to the particular embodiments and illustrations contained herein. For example, even though
Many adaptive array systems tend to rely on additional supporting mechanisms which may be essential or desirable for purposes of providing optimal operation of the array, but not all of these supporting mechanisms are explicitly illustrated in the drawings. One such mechanism is voice activity detection (VAD). VAD is often useful in the context of adaptive arrays in order to detect a presence of desired talker activity and, in response thereto, to slow down the adaptation process, or even freeze adaptation. This slowing down is performed in order to minimize audible artifacts introduced in the desired talker's speech by the FIR filter(s) continuously changing weights.
Another supporting mechanism is control of adaptation rate for the multiple nLMS loops present in these embodiments. The adaptation times for each of the successive loops is subject to optimization for specific geometrical and system requirements. Other details include the size of the FIR filters if an FIR mechanism is used. By way of illustration, an adaptive system may be set up such that a fast convergence rate is achieved in the loop using a dipole reference, with slower convergence rates being provided in successive loops. The successive loops may require longer FIR filters in order to perform better tracking and extraction of the remaining interference signals. These and other constraints are not essential for the idea presented.
Although some of the embodiments disclosed herein describe use of FIR filters in an adaptive system, this is for illustrative purposes only. Adaptation using recursive filters, as well as adaptation in the frequency domain, can also be used. Forming one or more constituent elements for microphone arrays can also be performed using various combinations of time delay (including fractional), time domain FIR or infinite impulse response (IIR) filtering, as well as frequency domain filtering and processing. The choice of one approach versus the other is based on factors such as: available code base, other processing already done in the system, computational costs, current drain costs, development costs and time schedules.
It should be appreciated that one or more embodiment encompassed by the present disclosure are advantageous in one or more respects. Thus, it is specifically intended that the present disclosure not be limited to the embodiments and illustrations contained herein, but include modified forms of those embodiments including portions of the embodiments and combinations of elements of different embodiments as come within the scope of the following claims.