Aspects of the present disclosure relate to detecting and reducing noise in an audio signal, produced in response to non-acoustic stimuli, and generated using a microphone array (e.g., an array of microphones spaced apart along a linear axis). Non-acoustic stimuli can include wind striking the microphones in the microphone array from various angles and at various speeds. Another example of non-acoustic stimuli can be someone touching, or otherwise coming into contact with, one or more of the microphones in the microphone array. It is usually desirable for a microphone array to be insensitive to non-acoustic stimuli. In contrast, sensitivity to some but not all acoustic stimuli is generally desirable. For example, speech from a talker is usually a desirable acoustic stimulus, whereas speech from a competing talker is usually not a desirable acoustic stimulus. For an array with an objective to capture speech from a talker, examples of undesirable acoustic stimuli include, but are not limited to, road and tire noise, fan noises, honking horns, keys jingling, television sounds in the background, and music from a radio.
In a microphone array, signals produced by two or more microphones can be combined to form an output audio signal. For instance, the output audio signal can be generated through beamforming, which may involve introducing a time delay to one or more microphone signals so as to take advantage of the spatial relationship between the microphone capsules. Beamforming can be used, for example, to programmatically design a directional pickup response by exploiting the unique phase information captured by omnidirectional microphones. Beamforming enables the polar pattern of the microphone array's overall response to be shaped in many different ways, including cardioid, hyper-cardioid, figure-8, etc.
Aspects of the present disclosure also relate to calibrating a system with a microphone array in order to compensate for mismatched microphones. In a microphone array, the responses of the individual microphones should ideally be the same in order to permit accurate beamforming. Mismatches due to variations in microphone components, such as the transducers that convert acoustic energy into electrical signals, are typically handled through gain calibration at the time of manufacture. Transducer assemblies are usually referred to as microphone capsules. Capsules generally include a diaphragm that vibrates in response to sound and electrical components that convert the vibration of the diaphragm into an electrical signal. In the present disclosure, the terms “capsule” and “microphone” are sometimes used interchangeably since the behavior of a microphone is dictated by its capsule. Once a capsule has been fully enclosed (e.g., placed into a housing, with a grille and a foam windscreen) the response of the capsule now includes the acoustic path through said enclosure (e.g., housing), which is good to measure, but at this stage in production processes it becomes prohibitive to use conventional gain calibration because electrical components (e.g., gain-trimming resistors) cannot be added or removed. Alternatively, for microphones assemblies which include onboard memory and signal processing, it is possible to store the results of measurements in memory so that calibration can be applied digitally. However, this does not address the fact that microphone sensitivities can change over time, and at different rates for different frequencies.
Methods, apparatuses, systems, and computer-readable media are disclosed for improved detection and reduction of noise in microphone signals generated using a microphone array. In particular, techniques are described for determining whether signals from the microphones in the array are due to non-acoustic stimuli (e.g., wind), and for removing or at least substantially reducing the portion of the output of the array that belongs to such non-acoustic stimuli without significantly affecting signals which are correlated. A primary use case for the techniques described herein is the detection and reduction of noise caused by wind buffeting. However, the techniques can be applied to detect and cancel other non-acoustic stimuli.
Various aspects of the present disclosure relate to ways to detect the presence of a non-acoustic stimulus. In some embodiments, the presence of a non-acoustic stimulus is detected by determining a difference between a beamformed signal generated by a beamformer and a reference signal (e.g., an average of signals from two or more microphones). If the comparison indicates that the beamformed signal is significantly larger in magnitude than the reference signal, then it may be concluded that a non-acoustic stimulus is present, and therefore the microphone signals are uncorrelated. In some embodiments, the difference between the beamformed signal and the reference signal is compared to a threshold value that, if exceeded by the difference, indicates the presence of a non-acoustic stimulus. Another method would be to directly calculate a matrix of correlation coefficients on a collection of samples from each of the plurality of microphones in the array, and compare elements of this matrix to a threshold, above which indicates the presence of non-acoustic stimuli.
Various aspects of the present disclosure relate to reducing sensitivity to non-acoustic stimuli by adjusting the manner in which signals generated by two or more microphones are combined to produce an output audio signal. For instance, the contributions of signals from individual microphones to the output audio signal can be varied depending on whether or not an non-acoustic stimulus is present. In some embodiments, a microphone array is crossfaded between a first mode of operation to a second mode of operation in response to detecting a non-acoustic stimulus. The second mode can, for instance, be inherently less sensitive to non-acoustic stimuli such as a single omnidirectional microphone, or can be a unique process of combining multiple microphone signals from the array to guarantee that the magnitude of the response to non-acoustic stimuli is actively minimized. In some embodiments, the output audio signal in the second mode is generated as a sum of a first audio signal and a second audio signal, where the first audio signal corresponds, mainly or entirely, to low frequency components from a microphone signal associated with the least sensitivity to non-acoustic stimuli, and the second audio signal corresponds to high frequency components associated with signals from multiple microphones.
Various aspects of the present disclosure relate to detecting, while a microphone array is in use, a mismatch between the sensitivities of different microphones, and then adjusting the gain of the microphones to correct for the mismatch. The detection and correction of the mismatch can be performed at various points over the lifetime of the microphone array. This would permit mismatches that are not present when the microphone array is initially assembled to be corrected, for example, mismatches due to subsequent aging of microphone components or physical blockage of sound hole inlets. Correction of sensitivity mismatches can improve beamforming by maintaining the directivity of the microphone array substantially constant throughout the lifetime of the microphone array. Correction of sensitivity mismatches can also improve the accuracy of the detection of noise corresponding to non-acoustic stimuli by ensuring that all microphones have the same (within a certain degree) level of sensitivity across all microphones.
In certain embodiments, techniques for measuring the degree of mismatch between two or more microphones are applied to determine, based on the degree of mismatch, the extent to which the gain for a particular microphone should be adjusted, e.g., by increasing or decreasing the amount of amplification applied to a signal from the particular microphone. In one embodiment, sensitivity matching is performed by comparing an individual microphone capsule's magnitude response, from a long term exposure to a sound field, to the magnitude response from the long term exposure to the sound field for the average, e.g., of all microphone signals in the microphone array. In some embodiments, correction is performed for specific frequencies or frequency bands.
In certain embodiments, a method involves generating a first amplified microphone signal by amplifying a first microphone signal. The first microphone signal represents a response of a first microphone in a microphone array to a sound field. The sound field is produced by an acoustic stimulus and a non-acoustic stimulus. The method further involves generating a second amplified microphone signal by amplifying a second microphone signal, the second microphone signal representing a response of a second microphone in the microphone array to the sound field. The method further involves: calculating a first magnitude representing a running average of acoustic energy that the sound field exposes the first microphone to, calculating a second magnitude representing a running average of acoustic energy that the sound field exposes the second microphone to, and determining that the first microphone and the second microphone have mismatched sensitivities based on a difference between the first magnitude and the second magnitude. The method further involves adjusting, in response to the determining that the first microphone and the second microphone have mismatched sensitivities, an amount of amplification used to generate the first amplified microphone signal such that a difference between a sensitivity of the first microphone and a sensitivity of the second microphone is reduced. A rate at which the amount of amplification used to generate the first amplified microphone signal can change is limited.
In certain embodiments, a system includes a microphone array, a first amplifier, a second amplifier, and a mismatch detection subsystem. The microphone array includes a first microphone and a second microphone. The first amplifier is configured to generate a first amplified microphone signal by amplifying a first microphone signal representing a response of the first microphone to a sound field, the sound field being produced by an acoustic stimulus and a non-acoustic stimulus. The second amplifier is configured to generate a second amplified microphone signal by amplifying a second microphone signal representing a response of the second microphone to the sound field. The mismatch detection subsystem is configured to: calculate a first magnitude representing a running average of acoustic energy that the sound field exposes the first microphone to; calculate a second magnitude representing a running average of acoustic energy that the sound field exposes the second microphone to; and determine that the first microphone and the second microphone have mismatched sensitivities based on a difference between the first magnitude and the second magnitude. The mismatch detection subsystem is further configured to adjust, in response to determining that the first microphone and the second microphone have mismatched sensitivities, an amount of amplification used by the first amplifier to generate the first amplified microphone signal such that a difference between a sensitivity of the first microphone and a sensitivity of the second microphone is reduced. Additionally, the mismatch detection subsystem is configured to limit a rate at which the amount of amplification used to generate the first amplified microphone signal can change.
In certain embodiments, a computer-readable storage medium contains instructions that, when executed by one or more processors of a computer, cause the one or more processors to generate a first amplified microphone signal by amplifying a first microphone signal. The first microphone signal represents a response of a first microphone in a microphone array to a sound field. The sound field is produced by an acoustic stimulus and a non-acoustic stimulus. The instructions further cause the one or more processors to generate a second amplified microphone signal by amplifying a second microphone signal, the second microphone signal representing a response of a second microphone in the microphone array to the sound field. The instructions further cause the one or more processors to: calculate a first magnitude representing a running average of acoustic energy that the sound field exposes the first microphone to; calculate a second magnitude representing a running average of acoustic energy that the sound field exposes the second microphone to; and determine that the first microphone and the second microphone have mismatched sensitivities based on a difference between the first magnitude and the second magnitude. The instructions further cause the one or more processors to adjust, in response to determining that the first microphone and the second microphone have mismatched sensitivities, an amount of amplification used to generate the first amplified microphone signal such that a difference between a sensitivity of the first microphone and a sensitivity of the second microphone is reduced. A rate at which the amount of amplification used to generate the first amplified microphone signal can change is limited.
Several illustrative embodiments will now be described with respect to the accompanying drawings, which form a part hereof. While particular embodiments, in which one or more aspects of the disclosure may be implemented, are described below, other embodiments may be used and various modifications may be made without departing from the scope of the disclosure or the spirit of the appended claims.
Embodiments are described with respect to omnidirectional microphones, but are equally applicable to directional microphones. Further, the embodiments are not limited to any particular type of microphone. For instance, the embodiments can be applied to MEMS (Micro-Electro-Mechanical Systems) based microphones, capacitor/condenser microphones, piezoelectric microphones, and ribbon microphones.
In a microphone array, sound which is to be captured (e.g., a user's voice) causes the microphones to produce signals that are correlated with each other since each microphone captures the same sound and responds to the sound in substantially the same manner. This assumes that the microphones are matched, e.g., they have the same frequency response and sensitivity. This also assumes the microphones are spaced close to each other. A large spacing between microphones in the array reduces the similarity of what they experience at frequencies whose wavelength is longer than the spacing. If the microphones are matched, then signals produced by the microphones in response to a sound source that is equidistant from and facing the same direction toward each of the microphones will be substantially identical in the time domain.
Non-acoustic stimuli can introduce noise into the output of a beamformer. A major source of such noise is wind buffeting, which almost invariably presents itself at different microphones in different ways. Wind impinging on a microphone in a microphone array will almost never impinge upon another microphone in the same array with the same intensity at the same time. This reduces the degree of correlation between signals produced by the microphones in response to this non-acoustic stimuli. The output audio signal produced by combining the microphone signals will therefore include a mixture of composition that corresponds to non-acoustic stimuli (e.g., wind gusts) and acoustic stimuli (e.g., speech, ambient acoustic noise). The effects of such noise are exacerbated due to the fact that some beamformer topologies include a post-filter or stage that amplifies uncorrelated signals. There are other non-acoustic stimuli which can cause uncorrelated signals and which are often encountered during use of a microphone array. For instance, noise may be introduced as a result of a user scratching on a microphone cover or handling the assembly in which the microphone array is housed.
Microphone array 110 comprises a plurality of microphones arranged in a specific physical configuration. For instance, the microphone array 110 may include two or more omnidirectional microphones arranged sequentially along a linear axis, with a certain distance between each pair of adjacent microphones, in what is known as an endfire configuration. In an endfire configuration, if a sound source is closer to one end of the microphone array, sound from the source will be captured by each microphone at different times, with the microphone that is closest to the source being the first microphone to capture the sound. However, if the source is equidistant from the microphones (e.g., facing broadside), then the sound from the source will be captured simultaneously by each microphone in the array.
Output signal generator 120 is configured to generate an output audio signal by combining signals from two or more microphones in the microphone array 110. The output audio signal generated by the output signal generator 120 can be output over a loudspeaker (e.g., over an in-vehicle speaker), stored for subsequent use (e.g., as an audio recording for later playback) or subjected to downstream processing.
In certain embodiments, the output signal generator 120 includes a beamformer configured to control the response of the microphone array 110 through beamforming. For instance, the beamformer may introduce a time delay into one or more microphone signals so that the microphone signals have a certain phase relationship when the microphone signals are combined (e.g., summed together or subtracted from each other) to form the output audio signal. The beamforming creates nulls in certain directions, resulting in a desired polar pattern for the microphone array 110. In some embodiments, the beamformer is a differential beamformer that generates the output audio signal based on a difference between two or more microphone signals.
As indicated above, a post-filter in a beamformer can amplify signals that are produced in response to non-acoustic stimuli. For instance, post-filters for differential beamformers may apply increasing gain inversely proportional to frequency. Such amplification is performed in order to compensate for the fact that signals from different microphones become increasingly dissimilar at higher frequencies. In general, for a microphone array using differential beamforming, the beamforming post-filter adds a significant boost at low frequencies due to the expectation that acoustic signals are highly correlated between two closely-spaced microphones. Since the difference between two closely-spaced microphone's signals is very close to zero for the lowest frequencies, it makes sense to use this boost to restore the on-axis response to acoustic stimuli. However, non-acoustic stimuli (e.g., wind, physical handling) produce signals in these closely-spaced microphones whose difference is considerably greater than being close to zero. Further, since differential beamforming works on the gradient between microphone signals, microphone signals that are uncorrelated with each other have a large gradient value after the differential of the microphone signals is calculated. For instance, during a wind event, the magnitude of a beamformed signal output by a differential beamformer can be greater than ten times that of any individual microphone signal used to generate the beamformed signal.
The output signal generator 120 may further be configured to adjust the output audio signal in response to the noise detection subsystem 130 detecting wind or other non-acoustic stimuli. For instance, as discussed below, in certain embodiments, the output audio signal is generated by switching between the output of a beamformer when a non-acoustic stimulus has not been detected and the output of a noise reduction circuit when a non-acoustic stimulus has been detected.
Noise detection subsystem 130 is configured to detect the presence of non-acoustic stimuli which, as discussed above, result in uncorrelated signals produced by the microphone array 110. In particular, the noise detection subsystem 130 may be configured to determine whether the signal from a particular microphone is sufficiently uncorrelated with the signal from another microphone in the microphone array 110. The noise detection subsystem 130 is further configured to control the output signal generator 120 such that the amount of noise in the output audio signal due to non-acoustic stimuli is reduced. For instance, the noise detection subsystem 130 may generate a control signal that causes the output signal generator 120 to perform the above-mentioned switching between the output of the beamformer and the output of the noise reduction circuit.
The noise detection subsystem 130 may, in response to detecting wind or other non-acoustic stimuli, change the contributions of the one or more microphone signals to the output audio signal produced by the output signal generator 120. In some embodiments, the noise detection subsystem 130 switches the output signal generator 120 from a first operating mode to a second operating mode. For instance, the noise detection subsystem 130 may configure the output signal generator 120 to operate in a directional mode when a non-acoustic stimuli is not detected, and then switch the output signal generator to a second mode which is less directional but significantly less sensitive to non-acoustic stimuli when the noise detection subsystem positively detects such stimuli. The directional mode can be a mode in which microphone signals from multiple microphones in the microphone array 110 are used to form the output audio signal in accordance with a directional response. The second mode can be a mode in which the output audio signal corresponds to a response of at least a single omnidirectional microphone. The second mode can alternatively be a mode in which microphone signals from multiple microphones in the microphone array 110 are used to form an output signal which is significantly less sensitive to non-acoustic stimuli, but also suffers from being less directional than the directional mode, yet still has some directional characteristics.
Reconfiguration of the output signal generator 120 in response to detection of non-acoustic stimuli does not necessarily involve switching between discrete operating modes. For instance, as explained below in connection with the embodiment of
Mismatch detection subsystem 140 is configured to detect mismatches between the sensitivities of microphones in the microphone array 110 and to adjust the amount of gain applied to one or more microphones so that the sensitivities of all microphones in the microphone array 110 are approximately the same. As described below, in certain embodiments, mismatch detection is implemented by generating an RMS (root mean square) signal for one or more microphones and then comparing each RMS signal to a reference RMS signal from a reference microphone in order to adjust the gain of the one or more microphones based on a result of the comparison. Alternatively, in some embodiments, the reference RMS signal corresponds to the RMS of an average of the signals of all the microphones in the array. Using the average has certain benefits over using a single reference, including better matching performance if there is a problem with the reference microphone (e.g., the reference microphone is plugged, broken, or compromised due to aging). Additionally, since the sensitivity of all microphone capsules is usually specified with a tolerance (e.g., 300 mV/Pa+/−3 dB), and this tolerance follows a normal or Gaussian distribution, using the average of multiple capsules as the reference signal serves to decrease the overall sensitivity tolerance.
The mismatch detection subsystem 140 can perform a gain adjustment by, for example, varying an input to an amplifier (e.g., an operational amplifier (op-amp)) that amplifies a particular microphone signal. The amplified microphone signal can be used in place of the original microphone signal during mismatch detection. For instance, the amplified microphone signal can be used for generating one of the RMS signals described above and for input to the beamformer of the output signal generator 120. In some embodiments, the gain is adjusted in proportion to the difference between the inputs of a comparator that compares two RMS signals to each other, e.g., an RMS signal from the microphone to be adjusted and a reference RMS signal. In such embodiments, the output of the comparator may form a control signal for triggering the gain adjustment.
Microphones 210A, 210B can be, but are not necessarily, omnidirectional. Each of the microphones 210A, 210B comprises a capsule configured to produce a corresponding microphone signal in response to sound impinging on the capsule. The microphones 210A, 210B can be placed within a shared housing, e.g., inside the body of a smart speaker or other portable electronic device. Alternatively, each of the microphones 210A, 210B can be placed in a separate housing. In some embodiments, the microphones 210A, 210B are external microphones that can be repositioned to a desired location such as around a table in a conference room. The microphones 210A, 210B can also be permanently installed in an operating environment, e.g., mounted on a panel in a vehicle cabin. In another example, if external microphones are positioned in a conference room, adaptive signal processing may be used to estimate an arrival location for each talker and preserve signals from “directions of interest” corresponding to the estimated arrival locations.
Differential beamformer 220 is configured to output a beamformed signal to the RMS unit 232. The beamformed signal is generated based on a combination of the microphone signal produced by microphone 210A and the microphone signal produced by microphone 210B. The beamformer 220 is differential in that the output of the beamformer 220 is based on a difference between the signals of the microphones 210A and 210B. Although
As indicated above, a beamformer can combine microphone signals to produce an overall response for a microphone array according to a desired polar pattern. Thus, the beamformer 220 may perform null steering by, for example, delaying the microphone signal received from microphone 210A relative to the microphone signal received from microphone 210B. For instance, beamformer 220 may include a delay stage that delays the signal from microphone 210A, followed by a summing stage that sums the delayed signal with the signal from microphone 210B. The delay stage may cause the signal from the microphone 210A to be out of phase with the signal from the microphone 210B such that summing these signals is equivalent to a subtraction operation. The summing stage may also perform mathematical integration. For instance, the delayed signal from microphone 210A and the signal from microphone 210B can be provided as inputs to an op-amp configured as a summing integrator, thus also performing the function of a post-filter.
RMS unit 230 is configured to generate an RMS value based on the signals from the microphones 210A and 210B. In particular, the RMS unit 230 can calculate the RMS value for the average of signals of the microphones 210A and 210B (and any additional microphones in the microphone array) to generate an RMS of the average signal.
RMS unit 232 is, similar to the RMS unit 230, configured to generate an RMS signal. Unlike the RMS unit 230, the RMS unit 232 operates on a single input, which is the output of the beamformer 220. Therefore, the RMS signal generated by the RMS unit 232 represents the RMS of the beamformed signal.
Comparator 240 is configured to compare the RMS signal generated by the RMS unit 230 to the RMS signal generated by the RMS unit 232 to generate, based on a result of the comparison, a detection signal 242. The detection signal 242 indicates whether wind or other non-acoustic stimulus is present. If the magnitude of the detection signal 242 exceeds a certain threshold, then this would indicate that there is a significant difference between the RMS value for the average of each microphone in the entire microphone array and the RMS value of the beamformed signal. In particular, in the presence of non-acoustic stimuli, it can be expected that the output of the beamformer 220 will be significantly greater than the average microphone signal or, alternatively, significantly greater than the output of an individual omnidirectional microphone.
In an alternative embodiment, one of the microphones 210A and 210B may be designated as a reference microphone and the RMS unit 230 generates its RMS signal using only the signal from the reference microphone instead of the average of all the microphones in the array. Which of the microphones in the array is used as the reference microphone can be fixed.
The use of the RMS units 230 and 232 to generate the inputs of the comparator 240 is advantageous if the RMS is insensitive to phase mismatch between microphones (e.g., due to differences in time of arrival). This can be ensured by designing RMS units 230 and 232 as magnitude detectors with appropriate time constants governing the rise and fall time limits respectively of their output signals. Therefore, the RMS units 230 and 232 operate to smooth the average level of their respective inputs. Computing RMS produces a non-zero time-weighted average. In contrast, the time-weighted average of a low-pass filter is zero (since the expectation of the waveform to be positive and negative is randomly distributed). Therefore, using RMS units 230 and 232 improves detection accuracy relative to an alternative detection method in which a low-pass filter is applied to the beamformed signal and the output of the low-pass filter is compared to a threshold.
Averaging unit 310 is configured to generate an average signal corresponding to the average of the microphone signals from all the microphones in the microphone array.
Rectifier 330 operates on the microphone signal from the microphone 210A. Rectifier 332 operates on the microphone signal from the microphone 210B. A separate rectifier can be provided for each microphone in the microphone array. The rectifiers 330, 332 are configured to convert their respective microphone signals into signals having a single polarity (e.g., by inverting negative signal values), representing the instantaneous magnitude of their respective microphone signals. The rectified microphone signals are input to the comparator 340.
Comparator 340 is configured to compare the rectified microphone signals to generate a control signal, as an input to the cross fader/switch 250, indicating which of the rectified microphone signals has lower instantaneous magnitude. In implementations featuring three or more microphones, the comparator 340 can provide for comparison of rectified signals from such additional microphones, so that the output of the comparator 340 indicates which microphone among the three or more microphones has the lowest instantaneous magnitude. Comparator 340 can therefore include multiple comparison stages, e.g., a first stage comparing signals from a first pair of microphones, a second stage comparing signals from a second pair of microphones, and a third stage comparing the result of the first stage to the result of the second stage. Alternatively, other embodiments can utilize a sorting algorithm inside the comparator, to identify the minimum instantaneous magnitude and provide an index to associate the correct microphone signal to which the minimum belongs.
Cross fader/switch 350 is configured to generate, using the microphones signals produced by the microphones 210A and 210B (and any additional microphones in the microphone array) a signal for input to the LPF 362. The output of the cross fader/switch 350 can be a signal corresponding to one of the microphone signals, e.g., switching entirely to the signal from microphone 210B when the output of the comparator 340 indicates that the signal from microphone 210B has the lowest instantaneous magnitude.
If implemented as a cross fader, the output of the cross fader/switch 350 corresponds to a blend of signals from different microphones. The degree to which an individual microphone signal contributes to the output of the cross fader can be controlled based on the output of the comparator 340. For instance, when the output of the comparator 340 indicates that the signal from microphone 210B has the lowest instantaneous magnitude, the signal from 210B can be faded-in to its maximum allowable level (e.g., gain of one), while simultaneously the signal from microphone 210A can be faded out to its minimum allowable level (e.g., gain of zero). The fade-in and fade-out apply gain with the same rate of change. If the rate of change of gain is too slow, the response to the non-acoustic stimuli will not be effectively reduced. However, the time rate of change of the gain should not be too fast to avoid distorting the response to the acoustic stimuli of interest.
LPF 362 is configured to filter out high frequency components of the signal generated by the cross fader/switch 250. The output of the LPF 362 therefore corresponds to the low frequency components of a signal that is less sensitive to non-acoustic stimuli. As discussed above, highly directional beamformers may consequently increase the sensitivity to non-acoustic stimuli, especially at low frequencies. It is therefore desirable for the low frequency portion of an audio output signal to be generated from microphone signals which are processed to be less sensitive to non-acoustic stimuli, but equally sensitive to acoustic stimuli from a direction of interest. The combination of cross fader/switch 350 and LPF 362 enables such a low frequency portion to be generated.
HPF 360 is configured to filter out low frequency components of the average signal generated by the averaging unit 310. The output of the HPF 360 is provided, together with the output of the LPF 362, to the summation unit 370. Since it is so unlikely that wind, or other non-acoustic stimuli, will create equal disturbances on all microphones in the array 110 at the same time, the averaging performed by the averaging unit 310 will generate an output signal which is lower in sensitivity to non-acoustic stimuli compared to any of the microphone signals on their own. Averaging is not as efficient at lowering this sensitivity when compared to the crossfader operation, however, the crossfader operation adds noise and distortion in the higher frequencies as a result. Therefore, in some embodiments, the lower frequencies are kept, from the cross fader/switch 350, by using LPF 362, and the higher frequencies of the averaging unit 310 output are kept, by using HPF 360.
Summation unit 370 is configured to generate a noise-reduced signal 372 by adding together the outputs of the HPF 360 and the LPF 362. The noise-reduced signal 372 therefore corresponds to a signal whose low frequency components are derived from one or more microphone signals that are maximally less sensitive to non-acoustic stimuli while remaining undistorted for acoustic stimuli. In addition, high frequency components of the signals are reduced in sensitivity to non-acoustic stimuli, remain undistorted for acoustic stimuli from a direction of interest, and generate no additional noise and distortion in order to achieve the lower sensitivity to non-acoustic stimuli, which are derived from the average of all the microphone signals. Averaging N microphones results in sensitivity reduction to non-acoustic stimuli by a factor of 10*log(N). The output from averaging two microphones during a wind buffeting event will typically be 3 dB lower than either single microphone's output (for a long term exposure).
The noise-reduced signal 372 can be used as an output audio signal in place of the output of a beamformer (e.g., instead of the output of the beamformer 220 in
The system 300 operates to generate the noise-reduced signal with the lowest sensitivity to non-acoustic stimuli while preserving the sensitivity to acoustic stimuli from a direction of interest, when there is a wind buffeting or other non-acoustic stimuli present on one or more microphones. Since the microphones are spatially diverse and are nearly guaranteed to respond dissimilarly to a non-acoustic stimuli at any particular moment in time, one of the microphone signals, in the presence of wind, will nearly always have a lower instantaneous magnitude than the other microphone signal(s). In contrast, all the microphones are expected to respond quite similarly to acoustic stimuli. By comparing rectified microphone signals, the system 300 can identify which has the lower instantaneous magnitude. The system 300 switches or cross fades between each microphone signal to favor the microphone signal with lowest instantaneous magnitude (e.g., at any particular time interval). The microphone signals corresponding to the response to acoustic stimuli such as voice are retained in the output of the HPF 360, without processing artifacts such as noise and distortion, and will therefore pass through unaffected by the switching or cross fading. The microphone signals corresponding to the response to acoustic stimuli are also retained in the output of the LPF 362, however, there may be noise artifacts generated from the crossfading/switching operation which, to some degree, pass through the LPF 362. Thus, a tradeoff for maximally reducing sensitivity to non-acoustic stimuli is a noise artifact generated in the crossfader/switch operation. In some embodiments, the corner frequency of the LPF 362 and HPF 360 are chosen to balance this tradeoff.
In the embodiment of
Switching or cross fading quickly between two signals (e.g., average or single microphone) that contribute to an output audio signal will generate two forms of higher frequency information (new noise). First, the switching or cross fading may sometimes results in a steep change in voltage over a small change in time (large dV/dt), generating noise with a wide bandwidth. Second, the switching mechanism itself (if implemented in analog circuitry) can potentially generate sharp transients from the transfer of stored energy on either side of the switch mechanism. These transients can be filtered out in a number of different ways. For instance, in some embodiments, switching noise introduced into the output audio signal 550 as a result of switching performed by the cross fader/switch 540 is reduced by low-pass filtering the output audio signal 550 through one or more low-pass filter stages (not depicted). Alternatively, switching noise can be reduced by configuring the cross fader/switch 540 with a limit on its maximum slew rate, and/or a time constant for the crossfade function governing the fade-in and simultaneous fade-out times.
Delay stage 630 includes an op-amp 632 configured to apply a time delay and phase inversion to the amplified microphone signal 612A. Summation and post-filter stage 640 is configured to sum the output of the delay stage 630 with the amplified microphone signal 612B via a common node. The summed result is then filtered and amplified by an op-amp 642 to produce a differential beamformer output signal 650. Signal 650 is now at the proper magnitude level to drive downstream connected equipment, such as telecommunication terminals, and/or voice recognition systems.
Comparator 720 is an op-amp based circuit analogous to the comparator 340. The comparator 720 compares the rectified signal 712A to the rectified signal 712B to control a bipolar junction transistor 722 based on the voltage difference between the rectified signals 712A, 712B. The emitter of the bipolar junction transistor 722 forms a control signal for controlling the operation of a cross fader 730.
Cross fader 730 is an op-amp based circuit analogous to the cross fader/switch 350. The cross fader 730 adjusts the contributions of the amplified microphone signals 612A and 612B based on the control signal produced at the bipolar junction transistor 722. The control signal influences the composition of the mixture of 612A and 612B which is mixed by op-amp 734. The op-amp 734 generates the output of the cross fader 730. The output of op-amp 734 is equal to the inverse polarity of signal 612A plus the inverse of the output of op-amp 732, which is signal 612B minus 612A. When the control signal from the transistor 722 is fully on, the output of op-amp 732 is pulled to ground. Therefore, when 712B is greater than 712A, the output of op-amp 734 is equal to the inverse polarity (negative) 612A. When 712B is less than 712A, the output of op-amp 734 is equal to the sum of negative 612A plus positive 612A plus negative 612B, which is equal to negative 612B.
HPF 820 is analogous to the HPF 360 and includes one or more high-pass filtering stages. In the embodiment depicted in
LPF 830 is analogous to the LPF 362 and includes one or more low-pass filtering stages configured according to a topology this is counterpart to the topology of the HPF 820. In the embodiment depicted in
Summation unit 840 is an op-amp based circuit analogous to the summation unit 370. The summation unit 840 is configured to sum the outputs of the HPF 820 and the LPF 830 to generate a noise-reduced signal 842 (OMNI-OUT) that corresponds to the noise-reduced signal 372.
RMS unit 910 is an op-amp based circuit analogous to the RMS unit 232 in
Comparator 920 is an op-amp based circuit analogous to the comparator 240. The comparator 920 is configured to compare the RMS signal 912 to an RMS signal 922 to generate a control signal for the cross fader 930. The RMS signal 922 is an average RMS of all microphone signals and can be generated using the circuit depicted in
Cross fader 930 is an op-amp based circuit analogous to the cross fader/switch 540 in
At 1102, sound is captured using a microphone array. The microphone array includes at least a first microphone and a second microphone, and each of the microphones in the array produces a respective microphone signal in response to acoustic and non-acoustic stimuli in a physical environment. As explained earlier, sound from a particular acoustic stimuli in an environment may arrive at different times at different microphones depending on how the microphones are positioned relative to the stimuli. Therefore, a plurality of microphone signals may be generated by the microphone array over a period of time. The microphone signals may be received by a noise detection subsystem and include a first microphone signal generated based on a response of the first microphone and a second microphone signal generated based on a response of a second microphone to the same acoustic stimuli.
At 1104, the microphone signals are optionally conditioned for further processing. Such conditioning can include amplification, rectification, time of arrival synchronization, delay, filtering and/or other types of signal processing.
At 1106, a beamformed signal is generated by combining the first microphone signal and the second microphone signal using differential beamforming. The beamformed signal may be generated, for example, by a differential beamformer.
At 1108, an average signal is generated. The average signal corresponds to an average of the first microphone signal and the second microphone signal and can be generated by an averaging unit (e.g., averaging unit 310). Alternatively, as discussed above, microphone signals can be time-aligned so as to be in phase with respect to an acoustic stimulus. Thus, in some embodiments, the average signal in 1108 is generated as an average of two or more compensated signals, (e.g., the signals 1712A and 1712B shown in
At 1110, as part of detecting non-acoustic stimuli, a first signal is compared to a second signal. The first signal can be the beamformed signal or a signal derived from the beamformed signal (e.g., the RMS of the beamformed signal). The second signal can be the average signal or a signal derived from the average signal (e.g., the RMS of the average signal). The comparison in 1110 can be performed using a comparator such as the comparator 240.
At 1112, a determination is made, based on a result of the comparison in 1110, that an instantaneous magnitude of the first signal is greater than that of the second signal. If the comparison in 1110 is made using a comparator, the determination in 1112 can be made implicitly, as part of performing the comparison, and will be reflected in the output of the comparator. The determination in 1112 confirms the presence of non-acoustic stimuli (i.e., that there is at least one non-acoustic source present). In some embodiments, the determination in 1112 may include determining that the magnitude of the response to non-acoustic stimuli exceeds a threshold, for example, when the magnitude of the first signal exceeds the magnitude of the second signal by a certain amount.
At 1114, an output audio signal is generated by, in response to the determination in 1112, switching or cross fading (e.g., using the cross fader/switch 540) between the beamformed signal and a noise-reduced signal such that a contribution of the noise-reduced signal to the output audio signal is increased (to a maximum gain value of one) and a contribution of the beamformed signal to the output audio signal is decreased (to a minimum gain value of zero). The time rate of change of gain for all signals in the crossfader operation can be controlled so that the resultant output signal is free from volume fluctuations. In certain embodiments, the generating of the noise-reduced signal can be performed according to the processing depicted in
The switching or cross fading in block 1114 may involve switching from an overall response (e.g., an output signal generated based on a beamformer output) that is substantially directional to an overall response that is substantially omnidirectional, at least for certain frequencies. For example, the switch can be from a first overall response that is more directional (e.g., highly directional) at lower frequencies and less directional at higher frequencies, to a second overall response that is omnidirectional at the same lower frequencies and less directional (e.g., moderately directional) at the same higher frequencies.
At 1118, a determination is made, based on the comparison in 1116, that a lower magnitude response to non-acoustic stimuli is present in the first microphone signal than the second microphone signal. If more than two microphone signals were generated in 1102, the determination in 1118 may involve determining moment by moment that the first microphone signal has the lowest magnitude response to non-acoustic stimuli among all the microphone signals, e.g., because the first microphone signal or the rectified version of the first microphone signal has the lowest instantaneous magnitude, and the determination in 1112 has determined the presence of a non-acoustic stimulus.
At 1120, as part of generating a noise-reduced signal and in response to the moment by moment determination in 1118, a contribution of the first microphone signal (or whichever microphone signal was determined in 1118 to have the lowest magnitude response) to the input of a low-pass filter (e.g., the LPF 362) is increased by cross fading or switching between the microphone signals moment by moment. In some embodiments, the contribution of the first microphone signal is increased relative to contributions of other microphone signals, but without completely eliminating the contributions of the other microphone signals. Alternatively, a switch to using only the first microphone signal (e.g., so that the second microphone signal does not contribute in any way to the noise-reduced signal) is also possible.
At 1122, an average signal is generated as an input to a high-pass filter (e.g., the HPF 360). The average signal corresponds to an average of all the microphone signals (e.g., the first microphone signal and the second microphone signal).
At 1124, the outputs of the low-pass filter and the high-pass filter are summed together (e.g., by the summation unit 370) to generate the noise-reduced signal. The use of a high-pass filter in combination with a low-pass filter to generate the noise-reduced signal is optional. In some embodiments, the noise-reduced signal is simply the microphone signal that has the lowest instantaneous magnitude. Thus, the noise-reduced signal can be generated using at least the first microphone signal, possibly only the first microphone signal. The noise-reduced signal generated in 1124 is then provided as an input for the processing in 1114 of
At 1202, frequency components of a plurality of microphone signals generated using a microphone array are extracted. The extracting of the frequency components may involve, for example, applying a Discrete Fourier Transform (DFT) to digital versions of analog microphone signals from at least a first microphone and a second microphone in the microphone array. The output of the DFT may include, for each microphone signal, a spectral distribution across a range of frequencies. The frequencies may be divided into frequency bins, with a value assigned to each bin, where the value assigned to a bin indicates the amount of energy in a particular microphone signal at the frequency or range of frequencies to which the bin corresponds.
At 1204, the magnitudes in each of the many frequency bins extracted in 1202 are averaged over a period of time. An appropriate averaging of the frequency components produces, for each microphone signal, a set of average frequency components. The averaging of the frequency components reduces the number of outlier frequency components (e.g., false spikes in the frequency spectrum) and produces a spectral representation of each microphone signal that reflects the frequency behavior of the microphone signal over the period of time.
At 1206, spectral smoothing is performed, in the frequency domain, on the averaged frequency components. The spectral smooth further reduces the number of outlier frequency components, thereby producing a more accurate spectral representation of each microphone signal.
At 1208, a subset of smoothed and averaged frequency components are identified as having the least amount of energy. The subset can be identified, for example, by eliminating any frequency components whose values exceed a certain threshold. Values that exceed the threshold are usually values associated with non-acoustic stimuli, whereas values below the threshold tend to be associated with acoustic sources that should be captured (e.g., a person's voice).
At 1210, a noise-reduced signal is generated by applying a filter. The filter is generated based on the subset of frequency components identified in 1208 and operates to filter out frequency components not included in the identified subset. This produces a composite signal that can include contributions from all the microphone signals, but excludes portions of the microphone signals that are associated with non-acoustic stimuli.
The embodiments described above provide for reduced sensitivity to non-acoustic stimuli, and include various circuit implementations operable to detect and reduce the response to non-acoustic stimuli in a microphone array. Described below are embodiments directed to sensitivity matching between microphones in a microphone array. Sensitivity matching is useful in itself because the accuracy with which polar patterns are achieved through beamforming depends upon sensitivity matched microphones. Using signals from mismatched microphones for beamforming can result in polar patterns that deviate significantly from a desired polar pattern. The deviation is especially noticeable at lower frequencies. From example, a 1 decibel mismatch between a pair of microphones spaced 15.6 millimeters apart and whose desired response is a cardioid pattern may not produce much deviation from the desired cardioid pattern at frequencies ranging from approximately 3 kilohertz (kHz) down to about 800 Hz, but the polar pattern may become increasingly less like a cardioid below 800 Hz. At around 300 Hz and below, the resulting pattern would look completely circular, or omnidirectional.
In the absence of sensitivity matching, if the sensitivity mismatch between microphones is substantial, one solution would be to simply select the microphone with the lower sensitivity. However, selecting the microphone with the lower sensitivity is sub-optimal, whereas sensitivity matching enables an output audio signal to be generated with the best possible instantaneous signal-to-noise ratio relative to acoustic and non-acoustic stimuli.
Sensitivity matching can also be used to improve the performance of noise detection and noise reduction. In this sense, noise refers to any response to a non-acoustic stimulus. The example embodiments described above for detecting and reducing such responses include embodiments in which comparators are used to compare signals derived from microphone responses (e.g., amplified and rectified microphone signals, beamformed signals, and RMS signals). If the sensitivity of a microphone deviates significantly from the sensitivities of other microphones in a microphone array, this will reduce the accuracy of the inputs to the comparators, and will therefore have an adverse effect on the results on the comparisons. For instance, mismatches could result in false positives, false negatives, or incorrect amounts of cross fading.
Additionally, noise detection can be beneficial for sensitivity matching. For instance, in some embodiments, a sensitivity matching system (e.g., the system depicted in
Gain stage 1310A is configured to generate an amplified microphone signal 1312A. Gain stage 1310B is configured to generate an amplified microphone signal 1312B. The gain stages 1310A, 1310B can be integrated into or shared with the earlier described noise detection and reduction systems. For instance, the gain stages 1310A, 1310B may correspond to the gain stage 620 in
As shown in
RMS units 1320A, 1320B supply RMS signals as inputs to the comparator 1330. The RMS unit 1320A generates an RMS signal corresponding to the RMS of the amplified microphone signal 1312A. Similarly, the RMS unit 1320B generates an RMS signal corresponding to the RMS of the amplified microphone signal 1312B. The RMS units 1320A, 1320B can be implemented in a similar manner to the RMS units described earlier, e.g., using a combination of rectification and low-pass filter units. The RMS signals generated by the RMS units 1320A, 1320B are generated over a relatively long time constant (e.g., a time window of 0.5 seconds or more). Using a long time constant ensures that sensitivity matching is robust even in the presence of directional acoustic stimuli whose sound arrives at different times for different positions along the microphone array. It is also very important to impose a limit for the time-rate-of-change of gain that 1310A will provide, to ensure stability, and mismatch estimation accuracy. Using a relatively long time constant, or integrating the amplified signal's magnitudes over a relatively long period of time, measures the true exposure to the sound field each microphone experienced. Even if the microphones are spaced further apart than the wavelengths included in the measurement, all microphones which are designed and placed to capture the sound from a talker will experience the same long term acoustic exposure. Therefore, as a consequence of using a relatively long time constant, the long-term RMS value of the amplified microphone signal 1312A will match that of the amplified microphone signal 1312B, which effectively makes the sensitivities of the microphones 210A, 210B identical or within a certain narrow range of each other. It is practical to achieve settled mismatch of less than 0.005 dB.
The control signal 1316 indicates whether the RMS signal from the RMS unit 1320A is larger than the RMS signal from the RMS unit 1320B. If so, the value of the control signal 1316 will instruct the gain stage 1310A to decrease the amount of amplification applied to the signal from the microphone 210A. To ensure stability, the gain unit 1310A may only be allowed to respond by a present limit of gain per second (e.g., 0.2 dB per second), or by a present fraction of the measured mismatch per second (e.g., 5% of the mismatch per second). Similarly, if the control signal 1316 indicates that the RMS signal from the RMS unit 1320A is smaller than the RMS signal from the RMS unit 1320A, the control signal 1316 will instruct the gain stage 1310A to increase the amount of amplification applied to the signal from the microphone 210A.
The system 1300 can be operated over time (e.g., continuously or periodically activated) to ensure that the sensitivity of microphone 210A remains within a certain range of the sensitivity of the microphone 210B. The system 1300 is merely an example of a system for sensitivity matching. Variations of the system 1300 are possible. For example, in some embodiments, microphones 210A, 210B are adjusted in tandem based on the control signal 1316 (e.g., increasing the amplification of gain stage 1310A while decreasing the amplification of gain stage 1310B). In microphone arrays featuring three or more microphones, the gains can be adjusted in groups. For example, adjustment can be performed in a pairwise manner by comparing an RMS signal from a first microphone to an RMS signal from a second microphone to adjust the gain for the first microphone, and then comparing the RMS signal from the first microphone (updated after the gain for the first microphone has been adjusted) to an RMS signal from a third microphone to adjust the gain for the third microphone.
In some embodiments, the input to an RMS unit is filtered using a band-pass filter and/or low-pass filter in order to restrict the input to a low frequency range. Since sensitivity mismatch is usually not constant over frequency, and since low frequencies tend to require more precise sensitivity matching than higher frequencies, (e.g., for good low frequency differential beamforming performance) restricting the RMS input to the low frequency range would help ensure that any gain adjustments are performed using signals in the frequency range that needs the most correction.
As shown in
The circuit 1402 further includes a low-pass filter stage 1470 and an op-amp 1480. The low-pass filter stage 1470 is configured to low-pass filter the outputs of the rectifiers 1460A, 1460B to generate a pair of inputs to the op-amp 1480. The op-amp 1480 serves as an integrating comparator and is configured to generate the control signal 1434 based on the integral of the difference between the low-pass filtered outputs of the rectifiers 1460A and 1460B.
RMS unit 1502 is configured to generate an RMS signal 1512 corresponding to the RMS of the average of the signals from the microphones 210A and 210B.
Gain stages 1510A and 1510B are analogous to the gain stage 1310A in
RMS units 1520A and 1520B are analogous to the RMS units 1320A and 1320B in
Comparator 1530 is configured to compare the RMS signal generated by the RMS unit 1502 to the RMS signal generated by the RMS unit 1520A to output a control signal 1532 based on the difference between these RMS signals. Similarly, the comparator 1540 is configured to compare the RMS signal generated by the RMS unit 1502 to the RMS signal generated by the RMS unit 1520B to output a control signal 1542. Thus, each of the comparators 1530, 1540 operates to compare the same average RMS signal against an RMS signal derived from the signal of a respective microphone.
As shown in
As shown in
The system 1600 further includes a comparator 1620, a cross fader/switch 1630, and a differential beamformer 1640. The comparator 1620 is configured to compare the outputs of the rectifiers 1602 and 1604, and is therefore analogous to the comparator 340 in
In
Time-aligning microphone signals so that they are in phase with each other for sound from an acoustic source of interest is advantageous because it permits cross fading/switching (e.g., by the cross fader 1630) to be performed with less audible distortion being produced for the sound from the acoustic source of interest, i.e., the signal of interest. If the microphone signals are perfectly aligned and in phase, there should theoretically be zero distortion to the signal of interest. However, it should be noted that a certain amount of error in time alignment is generally acceptable. As a result, time-alignment does not need to be perfect, and a fixed delay can be used in conjunction with the embodiment shown in
After being output from the time-of-arrival alignment unit 1710, the time-aligned signals 1712A and 1712B are sent into the rectifiers 1602 and 1604, respectively, and are subsequently subjected to the above-described processing for reduction of non-acoustic stimuli. As shown in
At 1802, a first amplified microphone signal and a second amplified microphone signal are generated based on a first microphone signal and a second microphone, respectively. The first amplified microphone signal can be generated by inputting the first microphone signal into a first amplifier (e.g., the gain stage 1310A in
At 1804, a first RMS signal is generated. The first RMS signal corresponds to an RMS of the first amplified microphone signal. For example, the first RMS signal can be the output of the RMS unit 1320A in
At 1806, a second RMS signal is generated. The second RMS signal corresponds to either an RMS of the second amplified microphone signal (e.g., the output of the RMS unit 1320B) or an RMS of an average of the first amplified microphone signal and the second amplified microphone signal (e.g., the output of the RMS unit 1502). The time interval over which the first RMS signal and the second RMS signal are calculated can be selected to be sufficiently long enough the RMS signals are indicative of the degree of exposure to acoustic energy across the microphones (e.g., across all microphones in the microphone array).
Blocks 1804 and 1806 can be generalized to involve steps of calculating a first magnitude (e.g., a value of the first RMS signal) representing a running average of acoustic energy that the sound field exposes the first microphone to; and calculating a second magnitude (e.g., a value of the second RMS signal) representing a running average of acoustic energy that the sound field exposes the second microphone to.
At 1808, the first RMS signal is compared to the second RMS signal. The comparison in 1808 can be performed, for example, using the comparator 1330, the comparator 1530, or the comparator 1540. More generally, block 1808 may involve determining that the first microphone and the second microphone have mismatched sensitivities based on a difference between the first magnitude and the second magnitude discussed above. For example, the mismatch can be determined based on the ratio between a value of the first RMS signal and a value of the second RMS signal.
At 1810, a determination is made, based on a result of the comparison in 1808, that the first microphone and the second microphone have mismatched sensitivities. For instance, the microphones may be deemed to be mismatched if there is any difference between the first RMS signal and the second RMS signal, since the RMS in this case is a measurement of the long term exposure to the acoustic sound field, and the microphones are positioned close together in an array. Alternatively, the difference may be required to exceed a certain threshold before the microphones are deemed to be mismatched. If the comparison in 1808 is performed using a comparator, the determination can be reflected in the output of the comparator.
At 1812, an amount of amplification used by at least one amplifier (e.g., the amplifier that generates the first amplified microphone signal) is adjusted, in response to the determination in 1810, and such that a difference between a sensitivity of the first microphone and a sensitivity of the second microphone is reduced. The adjustment can, for example, be performed using the output of a comparator that performed the comparison in 1808 as a control signal. The control signal may be proportional to the difference between the first RMS signal and the second RMS signal, and may therefore indicate an extent to which the amount of amplification applied should be adjusted.
In some embodiments, a comparison is performed for each microphone in the microphone array. For example, in accordance with the embodiment of
In some embodiments, the adjusting of the amount of amplification applied by an amplifier is conditioned upon there being less than a threshold amount of noise present due to non-acoustic stimuli (e.g., as indicated by the responses of individual microphones in the microphone array to a sound field). Thus, the process 1800 may include an additional step of determining (e.g., using an implementation of the noise detection subsystem 130 in
Additionally, in certain embodiments, the rate at which the amount of amplification used to generate an amplified microphone signal can change is limited. Thus, the adjustment in 1812 may be subject to a time-rate-of-change limit to restrict the speed at which a change in gain is allowed to be carried out. For example, if the comparison in 1808 indicates that there is a mismatch ratio of ten (e.g., an RMS or other magnitude derived from the first microphone signal is ten times the RMS or other magnitude derived from the second microphone signal), then a control signal may be generated to instruct an amplifier to reduce the gain for the first microphone signal by a factor of ten. However, with a limit in place, the amplifier may be configured to permit a maximum change in gain of 0.2 dB per second, for example. The limit can be fixed or it may depend on the degree of mismatch. For example, the amplifier may be configured to permit a greater amount of amplification adjustment when the mismatch is higher than when the mismatch is lower. The processing in blocks 1802 to 1812 can be repeated to incrementally adjust the amount of amplification until the sensitivities of the first microphone and the second microphone are matched (e.g., when the RMS values of the microphones have converged to the same or approximately the same value).
The embodiments described above include various analog circuit implementations. It will be understood that sensitivity matching, noise detection, and noise reduction can also be performed using digital circuitry or a combination of analog and digital circuitry. For example, in some embodiments, mismatches between microphones are detected using a digital circuit that performs frequency domain analysis on microphone signals. As an alternative to comparing time-varying signals to determine differences in instantaneous signal magnitude, a frequency domain approach to sensitivity matching may involve extracting frequency components of microphone signals or signals derived therefrom, similar to the extraction described in connection with
At 1902, frequency components are extracted from a first amplified microphone signal and a second amplified microphone signal. The first amplified microphone signal is a result of amplifying a signal from a first microphone and is therefore associated with the first microphone. The second amplified microphone signal is a result of amplifying a signal from a second microphone and is therefore associated with the second microphone. The extraction in 1902 can be performed in a similar manner to the extraction in 1202 of
At 1904, the frequency components of the first amplified microphone signal to the frequency components of the second amplified microphone signal are compared at corresponding frequencies. For example, frequency components associated with the same frequency bin may be compared to determine how first microphone signal and the second microphone signal respond at a given frequency.
At 1906, frequencies at which the sensitivities of the first microphone and the second microphone are mismatched are identified, based on a result of the comparison in 1904. For example, it may be determined that the first microphone and the second microphone are mismatched at a particular frequency or at multiple frequencies across the entire frequency range of the spectral representations. A mismatch can be identified when the spectral representations have different energy levels at the same frequency, e.g., different values, or values that differ by more than a threshold, at the same frequency bin.
At 1908, for each identified frequency, the amount of gain applied by a gain stage, or the amount of amplification applied by at least one amplifier, at the identified frequency is adjusted. The adjustment can be performed, for example, by generating a separate control signal for each identified frequency. Similar to the limit discussed above in connection with
The sensitivity matching techniques described above can be combined with techniques for detection of, and reduction of sensitivity to, non-acoustic stimuli. As mentioned above, an adjustment to the amount of amplification applied by an amplifier can be conditioned upon determining that the response to non-acoustic stimuli is less than a threshold amount. As another example, in some embodiments, after the amount of amplification applied by an amplifier is adjusted in response to detection of a sensitivity mismatch (e.g., based on the processing depicted in
Additionally, the sensitivity matching techniques described above can be extended to any size microphone array. For example, if the microphone array has eight microphones, the microphones could be matched all together or in groups, e.g., a first group consisting of the first three microphones (consecutively spaced apart at one end of the array), a second group consisting of the next three microphones, and a third group consisting of the last two microphones. When matching the sensitivities of three or more microphones, the amount of amplification for any particular microphone may be adjusted based on an average signal level, e.g., by comparing an amplified microphone signal from an individual microphone to an average of the amplified microphone signals of the entire array. Further, if matching is done in groups, beamforming may involve generating a separate beamformed signal for each group after matching is completed for all groups, then combining the beamformed signals (e.g., through summation) to produce an output audio signal. In some embodiments, crossover filtering is applied to divide each beamformed signal into multiple signals across different frequency ranges (e.g., a high frequency range and a low frequency range) before combining the divided beamformed signals.
The computer system 2000 is shown comprising hardware elements that can be electrically coupled via a bus 2005. However, the hardware elements can be communicatively coupled in other ways. In some embodiments, the computer system 2000 is located on a motor vehicle and the bus 2005 is a Controller Area Network (CAN) bus. The hardware elements may include a processing unit(s) 2010 which can include, without limitation, one or more general-purpose processors, one or more special-purpose processors (such as a digital signal processor (DSP), graphics acceleration processors, application specific integrated circuits (ASICs), and/or the like), and/or other processing structure or means. Some embodiments may have a separate DSP 2020, depending on desired functionality. The computer system 2000 also can include one or more input device controllers 2070, which can control without limitation an in-vehicle touch screen, a touch pad, microphone (e.g., individual microphones in a microphone array), button(s), dial(s), switch(es), and/or the like; and one or more output device controllers 2015, which can control without limitation a display, light emitting diode (LED), loudspeakers, and/or the like. Output device controllers 2015 may, in some embodiments, include controllers that individually control various sound contributing devices in the vehicle.
In certain embodiments, the computer system 2000 implements at least some of the sensitivity matching, noise detection, or noise reduction functionality described above. For example, detection of mismatched microphones or detection of non-acoustic stimuli can be performed by executing instructions on one or more processing units 2010 and/or the DSP 2020.
The computer system 2000 may also include a wireless communication interface 2030, which can include without limitation a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset (such as a Bluetooth device, an IEEE 802.11 device, an IEEE 802.16.4 device, a WiFi device, a WiMax device, cellular communication facilities including 4G, 5G, etc.), and/or the like. The wireless communication interface 2030 may permit data to be exchanged with a network, wireless access points, other computer systems, and/or any other electronic devices described herein. The communication can be carried out via one or more wireless communication antenna(s) 2032 that send and/or receive wireless signals 2034.
In certain embodiments, the wireless communication interface 2030 may transmit information for remote processing of microphone signals and/or receiving information used for local processing of microphone signals. Sensitivity matching, noise detection, and noise reduction can be performed at least in part, by a remote computer system. For instance, in some embodiments, the computer system 2000 may receive, from a remote computer system, historical information regarding the sensitivity of a microphone in a microphone array. The historical information can be based on measurements taken at the time that the microphone array is fully assembled, or any time thereafter, for example, periodic measurements taken in the absence of non-acoustic stimuli and over the lifetime of the microphone array. The computer system 2000 may use the historical information to identify deviations in the sensitivity of the microphone from past sensitivity and to determine an appropriate action to take, including determining when to adjust the gain for the microphone.
The computer system 2000 can further include sensor controller(s) 2040. Such controllers can control, without limitation, one or more microphones, one or more accelerometer(s), gyroscope(s), camera(s), RADAR sensor(s), LIDAR sensor(s), ultrasonic sensor(s), magnetometer(s), altimeter(s), microphone(s), proximity sensor(s), light sensor(s), and the like. With respect to a microphone array, the sensor controller(s) 2040 may include, for example, one or more controllers configured to selectively activate microphones in the array, e.g., by switching on or off a power supply to a particular microphone.
The computer system 2000 may further include and/or be in communication with a memory 2060. The memory 2060 can include, without limitation, local and/or network accessible storage, a disk drive, a drive array, an optical storage device, a solid-state storage device, such as a random access memory (RAM), and/or a read-only memory (ROM), which can be programmable, flash-updateable, and/or the like. Such storage devices may be configured to implement any appropriate data stores, including without limitation, various file systems, database structures, and/or the like.
The memory 2060 can also comprise software elements (not shown), including an operating system, device drivers, executable libraries, and/or other code embedded in a computer-readable medium, such as one or more application programs, which may comprise computer programs provided by various embodiments, and/or may be designed to implement methods, and/or configure systems, provided by other embodiments, as described herein. In an aspect, then, such code and/or instructions can be used to configure and/or adapt a general purpose computer (or other device) to perform one or more operations in accordance with the described methods. The memory 2060 may further comprise storage for data used by the software elements. For instance, memory 2060 may store configuration information (e.g., gain offset values) indicating, for each microphone in a microphone array, how much to adjust an amplifier coupled to the microphone.
It will be apparent to those skilled in the art that substantial variations may be made in accordance with specific requirements. For example, customized hardware might also be used, and/or particular elements might be implemented in hardware, software (including portable software, such as applets, etc.), or both. Further, connection to other computing devices such as network input/output devices may be employed.
With reference to the appended figures, components that can include memory can include non-transitory machine-readable media. The terms “machine-readable medium” and “computer-readable medium” as used herein, refer to any storage medium that participates in providing data that causes a machine to operate in a specific fashion. In embodiments provided hereinabove, various machine-readable media might be involved in providing instructions/code to processing units and/or other device(s) for execution. Additionally or alternatively, the machine-readable media might be used to store and/or carry such instructions/code. In many implementations, a computer-readable medium is a physical and/or tangible storage medium. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Common forms of computer-readable media include, for example, magnetic and/or optical media, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read instructions and/or code.
The methods and systems presented in the current disclosure can be used in many different applications, such as in vehicles, in various types of headsets and/or head-worn apparatuses, hearing aids, and/or any mobile or handheld devices without departing from the teachings of the present disclosure.
The methods, systems, and devices discussed herein are examples. Various embodiments may omit, substitute, or add various procedures or components as appropriate. For instance, features described with respect to certain embodiments may be combined in various other embodiments. Different aspects and elements of the embodiments may be combined in a similar manner. The various components of the figures provided herein can be embodied in hardware and/or software. Also, technology evolves and, thus, many of the elements are examples that do not limit the scope of the disclosure to those specific examples.
Having described several embodiments, various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the disclosure. For example, the above elements may merely be a component of a larger system, wherein other rules may take precedence over or otherwise modify the application of the embodiments. Also, a number of steps may be undertaken before, during, or after the above elements are considered. Accordingly, the above description does not limit the scope of the disclosure to the exact embodiments described.
The present application claims the benefit of and priority to U.S. Provisional Application No. 62/873,962 filed Jul. 14, 2019, entitled “Capsule Matching and Anti-Wind Buffeting System.” The contents of U.S. Provisional Application No. 62/873,962 are incorporated herein by reference in their entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
62873962 | Jul 2019 | US |