The present disclosure is generally related to automated gain matching for multiple microphones.
Advances in technology have resulted in smaller and more powerful computing devices. For example, there currently exist a variety of portable personal computing devices, including wireless computing devices, such as portable wireless telephones, personal digital assistants (PDAs), and paging devices that are small, lightweight, and easily carried by users. More specifically, portable wireless telephones, such as cellular telephones and Internet protocol (IP) telephones, can communicate voice and data packets over wireless networks. Further, many such wireless telephones include other types of devices that are incorporated therein. For example, a wireless telephone can also include a digital still camera, a digital video camera, a digital recorder, and an audio file player. Also, such wireless telephones can process executable instructions, including software applications, such as a web browser application, that can be used to access the Internet. As such, these wireless telephones can include significant computing capabilities.
Audio processing systems in wireless telephones may use multiple-microphone systems that increase audio quality based on multi-channel digital processing algorithms. For example, in comparison to single-microphone systems, multiple-microphone systems may provide enhanced noise suppression (e.g., stationary noise suppression and non-stationary noise suppression) and may permit the audio processing systems to enable spatial-related audio features, such as position-dependent noises.
However, performance of the audio processing system may be degraded when there is a gain (e.g., sensitivity) mismatch between the microphones of the multiple-microphone system. Gain calibration calculation to correct such gain mismatches can be inaccurate and may be a significant burden on processing resources.
A method and an apparatus is disclosed for automated gain matching with respect to multiple microphones. Audio signals from multiples microphones may be digitally sampled at particular time instances to create digital data frames. For example, an audio signal from a reference microphone may be digitally sampled at a first time to generate a reference data frame, and an audio signal from a target microphone may also be digitally sampled at the first time to generate a target data frame. A single-source identifier (SSI) may determine that one source is present in the reference data frame and may determine that one source is present in the target data frame. A single channel signal detector (SC-SD) may determine whether the one source corresponds to speech or to background noise for both data frames. If the one source corresponds to background noise for both data frames, a power ratio associated with the power of the reference data frame and the power of the target data frame may be determined. The power ratio may be added to a histogram of power ratios to determine a gain calibration value for adjusting the gain of the target microphone. For example, the gain calibration value may be based on a particular power ratio in the histogram that has the highest count.
In a particular embodiment, a method includes receiving, at a processor, a first data frame at a first time from a first microphone. The method also includes receiving a second data frame at the first time from a second microphone. The method further includes calculating a power ratio of the first microphone and the second microphone based on the first data frame and the second data frame in response to determining that the first data frame and the second data frame are noise data frames.
In another particular embodiment, an apparatus includes a processor and a memory accessible to the processor. The memory stores instructions that are executable by the processor to cause the processor to receive a first data frame at a first time from a first microphone. The instructions also cause the processor to receive a second data frame at the first time from a second microphone. The instructions also cause the processor to calculate a power ratio of the first microphone and the second microphone based on the first data frame and the second data frame in response to determining that the first data frame and the second data frame are noise data frames.
In another particular embodiment, an apparatus includes means for receiving a first data frame at a first time from a first microphone. The apparatus also includes means for receiving a second data frame at the first time from a second microphone. The apparatus further includes means for calculating a power ratio of the first microphone and the second microphone based on the first data frame and the second data frame in response to determining that the first data frame and the second data frame are noise data frames.
In another particular embodiment, a computer-readable storage medium including instructions that, when executed by a processor, cause the processor to receive a first data frame at a first time from a first microphone. The instructions may also cause the processor to receive a second data frame at the first time from a second microphone. The instructions may also cause the processor to calculate a power ratio of the first microphone and the second microphone based on the first data frame and the second data frame in response to determining that the first data frame and the second data frame are noise data frames.
One particular advantage provided by at least one of the disclosed embodiments is an ability to generate fast and accurate estimates of microphone gain mismatches. Another particular advantage provided by at least one of the disclosed embodiments is an increased stability of microphone gain mismatch calculations, when compared to the minimum statistics algorithm, and an ability to adapt estimates of microphone gain mismatches to different types of background noise or noise spectra shapes. Other aspects, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.
Referring to
The noise detector 102 and the power ratio calculator 104 are configured to receive and process multiple data frames. For example, a first data frame 112, a second data frame 114, and an Nth data frame 116 may be provided to the noise detector 102 and to the power ratio calculator 104, where N is any integer greater than one. For example, if N is equal to 4, then four data frames are provided to the noise detector 102 and to the power ratio calculator 104. Each data frame 112-116 may correspond to digitized audio samples that are generated from analog audio from corresponding microphones. The analog audio from the corresponding microphones may be sampled at the same time (e.g., a first time) to generate the data frames 112-116. For example, the first data frame 112 may correspond to a first digitized audio sample of first analog audio from a first microphone (not shown), the second data frame 114 may correspond to a second digitized audio sample of second analog audio from a second microphone (not shown), and the Nth data frame 116 may correspond to an Nth digital audio sample of Nth analog audio from an Nth microphone (not shown). The first analog audio, the second analog audio, and the Nth analog audio may be sampled at the first time to generate the first data frame 112, the second data frame 114, and the Nth data frame, respectively. The first time may correspond to a particular time period. For example, in a particular embodiment, the first time may correspond to a particular clock cycle. In a particular embodiment, the first microphone may be a reference microphone and each additional microphone may be a target microphone.
Each data frame 112-116 may be a speech data frame, a noise data frame, or a multiple source data frame (e.g., a data frame that includes a substantial amount of speech and a substantial amount of noise). In a particular embodiment, a speech data frame may include a substantial amount of data that corresponds to speech and minimal (or zero) data that corresponds to background noise. A noise data frame may include a substantial amount of data that corresponds to background noise and minimal (or zero) data that corresponds to speech. In response to receiving the data frames 112-116, the noise detector 102 may be configured to determine whether each data frame 112-116 is a noise data frame. For example, the noise detector 102 may determine whether each data frame 112-116 is a single source data frame (e.g., corresponds to a single type of audio data) or a multiple source data frame. To illustrate, a single source data frame may be a speech data frame or a noise data frame. A multiple source data frame may be a data frame that includes a substantial amount of noise and speech. Such data frames include data that corresponds to two types of audio data (e.g., the noise type and the speech type). As an illustrative example, the noise detector 102 may determine whether the first data frame 112 is a speech data frame, a noise data frame, or a multiple source data frame. Likewise, the noise detector 102 may determine whether each of the second data frame 114 and the Nth data frame 116 is a speech data frame, a noise data frame, or a multiple source data frame. The noise detector 102 is configured to delete (or cease processing for purposes of gain matching) each data frame 112-116 associated with a particular sampling time (or time index) in response to a determination that any one data frame 112-116 associated with the particular sampling time (or time index) is a multiple source data frame. To illustrate, if the first data frame 112 is determined to include data that corresponds to noise and speech, the first data frame 112, the second data frame 114, and the Nth data frame 116 may all be dropped (e.g., processing of each of the data frames 112-116 may cease for purposes of gain matching).
When each data frame 112-116 is a single source data frame (e.g., corresponds to a single type of audio data), the noise detector 102 may identify whether each data frame 112-116 is a noise data frame or a speech data frame. To illustrate, the noise detector 102 may determine whether the first data frame 112 is a speech data frame, the noise detector 102 may determine whether the second data frame 114 is a speech data frame, etc. In response to a determination that each data frame 112-116 is not a speech data frame, the noise detector 102 may generate an activation signal 122 to enable (e.g., activate) the power ratio calculator 104. For example, a determination that each data frame 112-116 is not a speech data frame may indicate that each data frame 112-116 is a noise data frame.
The power ratio calculator 104 is configured to receive each of the data frames 112-116 and to calculate a power ratio of the first microphone (e.g., the reference microphone) and each target microphone in response to receiving the activation signal 122 from the noise detector 102. For example, the power ratio calculator 104 may calculate a first power ratio of the first microphone and the second microphone based on the first data frame 112 and the second data frame 114. Additionally, the power ratio calculator 104 may calculate an (N−1)th power ratio of the first microphone and the Nth microphone based on the first data frame 112 and the Nth data frame 116. In a particular embodiment, the power ratio calculator 102 may utilize time domain averaging (e.g., smoothing) when determining the power ratios. The power ratio calculator 104 may generate a strength signal 132 indicating the first power ratio and the second power ratio. The strength signal 132 may be provided to the histogram based estimator 106. In a particular embodiment, the first power ratio may correspond to a gain calibration value for a particular microphone. For example, the first power ratio (corresponding to the power ratio between the first microphone and the second microphone) may correspond to a gain calibration value 142 for the second microphone.
The histogram based estimator 106 is configured to receive the strength signal 132 from the power ratio calculator 104 and to maintain histograms for each power ratio. In a particular embodiment, the histograms are used to determine the gain calibration value 142 for each target microphone. For example, the estimated gain calibration values 142 for each target microphone may be generated by finding peaks in corresponding histograms. The peak may correspond to a power ratio in the histogram that appears most frequently. For example, the first power ratio (corresponding to the power ratio between the first microphone and the second microphone) may correspond to −1 decibel (dB). The first power ratio may be provided to the histogram based estimator 106 via the strength signal 132. The histogram based estimator 106 may add the first power ratio to a histogram associated with other power ratios between the first microphone and the second microphone and determine which power ratio occurs most frequently in the histogram. The power ratio that occurs most frequently (e.g., the particular power ratio with the highest count) may correspond to the gain calibration value 142 for the second microphone.
Determining calibration values based on data frames 112-116 when the data frames are noise data frames may permit the system 100 to converge quickly and accurately in real-time audio applications. For example, the system 100 may generate fast and accurate estimates of microphone gain mismatches. Using histograms of power ratios may provide increased stability of microphone gain mismatch calculations when compared to the minimum statistics algorithm, and an ability to adapt estimates of microphone gain mismatches to different types of background noise or noise spectra shapes.
Referring to
The first data frame 112 corresponding to the first microphone (e.g., the reference microphone) may be represented as x1(t)=s(t)+n(t), where s(t) corresponds to a directional source signal and where n(t) is a distributed background noise. In a particular embodiment, s(t) may correspond to speech. The second data frame 114 corresponding to the second microphone (e.g., the target microphone) may be represented as x2(t)=γ*s(t)+β*n(t), where (γ) corresponds to a difference in strength between the directional source of the first data frame 112 and the second data frame 114, and where (β) characterizes the gain mismatch between the first microphone and the second microphone. In real time applications, the directional source s(t), the background noise n(t), the difference in strength (γ), and the gain mismatch (β) may be unknown when the first data frame 112 and the second data frame 114 are received by the noise detector 102. In a particular embodiment, the Nth data frame 116 may be represented as xN(t)=γN*s(t)+βN*n(t), where (γN) corresponds to a difference in strength between the directional source of the first data frame 112 and the Nth data frame 116, and where (βN) characterizes the gain mismatch between the first microphone and the Nth microphone.
The SSI module 202 may be configured to determine whether each data frame 112-116 is a single source data frame or a multiple source data frame. For example, each data frame 112-116 may be provided to the SSI module 202. The SSI module 202 may detect the noise data frames and the speech data frames (e.g., the single source data frames). For example, a single source data frame may include noise n(t) or a signal s(t) (e.g., speech). In a particular embodiment, the SSI module 202 may determine whether each data frame 112-116 is a single source data frame based on a direction of sound components associated with the data frames 112-116. For example, a single source data frame may correspond to a data frame having sound components that come from a single direction (e.g., unidirectional sound components).
In another particular embodiment, the SSI module 202 may determine whether each data frame 112-116 is a multiple source data frame. In response to a determination that a particular data frame 112-116 is not a multiple source data frame, the SSI module 202 may determine that the particular data frame 112-116 is a single source data frame. A multiple source data frame may correspond to a data frame having sound components that come from multiple directions. Alternatively, or in addition, a multiple source data frame may correspond to a data frame where two or more sound components are detected as having an amplitude (e.g., based on a measured decibel level) that exceeds a particular threshold and that are detected as coming from different source directions.
In another particular embodiment, a matrix (e.g., a covariance matrix as described below) may be used to determine whether each data frame 112-116 is a single source data frame. For ease of illustration, the following description corresponds to determining whether the first and second data frames 112, 114 are single source data frames. However, the techniques used herein may be extended to determine whether other data frames (e.g., the Nth data frame 116) are single source data frames. Also, for ease of description, the signal s(t) is described herein as speech; however, in other embodiment, other signal types may be present.
Using the first data frame 112 (e.g., x1(t)=s(t)+n(t)) and the second data frame 114 (e.g., x2(t)=γ*s(t)+β*n(t)), data from a first time (e.g., t=k+1) to an Tth time (e.g., t=k+T) may be used to obtain
P1(k) may correspond to a power level of a channel corresponding to the first microphone, Px(k) may correspond to a correlation between the first microphone and the second microphone, and P2(k) may correspond to a power level of a channel corresponding to the second microphone. Ps(k) may correspond to a power level of the speech s(t) at the kth frame, and Pn(k) may correspond to the power level of the noise n(t) at the kth frame. In a particular embodiment, s(t) and n(t) are not correlated. The vector notation of the three equations may be expressed as
Thus, vectors corresponding to successive time indices from a first time to an Lth time may be represented as a matrix (H), where
When a data frame is a single source data frame (e.g., a speech data frame or a noise data frame), the rank of the matrix (H) may be equal to one. However, if the data frame is a multiple source data frame (e.g., a substantial amount of speech s(t) and noise n(t) are present), the rank of the matrix (H) may be equal to two. Thus, the SSI module 202 may detect the frames where one source (e.g., one type of audio data) is present by detecting the rank of the matrix (H). However, when one source is present (i.e., when the matrix (H) has a rank of one), the analysis of the matrix (H) does not indicate which type of audio data is present.
In a particular embodiment, calculations by the SSI module 202 may be simplified by utilizing eigenvalue decomposition of a covariance matrix (R) to determine whether each data frame 112-116 corresponds to a single type of audio data. The covariance matrix may be expressed as
where V is the eigen-matrix of the covariance matrix (R), and λi are the corresponding eigen values with λ1>λ2>λ3>0. Determining whether each data frame 112-116 corresponds to a single type of audio data may then be accomplished by the following comparison
If the comparison is true (e.g., if the left-hand-side of the above equation is greater than or equal to the threshold tλ), then each of the compared data frames (i.e., the first data frame 112 and the second data frame 114, in the above example) are single source data frames. For example, if the comparison is true, then each of the compared data frames corresponds to noise n(t) or corresponds to speech s(t) (e.g., correspond to a single type of audio data). The SSI module 202 may generate a signal 212 indicating whether each of the compared data frames is a single source data frame. For example, when each of the compared data frames is a single source data frame, the SSI module 202 may generate a logical high voltage signal (e.g., a logical “1” value) and provide the logical high voltage signal to the first input of the logical AND gate 206. Conversely, when one or more of the compared data frames corresponds to multiple types of audio data (e.g., noise and speech), the SSI module 202 may generate a logical low voltage signal (e.g., a logical “0” value) and provide the logical low voltage signal to the first input of the logical AND gate 206.
The SC-SD module 204 may be configured to detect whether each data frame 112-116 is a speech data frame. For example, for the first data frame 112 (e.g., x1(t)=s(t)+n(t)), the SC-SD module 204 may determine whether audio data corresponding to speech s(t) is present or whether audio data corresponding to speech s(t) is absent. The SC-SD module 204 may make similar determinations for the other data frames 114, 116. In a particular embodiment, the SC-SD module 204 is a single channel voice activity detector (SC-VAD). For example, the SC-SD module 204 may be configured to detect frames having a strong speech s(t) component. In a particular embodiment, the SC-SD module 204 uses a speech detection process that is based on a harmonic structure in human speech, which is usually low-frequency concentrated. Referring to
The speech detection process used by the SC-SD module 204 may be based on a single frame so that no error propagates from frame to frame during evaluation. Additionally, the speech detection process may be memory efficient and easily tunable. Further, the speech detection process is independent of input level.
For a particular data frame 112-116, the SC-SD module 204 may determine a magnitude of the particular data frame's 112-116 Fourier coefficients, Sf(k), where k (e.g., 1, . . . , Nf) is a frequency index, and Nf is a number of frequency bins. The speech detection process may also determine a cyclically shifted version of the Fourier coefficients (Sf(k)), which may be represented as Cf(k,τ), where τ is the amount of the shift. For example, the shifted version of the Fourier coefficients may be expressed as Cf(k,τ)=Sf((k+τ)*%*Nf), where % represents a modulation operation. Referring to
Referring to
Referring back to
The logical AND gate 206 is configured to receive the signal 212 from the SSI module 202 at the first input and to receive the signal 214 from the SC-SD module 204 at the second input. The logical AND gate 206 is configured to output the activation signal 122 based on the signals 212-214 received from the SSI module 202 and the SC-SD modules, respectively. For example, in response to the SSI module 202 generating a logical high voltage signal and the SC-SD module 204 generating a logical high voltage signal, the logical AND gate 206 may generate a logical high voltage activation signal (e.g., enabling the power ratio calculator 104 of
Referring to
The SSI module 402 may correspond to the SSI module 202 of
The SC-SD module 404 may correspond to the SC-SD module 204 of
Referring to
The first microphone 502 may generate a first analog audio signal and provide the first analog audio signal to the CODEC 508. The CODEC 508 may digitally sample the first analog audio signal at a first time to generate the first data frame 112. The second microphone 504 may generate a second analog audio signal and provide the second analog audio signal to the CODEC 508. The CODEC 508 may digitally sample the second analog audio signal at the first time to generate the second data frame 114. The Nth microphone 506 may generate an Nth analog audio signal and provide the Nth analog audio signal to the CODEC 508. The CODEC 508 may digitally sample the Nth analog audio signal at the first time to generate the Nth data frame 116.
The data frames 112-116 are provided to another particular illustrative embodiment of the noise detector 102. For example, the noise detector 102 includes a first two microphone SSI module 520 and an (N−1)th two microphone SSI module 522. Each two microphone SSI module 520, 522 may correspond to the SSI module 202 of
The noise detector 102 may also include a combinational circuit 530. In a particular embodiment, the combinational circuit 530 may be a logic gate or a series of logic gates configured to receive input signals from each two microphone SSI module 520, 522 and from each SC-SD module 524-528. In response to the input signals, the combinational circuit 530 may generate an activation signal 122. For example, when the input signals indicate that each of the data frames 112-116 is a single source data frame and that each of the data frames is classified as a noise data frame, the combinational circuit 530 may generate a logical high value (e.g., enabling the power ratio calculator 104 of
While several embodiments of the noise detector 102 have been illustrated, other embodiments are possible. For example, in another particular embodiment, the noise detector 102 may include a three microphones SSI module configured to receive three data frames generated from analog audio from three microphones. In another particular embodiment, a combinational circuit may selectively activate each SC-SD module 524-528 based on an output of each two microphone SSI module 520, 522. For example, in response to a determination by the first two microphone SSI module 520 that the first and the second data frames 112, 114 are single source data frames, the combinational circuit may activate the first and second SC-SD modules 524, 526. Additionally, in response to a determination by the (N−1)th two microphone SSI module 522 that the Nth data frame 116 are multiple source data frames, the combinational circuit may deactivate the Nth SC-SD module 528. Thus, the Nth data frame 116 may be omitted from subsequent gain matching calculations while gain matching calculations with respect to the first and second data frames 112, 114 proceed.
Referring to
The first frame power calculator module 602 is configured to receive the first data frame 112 and to calculate a first frame power of the first data frame 112. A first power signal representative of the first frame power is provided to the first ratio calculator module 612 and to the (N−1)th ratio calculator module 614. The second frame power calculator module 604 is configured to receive the second data frame 114 and to calculate a second frame power of the second data frame 114. A second power signal representative of the second frame power is provided to the first ratio calculator module 312. The Nth frame power calculator module 606 is configured to receive the Nth data frame 116 and to calculate an Nth frame power of the Nth data frame 116. An Nth power signal representative of the Nth frame power is provided to the (N−1)th ratio calculator module 614. In a particular embodiment, the ratio calculator modules 612, 614 may be selectively activated in response to a first activation signal and a second activation signal.
The first ratio calculator module 612 may calculate a first ratio 632 of the first frame power and the second frame power (e.g., calculate a power ratio for the second microphone 504 based on the first microphone 502 (e.g., the reference microphone)). The first ratio 632 may be provided to the histogram based estimator 106 as described with respect to
Referring to
The first histogram maintenance module 702 is configured to receive the first ratio 632 (or the first modified ratio 632′). The first histogram maintenance module 702 is configured to maintain a histogram of power ratios associated with other data frames received from the first microphone 502 and the second microphone 504 at other particular times. In response to receiving the first ratio 632, the first histogram maintenance module 702 adds the first ratio to the power ratios in the maintained histogram.
For example, referring to
Referring back to
The (N−1)th histogram maintenance module 704 is configured to receive the (N−1)th ratio 634 (or the (N−1)th modified ratio 634′). The (N−1)th histogram maintenance module 704 is configured to maintain a histogram of power ratios associated with other data frames received from the first microphone 502 and the Nth microphone 506 at other particular times. In response to receiving the (N−1)th ratio 634, the (N−1)th histogram maintenance module 704 adds the (N−1)th ratio to the power ratios in the maintained histogram. The (N−1)th histogram maintenance module 704 is configured to determine a (N−1)th gain calibration value 744 based on a power ratio that appears most frequently in the histogram corresponding to the (N−1)th ratio 634. The (N−1)th gain calibration value 744 may correspond to the gain calibration value 142 of
Each histogram maintenance module 702, 704 may be a short-term histogram maintenance module or a long-term histogram maintenance module. Long-term histogram maintenance modules may store power ratios over a first particular time period, and short-term histogram modules may store power ratios over a second particular time period. In a particular embodiment, the second particular time period is included in the first particular time period; however, the second particular time period is shorter than the first particular time period.
For example, long-term histogram maintenance modules may store each power ratio calculated by a corresponding ratio calculator module, and short-term histogram may only store power ratios calculated within a recent time period (e.g., store power ratios calculated within the last three seconds). In a particular embodiment, long-term histogram maintenance modules may store every power ratio calculated by a processor. With reference to
In a particular embodiment, the first gain calibration value 742 and the (N−1)th gain calibration value 744 may be provided to the first time-domain smoothing module 712 and the (N−1)th time-domain smoothing module 714, respectively. The time-domain smoothing modules 712, 714 may smooth the gain calibration values 742, 744 to generate modified calibration values 742′, 744′. The modified calibration values 742′, 744′ may be provided to gain adjustment circuits associated with the second and Nth microphones 504, 506, respectively.
Referring to
The histogram maintenance modules 802-808 may operate in substantially similar manner as the histogram maintenance modules 702, 704 of
For example, the short-term histogram maintenance modules 804, 808 may be responsive to the timer 810 in such a manner to only maintain power ratio histograms for a particular time period. For example, the timer 810 may generate a timing signal 812 indicating a relatively short time period (e.g., three seconds). The short-term histogram maintenance modules 804, 808 may maintain power ratios information in the corresponding short-term histograms for the relatively short time (e.g., for up to three seconds prior to the present time). The short-term histogram maintenance modules 802, 804 may generate gain calibration values 842, 844, respectively, based on a power ratio that appears most frequently within the corresponding short-term histograms.
The long-term histogram maintenance modules 802, 806 may maintain the corresponding long-term histograms for a longer period of time. For example, the long-term histograms may be maintained perpetually or from startup to shutdown of a device for which gain matching is being performed.
The gain calibration values 841, 843 (e.g., calibration estimates) associated with the long-term histogram maintenance modules 802, 806 may be expressed as gr. The gain calibration values 842, 844 (e.g., calibration estimates) associated with the short-term histogram maintenance modules 804, 808 may be expressed as gS. The first combinational circuit 852 may determine whether to use a first short-term calibration estimate gS of the first short-term histogram maintenance module 804 or a first long-term calibration estimate gL for gain matching. In a particular embodiment, the first short-term calibration estimate gS may be used if it is considered to be reliable. For example, first combinational circuit 852 may compare an absolute value of a difference between the first short-term calibration estimate gS and the first long-term calibration estimate gL (e.g., |gL−gS|) to a threshold β. If the absolute value is less than the threshold β, the first short-term calibration estimate gS may be considered to be reliable, and the first combinational circuit 852 may provide the first short-term calibration estimate 842 (gS) to a gain calibration circuit associated with the second microphone 502. Otherwise, the first combinational circuit 852 may provide the first long-term calibration estimate 841 (gL) to the gain calibration circuit associated with the second microphone 502. The pseudo code for the first combinational circuit 852 may be represented as:
if (|gL−gS|<β)
Where α is a smoothing parameter less than one, ct is the output calibration for the second microphone 504 (e.g., target microphone) at a present time (t), ct-1 is the output calibration for the second microphone 504 at a previous time instant (t−1).
The second combinational circuit 854 may operate in a substantially similar as the first combinational circuit 852 with respect to signals received from the Nth long-term histogram maintenance module 806 and the Nth short-term histogram maintenance module 808. For example, second combinational circuit 854 may compare an absolute value of a difference between a second short-term calibration estimate gS from the Nth short-term histogram maintenance module 808 and a second long-term calibration estimate gL from the Nth long-term histogram maintenance module 806 (e.g., |gL−gS) to the threshold β. If the absolute value is less than the threshold β, the second combinational circuit 854 may provide the second short-term calibration estimate 844 (gS) to a gain calibration circuit associated with the Nth microphone 504. Otherwise, the second combinational circuit 854 may provide the second long-term calibration estimate 843 (gL) to the gain calibration circuit associated with the Nth microphone 502.
Referring to
The method 1000 includes receiving a first data frame at a first time from a first microphone, at 1002. For example, in
The method 1000 may also include determining whether the first data frame and the second data frame are single source data frames, at 1006. For example, in
The method 1000 may also include determining whether the first data frame and the second data frame are speech data frames, at 1008. For example, in
A power ratio of the first microphone and the second microphone may be calculated based on the first data frame and the second data frame in response to determining that the first data frame and the second data frame are noise data frames, at 1010. For example, in
In a particular embodiment, the method 1000 may include determining a gain calibration value based on the power ratio. For example, the first ratio 832 generated by the first ratio calculator module 812 may be provided to a gain calibration circuit associated with the second microphone (e.g., the second microphone 504 of
Referring to
The memory 1132 may include histogram data 1154 and gain matching data 1152. In a particular embodiment, the histogram data 1154 may correspond to the histogram of power ratios illustrated in
The memory 1132 may be a tangible non-transitory processor-readable storage medium that includes the instructions 1158. The instructions 1156 may be executed by a processor, such as the processor 1110 or the components thereof, to perform the method 1000 of
In conjunction with the described embodiments, an apparatus is disclosed that includes means for receiving a first data frame at a first time from a first microphone. For example, the means for receiving the first data frame may include the noise detector 102 of
The apparatus may also include means for receiving a second data frame at the first time from a second microphone. For example, the means for receiving the second data frame may include the noise detector 102 of
The apparatus may also include means for calculating a power ratio of the first microphone and the second microphone based on the first data frame and the second data frame. For example, the means for calculating the power ratio may include the system 100 of
Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software executed by a processor, or combinations of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or processor executable instructions depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of non-transient storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or user terminal.
The previous description of the disclosed embodiments is provided to enable a person skilled in the art to make or use the disclosed embodiments. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other embodiments without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.
The present application claims priority from U.S. Provisional Patent Application No. 61/824,222, filed May 16, 2013, entitled “AUTOMATED GAIN MATCHING FOR MULTIPLE MICROPHONES,” the contents of which are incorporated by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
7171008 | Elko | Jan 2007 | B2 |
7716044 | Kobayashi et al. | May 2010 | B2 |
8098844 | Elko | Jan 2012 | B2 |
8229126 | Chamberlain et al. | Jul 2012 | B2 |
20040049380 | Ehara et al. | Mar 2004 | A1 |
20060147054 | Buck et al. | Jul 2006 | A1 |
20090136057 | Taenzer | May 2009 | A1 |
20090316731 | Kong | Dec 2009 | A1 |
20110051953 | Makinen et al. | Mar 2011 | A1 |
20110075859 | Kim et al. | Mar 2011 | A1 |
20110313763 | Amada | Dec 2011 | A1 |
20130272540 | Aahgren et al. | Oct 2013 | A1 |
Number | Date | Country |
---|---|---|
2466581 | Jun 2012 | EP |
2009130388 | Oct 2009 | WO |
Entry |
---|
International Search Report and Written Opinion for International Application No. PCT/US2014/036634, ISA/EPO, Date of Mailing Jul. 30, 2014, 11 pages. |
Number | Date | Country | |
---|---|---|---|
20140341380 A1 | Nov 2014 | US |
Number | Date | Country | |
---|---|---|---|
61824222 | May 2013 | US |