The present application is related to U.S. patent application Ser. No. 12/004,899 filed Dec. 21, 2007 and entitled “System and Method for 2-Channel and 3-Channel Acoustic Echo Cancellation,” and U.S. patent application Ser. No. 12/004,896 filed Dec. 21, 2007 and entitled “System and Method for Blind Subband Acoustic Echo Cancellation Postfiltering,” both of which are herein incorporated by reference.
The present application is also related to U.S. patent application Ser. No. 11/825,563 filed Jul. 6, 2007 and entitled “System and Method for Adaptive Intelligent Noise Suppression,” U.S. patent application Ser. No. 11/343,524, filed Jan. 30, 2006 and entitled “System and Method for Utilizing Inter-Microphone Level Differences for Speech Enhancement,” and U.S. patent application Ser. No. 11/699,732 filed Jan. 29, 2007 and entitled “System And Method For Utilizing Omni-Directional Microphones For Speech Enhancement,” all of which are also herein incorporated by reference.
1. Field of Invention
The present invention relates generally to audio processing and, more particularly, to envelope-based acoustic echo cancellation in an audio system.
2. Description of Related Art
Conventionally, when audio from a far-end environment is presented through a speaker of a near-end communication device, sounds from a far-end audio signal may be picked up by microphones or other audio sensors of the near-end communication device. As such, the sounds from the far-end audio signal may be sent back to the far-end environment resulting in an echo to a far-end listener.
Conventionally, acoustic echo cancellation (AEC) systems may take the far-end audio signal and use it to predict the echo of the far-end audio signal (after being played through the speaker and picked up by the microphone). Typically, a transfer function that describes a path from the far-end audio signal, through the speaker, through an acoustic environment, and back to the microphone is linearly modeled to predict the echo. These AEC systems are performed in a waveform domain whereby the echo is predicted, inverted, delayed, and subtracted out from a near-end audio signal.
Disadvantageously, there are many problems with these conventional AEC systems. First, the transfer function (i.e., relationship between the far-end audio signal and the echo) is typically constantly changing, since the acoustic environment is rarely fixed. In the case of a handheld communication device (e.g., a cellular phone), there may also be reflections off a face of a user. The prior art AEC systems are adaptive and continually update the transfer function. However, errors usually occur in the echo prediction due to the changing environment. If the echo prediction is even slightly incorrect, or an applied delay is incorrect, residual echo will remain.
A second disadvantage is that these prior art AEC systems typically use a linear model (i.e., linear filter) to predict the echo. However, the transfer function is often not linear (e.g., if there are non-linearities in the speaker which may cause distortion). As a result, poor echo prediction may occur.
Other AEC systems may attempt to overcome this disadvantage by introducing non-linearity to the echo prediction model. However, this results in more complexity. For example, non-linearity may become problematic in cellular phones or speakerphones with cheap components. The prediction may be difficult to obtain unless an exact model of the speaker (e.g., of the cellular phone or speakerphone) is known.
Embodiments of the present invention overcome or substantially alleviate prior problems associated with acoustic echo cancellation processing. Exemplary embodiments of the present invention utilize magnitude envelopes of acoustic signals to determine an echo envelope. An echo gain mask may then be generated based on the echo envelope.
In exemplary embodiments, a primary acoustic signal is received via a microphone of the communication device, and a far-end signal is received via a receiver. Because a speaker may provide audio (the far-end signal) that may be picked up by the microphone, the acoustic signal received by the microphone may include speaker leakage. As such, acoustic echo cancellation (AEC) is applied to the acoustic signal to obtain an AEC masked signal.
Frequency analysis is performed on the primary acoustic signal and the far-end acoustic signal to obtain frequency sub-bands. These frequency sub-bands may then be used to generate corresponding estimated energy spectra. The energy spectra are then used to determine echo-dominated and echo-free noise estimates.
An echo gain mask based on magnitude envelopes of the energy spectra of the primary and far-end acoustic signals for each frequency sub-band is generated. A noise gain mask based on at least the primary acoustic signal for each frequency sub-band may also be generated. The echo gain mask and the noise gain mask may then be combined. In one embodiment, the combination comprises selecting a minimum between the two gain masks. The combination of the echo gain mask and noise gain mask may then be applied to the primary acoustic signal to generate a masked signal, which may be output to a far-end environment.
The present invention provides exemplary systems and methods for providing envelope-based acoustic echo cancellation (EnvAEC). Exemplary embodiments perform the EnvAEC based on frequency sub-bands and prediction envelopes of echo waveforms, as opposed to details of actual echo waveforms. The envelopes, thus, show how energy in the waveforms may change over time.
Exemplary embodiments are configured to reduce and/or minimize effects of speaker signal leakage to one or more microphones in such a way that the far-end environment does not perceive an echo. While the following description will be discussed using a two microphone system, it should be noted that embodiments of the present invention may be applied to a single microphone envelope-based acoustic echo cancellation system.
Embodiments of the present invention may be practiced on any device that is configured to receive audio such as, but not limited to, cellular phones, phone handsets, headsets, and conferencing systems. While embodiments of the present invention will be described in reference to operation on a speakerphone, the present invention may be practiced on any audio device.
Referring to
The exemplary communication device 104 comprises a microphone 106 (i.e., primary microphone), speaker 108, and an audio processing system 110 including an acoustic echo cancellation mechanism. The microphone 106 is configured to pick up audio from the acoustic source 102, but may also pick up noise from the near-end environment 100. The audio received from the acoustic source 102 will comprise a near-end microphone signal y(t), which will be sent back to a far-end environment 112.
In some embodiments, one or more additional microphones (not shown) may be present in the communication device 104. The one or more additional microphones may be located a distance away from the primary microphone 106. In some embodiments, the microphone(s) may comprise omni-directional microphones.
An acoustic signal x(t) comprising speech from the far-end environment 112 may be received via a communication network 114 by the communication device 104. The received acoustic signal x(t) may then be provided to the near-end environment 100 via the speaker 108. The audio output from the speaker 108 may leak back into (i.e., be picked up by) the microphone 106. This leakage may result in an echo perceived at the far-end environment 112.
The exemplary audio processing system 110 is configured to remove u(t) (i.e., echoes of x(t)) from y(t), while preserving a near-end voice signal v(t). In exemplary embodiments, the removal of u(t) is performed without introducing distortion to a far-end listener. This may be achieved by calculating and applying time and frequency varying multiplicative gains or masks that render the acoustic echo inaudible. Ideally, the gains are less than 1 (i.e., less than 0 dB) to result in signal attenuation. In various embodiments, the attenuation is strong when echo dominates over other components of the signal.
Referring now to
The exemplary receiver 200 is an acoustic sensor configured to receive the far-end signal x(t) from the network 114. In some embodiments, the receiver 200 may comprise an antenna device. The received far-end signal x(t) may then be forwarded to the audio processing system 110 and the output device 206.
The audio processing system 110 is also configured to receive the acoustic signals from the acoustic source 102 via the primary and optional secondary microphones 106 and 204 (e.g., primary and secondary acoustic sensors) and process the acoustic signals. The primary and secondary microphones 106 and 204 may be spaced a distance apart in order to allow for an energy level differences between them. After reception by the microphones 106 and 204, the acoustic signals may be converted into electric signals (i.e., a primary electric signal and a secondary electric signal). The electric signals may themselves be converted by an analog-to-digital converter (not shown) into digital signals for processing in accordance with some embodiments. In order to differentiate the acoustic signals, the acoustic signal received by the primary microphone 106 is herein referred to as the primary acoustic signal, while the acoustic signal received by the secondary microphone 204 is herein referred to as the secondary acoustic signal. It should be noted that embodiments of the present invention may be practiced utilizing any number of microphones. In exemplary embodiments, the acoustic signals from the secondary microphone 204 are used for total noise estimation as will be discussed further below.
The output device 206 is any device which provides an audio output to a listener (e.g., the acoustic source 102). For example, the output device 206 may comprise the speaker 108, an earpiece of a headset, or handset on the communication device 104.
In various embodiments, where the primary and secondary microphones are omni-directional microphones that are closely-spaced (e.g., 1-2 cm apart), a beamforming technique may be used to simulate a forwards-facing and a backwards-facing directional microphone response. A level difference may be obtained using the simulated forwards-facing and the backwards-facing directional microphone. The level difference may be used to discriminate speech and noise in the time-frequency domain which can be used in noise estimation.
In operation, the acoustic signals received from the primary and secondary microphones 106 and 204 and the far-end acoustic signal x(t) are converted to electric signals and processed through the frequency analysis module 302. The frequency analysis module 302 takes the acoustic signals and mimics the frequency analysis of the cochlea (i.e., cochlear domain) simulated by a filter bank. In one embodiment, the frequency analysis module 302 separates the acoustic signals into frequency bands or sub-bands. Alternatively, other filters such as short-time Fourier transform (STFT), Fast Fourier Transform, Fast Cochlea transform, sub-band filter banks, modulated complex lapped transforms, cochlear models, a gamma-tone filter bank, wavelets, or any generalized spectral analysis filter/method, can be used for the frequency analysis and synthesis.
Because most sounds (e.g., acoustic signals) are complex and comprise more than one frequency, a sub-band analysis on the acoustic signal may be performed to determine what individual frequencies are present in the acoustic signal during a frame (e.g., a predetermined period of time). According to one embodiment, the frame is 5-20 ms long (e.g., 40 to 160 samples for a system audio sampling rate of 8000 Hz). Alternative embodiments may utilize other frame lengths. Data may be pushed through the audio processing system 110 in these frames (i.e., blocks of buffered samples).
The output of the frequency analysis module 302 comprises a plurality of waveforms. Thus, if the acoustic signal comprises high frequency bands, the resulting waveforms are more energetic. As will be discussed further below, the envelopes of these waveforms in these frequency bands are analyzed for echo suppression. Specifically, the envelopes of the far-end acoustic signal are used to predict the echo envelopes that will be present in the near-end acoustic signal (e.g., primary acoustic signal).
Once the frequencies are determined, the signals are forwarded to the energy module 304 which computes energy/power estimates for the primary, secondary, and far-end acoustic signals during an interval of time for each frequency sub-band (i.e., power estimates). As such an average power output for each frequency sub-band (i.e., power spectrum) may be calculated for each of the acoustic signals in frames. In exemplary embodiments, the frames comprise 5 ms time periods. Thus, buffers are filled up with 5 ms of frequency analysis module 302 output. An average power per frame may then be determined.
The exemplary energy module 304 is a component which, in some embodiments, can be represented mathematically by the following equation:
E(t,ω)=λE|X(t,ω)|2+(1−λE)E(t−1,ω)
where λE is a number between zero and one that determines an averaging time constant, X(t,ω) is the acoustic signal being processed (e.g., the primary, secondary, or far-end acoustic signal) in the cochlea domain, ω represents the frequency, and t represents time. Given a desired time constant T (e.g., 4 ms) and sampling frequency fs (e.g., 16 kHz), the value of λE can be approximated as
As provided, the energy level for the acoustic signal, E(t,ω), is dependent upon a previous energy level of the acoustic signal, E (t−1,ω).
The exemplary noise estimate module 306 is configured to determine a noise estimate based on the primary acoustic signal. In some embodiments, this noise estimate module 306 may produce a stationary noise estimate based on constant ambient noise in the near-end environment 100. This stationary noise estimate may be later augmented by non-stationary noise components as a function of both the primary microphone 106 and the optional secondary microphone 204.
In one embodiment, the noise estimate module 306 comprises a minimum statistics tracker (MST) which receives the energy of the primary acoustic signal from the signal path for processing. The determination of the noise estimate, according to one embodiment, is discussed in more detail in connection with U.S. patent application Ser. No. 12/072,931 entitled “System and Method for Providing Single Microphone Noise Suppression Fallback,” which is incorporated by reference. The noise estimate is then provided to the echo mask generator 308 and the total noise estimate module 310.
The exemplary echo mask generator 308 is configured to generate an echo gain mask that will render echo inaudible. The echo gain mask is generated based on predicted envelopes (of the echo waveforms) based on the far-end acoustic signal. By analyzing the near-end signal (e.g., primary acoustic signal) and the predicted envelopes, a determination may be made as to where and when echo may be audible in frequency sub-bands. Echo suppression may then be applied to these frequency sub-bands. The output of the echo mask generator 308 comprises a gain value per frequency sub-band and frame. The echo mask generator 308 will be discussed in more detail in connection with
The result of the echo mask generator 308 along with the noise estimate and the optional secondary acoustic signal energy are forwarded to the total noise estimate module 310. In some embodiments, the total noise estimate module 310 may comprise an echo-free noise estimate module. The total noise estimate module 310 is configured to compute an estimate of the near-end noise power spectra (e.g., time and frequency dependent portion of the acoustic signal that is not from the acoustic source 102). In exemplary embodiments, the total noise estimate module 310 refines the noise estimate received from the noise estimate module 306, which may be corrupted by echo power.
The results from the total noise estimate module 310 may then be used by the noise mask generator 312 to determine a noise suppression gain mask. Various embodiments of the exemplary total noise estimate module 310 and the exemplary noise mask generator 312 are further discussed in U.S. patent application Ser. No. 12/004,899 filed Dec. 21, 2007 and entitled “System and Method for 2-Channel and 3-Channel Acoustic Echo Cancellation,” and U.S. patent application Ser. No. 12/004,896 filed Dec. 21, 2007 and entitled “System and Method for Blind Subband Acoustic Echo Cancellation Postfiltering” which are both hereby incorporated by reference.
In some embodiments, the echo gain mask may be refined in each sub-band to reduce near-end speech distortion. In exemplary embodiments, the echo and noise mask integration module 314 takes into account the near-end noise level (from the echo-free noise estimate determined by the total noise estimate module 310) and the noise suppression gain mask (from the noise mask generator 312). In various embodiments, the echo gain mask may be limited such that a total output power is not more than a certain amount (e.g., 6 dB) below the output noise power that will be produced by applying the noise suppression gain mask to the near-end noise. This process may reduce perception of, output noise modulation correlated with the echo, while still ensuring the echo remains inaudible.
The echo gain mask is combined with the noise gain mask by the echo and noise mask integration module 314. In one embodiment, the echo and noise mask integration module 314 may select a minimum between the two gain masks (i.e., noise gain mask and echo gain mask) for each sub-band, with a lower limit on how far below the noise gain mask the combined mask can be. A possible mathematical description of this combination operation is:
where gtot is a final gain mask, min(x,y) is a minimum of x and y, max(x,y) is a maximum of x and y, gN is the noise gain mask, gE is the echo gain mask, Py is the total power in the frame, Pn is the estimated noise power in the frame, and γ is maximum tolerated modulation on the output noise. For example, if the amount of tolerated modulation is −6 dB (i.e., 6 dB down), γ will equal 10−6/20 or around 0.5.
The noise gain mask is produced from the total noise estimate and may be defined such that the noise power is reduced at the system output but not rendered completely inaudible. As such, the echo and noise mask integration module 314 may be used to negotiate different objectives of the noise suppression and echo cancellation masks. Furthermore, it should be noted that the final gain mask may be defined such that no noise reduction, or at least multiplicative noise masking, is performed in the system. This is equivalent to taking the noise mask generator 312 out of the audio processing system 110. The result is a final gain mask that may be used for modification of the primary acoustic signal from the primary microphone 106. Accordingly in exemplary embodiments, gain masks may be applied to an associated frequency band of the primary acoustic signal in the modifier/reconstructor module 316.
Next, the post-AEC and noise suppressed frequency bands are converted back into time domain from the cochlea domain. The conversion may comprise taking the post-AEC and noise suppressed frequency bands and adding together phase shifted signals of the cochlea channels in the modifier/reconstructor module 316. In one embodiment, the reconstruction portion of the modifier/reconstructor module 316 comprises a frequency synthesis module. Once conversion is completed, the synthesized, masked acoustic signal may be output (e.g., forwarded to the communication network 114 and sent to the far-end environment 112).
In the embodiment of
It should be noted that the system architecture of the audio processing system 110 of
Referring now to
The inputs are then subjected to a square-root (sqrt) operation in the sqrt module 402. The sqrt operation transforms the power spectrum into magnitudes. For a given frequency band, a sequence of such magnitudes over time comprises an envelope, or one amplitude per frame.
In exemplary embodiments, a prediction of an echo in a near-end envelope is used to determine whether the near-end envelope is dominated by echo to a point that the echo is audible. The prediction may be based on the far-end signal (and thus the far-end envelope). For example, if there is no energy in the far-end envelope, there is likely no echo and thus no echo suppression. However, if there is energy both in the far-end and the near-end envelopes, an echo within the near-end envelopes is more likely. In these examples, echo suppression will be applied. The echo mask generator 308 comprises logic which analyzes the prediction versus an observation in order to determine where and when to apply echo suppression.
In some embodiments, the amount of echo suppression may be limited by perceptual effects. For example, if a loud sound is followed by a soft sound in quick succession, forward masking may occur whereby the soft sound is not really perceived. In this situation, suppression may not be necessary.
The determination of whether the near-end envelope is dominated by echo may occur in a mask generator 404. If the echo is audible, the mask generator 404 will have a low gain value (e.g., 0). Otherwise, the mask generator 404 may have an “all pass” gain value (e.g., 1). A mathematical representation of this determination is
where m is the mask generated by the mask generator 404, y is the near-end envelope, û is the echo prediction, τ1 is a dominance threshold for a ratio of non-echo power to echo power, and H is a Heavyside step function (which is 1 if its input is greater than 1 and 0 otherwise). A reasonable value for τ1, for example, is 1 (0 dB). The lower τ1 is, the higher the echo power has to be relative to the total power before it is attenuated.
An echo predictor 406 is configured to use the far-end envelope (per frequency band) to predict its echo in the near-end envelope. In one embodiment, the echo predictor 406 may be simplified by assuming that the echo envelope is just a delayed and scaled version of the far-end envelope. Mathematically, this assumption may be represented by
û(n)=g·x(n−d), (2)
where g is a scale factor, n is a frame index, and d is a number of frames delayed. In some embodiments, this representation may provide an accurate estimate due to low temporal resolution of the envelope (which is bolstered by smoothing). This allows the prediction to be robust to changes in the delay and echo tail length. Often, the number of frames delayed d can be measured and stored as a priori information. In other embodiments, d may be estimated during operation (e.g., by finding a peak of a cross-correlation between the near-end and far-end signals).
It may also be possible to extend the prediction to a multiple reflection model, such as
where the echo is predicted as a summation of far-end echoes at various delays and scale factors. However, in most embodiments, such an extreme model may not be necessary.
Smoothers 408 and 410 may smooth the magnitudes, and thus envelopes, of the far-end and near-end signals. The temporal smoothing removes “wiggles” in the envelope that may not be important for echo suppression. These “wiggles” if not removed, may result in glitches in the echo prediction. As a result, low temporal resolution can be further enforced by applying the temporal smoothing to the envelopes. An example of a simple form of smoothing is a leaky integrator, mathematically represented by,
x(n)=α−x(n−1)+(1−α)·X(n), (4)
where α is a smoothing factor and x is an unsmoothed envelope. In this embodiment, more smoothing is performed at higher α. As a result, the system may be less susceptible to delay and echo tail changes.
Smoothing may also be beneficial for filtering out hard-to predict non-linear interactions (e.g., “cross-terms”) between the echo and other near-end signal components inconsequential to determining echo dominance. However, if too much smoothing is performed, the system may not react quickly enough for echo dynamics. An example of a reasonable value for α is 0.85 for a frame rate of 200 Hz, which corresponds to a time constant of around −200/ln(0.85)=30 ms. For this value, the system is able to accurately predict the echo envelope, even if the echo tail is up to 600 ms long.
Exemplary embodiments are robust to changes in echo delay and tail length. As such, the accuracy of the echo prediction will be largely determined by the accuracy of the gain g. In most embodiments, this gain g may not be known a priori with enough precision, and so it must be estimated during operation. Further, complicating matters, g can significantly change at any time due to echo path or speaker volume changes.
Thus, embodiments of the present invention adaptively estimate g using a gain updater 412. In some embodiments, g is updated quickly when echo is dominating the near-end signal, and updated slowly or not at all otherwise. In exemplary embodiments, an adaptation control module 414 provides the information necessary to determine how quickly to update g. In some embodiments, the information may come in a form of a frame- and frequency-varying parameter μ. The larger μ is, the faster g is updated.
Given μ, an efficient way of updating g is given by a normalized least-mean-square (NLMS) algorithm, such as,
where xd=x(n−d). d subscripts denote that the expression is valid for an a priori known delay d.
If a vector of delays is being considered in conjunction with the prediction model (3), there may be well known and simple extensions to Eq. (5) that apply. Additionally, Δ is a regularization parameter that slows the update of g when the far-end signal is small. Eq. (5) may be re-written, using (2), as
where
is between 0 and 1. Eq. (6) gives an intuitive explanation that the gain estimate is an adaptively smoothed version of a ratio of near-end to far-end envelopes. The output of the gain updater 412 may be fed back to the echo predictor 406 in the next frame. It should be noted that if any upper or lower limits on g are known a priori, these limits may be applied before passing the result to the echo predictor 406.
The exemplary adaptation control module 414 is configured to notify the gain updater 412 when and how fast to safely adapt the gain g via the parameter μ (which is between 0 and 1 with 1 being the fastest). In exemplary embodiments, a determination of when the echo sufficiently dominates the near-end signal (e.g., when the ratio y/x in Eq. (6) is a meaningful observation of g) is performed. The output of the adaptation control module 414 encapsulates an adaptation speed.
In accordance with some embodiments, multiple cues may be utilized to help strike a balance between updating too quickly (e.g., the system reacts erroneously to near-end sources) and updating too slowly (e.g., the system doesn't react quickly enough to echo path changes). The cues may comprise, for example, adaptive far-end noise floor, near-end noise floor, and maximum gain. While these examples, as will be discussed below, provide outputs of either 0 or 1, alternative cues can produce values between 0 and 1.
The adaptive far-end floor cue is an adaptive version of the Δ in Eq. (5) (i.e., an adaptive threshold on the far-end energy). As a result, only the most powerful times and frequencies of x enable fast adaptation, since the echo is most likely to dominate where x is its most powerful. One method of insuring this result is by performing a long-term average/smoothing of past values of x, and then determining whether or not the current value of x is far enough above the long term average xLT.
Besides smoothing (e.g., Eq. (4)), an alternative method to produce xLT is to convert x to decibel units and apply a linear slew rate. This may be mathematically represented by,
where dB[x] denotes a conversion of x to dB units. This method may result in smoother tracking behavior. Furthermore, independent control of upward and downward slew rates via γup and γdown allows a user to control at what percentile in a range of x the long-term average settles (e.g., it is higher for higher ratios of γup to γdown).
The output of this adaptive far-end floor cue can be represented as
Q1=H[dB[x]>xLT+τ2], (8)
where higher values of τ2 are more selective to higher values of dB[x].
The near-end noise floor cue utilizes a noise power spectrum input, N, such that adaptation is prevented if the near-end envelope is not high enough above the noise envelope. In these embodiments, echo cannot dominate the signal if the near-end envelope is not high enough above the noise envelope. Therefore, mathematically,
Q2=H[y>N·τ3], (9)
where higher values of τ3 require higher minimum dominance over the ambient noise.
In exemplary embodiments, the maximum gain cue may be useful if there is an upper limit gmax expected on the ratio y/x (the gain observation) if y is dominated by echo, while that maximum is often exceeded when y is dominated by near-end sources. In such an embodiment, a condition may be mathematically set such that
Q3=H[y<gmax·x]. (10)
Alternative cues may be devised to increase robustness of the system. The final output of the adaptation control module 414 may be obtained by combining all of the cues and applying an absolute maximum on the update speed, according to exemplary embodiments. This may be mathematically represented by,
μ=Q1·Q2·Q3·μmax, (11)
In exemplary embodiments, the combination of all the cues being true (e.g., value=1) will result in adaptation. In contrast, if one or more the cues is false (e.g., value=0), adaptation will not occur.
An alternative embodiment of the audio processing system is shown in
Referring now to
The acoustic signals are then converted to electric signals and processed through the frequency analysis module 302 to obtain frequency sub-bands in step 604. In one embodiment, the frequency analysis module 302 takes the acoustic signals and mimics the frequency analysis of a cochlea (i.e., cochlear domain) simulated by a filter bank. The result comprises frequency sub-bands.
In step 606, energy estimates for the acoustic signals are computed. In one embodiment, the energy estimates are determined by the energy module 304. The exemplary energy module 304 utilizes a present acoustic signal and a previously calculated energy estimate to determine the present energy estimate for each acoustic signal at each frequency sub-band.
Subsequently, the noise estimate is determined in step 608. According to embodiments of the present invention, the noise estimate for each frequency sub-band is based on the acoustic signal received at the primary microphone 106. The noise estimate may then be provided to the echo mask generator 308 and the total noise estimate module 310.
The echo mask is then generated in step 610. Step 610 will be discussed in more detail in connection with
A noise suppression gain mask is generated in step 612. In exemplary embodiments a total noise estimate that may be echo free is determined by the total noise estimate module 310. In exemplary embodiments, the total noise estimate is determined for each frequency sub-band. In exemplary embodiments, the total noise estimate module 310 is configured to compute an estimate of the near-end noise power spectrum (e.g., time and frequency dependent portion of the acoustic signal that is not from the acoustic source 102). In some embodiments, the total noise estimate module 310 may refine the noise estimate received from the noise estimate module 306, which may be corrupted by echo power. The noise suppression gain mask may then be generated using the total noise estimate by the noise mask generator 312.
In step 614, a combined echo/noise suppression mask is generated. In exemplary embodiments, the combined echo and noise mask integration module 314 generates the combined echo/noise suppression mask. In one embodiment, the echo and noise mask integration module 314 may select a minimum between the two gain masks (i.e., noise gain mask and echo gain mask) for each sub-band, with a lower limit on how far below the noise gain mask the combined mask can be.
The result of the echo and noise mask integration module 314 is a final gain mask that may be used for modification of the primary acoustic signal from the primary microphone 106 in step 616. The modified signal may then be reconstructed and output.
Referring now to
The far-end and near-end envelopes are then smoothed in step 704. The temporal smoothing removes “wiggles” in the envelope that may not be important for echo suppression. In one embodiment, the smoothing may be performed using a leaky integrator.
In step 708, the echo envelope is predicted. The echo predictor 406 is configured to use the far-end envelope (per frequency band) to predict its echo in the near-end envelope. In one embodiment, the echo predictor 406 may be simplified by assuming that the echo envelope is just a delayed and scaled version of the far-end envelope. In some embodiments, this representation may provide an accurate estimate due to low temporal resolution of the envelope (which is bolstered by smoothing).
The gain mask may be generated or updated in step 710. The gain mask may be generated by looking at a ratio between the echo prediction and the echo prediction power from the total near-end power.
Embodiments of the present invention adaptively estimate gain using the gain updater 412. In some embodiments, gain is updated quickly when echo is dominating the near-end signal, and updated slowly or not at all otherwise. In exemplary embodiments, the adaptation control module 414 provides the information necessary to determine how quickly to update gain. In some embodiments, the information may come in a form of a frame- and frequency-varying parameter μ. The larger μ is, the faster g is updated.
The above-described modules can be comprised of instructions that are stored on storage media. The instructions can be retrieved and executed by the processor 202. Some examples of instructions include software, program code, and firmware. Some examples of storage media comprise memory devices and integrated circuits. The instructions are operational when executed by the processor 202 to direct the processor 202 to operate in accordance with embodiments of the present invention. Those skilled in the art are familiar with instructions, processor(s), and storage media.
The present invention is described above with reference to exemplary embodiments. It will be apparent to those skilled in the art that various modifications may be made and other embodiments can be used without departing from the broader scope of the present invention. For example, embodiments of the present invention may be applied to any system (e.g., non speech enhancement system) utilizing AEC. Therefore, these and other variations upon the exemplary embodiments are intended to be covered by the present invention.
Number | Name | Date | Kind |
---|---|---|---|
3976863 | Engel | Aug 1976 | A |
3978287 | Fletcher et al. | Aug 1976 | A |
4137510 | Iwahara | Jan 1979 | A |
4433604 | Ott | Feb 1984 | A |
4516259 | Yato et al. | May 1985 | A |
4535473 | Sakata | Aug 1985 | A |
4536844 | Lyon | Aug 1985 | A |
4581758 | Coker et al. | Apr 1986 | A |
4628529 | Borth et al. | Dec 1986 | A |
4630304 | Borth et al. | Dec 1986 | A |
4649505 | Zinser, Jr. et al. | Mar 1987 | A |
4658426 | Chabries et al. | Apr 1987 | A |
4674125 | Carlson et al. | Jun 1987 | A |
4718104 | Anderson | Jan 1988 | A |
4811404 | Vilmur et al. | Mar 1989 | A |
4812996 | Stubbs | Mar 1989 | A |
4864620 | Bialick | Sep 1989 | A |
4920508 | Yassaie et al. | Apr 1990 | A |
5027410 | Williamson et al. | Jun 1991 | A |
5054085 | Meisel et al. | Oct 1991 | A |
5058419 | Nordstrom et al. | Oct 1991 | A |
5099738 | Hotz | Mar 1992 | A |
5119711 | Bell et al. | Jun 1992 | A |
5142961 | Paroutaud | Sep 1992 | A |
5150413 | Nakatani et al. | Sep 1992 | A |
5175769 | Hejna, Jr. et al. | Dec 1992 | A |
5187776 | Yanker | Feb 1993 | A |
5208864 | Kaneda | May 1993 | A |
5210366 | Sykes, Jr. | May 1993 | A |
5224170 | Waite, Jr. | Jun 1993 | A |
5230022 | Sakata | Jul 1993 | A |
5289273 | Lang | Feb 1994 | A |
5319736 | Hunt | Jun 1994 | A |
5323459 | Hirano | Jun 1994 | A |
5341432 | Suzuki et al. | Aug 1994 | A |
5381473 | Andrea et al. | Jan 1995 | A |
5381512 | Holton et al. | Jan 1995 | A |
5400409 | Linhard | Mar 1995 | A |
5402493 | Goldstein | Mar 1995 | A |
5402496 | Soli et al. | Mar 1995 | A |
5471195 | Rickman | Nov 1995 | A |
5473702 | Yoshida et al. | Dec 1995 | A |
5473759 | Slaney et al. | Dec 1995 | A |
5479564 | Vogten et al. | Dec 1995 | A |
5502663 | Lyon | Mar 1996 | A |
5544250 | Urbanski | Aug 1996 | A |
5574824 | Slyh et al. | Nov 1996 | A |
5583784 | Kapust et al. | Dec 1996 | A |
5587998 | Velardo et al. | Dec 1996 | A |
5590241 | Park et al. | Dec 1996 | A |
5602962 | Kellermann | Feb 1997 | A |
5675778 | Jones | Oct 1997 | A |
5682463 | Allen et al. | Oct 1997 | A |
5694474 | Ngo et al. | Dec 1997 | A |
5706395 | Arslan et al. | Jan 1998 | A |
5717829 | Takagi | Feb 1998 | A |
5729612 | Abel et al. | Mar 1998 | A |
5732189 | Johnston et al. | Mar 1998 | A |
5749064 | Pawate et al. | May 1998 | A |
5757937 | Itoh et al. | May 1998 | A |
5792971 | Timis et al. | Aug 1998 | A |
5796819 | Romesburg | Aug 1998 | A |
5806025 | Vis et al. | Sep 1998 | A |
5809463 | Gupta et al. | Sep 1998 | A |
5825320 | Miyamori et al. | Oct 1998 | A |
5839101 | Vahatalo et al. | Nov 1998 | A |
5845243 | Smart et al. | Dec 1998 | A |
5920840 | Satyamurti et al. | Jul 1999 | A |
5933495 | Oh | Aug 1999 | A |
5943429 | Handel | Aug 1999 | A |
5956674 | Smyth et al. | Sep 1999 | A |
5974380 | Smyth et al. | Oct 1999 | A |
5978824 | Ikeda | Nov 1999 | A |
5983139 | Zierhofer | Nov 1999 | A |
5990405 | Auten et al. | Nov 1999 | A |
6002776 | Bhadkamkar et al. | Dec 1999 | A |
6061456 | Andrea et al. | May 2000 | A |
6072881 | Linder | Jun 2000 | A |
6097820 | Turner | Aug 2000 | A |
6108626 | Cellario et al. | Aug 2000 | A |
6122610 | Isabelle | Sep 2000 | A |
6134524 | Peters et al. | Oct 2000 | A |
6137349 | Menkhoff et al. | Oct 2000 | A |
6140809 | Doi | Oct 2000 | A |
6173255 | Wilson et al. | Jan 2001 | B1 |
6180273 | Okamoto | Jan 2001 | B1 |
6205422 | Gu et al. | Mar 2001 | B1 |
6216103 | Wu et al. | Apr 2001 | B1 |
6219408 | Kurth | Apr 2001 | B1 |
6222927 | Feng et al. | Apr 2001 | B1 |
6223090 | Brungart | Apr 2001 | B1 |
6226616 | You et al. | May 2001 | B1 |
6263307 | Arslan et al. | Jul 2001 | B1 |
6266633 | Higgins et al. | Jul 2001 | B1 |
6317501 | Matsuo | Nov 2001 | B1 |
6321193 | Nystrom et al. | Nov 2001 | B1 |
6339758 | Kanazawa et al. | Jan 2002 | B1 |
6355869 | Mitton | Mar 2002 | B1 |
6363345 | Marash et al. | Mar 2002 | B1 |
6381570 | Li et al. | Apr 2002 | B2 |
6430295 | Handel et al. | Aug 2002 | B1 |
6434417 | Lovett | Aug 2002 | B1 |
6449586 | Hoshuyama | Sep 2002 | B1 |
6453289 | Ertem et al. | Sep 2002 | B1 |
6469732 | Chang et al. | Oct 2002 | B1 |
6487257 | Gustafsson et al. | Nov 2002 | B1 |
6496795 | Malvar | Dec 2002 | B1 |
6513004 | Rigazio et al. | Jan 2003 | B1 |
6516066 | Hayashi | Feb 2003 | B2 |
6529606 | Jackson, Jr. II et al. | Mar 2003 | B1 |
6549630 | Bobisuthi | Apr 2003 | B1 |
6584203 | Elko et al. | Jun 2003 | B2 |
6622030 | Romesburg et al. | Sep 2003 | B1 |
6717991 | Gustafsson et al. | Apr 2004 | B1 |
6718309 | Selly | Apr 2004 | B1 |
6738482 | Jaber | May 2004 | B1 |
6748095 | Goss | Jun 2004 | B1 |
6760450 | Matsuo | Jul 2004 | B2 |
6785381 | Gartner et al. | Aug 2004 | B2 |
6792118 | Watts | Sep 2004 | B2 |
6795558 | Matsuo | Sep 2004 | B2 |
6798886 | Smith et al. | Sep 2004 | B1 |
6810273 | Mattila et al. | Oct 2004 | B1 |
6882736 | Dickel et al. | Apr 2005 | B2 |
6915264 | Baumgarte | Jul 2005 | B2 |
6917688 | Yu et al. | Jul 2005 | B2 |
6944510 | Ballesty et al. | Sep 2005 | B1 |
6978159 | Feng et al. | Dec 2005 | B2 |
6982377 | Sakurai et al. | Jan 2006 | B2 |
6999582 | Popovic et al. | Feb 2006 | B1 |
7016507 | Brennan | Mar 2006 | B1 |
7020605 | Gao | Mar 2006 | B2 |
7031478 | Belt et al. | Apr 2006 | B2 |
7054452 | Ukita | May 2006 | B2 |
7065485 | Chong-White et al. | Jun 2006 | B1 |
7072834 | Zhou | Jul 2006 | B2 |
7076315 | Watts | Jul 2006 | B1 |
7092529 | Yu et al. | Aug 2006 | B2 |
7092882 | Arrowood et al. | Aug 2006 | B2 |
7099821 | Visser et al. | Aug 2006 | B2 |
7142677 | Gonopolskiy et al. | Nov 2006 | B2 |
7146316 | Alves | Dec 2006 | B2 |
7155019 | Hou | Dec 2006 | B2 |
7164620 | Hoshuyama | Jan 2007 | B2 |
7171008 | Elko | Jan 2007 | B2 |
7171246 | Mattila et al. | Jan 2007 | B2 |
7174022 | Zhang et al. | Feb 2007 | B1 |
7206418 | Yang et al. | Apr 2007 | B2 |
7209567 | Kozel et al. | Apr 2007 | B1 |
7225001 | Eriksson et al. | May 2007 | B1 |
7242762 | He et al. | Jul 2007 | B2 |
7246058 | Burnett | Jul 2007 | B2 |
7254242 | Ise et al. | Aug 2007 | B2 |
7359520 | Brennan et al. | Apr 2008 | B2 |
7383179 | Alves et al. | Jun 2008 | B2 |
7412379 | Taori et al. | Aug 2008 | B2 |
7433907 | Nagai et al. | Oct 2008 | B2 |
7555075 | Pessoa et al. | Jun 2009 | B2 |
7555434 | Nomura et al. | Jun 2009 | B2 |
7617099 | Yang et al. | Nov 2009 | B2 |
7949522 | Hetherington et al. | May 2011 | B2 |
8098812 | Fadili et al. | Jan 2012 | B2 |
20010016020 | Gustafsson et al. | Aug 2001 | A1 |
20010031053 | Feng et al. | Oct 2001 | A1 |
20010038699 | Hou | Nov 2001 | A1 |
20020002455 | Accardi et al. | Jan 2002 | A1 |
20020009203 | Erten | Jan 2002 | A1 |
20020041693 | Matsuo | Apr 2002 | A1 |
20020080980 | Matsuo | Jun 2002 | A1 |
20020106092 | Matsuo | Aug 2002 | A1 |
20020116187 | Erten | Aug 2002 | A1 |
20020133334 | Coorman et al. | Sep 2002 | A1 |
20020147595 | Baumgarte | Oct 2002 | A1 |
20020184013 | Walker | Dec 2002 | A1 |
20030014248 | Vetter | Jan 2003 | A1 |
20030026437 | Janse et al. | Feb 2003 | A1 |
20030033140 | Taori et al. | Feb 2003 | A1 |
20030039369 | Bullen | Feb 2003 | A1 |
20030040908 | Yang et al. | Feb 2003 | A1 |
20030061032 | Gonopolskiy | Mar 2003 | A1 |
20030063759 | Brennan et al. | Apr 2003 | A1 |
20030072382 | Raleigh et al. | Apr 2003 | A1 |
20030072460 | Gonopolskiy et al. | Apr 2003 | A1 |
20030095667 | Watts | May 2003 | A1 |
20030099345 | Gartner et al. | May 2003 | A1 |
20030101048 | Liu | May 2003 | A1 |
20030103632 | Goubran et al. | Jun 2003 | A1 |
20030128851 | Furuta | Jul 2003 | A1 |
20030138116 | Jones et al. | Jul 2003 | A1 |
20030147538 | Elko | Aug 2003 | A1 |
20030169891 | Ryan et al. | Sep 2003 | A1 |
20030228023 | Burnett et al. | Dec 2003 | A1 |
20040013276 | Ellis et al. | Jan 2004 | A1 |
20040047464 | Yu et al. | Mar 2004 | A1 |
20040057574 | Faller | Mar 2004 | A1 |
20040078199 | Kremer et al. | Apr 2004 | A1 |
20040131178 | Shahaf et al. | Jul 2004 | A1 |
20040133421 | Burnett et al. | Jul 2004 | A1 |
20040165736 | Hetherington et al. | Aug 2004 | A1 |
20040196989 | Friedman et al. | Oct 2004 | A1 |
20040263636 | Cutler et al. | Dec 2004 | A1 |
20050025263 | Wu | Feb 2005 | A1 |
20050027520 | Mattila et al. | Feb 2005 | A1 |
20050049864 | Kaltenmeier et al. | Mar 2005 | A1 |
20050060142 | Visser et al. | Mar 2005 | A1 |
20050152559 | Gierl et al. | Jul 2005 | A1 |
20050185813 | Sinclair et al. | Aug 2005 | A1 |
20050213778 | Buck et al. | Sep 2005 | A1 |
20050216259 | Watts | Sep 2005 | A1 |
20050228518 | Watts | Oct 2005 | A1 |
20050276423 | Aubauer et al. | Dec 2005 | A1 |
20050288923 | Kok | Dec 2005 | A1 |
20060072768 | Schwartz et al. | Apr 2006 | A1 |
20060074646 | Alves et al. | Apr 2006 | A1 |
20060098809 | Nongpiur et al. | May 2006 | A1 |
20060120537 | Burnett et al. | Jun 2006 | A1 |
20060133621 | Chen et al. | Jun 2006 | A1 |
20060149535 | Choi et al. | Jul 2006 | A1 |
20060160581 | Beaugeant et al. | Jul 2006 | A1 |
20060165202 | Thomas et al. | Jul 2006 | A1 |
20060184363 | McCree et al. | Aug 2006 | A1 |
20060198542 | Benjelloun Touimi et al. | Sep 2006 | A1 |
20060222184 | Buck et al. | Oct 2006 | A1 |
20070021958 | Visser et al. | Jan 2007 | A1 |
20070027685 | Arakawa et al. | Feb 2007 | A1 |
20070033020 | (Kelleher) Francois et al. | Feb 2007 | A1 |
20070067166 | Pan et al. | Mar 2007 | A1 |
20070078649 | Hetherington et al. | Apr 2007 | A1 |
20070094031 | Chen | Apr 2007 | A1 |
20070100612 | Ekstrand et al. | May 2007 | A1 |
20070116300 | Chen | May 2007 | A1 |
20070150268 | Acero et al. | Jun 2007 | A1 |
20070154031 | Avendano et al. | Jul 2007 | A1 |
20070165879 | Deng et al. | Jul 2007 | A1 |
20070195968 | Jaber | Aug 2007 | A1 |
20070230712 | Belt et al. | Oct 2007 | A1 |
20070276656 | Solbach et al. | Nov 2007 | A1 |
20080019548 | Avendano | Jan 2008 | A1 |
20080033723 | Jang et al. | Feb 2008 | A1 |
20080140391 | Yen et al. | Jun 2008 | A1 |
20080201138 | Visser et al. | Aug 2008 | A1 |
20080228478 | Hetherington et al. | Sep 2008 | A1 |
20080260175 | Elko | Oct 2008 | A1 |
20090012783 | Klein | Jan 2009 | A1 |
20090012786 | Zhang et al. | Jan 2009 | A1 |
20090129610 | Kim et al. | May 2009 | A1 |
20090220107 | Every et al. | Sep 2009 | A1 |
20090228272 | Herbig et al. | Sep 2009 | A1 |
20090253418 | Makinen | Oct 2009 | A1 |
20090271187 | Yen et al. | Oct 2009 | A1 |
20090296958 | Sugiyama | Dec 2009 | A1 |
20090323982 | Solbach et al. | Dec 2009 | A1 |
20100094643 | Avendano et al. | Apr 2010 | A1 |
20100278352 | Petit et al. | Nov 2010 | A1 |
20110178800 | Watts | Jul 2011 | A1 |
20110182436 | Murgia et al. | Jul 2011 | A1 |
20120093341 | Kim et al. | Apr 2012 | A1 |
20120121096 | Chen et al. | May 2012 | A1 |
20120140917 | Nicholson et al. | Jun 2012 | A1 |
Number | Date | Country |
---|---|---|
62110349 | May 1987 | JP |
4184400 | Jul 1992 | JP |
5053587 | Mar 1993 | JP |
2005172865 | Jul 1993 | JP |
6269083 | Sep 1994 | JP |
10-313497 | Nov 1998 | JP |
11-249693 | Sep 1999 | JP |
2004053895 | Feb 2004 | JP |
2004531767 | Oct 2004 | JP |
2004533155 | Oct 2004 | JP |
2005110127 | Apr 2005 | JP |
2005148274 | Jun 2005 | JP |
2005518118 | Jun 2005 | JP |
2005195955 | Jul 2005 | JP |
0174118 | Oct 2001 | WO |
02080362 | Oct 2002 | WO |
02103676 | Dec 2002 | WO |
03043374 | May 2003 | WO |
03069499 | Aug 2003 | WO |
03069499 | Aug 2003 | WO |
2004010415 | Jan 2004 | WO |
2007081916 | Jul 2007 | WO |
2007140003 | Dec 2007 | WO |
2010005493 | Jan 2010 | WO |
2011094232 | Aug 2011 | WO |
Entry |
---|
Allen, Jont B. “Short Term Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform”, IEEE Transactions on Acoustics, Speech, and Signal Processing. vol. ASSP-25, No. 3, Jun. 1977. pp. 235-238. |
Allen, Jont B. et al. “A Unified Approach to Short-Time Fourier Analysis and Synthesis”, Proceedings of the IEEE. vol. 65, No. 11, Nov. 1977. pp. 1558-1564. |
Avendano, Carlos, “Frequency-Domain Source Identification and Manipulation in Stereo Mixes for Enhancement, Suppression and Re-Panning Applications,” 2003 IEEE Workshop on Application of Signal Processing to Audio and Acoustics, Oct. 19-22, pp. 55-58, New Paltz, New York, USA. |
Boll, Steven F. “Suppression of Acoustic Noise in Speech using Spectral Subtraction”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-27, No. 2, Apr. 1979, pp. 113-120. |
Boll, Steven F. et al. “Suppression of Acoustic Noise in Speech Using Two Microphone Adaptive Noise Cancellation”, IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. ASSP-28, No. 6, Dec. 1980, pp. 752-753. |
Boll, Steven F. “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, Dept. of Computer Science, University of Utah Salt Lake City, Utah, Apr. 1979, pp. 18-19. |
Chen, Jingdong et al. “New Insights into the Noise Reduction Wiener Filter”, IEEE Transactions on Audio, Speech, and Language Processing. vol. 14, No. 4, Jul. 2006, pp. 1218-1234. |
Cohen, Israel et al. “Microphone Array Post-Filtering for Non-Stationary Noise Suppression”, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2002, pp. 1-4. |
Cohen, Israel, “Multichannel Post-Filtering in Nonstationary Noise Environments”, IEEE Transactions on Signal Processing, vol. 52, No. 5, May 2004, pp. 1149-1160. |
Dahl, Mattias et al., “Simultaneous Echo Cancellation and Car Noise Suppression Employing a Microphone Array”, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 21-24, pp. 239-242. |
Elko, Gary W., “Chapter 2: Differential Microphone Arrays”, “Audio Signal Processing for Next-Generation Multimedia Communication Systems”, 2004, pp. 12-65, Kluwer Academic Publishers, Norwell, Massachusetts, USA. |
“ENT 172.” Instructional Module. Prince George's Community College Department of Engineering Technology. Accessed: Oct. 15, 2011. Subsection: “Polar and Rectangular Notation”. <http://academic.ppgcc.edu/ent/ent172—instr—mod.html>. |
Fuchs, Martin et al. “Noise Suppression for Automotive Applications Based on Directional Information”, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, May 17-21, pp. 237-240. |
Fulghum, D. P. et al., “LPC Voice Digitizer with Background Noise Suppression”, 1979 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 220-223. |
Goubran, R.A. “Acoustic Noise Suppression Using Regression Adaptive Filtering”, 1990 IEEE 40th Vehicular Technology Conference, May 6-9, pp. 48-53. |
Graupe, Daniel et al., “Blind Adaptive Filtering of Speech from Noise of Unknown Spectrum Using a Virtual Feedback Configuration”, IEEE Transactions on Speech and Audio Processing, Mar. 2000, vol. 8, No. 2, pp. 146-158. |
Haykin, Simon et al. “Appendix A.2 Complex Numbers.” Signals and Systems. 2nd Ed. 2003. p. 764. |
Hermansky, Hynek “Should Recognizers Have Ears?”, In Proc. ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 1-10, France 1997. |
Hohmann, V. “Frequency Analysis and Synthesis Using a Gammatone Filterbank”, ACTA Acustica United with Acustica, 2002, vol. 88, pp. 433-442. |
Jeffress, Lloyd A. et al. “A Place Theory of Sound Localization,” Journal of Comparative and Physiological Psychology, 1948, vol. 41, p. 35-39. |
Jeong, Hyuk et al., “Implementation of a New Algorithm Using the STFT with Variable Frequency Resolution for the Time-Frequency Auditory Model”, J. Audio Eng. Soc., Apr. 1999, vol. 47, No. 4., pp. 240-251. |
Kates, James M. “A Time-Domain Digital Cochlear Model”, IEEE Transactions on Signal Processing, Dec. 1991, vol. 39, No. 12, pp. 2573-2592. |
Lazzaro, John et al., “A Silicon Model of Auditory Localization,” Neural Computation Spring 1989, vol. 1, pp. 47-57, Massachusetts Institute of Technology. |
Lippmann, Richard P. “Speech Recognition by Machines and Humans”, Speech Communication, Jul. 1997, vol. 22, No. 1, pp. 1-15. |
Liu, Chen et al. “A Two-Microphone Dual Delay-Line Approach for Extraction of a Speech Sound in the Presence of Multiple Interferers”, Journal of the Acoustical Society of America, vol. 110, No. 6, Dec. 2001, pp. 3218-3231. |
Martin, Rainer et al. “Combined Acoustic Echo Cancellation, Dereverberation and Noise Reduction: A two Microphone Approach”, Annales des Telecommunications/Annals of Telecommunications. vol. 49, No. 7-8, Jul.-Aug. 1994, pp. 429-438. |
Martin, Rainer “Spectral Subtraction Based on Minimum Statistics”, in Proceedings Europe. Signal Processing Conf., 1994, pp. 1182-1185. |
Mitra, Sanjit K. Digital Signal Processing: a Computer-based Approach. 2nd Ed. 2001. pp. 131-133. |
Mizumachi, Mitsunori et al. “Noise Reduction by Paired-Microphones Using Spectral Subtraction”, 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, May 12-15. pp. 1001-1004. |
Moonen, Marc et al. “Multi-Microphone Signal Enhancement Techniques for Noise Suppression and Dereverbration,” http://www.esat.kuleuven.ac.be/sista/yearreport97//node37.html, accessed on Apr. 21, 1998. |
Watts, Lloyd Narrative of Prior Disclosure of Audio Display on Feb. 15, 2000 and May 31, 2000. |
Cosi, Piero et al. (1996), “Lyon's Auditory Model Inversion: a Tool for Sound Separation and Speech Enhancement,” Proceedings of ESCA Workshop on ‘The Auditory Basis of Speech Perception,’ Keele University, Keele (UK), Jul. 15-19, 1996, pp. 194-197. |
Parra, Lucas et al. “Convolutive Blind Separation of Non-Stationary Sources”, IEEE Transactions on Speech and Audio Processing. vol. 8, No. 3, May 2008, pp. 320-327. |
Rabiner, Lawrence R. et al. “Digital Processing of Speech Signals”, (Prentice-Hall Series in Signal Processing). Upper Saddle River, NJ: Prentice Hall, 1978. |
Weiss, Ron et al., “Estimating Single-Channel Source Separation Masks: Revelance Vector Machine Classifiers vs. Pitch-Based Masking”, Workshop on Statistical and Perceptual Audio Processing, 2006. |
Schimmel, Steven et al., “Coherent Envelope Detection for Modulation Filtering of Speech,” 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, No. 7, pp. 221-224. |
Slaney, Malcom, “Lyon's Cochlear Model”, Advanced Technology Group, Apple Technical Report #13, Apple Computer, Inc., 1988, pp. 1-79. |
Slaney, Malcom, et al. “Auditory Model Inversion for Sound Separation,” 1994 IEEE International Conference on Acoustics, Speech and Signal Processing, Apr. 19-22, vol. 2, pp. 77-80. |
Slaney, Malcom. “An Introduction to Auditory Model Inversion”, Interval Technical Report IRC 1994-014, http://coweb.ecn.purdue.edu/˜maclom/interval/1994-014/, Sep. 1994, accessed on Jul. 6, 2010. |
Solbach, Ludger “An Architecture for Robust Partial Tracking and Onset Localization in Single Channel Audio Signal Mixes”, Technical University Hamburg-Harburg, 1998. |
Stahl, V. et al., “Quantile Based Noise Estimation for Spectral Subtraction and Wiener Filtering,” 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Jun. 5-9, vol. 3, pp. 1875-1878. |
Syntrillium Software Corporation, “Cool Edit User's Manual”, 1996, pp. 1-74. |
Tashev, Ivan et al. “Microphone Array for Headset with Spatial Noise Suppressor”, http://research.microsoft.com/users/ivantash/Documents/Tashev—MAforHeadset—HSCMA—05.pdf. (4 pages). |
Tchorz, Jurgen et al., “SNR Estimation Based on Amplitude Modulation Analysis with Applications to Noise Suppression”, IEEE Transactions on Speech and Audio Processing, vol. 11, No. 3, May 2003, pp. 184-192. |
Valin, Jean-Marc et al. “Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter”, Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sep. 28,-Oct. 2, 2004, Sendai, Japan. pp. 2123-2128. |
Watts, Lloyd, “Robust Hearing Systems for Intelligent Machines,” Applied Neurosystems Corporation, 2001, pp. 1-5. |
Widrow, B. et al., “Adaptive Antenna Systems,” Proceedings of the IEEE, vol. 55, No. 12, pp. 2143-2159, Dec. 1967. |
Yoo, Heejong et al., “Continuous-Time Audio Noise Suppression and Real-Time Implementation”, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, May 13-17, pp. IV3980-IV3983. |
International Search Report dated Jun. 8, 2001 in Application No. PCT/US01/08372. |
International Search Report dated Apr. 3, 2003 in Application No. PCT/US02/36946. |
International Search Report dated May 29, 2003 in Application No. PCT/US03/04124. |
International Search Report and Written Opinion dated Oct. 19, 2007 in Application No. PCT/US07/00463. |
International Search Report and Written Opinion dated Apr. 9, 2008 in Application No. PCT/US07/21654. |
International Search Report and Written Opinion dated Sep. 16, 2008 in Application No. PCT/US07/12628. |
International Search Report and Written Opinion dated Oct. 1, 2008 in Application No. PCT/US08/08249. |
International Search Report and Written Opinion dated May 11, 2009 in Application No. PCT/US09/01667. |
International Search Report and Written Opinion dated Aug. 27, 2009 in Application No. PCT/US09/03813. |
International Search Report and Written Opinion dated May 20, 2010 in Application No. PCT/US09/06754. |
Fast Cochlea Transform, US Trademark Reg. No. 2,875,755 (Aug. 17, 2004). |
Dahl, Mattias et al., “Acoustic Echo and Noise Cancelling Using Microphone Arrays”, International Symposium on Signal Processing and its Applications, ISSPA, Gold coast, Australia, Aug. 25-30, 1996, pp. 379-382. |
Demol, M. et al. “Efficient Non-Uniform Time-Scaling of Speech With WSOLA for CALL Applications”, Proceedings of InSTIL/ICALL2004—NLP and Speech Technologies in Advanced Language Learning Systems—Venice Jun. 17-19, 2004. |
Laroche, Jean. “Time and Pitch Scale Modification of Audio Signals”, in “Applications of Digital Signal Processing to Audio and Acoustics”, The Kluwer International Series in Engineering and Computer Science, vol. 437, pp. 279-309, 2002. |
Moulines, Eric et al., “Non-Parametric Techniques for Pitch-Scale and Time-Scale Modification of Speech”, Speech Communication, vol. 16, pp. 175-205, 1995. |
Verhelst, Werner, “Overlap-Add Methods for Time-Scaling of Speech”, Speech Communication vol. 30, pp. 207-221, 2000. |
Fazel et al., An overview of statistical pattern recognition techniques for speaker verification, IEEE, May 2011. |
Sundaram et al, Discriminating two types of noise sources using cortical representation and dimension reduction technique, IEE, 2007. |
Bach et al, Learning Spectral Clustering with application to speech separation, Journal of machine learning research, 2006. |
International Search Report and Written Opinion dated Mar. 31, 2011 in Application No. PCT/US11/22462. |
Number | Date | Country | |
---|---|---|---|
20090238373 A1 | Sep 2009 | US |