Claims
- 1. A method for filtering noise from an audio signal, comprising the steps of:
obtaining a multi-channel recording of an audio signal; determining a psychoacoustic masking threshold for the audio signal; determining a filter for filtering noise from the audio signal using the multi-channel recording, wherein the filter is determined using the psychoacoustic masking threshold; and filtering the multi-channel recording using the filter to generate an enhanced audio signal.
- 2. The method of claim 1, further comprising the steps of determining a calibration parameter for the input channels, wherein the calibration parameter comprises a ratio of the impulse response of different channels, and wherein the calibration parameter is used to determine the filter.
- 3. The method of claim 2, wherein the calibration parameter is determined by processing a speech signal recorded in the different channels under quiet conditions.
- 4. The method of claim 2, wherein the step of estimating the calibration parameter comprises processing channel noise recorded in the different channels to determine a long-term spectral covariance matrix, and determining an eigenvector of the long-term spectral covariance matrix corresponding to a desired eigenvalue.
- 5. The method of claim 2, wherein the step of determining the calibration parameter is performed using an adaptive process.
- 6. The method of claim 5, wherein the adaptive process comprises a blind adaptive process.
- 7. The method of claim 57 wherein the adaptive process comprises a non-parametric estimation process using a gradient algorithm.
- 8. The method of claim 5, wherein the adaptive process comprises a model-based estimation process using a gradient algorithm.
- 9. The method of claim 2, wherein the step of determining the calibration parameter comprises setting a default calibration parameter.
- 10. The method of claim 1, further comprising the steps of:
determining a noise spectral power matrix using the multi-channel recording; and determining the signal spectral power using the noise spectral power matrix, wherein the signal spectral power is used to determine the masking threshold, and wherein the noise spectral power matrix is used to determine the filter.
- 11. The method of claim 10, further comprising the steps of:
detecting speech activity in the audio signal; and updating the noise spectral power matrix at times when speech activity is not detected in the audio signal.
- 12. The method of claim 1 wherein the filter comprises a linear filter.
- 13. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for filtering noise from an audio signal, the method steps comprising:
obtaining a multi-channel recording of an audio signal; determining a psychoacoustic masking threshold for the audio signal; determining a filter for filtering noise from the audio signal using the multi-channel recording, wherein the filter is determined using the psychoacoustic masking threshold; and filtering the multi-channel recording using the filter to generate an enhanced audio signal.
- 14. The program storage device of claim 13, further comprising instructions for performing the steps of determining a calibration parameter for the input channels, wherein the calibration parameter comprises a ratio of the impulse response of different channels, and wherein the calibration parameter is used to determine the filter.
- 15. The program storage device of claim 14, wherein the calibration parameter is determined by processing a speech signal recorded in the different channels under quiet conditions.
- 16. The program storage device of claim 14, wherein the instructions for determining the calibration parameter comprise instructions for performing the steps of processing channel noise recorded in the different channels to determine a long-term spectral covariance matrix, and determining an eigenvector of the long-term spectral covariance matrix corresponding to a desired eigenvalue.
- 17. The program storage device of claim 14, wherein the instructions for determining the calibration parameter comprise instructions for determining the calibration parameter using an adaptive process.
- 18. The program storage device of claim 17, wherein the adaptive process comprises a blind adaptive process.
- 19. The program storage device of claim 17, wherein the adaptive process comprises a non-parametric estimation process using a gradient algorithm.
- 20. The program storage device of claim 17, wherein the adaptive process comprises a model-based estimation process using a gradient algorithm.
- 21. The program storage device of claim 14, wherein the instructions for determining the calibration parameter comprise instructions for setting a default calibration parameter.
- 22. The program storage device of claim 13, further comprising instructions for performing the steps of:
determining a noise spectral power matrix using the multi-channel recording; and determining the signal spectral power using the noise spectral power matrix, wherein the signal spectral power is used to determine the masking threshold, and wherein the noise spectral power matrix is used to determine the filter.
- 23. The program storage device of claim 22, further comprising instructions for performing the steps of:
detecting speech activity in the audio signal; and updating the noise spectral power matrix at times when speech activity is not detected in the audio signal.
- 24. The program storage device of claim 13, wherein the filter comprises a linear filter.
- 25. A system for reducing noise of an audio signal, comprising:
an audio capture system comprising a microphone array, for capturing and recording an audio signal in each input channel of the microphone array; and a front-end speech processor that determines a psychoacoustic masking threshold of the audio signal and generates an enhanced speech signal of the audio signal by filtering noise from the speech signal using the psychoacoustic masking threshold.
- 26. The system of claim 25, wherein the front-end speech processor comprises:
a sampling module for generating a time-frequency representation of an audio signal in each channel; a calibration module for determining a calibration parameter, the calibration parameter comprising a ratio of the transfer functions between different channels; a voice activity detection module for detecting a speech signal in the input audio signal; a filter module for determining filter parameters using the psychoacoustic masking threshold and the calibration parameter; a filter for filtering the multi-channel recording using the filter parameters to generate an enhanced signal; and a conversion module for converting the enhanced signal into a time domain representation.
- 27. The system of claim 26, further comprising:
a noise spectral power module for determining a noise spectral power matrix using the multi-channel recording; and a signal spectral power module for determining the signal spectral power using the noise spectral power matrix, wherein the signal spectral power is used to determine the masking threshold, and wherein the noise spectral power matrix is used to determine the filter parameters.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 60/290,289, filed on May 11, 2001.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60290289 |
May 2001 |
US |