Claims
- 1. A method for processing two or more input audio signals, comprising the steps of:
(a) converting M input audio signals from a time domain into a frequency domain, where M>1; (b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises an estimate of coherence between the M input audio signals; and (c) combining the M input audio signals to generate N combined audio signals, where M>N.
- 2. The invention of claim 1, wherein:
step (a) comprises the step of applying a discrete Fourier transform (DFT) to convert left and right audio signals of an input audio signal from the time domain into a plurality of sub-bands in the frequency domain; step (b) comprises the steps of:
(1) generating an estimated coherence between the left and right audio signals for each sub-band; and (2) generating an average estimated coherence for one or more critical bands, wherein each critical band comprises a plurality of sub-bands; and step (c) comprises the steps of:
(1) combining the left and right audio signals into a single mono signal; and (2) encoding the single mono signal to generate an encoded mono signal bitstream.
- 3. The invention of claim 2, wherein the average estimated coherence for each critical band is encoded into the encode mono signal bitstream.
- 4. The invention of claim 1, wherein the auditory scene parameters further comprise one or more of an inter-aural level difference (ILD), an inter-aural time difference (ITD), and a head-related transfer function (HRTF).
- 5. An apparatus for processing two or more input audio signals, comprising:
(a) an audio analyzer comprising:
(1) one or more time-frequency transformers configured to convert M input audio signals from a time domain into a frequency domain, where M>1; and (2) a coherence estimator configured to generate a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises an estimate of coherence between the M input audio signals; and (b) a combiner configured to combine the M input audio signals to generate N combined audio signals, where M>N.
- 6. An encoded audio bitstream generated by:
(a) converting M input audio signals from a time domain into a frequency domain, where M>1; (b) generating a set of one or more auditory scene parameters for each of one or more different frequency bands in the M converted input audio signals, where each set of one or more auditory scene parameters comprises an estimate of coherence between the M input audio signals; and (c) combining the M input audio signals to generate N combined audio signals of the encoded audio bitstream, where M>N.
- 7. A method for synthesizing an auditory scene, comprising the steps of:
(a) dividing an input audio signal into one or more frequency bands, wherein each band comprises a plurality of sub-bands; and (b) applying an auditory scene parameter to each band to generate two or more output audio signals, wherein the auditory scene parameter is modified for each different sub-band in the band based on a coherence value.
- 8. The invention of claim 7, wherein the auditory scene parameter is a level difference.
- 9. The invention of claim 8, wherein, for each sub-band in each band, the level difference corresponds to left and right weighting factors wL and wR that are modified by factors nL and nR, respectively, to generate left and right modified weighting factors wL′ and wR′ that are used to generate left and right audio signals of an output audio signal, wherein:
- 10. The invention of claim 9, wherein, for each band:
the modification function is a zero-mean random sequence within the band; the coherence value is an average estimated coherence for the band; and the gain g is a function of the average estimated coherence.
- 11. The invention of claim 7, wherein the auditory scene parameter is a time difference.
- 12. The invention of claim 11, wherein, for each sub-band s in each band c, a time difference τs is modified based on a delay offset ds and a gain factor gc to generate a modified time difference τs′ that is applied to generate left and right audio signals of an output audio signal, wherein:
- 13. The invention of claim 12, wherein, for each band:
the delay offset ds is based on a zero-mean random sequence within the band; the coherence value is an average estimated coherence for the band; and the gain gc is a function of the average estimated coherence.
- 14. The invention of claim 7, wherein the coherence value is estimated from left and right audio signals of an audio signal used to generate the input audio signal.
- 15. The invention of claim 7, wherein, within each band, the auditory scene parameter is modified based on a random sequence.
- 16. The invention of claim 7, wherein, within each band, the auditory scene parameter is modified based on a sinusoidal function.
- 17. The invention of claim 7, wherein, within each band, the auditory scene parameter is modified based on a triangular function.
- 18. The invention of claim 7, wherein:
step (a) comprises the steps of:
(1) decoding an encoded audio bitstream to recover a mono audio signal; and (2) applying a time-frequency transform to convert the mono audio signal from a time domain into the plurality of sub-bands in a frequency domain; step (b) comprises the steps of:
(1) applying the auditory scene parameter to each band to generate left and right audio signals of an output audio signal in the frequency domain; and (2) applying an inverse time-frequency transform to convert the left and right audio signals from the frequency domain into the time domain.
- 19. An apparatus for synthesizing an auditory scene, comprising:
(1) a time-frequency transformer configured to convert an input audio signal from a time domain into one or more frequency bands in a frequency domain, wherein each band comprises a plurality of sub-bands; (2) an auditory scene synthesizer configured to apply an auditory scene parameter to each band to generate two or more output audio signals, wherein the auditory scene parameter is modified for each different sub-band in the band based on a coherence value; and (3) one or more inverse time-frequency transformers configured to convert the two or more output audio signals from the frequency domain into the time domain.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The subject matter of this application is related to the subject matter of U.S. patent application Ser. No. 09/848,877, filed on May 4, 2001 as attorney docket no. Faller 5 (“the '877 application”), and U.S. patent application Ser. No. 10/045,458, filed on Nov. 7, 2001 as attorney docket no. Baumgarte 1-6-8 (“the '458 application”), the teachings of both of which are incorporated herein by reference.