Claims
- 1. A method comprising the steps of:
(a) converting a plurality of input audio signals into a combined audio signal and a plurality of auditory scene parameters; and (b) embedding the auditory scene parameters into the combined audio signal to generate an embedded audio signal, such that: a first receiver that is aware of the existence of the embedded auditory scene parameters can extract the auditory scene parameters from the embedded audio signal and apply the extracted auditory scene parameters to synthesize an auditory scene; and a second receiver that is unaware of the existence of the embedded auditory scene parameters can process the embedded audio signal to generate an output audio signal, where the embedded auditory scene parameters are transparent to the second receiver.
- 2. The invention of claim 1, wherein the plurality of auditory scene parameters comprise two or more different sets of one or more auditory scene parameters, wherein each set of auditory scene parameters corresponds to a different frequency band in the combined audio signal such that the first receiver synthesizes the auditory scene by (a) dividing an input audio signal into a plurality of different frequency bands; and (b) applying the two or more different sets of one or more auditory scene parameters to two or more of the different frequency bands in the input audio signal to generate two or more synthesized audio signals of the auditory scene, wherein for each of the two or more different frequency bands, the corresponding set of one or more auditory scene parameters is applied to the input audio signal as if the input audio signal corresponded to a single audio source in the auditory scene.
- 3. The invention of claim 2, wherein each set of one or more auditory scene parameters corresponds to a different audio source in the auditory scene.
- 4. The invention of claim 2, wherein, for at least one of the sets of one or more auditory scene parameters, at least one of the auditory scene parameters corresponds to a combination of two or more different audio sources in the auditory scene that takes into account relative dominance of the two or more different audio sources in the auditory scene.
- 5. The invention of claim 2, wherein the two or more synthesized audio signals comprise left and right audio signals of a binaural signal corresponding to the auditory scene.
- 6. The invention of claim 2, wherein the two or more synthesized audio signal comprise three or more signals of a multi-channel audio signal corresponding to the auditory scene.
- 7. The invention of claim 1, wherein the combined audio signal corresponds to a combination of two or more different mono source signals, wherein the two or more different frequency bands are selected by comparing magnitudes of the two or more different mono source signals, wherein, for each of the two or more different frequency bands, one of the mono source signals dominates the one or more other mono source signals.
- 8. The invention of claim 1, wherein the combined audio signal corresponds to a combination of left and right audio signals of a binaural signal, wherein each different set of one or more auditory scene parameters is generated by comparing the left and right audio signals in a corresponding frequency band.
- 9. The invention of claim 1, wherein the auditory scene parameters comprise one or more of an interaural level difference, an interaural time delay, and a head-related transfer function.
- 10. The invention of claim 1, wherein step (b) comprises the step of applying a layered coding technique in which stronger error protection is provided to the combined audio signal than to the auditory scene parameters when generating the embedded audio signal, such that errors due to transmission over a lossy channel will tend to affect the auditory scene parameters before affecting the combined audio signal to improve the probability of the first receiver to process at least the combined audio signal.
- 11. The invention of claim 1, wherein step (b) comprises the step of applying a multi-descriptive coding technique in which the auditory scene parameters and the combined audio signal are both divided into two or more streams, wherein each stream divided from the auditory scene parameters is embedded into a corresponding stream divided from the combined audio signal to form a stream of the embedded audio signal, such that the two or more streams of the embedded audio signal may be transmitted over two or more different channels to the first receiver, such that the first receiver is able to synthesize the auditory scene using extracted auditory scene parameters having relatively coarse resolution when errors result from transmission of one or more of the streams of the embedded audio signal over one or more lossy channels.
- 12. A machine-readable medium, having encoded thereon program code, wherein, when the program code is executed by a machine, the machine implements a method, comprising the steps of:
(a) converting a plurality of input audio signals into a combined audio signal and a plurality of auditory scene parameters; and (b) embedding the auditory scene parameters into the combined audio signal to generate an embedded audio signal, such that: a first receiver that is aware of the existence of the embedded auditory scene parameters can extract the auditory scene parameters from the embedded audio signal and apply the extracted auditory scene parameters to synthesize an auditory scene; and a second receiver that is unaware of the existence of the embedded auditory scene parameters can process the embedded audio signal to generate an output audio signal, where the embedded auditory scene parameters are transparent to the second receiver.
- 13. An apparatus comprising:
(a) an encoder configured to convert a plurality of input audio signals into a combined audio signal and a plurality of auditory scene parameters; and (b) a merging module configure to embed the auditory scene parameters into the combined audio signal to generate an embedded audio signal, such that: a first receiver that is aware of the existence of the embedded auditory scene parameters can extract the auditory scene parameters from the embedded audio signal and apply the extracted auditory scene parameters to synthesize an auditory scene; and a second receiver that is unaware of the existence of the embedded auditory scene parameters can process the embedded audio signal to generate an output audio signal, where the embedded auditory scene parameters are transparent to the second receiver.
- 14. A method for synthesizing an auditory scene, comprising the steps of:
(a) receiving an embedded audio signal comprising a combined audio signal embedded with a plurality of auditory scene parameters, wherein a receiver that is unaware of the existence of the embedded auditory scene parameters can process the embedded audio signal to generate an output audio signal, where the embedded auditory scene parameters are transparent to the receiver; (b) extracting the auditory scene parameters from the embedded audio signal; and (c) applying the extracted auditory scene parameters to the combined audio signal to synthesize an auditory scene.
- 15. The invention of claim 14, wherein the plurality of auditory scene parameters comprise two or more different sets of one or more auditory scene parameters, wherein each set of auditory scene parameters corresponds to a different frequency band in the combined audio signal such that the auditory scene is synthesized by (1) dividing the combined audio signal into a plurality of different frequency bands; and (2) applying the two or more different sets of one or more auditory scene parameters to two or more of the different frequency bands in the combined audio signal to generate two or more synthesized audio signals of the auditory scene, wherein for each of the two or more different frequency bands, the corresponding set of one or more auditory scene parameters is applied to the combined audio signal as if the combined audio signal corresponded to a single audio source in the auditory scene.
- 16. The invention of claim 15, wherein each set of one or more auditory scene parameters corresponds to a different audio source in the auditory scene.
- 17. The invention of claim 15, wherein, for at least one of the sets of one or more auditory scene parameters, at least one of the auditory scene parameters corresponds to a combination of two or more different audio sources in the auditory scene that takes into account relative dominance of the two or more different audio sources in the auditory scene.
- 18. The invention of claim 15, wherein the two or more synthesized audio signals comprise left and right audio signals of a binaural signal corresponding to the auditory scene.
- 19. The invention of claim 15, wherein the two or more synthesized audio signal comprise three or more signals of a multi-channel audio signal corresponding to the auditory scene.
- 20. The invention of claim 14, wherein the combined audio signal corresponds to a combination of two or more different mono source signals, wherein the two or more different frequency bands are selected by comparing magnitudes of the two or more different mono source signals, wherein, for each of the two or more different frequency bands, one of the mono source signals dominates the one or more other mono source signals.
- 21. The invention of claim 14, wherein the combined audio signal corresponds to a combination of left and right audio signals of a binaural signal, wherein each different set of one or more auditory scene parameters is generated by comparing the left and right audio signals in a corresponding frequency band.
- 22. The invention of claim 14, wherein the auditory scene parameters comprise one or more of an interaural level difference, an interaural time delay, and a head-related transfer function.
- 23. The invention of claim 14, wherein the embedded audio signal was generated by applying a layered coding technique in which stronger error protection was provided to the combined audio signal than to the auditory scene parameters, such that errors due to transmission over a lossy channel will tend to affect the auditory scene parameters before affecting the combined audio signal to improve the probability of a receiver to process at least the combined audio signal.
- 24. The invention of claim 14, wherein the embedded audio signal was generated by applying a multi-descriptive coding technique in which the auditory scene parameters and the combined audio signal were both divided into two or more streams, wherein each stream divided from the auditory scene parameters was embedded into a corresponding stream divided from the combined audio signal to form a stream of the embedded audio signal, such that the two or more streams of the embedded audio signal may be transmitted over two or more different channels to a receiver, such that the receiver is able to synthesize the auditory scene using extracted auditory scene parameters having relatively coarse resolution when errors result from transmission of one or more of the streams of the embedded audio signal over one or more lossy channels.
- 25. A machine-readable medium, having encoded thereon program code, wherein, when the program code is executed by a machine, the machine implements a method for synthesizing an auditory scene, comprising the steps of:
(a) receiving an embedded audio signal comprising a combined audio signal embedded with a plurality of auditory scene parameters, wherein a receiver that is unaware of the existence of the embedded auditory scene parameters can process the embedded audio signal to generate an output audio signal, where the embedded auditory scene parameters are transparent to the receiver; (b) extracting the auditory scene parameters from the embedded audio signal; and (c) applying the extracted auditory scene parameters to the combined audio signal to synthesize an auditory scene.
- 26. An apparatus for synthesizing an auditory scene, comprising:
(a) a dividing module configured to (1) receive an embedded audio signal comprising a combined audio signal embedded with a plurality of auditory scene parameters, wherein a receiver that is unaware of the existence of the embedded auditory scene parameters can process the embedded audio signal to generate an output audio signal, where the embedded auditory scene parameters are transparent to the receiver and (2) extract the auditory scene parameters from the embedded audio signal; and (b) a decoder configure to apply the extracted auditory scene parameters to the combined audio signal to synthesize an auditory scene.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of the filing date of U.S. provisional application No. 60/311,565, filed on Aug. 10, 2001 as attorney docket no. Baumgarte 1-6-8, the teachings of which are incorporated herein by reference. The subject matter of this application is related to the subject matter of application Ser. No. 09/848,877, filed on May 4, 2001 as attorney docket no. Faller 5 (“the '877 application”), the teachings of which are incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60311565 |
Aug 2001 |
US |