This application includes references to matter disclosed in U.S. Ser. No. 12/246,491, filed on 6 Oct. 2008.
The present invention relates to audio signal processing processes. Specifically, the present invention relates to a method for processing audio signals.
Stereo signals may be decoded into multi-channel audio to provide a user with a sense of immersion and realism when experiencing the multi-channel audio through a plurality of speakers. The decoding of signals into multi-channel audio may be carried out using techniques disclosed in U.S. Ser. No. 12/246,491, which is another patent application filed by Creative Technology Ltd.
It should be noted that a cinema hall typically includes a plurality of speakers distributed in a wide spread loudspeaker layout throughout the cinema hall with the plurality of speakers being directed at cinema goers seated in the cinema hall such that a spatial sound effect is experienced by the cinema goers.
Unfortunately, arranging a plurality of speakers in a wide spread loudspeaker layout in a relatively smaller enclosed area compared to the cinema hall, such as, for example, a room in a home is not convenient due to constraints in the size of the enclosed area and the fact that the presence of the plurality of speakers would appear odd. However, it would be highly desirable if spatial sound effects could be reproduced in the home. Furthermore, given the prevalence of compact speaker-array units being found in homes, it would be desirable if spatial sound effects may be reproduced in homes using compact speaker-array units.
In addition, it would also be desirable if the compact speaker-array units could reproduce spatial sound effects over an enlarged location as it is unlikely that persons in a home remain seated at a single location unlike movie-goers in a cinema hall.
The present invention aims to address the aforementioned situations.
There is provided a method for enlarging a location with optimal three-dimensional audio perception. Optimal three-dimensional audio perception may relate to a fully spatial sound effect.
The method includes deriving three-dimensional encoded localization cues from an audio input signal having a first channel signal and a second channel signal; decoding the first channel signal and the second channel signal into a plurality of decoded channel signals, the plurality of decoded channel signals being equal to a number of speaker units; performing crosstalk cancellation on the plurality of decoded channel signals to eliminate crosstalk between the plurality of decoded channel signals; and outputting the plurality of decoded channel signals which have been subjected to crosstalk cancellation to each of the number of speaker units. It is advantageous that the crosstalk cancellation includes further processing to generate a smoothed frequency envelope.
The smoothed frequency envelope may be reconstructed from truncated cepstrals derived from converting each of the plurality of decoded channel signals into the cepstrum spectrum. The smoothed frequency envelope also minimizes timbre artifacts, the timbre artifacts being high peaks and low valleys in the cepstrum spectrum of each of the plurality of decoded channel signals.
The localization cues may include at least for example, an up-down dimension, a left-right dimension, a front-back dimension, an azimuth angle, an elevation angle and so forth. The derivation of the three-dimensional encoded localization cues may be based on providing a listener with a fully spatial sound effect.
The enlarged location with optimal three-dimensional audio perception advantageously allows a listener to move about as the enlarged location relates to a boundary which encompasses a plurality of positions with optimal three-dimensional audio perception.
The method may preferably further include summing the plurality of decoded channel signals which have been subjected to crosstalk cancellation before output to each of the number of speaker units. Each speaker unit may include at least one speaker driver. Preferably, the crosstalk cancellation may be performed to cause a listener to perceive audio to be emanated from virtual speakers.
In order that the present invention may be fully understood and readily put into practical effect, there shall now be described by way of non-limitative example only preferred embodiments of the present invention, the description being with reference to the accompanying illustrative drawings.
Referring to
The method 20 for enlarging a location with optimal three-dimensional audio perception includes deriving three-dimensional encoded localization cues from an audio input signal having a first channel signal and a second channel signal (22). The audio input signal with the first channel signal and the second channel signal may be known as a stereo signal. The techniques for deriving the three-dimensional encoded localization cues may relate to audio signal processing techniques described in U.S. Ser. No. 12/246,491 or any other known audio signal processing technique. The derivation of the three-dimensional encoded localization cues is an essential step to reproduce a fully spatial sound effect. The localization cues includes, for example, an up-down dimension, a left-right dimension, a front-back dimension, an azimuth angle, an elevation angle and so forth.
The method 20 also includes decoding the first channel signal and the second channel signal into a plurality of decoded channel signals (24), the plurality of decoded channel signals being equal to a number of speaker units. Each speaker unit may include at least one speaker driver. Subsequently, crosstalk cancellation may be performed on the plurality of decoded channel signals (26) to eliminate crosstalk between the plurality of decoded channel signals. Crosstalk cancellation is performed to cause the listener to perceive audio to be emanated from virtual speakers. Crosstalk cancellation eliminates the crosstalk between channels. Crosstalk cancellation also includes further processing to generate a smoothed frequency envelope 100 as shown in
Consequently, the method 20 further includes summing the plurality of decoded channel signals (30) which have been subjected to crosstalk cancellation before output to each of the number of speaker units. Finally, the method 20 includes outputting each of the summed decoded channel signals (32) which have been subjected to crosstalk cancellation to each of the number of speaker units such that the listener is able to enjoy the fully spatial sound effect with an enlarged location with optimal three-dimensional audio perception. The concept of the enlarged location will be described in further detail in the subsequent paragraphs.
Referring to
Mathematical representations will now be provided to illustrate the concept of the enlarged location with optimal three-dimensional audio perception:
X is the multichannel audio produced by deriving three-dimensional encoded localization cues from an audio input signal (22 in method 20).
Y is the transaural audio perceived by the listener.
Hc is a HRTF matrix from the real audio sources to the listener.
Hv is a HRTF matrix from the virtual audio sources to the listener.
{circumflex over (X)} is the virtualization output sent to the real audio sources.
ifft relates to “inverse discrete fourier transform”.
fft relates to “fast fourier transform”.
H is converted into cepstrum spectrum,
ceps=ifft(log(abs(H))
Subsequently, smoothed spectral envelopes are reconstructed from truncated cepstrals,
Hsmooth=exp(fft(window(ceps)))
The smoothed spectral envelopes 100 may be seen in
Referring to
Referring to
The system 40 includes a plurality of audio filters 44 for performing crosstalk cancellation on the plurality of decoded channel signals (x1, x2, . . . , xN).
Crosstalk cancellation is performed to cause the listener to perceive audio to be emanated from virtual speakers. Crosstalk cancellation eliminates the crosstalk between channels. Crosstalk cancellation also includes further processing to generate a smoothed frequency envelope 100 as shown in
The system 40 includes a plurality of signal summing circuits 46 for summing the plurality of crosstalk cancelled signals. Finally, the plurality of crosstalk cancelled signals which have been summed are output to a plurality of speaker units (S1, S2, . . . , SN) such that the listener is able to enjoy the fully spatial sound effect with an enlarged location with optimal three-dimensional audio perception.
Whilst there has been described in the foregoing description preferred embodiments of the present invention, it will be understood by those skilled in the technology concerned that many variations or modifications in details of design or construction may be made without departing from the present invention.
Number | Name | Date | Kind |
---|---|---|---|
5761315 | Iida et al. | Jun 1998 | A |
6073100 | Goodridge, Jr. | Jun 2000 | A |
6111181 | Macon et al. | Aug 2000 | A |
7006645 | Fujita et al. | Feb 2006 | B2 |
7167567 | Sibbald et al. | Jan 2007 | B1 |
7263193 | Abel | Aug 2007 | B2 |
20030007648 | Currell | Jan 2003 | A1 |
20040170281 | Nelson et al. | Sep 2004 | A1 |
20040196982 | Aylward et al. | Oct 2004 | A1 |
20050117762 | Sakurai et al. | Jun 2005 | A1 |
20050271214 | Kim | Dec 2005 | A1 |
20050281408 | Kim et al. | Dec 2005 | A1 |
20060210087 | Davis et al. | Sep 2006 | A1 |
20070154020 | Katayama | Jul 2007 | A1 |
20070269063 | Goodwin et al. | Nov 2007 | A1 |
20080031462 | Walsh et al. | Feb 2008 | A1 |
20080056503 | McGrath | Mar 2008 | A1 |
20080205676 | Merimaa et al. | Aug 2008 | A1 |
20080273721 | Walsh | Nov 2008 | A1 |
20090092259 | Jot et al. | Apr 2009 | A1 |
Number | Date | Country |
---|---|---|
2008-154082 | Jul 2008 | JP |
Number | Date | Country | |
---|---|---|---|
20110188660 A1 | Aug 2011 | US |