These and/or other aspects and advantages of the invention will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Embodiments are described below to explain the present invention by referring to the figures.
The system may include a binaural decoder 120 including a decoding unit 130 and a 2-channel-synthesis unit 140, for example.
First, a plurality of channel signals may be input to the encoding unit 110, as the multi-channel signals. Referring to
Thus, the encoding unit 110 may generate spatial cues indicating frequency independent direction information of a virtual sound source generated by at least two channel sound sources among the sound sources of the plurality of channels, during the down-mixing of the plurality of channel signals to eventually generate the resultant down-mixed mono signal.
Below, for convenience of explanation, such spatial cues will also be referred to as channel directivity differences (CDDs), noting that alternative spatial cues with direction information may be available.
Thus, according to an embodiment of the present invention, the binaural decoder 120 may receive an input of such CDD spatial cues and the down-mixed mono signal, and by using the CDD spatial cues, up-mix the down-mixed mono signal to the multi-channel signals, and then further up-mix each multi-channel signal to synthesize a 2-channel signal.
Thus, here, the decoding unit 130 may receive the CDD spatial cues and the down-mixed mono signal, and by using the CDD spatial cues, restore a plurality of channel signals as the up-mixed multi-channel signals.
In an embodiment, and as noted above, in addition to the up-mixing of the multi-channel signals, the 2-channel-synthesis unit 140 may localize the up-mixed multi-channel signals, according to the positions of the respective channels, by using the CDD spatial cues and corresponding head related transfer functions (HRTFs), and thus, generate the 2-channel signal.
According to only an example,
Referring to
As illustrated, when a multi-channel audio signal is encoded, different magnitudes of energy of respective channels (channel i 11, channel j 12, and other channels) are distributed at a given point in time. In this case, assuming that other channels, other than channels l 11 and j 12, are not considered and a virtual sound source x 14 is generated only by the sound source of channel i 11 and the sound source of channel j 12, the energy of the virtual sound source x 14 can be considered to be the sum of the energy of channel i 11 and the energy of channel j 12, as in the below Equation 1.
W
i
2
+W
j
2
=W
x
2 Equation 1
Here, Wi2 is the energy of channel i, Wj2 is the energy of channel j, and Wx2 is the energy of channel x.
If both sides of Equation 1 are divided by Wx2, the result is the below Equation 2.
CDD
xi
2
+CDD
xj
2=1 Equation 2
Here, CDDxi=Wi2/Wx2, and CDDxj=Wj2/Wx2.
Meanwhile, relationships of CDDxi, CDDxj, and directivity information of channel i 21, channel j 22, and virtual sound source x 24 may be represented by the below Equation 3.
Here, θ represents directivity information of a channel and the angle between each channel and a plane bisecting the channel and a neighboring channel. Since the channel layout may have already been determined when a multi-channel audio signal is encoded, the directivity information of the channel may also be a predetermined value. Further, φ represents directivity information of a virtual sound source, and the angle between the virtual sound source x 14 and the bisecting plane, for example. As can be observed from Equation 3, CDDxi and CDDxj indicate the directivity information of the virtual sound source x 14 formed by the two channels i 11 and j 12.
Thus, in a process of generating a CDD, according to an embodiment of the present invention, the energy Wx2 of the virtual sound source x 14, CDDxi, and CDDxj may be obtained through Equations 1 and 2, and the directivity information of the virtual sound source x 14 may be obtained through Equation 3.
Here, based on the illustrated technique shown in
Here, referring to
In operation 310, a first OTT encoder 250 may receive inputs of the Lf channel and the Ls channel, e.g., corresponding to a plurality of available channel signals with determined direction information, generate CDD1Lf and CDD1Ls, and calculate the energy and directivity information of a first virtual sound source 210, as shown in
In operation 320, a second OTT encoder 255 may receive inputs of the Rf channel and the Rs channel, generate CDD2Rf and CDD2Rs, and calculate the energy and directivity information of a second virtual sound source 220.
In operation 330, a third OTT encoder 260 may receive inputs of the C channel and the LFE channel, generates CDD3C and CDD3LFE, and calculate the energy and directivity information of a third virtual sound source 230.
Further, in operation 340, a fourth OTT encoder 265 may receive inputs of the first virtual sound source 210 and the second virtual sound source 220, for example. Here, referring back to
In operation 350, a fifth OTT encoder 270 may receive inputs of the third virtual sound source 230 and the fourth virtual sound source 240, generate CDDm4 and CDDm3, and output a corresponding down-mixed mono signal, i.e., down-mixed from 5.1-channel signals. In such a method of encoding 5.1 channels, according to this embodiment of the present invention illustrated in
In operation 360, a multiplexing unit (not shown) generates and outputs a bitstream, including CDDs and the down-mixed mono signal.
In operation 505, a demultiplexing unit (not shown) may receive an input of an audio bitstream, including a down-mixed mono signal for multi-channel signals and CDDs, and may proceed to separate/parse the bitstream for the down-mixed mono signal and the CDDs.
In operation 510, a fifth OTT decoder 410 may restore the down-mixed mono signal to a down-mixed third virtual sound source and a down-mixed fourth virtual sound source, by using CDDm4 and CDDm3, for example
In operation 520, a fourth OTT decoder 420 may further restore the down-mixed fourth virtual sound source to a down-mixed first virtual sound source and a down-mixed second virtual sound source, by using CDD41 and CDD42, for example
In operation 530, a first OTT decoder 430 may restore the down-mixed first virtual sound source to an Lf channel and an Ls channel, by using CDDiLf and CDD1Ls, for example
In operation 540, a second OTT Decoder 440 may restore the down-mixed second virtual sound source to an Rf channel and an Rs channel, by using CDD2Rf and CDD2Rs, for example
In operation 550, a third OTT decoder 450 may restore the down-mixed third virtual sound source to a C channel and an LFE channel, by using CDD3C and CDD3LFE, again as examples.
Here, the Lf, Ls, Rf, Rs, C, and LFE channel signals, output by such a system for decoding a multi-channel audio signal illustrated in
Lf=CDD
m4
CDD
41
CDD
1Lf
m Equation 4
Ls=CDD
m4
CDD
41
CDD
1ILs
m Equation 5
Rf=CDD
m4
CDD
42
CDD
2Rf
m Equation 6
Rs=CDD
m4
CDD
42
CDD
2Rs
m Equation 7
C=CDD
m3
CDD
3c
m. Equation 8
LFE=CDD
m3
CDD
3LFE
m Equation 9
Referring to
Referring to
Here, the 2-channel-synthesis unit 730 may further include sound localization units 731 through 740, a right channel mixing unit 742, and a left channel mixing unit 743, for example.
The time/frequency transform unit 710 may receive an input of the down-mixed mono signal for multi-channel signals, transform the mono signal into the frequency domain, and output the same as a respective frequency domain signal.
The decoding unit 720 may receive respective CDD spatial cues indicating directivity information of the respective virtual sound sources, e.g., generated by at least two channel sound sources among the sound sources of the multi-channels, and the frequency domain down-mixed mono signal, and restore the frequency domain down-mixed mono signal to Lf, Ls, Rf, Rs, C and LFE channel signals, by using the CDD spatial cues.
In
The HRTF generation unit 750 may further receive the CDD spatial cues and HRTFs stored in the HRTF DB 760, and by using the CDD spatial cues and the HRTFs, generate HRTFs corresponding to other channels, i.e., Ls, Rf, Rs, and C channels, for example.
The HRTF generation unit 750 will now be explained in greater detail with reference to the aforementioned Equations 4 through 9. As can be observed from Equations 4 through 9, each channel signal output from the decoding unit 720 may be in a form in which the down-mixed mono signal m is multiplied by respective CDD spatial cues.
In an embodiment, the HRTF generation unit 750 may assign a weighting to a reference HRTF, with the weighting being a ratio of the product of CDD spatial cues corresponding to the channel of the reference HRTF, to the product of CDD spatial cues corresponding to the channel of an HRTF desired to be generated, among the products multiplied to the down-mixed mono signal in Equations 4 through 9. Thus, the HRTF generation unit 750 may generate the HRTF corresponding to the another channel other than the reference HRTF. That is, by convoluting the ratio of the products of the CDD spatial cues and the reference HRTF, a HRTF corresponding to the other channel, other than the reference HRTF, may be generated.
For example, in Equation 4, the Lf channel signal, corresponding to the reference HRTF, may be in a form in which the down-mixed mono signal m is multiplied by CDDm4CDD41CDD1Lf. Meanwhile, in Equation 7, the Rs channel signal may be in a form in which the down-mixed mono signal m is multiplied by CDDm4CDD42CDD2Rs. In this case, the HRTF corresponding to the Rs channel may thus be generated by assigning a weight of
to the HRTF of the Lf channel, which is the reference HRTF.
The 2-channel-synthesis unit 730 may, thus, receive an input of an HRTF corresponding to each channel from the reference HRTF DB 760 and the HRTF generation unit 750, for example.
In an embodiment, the sound localization units 731 through 740, included in the 2-channel-synthesis unit 730, may further localize channel signals to the positions of the respective channels, by using a respective HRTF, and generate the localized channel signals. Since the reference HRTF is that of the Lf channel in
As illustrated, the right channel mixing unit 742 may then mix signals output from the right channel sound localization units 731, 733, 735, 737, and 739, and the left channel mixing unit 743 may mix signals output from the left channel sound localization units 732, 734, 736, 738, and 740.
The first frequency/time transform unit 770 may further receive an input of the signal mixed in the right channel mixing unit 742, transform the signal to a time domain signal, and output the right channel signal, thereby achieving a synthesizing of the right channel signal.
Similarly, the second frequency/time transform unit 780 may receive an input of the signal mixed in the left channel mixing unit 743, transform the signal to a time domain signal, and output the left channel signal, again thereby achieving a synthesizing of the left channel signal.
In operation 810, as an example, the time/frequency transform unit 710 may receive a down-mixed mono signal for multi-channels, and transform the down-mixed mono signal to a respective frequency domain signal.
In operation 820, the decoding unit 720 and the HRTF generation unit 750, for example, may receive CDD spatial cues indicating directivity information of a virtual sound source generated by at least two channel sound sources, among sound sources for the multi-channels.
In operation 830, the decoding unit 720, for example, may restore the frequency domain down-mixed mono signal to respective multi-channel signals, by using the CDD spatial cues.
In operation 840, the HRTF generation unit 750 may receive an HRTF corresponding to a predetermined channel, among the multi-channels, e.g., from the reference HRTF DB 760, and by using the input HRTF and the CDD spatial cues, the HRTF generation unit 750 may generate an HRTF corresponding to a channel other than the predetermined channel.
In operation 850, the 2-channel-synthesis unit 730 may then localize the decoded multi-channel signals to respective positions, by using the HRTF corresponding to the predetermined channel and the generated HRTFs, thereby generating a 2-channel signal.
In operation 860, the first frequency/time transform unit 770 and the second frequency/time transform unit 780 may transform the 2-channel signal to time domain signals.
Thus, according to an embodiment of the present invention, information spatial cues indicating the directivity information of virtual sound sources may be generated for multi-channels and a corresponding down-mixed mono multi-channel audio signal may be encoded and/or decoded.
Since such directivity information of virtual sound sources is determined according to information of channel layouts and is not dependent on frequencies of the channel signals, a multi-channel audio signal can be accurately encoded and/or decoded irrespective frequency regions.
In addition to the above described embodiments, embodiments of the present invention can also be implemented through computer readable code/instructions in/on a medium, e.g., a computer readable medium, to control at least one processing element to implement any above described embodiment. The medium can correspond to any medium/media permitting the storing and/or transmission of the computer readable code.
The computer readable code can be recorded/transferred on a medium in a variety of ways, with examples of the medium including magnetic storage media (e.g., ROM, floppy disks, hard disks, etc.), optical recording media (e.g., CD-ROMs, or DVDs), and storage/transmission media such as carrier waves, as well as through the Internet, for example. Here, the medium may further be a signal, such as a resultant signal or bitstream, according to embodiments of the present invention. The media may also be a distributed network, so that the computer readable code is stored/transferred and executed in a distributed fashion. Still further, as only an example, the processing element could include a processor or a computer processor, and processing elements may be distributed and/or included in a single device.
Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
| Number | Date | Country | Kind |
|---|---|---|---|
| 10-2006-0075390 | Aug 2006 | KR | national |