The present invention relates to a method and apparatus for conversion of a multi-channel audio signal into a two-channel audio signal.
Techniques for conversion of multi-channel audio signals into two-channel signals are known, and normally referred to as down-mixing techniques.
With down-mixing it is possible to reproduce an original multi-channel audio signal by a normal stereo equipment with two channels and two loudspeaker cabinets. Anyway the known down-mixing techniques do not allow the listener to recognize the physical origin of the sound, that is normally achieved by reproducing the original multi-channel signal with a multi-channel reproduction system.
An example of a well-known multi-channel audio signal is the so-called surround sound system. Channel surround representation includes, in addition to the two front stereo channels L and R, an additional front center channel C and two surround rear channels Ls, Rs. In the recording phase a physical disposition of microphones is for example as shown in
As known, the down-mixing of the original surround signals (L, R, C, Ls, Rs) into a stereo signal (L′,R′) is made by performing a linear combination of the original signals as for example given by the following formulae:
L′=L+α·C+β·Ls
R′=R+α·C+β·Rs
where α and β are constants, e.g. both equal to 0.5. Each of the two stereo signals L′, R′ is given by a linear combination of the front and rear signals of the same side, and of the center channel C.
The L′ and R′ signals are supplied to the left and right loudspeaker of a stereo loudspeaker arrangement for reproduction to a listener, see
Let us now assume a situation in which for example a five-channel recording is made of a sound originated from two speaking persons, the one (S1) standing at a location close to the mLs microphone, and the other (S2) standing at a location close to the mL microphone, as shown in
Upon reproduction of this recording via a stereo loudspeaker arrangement and after down-mixing according to the known technique described above, all the audio signals from the mLs and mL microphones are reproduced by the left loudspeaker L and no correct (separate) localization of the two speaking persons is possible. Namely, the sound signals produced by both speaking persons located at the mLs microphone and the mL microphone are now reproduced by the left loudspeaker L and the listener perceives both persons as being located at the location of the left loudspeaker.
By this specific example, it is shown that there is a number of situations in which the down-mixed audio signal does not allow a listener to differentiate between positions of speaking persons and therefore does not allow to maintain the relative virtual positions between sound sources with respect to their original position. This applies more specifically in situations in which in the generation/recording phase the sound sources are located close to the front and rear pick-up means of one side only. Another problematic situation may occur in case a speaking person walks from one microphone position to another. The movement can not be perceived in known down-mixing systems.
Therefore it is the main object of the present invention to provide a method and apparatus for conversion of a multi-channel audio signal into a two-channel audio signal which overcomes the above problems.
An object of the present invention is, according to claim 1, a method for conversion of a n-channel audio signal (L, R, Ls, Rs) into a two-channel audio signal (Ro, Lo), where n≧4 and integer, comprising the step of generating either one of the two-channel audio signals, right (Ro) or left (Lo), by a combination of:
Preferably in the method, in the combination said front (L, R) signal component of the n-channel audio signal of the other side is multiplied by a factor δ<1, preferably in the range [0, 0.5], more preferably =0.25.
Preferably in the method, the other one of the two-channel audio signals, right (Ro) or left (Lo), is generated by a combination of:
A further object of the present invention is an apparatus configured so as to implement the above method.
These and further objects are achieved by means of an apparatus and method for conversion of a multi-channel audio signal into a two-channel audio signal, as described in the attached claims, which form an integral part of the present description.
The invention will become fully clear from the following detailed description, given by way of a mere exemplifying and non limiting example, to be read with reference to the attached drawing figures, wherein:
The same reference numerals and letters in the figures designate the same or functionally equivalent parts.
In the following some specific non limiting examples of embodiment of the method of the present invention will be described.
A first embodiment of the invention applies primarily in a situation like the one described above, with reference to
It is worth noticing that generally the input signals don't necessarily need to be microphone signals. They could be provided by any device capable of generating multichannel (surround) signals, e.g. mixing consoles, computer/artificially generated content (room simulation tools etc.), generic playback devices and so on.
According to the invention, the following formulae for the down-mixing process apply, in which one of the two stereo signals, for example Ro, is modified:
Lo=L+α·C+β·Ls
Ro=R+α·C+β·Rs+δ·L
where Lo, Ro, are the left and right components of the down-mixed audio signal; α and β are constants like those described above, δ is a constant, preferably substantially smaller than 0.5.
A possible range for α and β would be [0, 1], while −3 dB=0.707945 . . . is preferred.
A possible range for δ would be [0, 0.5], while 0.25 is preferred.
Preferably, the Lo signal is also modified in the following way:
Lo=η·L+α·C+β·Ls
Where preferably η≦1, more preferably η=(1−δ).
η is introduced here to approximate the global level of the sound generated by the down-mix signals to the global level of the multi-channel surround signal.
This way the sound signal generated by the speaking person located at the mLs microphone (hereafter defined as the first speaking person S1) is reproduced by the left loudspeaker (only). The listener thus perceives the first speaking person as being located at the position of the left loudspeaker L, as for example depicted in
The sound signal generated by the speaking person located at the mL microphone (hereafter defined as the second speaking person S2), however, is reproduced by both the left loudspeaker and the right loudspeaker. As a result, the listener perceives the second speaking person S2 as a so-called phantom source at a position between the left and right loudspeaker. If δ is substantially smaller than 0.5, the location will be at the left of the center line cl, viewed from the listener, as if the sound from speaking person S2 came from a virtual loudspeaker VL, as shown in
So, by feeding the right loudspeaker with a portion of the L signal, it is possible to distinguish the two speaking persons located at the mLs and mL microphone, as they are now perceived by the listener at the position of the left loudspeaker and at the right side of the left loudspeaker, respectively.
Likewise in case a recording is made of two speaking persons, the one being positioned close to the mRs microphone and the other positioned close to the mR microphone, a correction is needed to enable a differentiated localization of the two speaking persons during normal stereo reproduction and after down-mixing.
The following formulae for the down-mixing process apply, in which the stereo signal Lo is modified:
Lo=L+α·C+β·Ls+δ·R
Ro=R+α·C+β·Rs
where α, β and δ are constants, like the case above. Also in this case preferably δ is substantially smaller than 0.5.
Preferably, the Ro signal is also modified in the following way:
Ro=η·R+α·C+β·Rs
Where preferably η≦1, more preferably η=(1−δ).
This way the sound signal generated by the speaker located at the mRs microphone (hereafter defined as the first speaker S1) is reproduced by the right loudspeaker (only). The listener thus perceives the first speaker as being located at the position of the right loudspeaker R.
The sound signal generated by the speaker located at the mR microphone (hereafter defined as the second speaker S2), however, is reproduced by both the left loudspeaker and the right loudspeaker. As a result of this, the listener perceives the second speaker S2 to be located at a position between the left and right loudspeaker.
If δ is substantially smaller than 0.5, the location will be to the right of the center line cl, viewed from the listener, as if the sound from speaker S2 came from a virtual loudspeaker VR (not shown in
So, by feeding the left loudspeaker with a portion of the R signal, it is possible to distinguish the two speaking persons located at the mRs and mR microphone, as they are now perceived by the listener at the position of the right loudspeaker and at the left side of the left loudspeaker, respectively.
From both situations described above, it can be seen that what is maintained is the relative virtual position between the two signal sources, with respect to the original relative position.
Generally we can say that either one of the two-channel audio signals, right Ro or left Lo, is given by a combination of:
Preferably the other one of the two-channel audio signals, right Ro or left Lo, is generated by a combination of:
For n=5, we have A(n)=B(n)=(α·C), therefore a contribution given by the center channel C, and preferably n=(1−δ).
A second embodiment of the method of the invention applies in a situation with an input multi-channel audio signal with n=4 input channels, where the center channel C is lacking, and we have channels L, R, Ls and Rs as defined above.
In this case the above equations (for the case of n=5) still apply for Ro, Lo, without the term (α·C), therefore A(n)=B(n)=0, and preferably η=(1−δ).
A third embodiment of the method of the invention applies in a situation with an input multi-channel audio signal with n=7 input channels.
With reference to
Like in the previous cases, we have a sound source S1 located at microphone mLs and another sound source S2 located at microphone mL. Now a third sound source (for example a speaker) S3 is located at the left side microphone mLss channel (like in
Also in this cases of n=7, the above equations (for the case of n=5) still apply for Ro, Lo. What is changing is the value of A(n) and B(n), in which additional contributions come from the left side Lss or the right side Rss channels.
In fact now we have A(n)=α·C+γ·Rss+ε·Lss and B(n)=α·C+γ·Lss+ε·Rss. The additional multiplication factors γ and ε are preferably smaller than 1. Further, preferably η=(1−δ−ε). More preferably δ>ε/γ.
With reference to
The sound signal generated by the speaker S2 located at the mR or mL microphone is reproduced by both the left loudspeaker and the right loudspeaker. As a result of this, the listener perceives the second speaker S2 to be located at a position between the left L and right R loudspeaker, as from a virtual loudspeaker VL2. Also the sound signal generated by the speaker S3 located at the mRss or mLss microphone is reproduced by both the left loudspeaker and the right loudspeaker, with a different balance between the input signals. The listener perceives the third speaker S3 to be located at a position between the left L and right R loudspeaker, as from a virtual loudspeaker VL3, different with respect to S2. Also in this case it is maintained the relative virtual position between the three signal sources is maintained with respect to the original relative position.
Generally, the presence of the multiplying factors (α, β, δ, η, γ, ε) in the various formulae keeps into account the need to control the global level of sound generated by the down-mixed signal, by reducing proportionally the contributions of the original sound components.
As far as some example of apparatus are concerned, for the implementation of the method for conversion of a multi-channel audio signal into a two-channel audio signal of the present invention, the following can apply.
By applying the method of the invention on the signals in a recording and production phase of a multi-channel (surround) recording, it is possible to get the advantage that no modification is needed to the installed base of a consumer stereo equipment, with a stereo amplifier and stereo loudspeaker arrangement. As long as it receives the modified down-mixed stereo signal, a separate localization of sound sources is possible.
In the case of transmission of an original multi-channel (surround) signal, the method of the invention can be implemented in a consumer audio equipment, suitably modified to include means for the implementation of the method.
Preferably additional control signals may be included, during production of the surround signals, to allow the stereo equipment to select which formula to apply and when.
These additional control signals may be included in the metadata that is transmitted together with the multi-channel (surround) signal. For example they can be embedded in one or more of the audio channels, under the masking level of the audio signal, or they can be inserted in an additional channel.
Therefore the down-mixing unit of the consumer audio equipment is adapted to generate the left (Lo) and right (Ro) hand signal components of the stereo audio signal during time intervals defined by occurrences of the additional control signals.
With reference to
In
A control circuit CNT1 supplies control signals to enable each of the multiplying factors according to the selection of the specific formula effectively applied, namely depending of the position and/or motion of the sound sources in an audio scene. The control circuit CNT1 receives input signals IN1 for controlling the selection to be applied.
If the conversion from multichannel to two channel is made at the recording and production facility, the control signals can be generated for example by suitably controlling a recording console, according to known criteria.
If the conversion from multichannel to two channel is made at the receiver, the control signals may be generated in the receiver, and the control circuit CNT1 for example suitably demultiplexes or demodulates the additional control signals generated at the recording facility and sent by one of the techniques described above.
In
The control is made by a control circuit CNT2 in an equivalent way as that described with reference to
In
The control is made by a control circuit CNT3 in an equivalent way as that described with reference to
The method of the present invention can be advantageously implemented through a program for computer comprising program coding means for the implementation of one or more steps of the method, when this program is running on a computer. Therefore, it is understood that the scope of protection is extended to such a program for computer and in addition to a computer readable means having a recorded message therein, said computer readable means comprising program coding means for the implementation of one or more steps of the method, when this program is run on a computer.
Hereafter follows as a further explanation a value-table disclosing value ranges for the various multiplying parameters described above.
It is however stressed that signal components need not necessarily be combined in a linear way. Also non-linear combinations of the signal components are possible, such as described in WO2011/057922A1, which discloses a combination to obtain a power corrected summation of two signal components.
Many changes, modifications, variations and other uses and applications of the subject invention will become apparent to those skilled in the art after considering the specification and the accompanying drawings which disclose preferred embodiments thereof. All such changes, modifications, variations and other uses and applications which do not depart from the spirit and scope of the invention are deemed to be covered by this invention.
Further implementation details will not be described, as the man skilled in the art is able to carry out the invention starting from the teaching of the above description.
Number | Date | Country | Kind |
---|---|---|---|
TO2012A0067 | Jan 2012 | IT | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/051104 | 1/22/2013 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/110589 | 8/1/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5857026 | Scheiber | Jan 1999 | A |
6052470 | Mouri | Apr 2000 | A |
6493674 | Takamizawa | Dec 2002 | B1 |
7680290 | Ko | Mar 2010 | B2 |
8442237 | Kim | May 2013 | B2 |
20050053249 | Wu et al. | Mar 2005 | A1 |
20060009225 | Herre et al. | Jan 2006 | A1 |
20060013419 | Ko | Jan 2006 | A1 |
20060115100 | Faller | Jun 2006 | A1 |
20070133831 | Kim | Jun 2007 | A1 |
20070258607 | Purnhagen et al. | Nov 2007 | A1 |
20090060204 | Reams | Mar 2009 | A1 |
20090080666 | Uhle et al. | Mar 2009 | A1 |
20110255714 | Neusinger | Oct 2011 | A1 |
Number | Date | Country |
---|---|---|
313857 | Aug 2009 | TW |
0241668 | May 2002 | WO |
2005101371 | Oct 2005 | WO |
2006048227 | May 2006 | WO |
2006054270 | May 2006 | WO |
2010125104 | Nov 2010 | WO |
2011057922 | May 2011 | WO |
2011072729 | Jun 2011 | WO |
Entry |
---|
International Search Report dated Feb. 22, 2013, issued in PCT Application No. PCT/EP2013/051104, filed Jan. 22, 2013. |
Written Opinion dated Feb. 22, 2013, issued in PCT Application No. PCT/EP2013/051104, filed Jan. 22, 2013. |
Number | Date | Country | |
---|---|---|---|
20150036829 A1 | Feb 2015 | US |