The present invention relates to a method and apparatus for down-mixing of a multi-channel audio signal.
Techniques for conversion of multi-channel audio signals into two-channel signals are known, and normally referred to as down-mixing techniques.
With down-mixing it is possible to reproduce an original multi-channel audio signal by a normal stereo equipment with two channels and two loudspeaker cabinets.
An example of a well-known multi-channel audio signal is the so-called surround sound system. Channel surround representation includes, in addition to the two front stereo channels L and R, an additional front center channel C and two surround rear channels Ls, Rs.
Those surround signals are supplied during reproduction to corresponding loudspeakers located in a listening room, for example as shown in
As known, the down-mixing of the original surround signals (L, R, C, Ls, Rs) into a stereo signal (Lo, Ro) is made by performing a linear combination of the original signals as for example given by the following formulae:
Lo=L+α·C+β·Ls
Ro=R+α·C+β·Rs
where α and β are constants, smaller than 1, preferably both equal to 0.7.
Each of the two stereo signals Lo, Ro is given by a linear combination of the front and rear signals of the same side, and of the center channel C.
The Lo and Ro signals are supplied to the left and right loudspeaker of a stereo loudspeaker arrangement for reproduction to a listener, see
It should be noted that the publication in the Proceedings of the AES, vol. 121, January 2006, titled ‘Binaural simulation of complex acoustic scenes for interactive audio’ by Jean-Marc Jot et al. disclose a complicated signal processing system for a binaural simulation of acoustic scenes, which means that a system is proposed where the sound can come from ‘specific directions’ specifically chosen, such that a ‘correct’ sensation of a listener that hears the sound via headphones, is obtained. Also a presentation via (two, see
It should further be noted that EP-A177790 disclose a car audio reproduction system for creating a virtual centre sound source by means of a left and right side loudspeaker. Again, the system makes use of transfer functions between a position on the left side and the right ear of the listener, and of transfer functions between a position on the right side and the left ear of the listener. This is again contrary to the present application. Again, the present application discloses the same advantages over the known circuit described in EP-A1777902.
Therefore it is the main object of the present invention to provide a downmixing method and apparatus which at least partially avoids such distortions.
An object of the present invention is, according to claim 1, a method for down-mixing of a m-channel audio signal (L, R, C, Ls, Rs, Rss, Lss) into a n-channel audio signal (Ro, Lo, Rso, Lso), where m is an integer for which holds m>n and n is an integer for which holds n≧2, comprising the step of generating one of the n-channel audio signals of one side (right or left) of a listener (Ro, Lo, Rso, Lso), by a combination of:
A further object of the present invention is an apparatus, according to claim 2, for down-mixing an m-channel audio signal (L, R, C, Ls, Rs, Rss, Lss) into a n-channel audio signal (Ro, Lo), where m is an integer for which holds m>n and n is an integer for which holds n≧2, comprising
Further objects are apparatuses where m=3, or m=4, or m=5, or m=6, or m=7, and n=2, or n=4, complying with the characteristics of the above defined apparatus.
These and further objects are achieved by means of an apparatus and method for down-mixing of a multi-channel audio signal into a two-channel audio signal, as described in the attached claims, which form an integral part of the present description.
The invention is based on the recognition that combining e.g. the Ls and Rs signal components to e.g. the left-front and the right-front signals, respectively, in the downmixing process, those Ls and Rs signals are now perceived from the “left-front” and right-front” directions, respectively, whereas they are normally (in the five-channel reproduction situation) perceived from the “back-left” and “back right” directions, respectively.
This results in distortions in the perceived downmixed signals, which do not allow the listener to recognize the real physical origin of the sound, that is normally achieved by reproducing the original multi-channel signal with a multi-channel reproduction system. By pre-processing the signals from those positions that are ‘lost’ in the downmixing process by the pre-filtering as claimed, a relocation can be obtained which improves the perception of the listener, so that the signal components from the positions that are ‘lost’ in the downmixing process, can at least substantially be perceived from their original position.
The invention will become fully clear from the following detailed description, given by way of a mere exemplifying and non-limiting example, to be read with reference to the attached drawing figures, wherein:
The same reference numerals and letters in the figures designate the same or functionally equivalent parts.
The method of the present invention aims to correct for the above described distortions, by preprocessing the m-channel signal components before they are combined into the Lo and Ro signals, respectively.
A typical configuration provides for a situation like the one described above, with reference to
There are a number of possible situations of presence of different number of channels in the input multi-channel audio signal, namely m=3, where we have the R, L, C signal components; m=4 with R, L, Rs, Ls; m=5 with all L, R, C, Ls and Rs signal components, and so on with higher values of m.
In the following some specific non limiting examples of embodiment of the method of the present invention will be described.
A first embodiment of the invention, where m=3 (L, R, C) and n=2 (Lo, Ro), shown in
H(c−re)=H1*H(fr−re),
and
H(c−le)=H2*H(fl−le)
where H(c−re) and H(c−le) are the frequency characteristics of the transmission paths between the position of the front-center loudspeaker and the positions of the right ear and left ear, respectively, of the listener, in an m-channel surround reproduction situation, and
H(fr−re) is the frequency characteristic of the transmission path between the position of the “front-right” loudspeaker and the position of the right ear of the listener, in a n-channel stereo reproduction situation, and
H(fl−le) is the frequency characteristic of the transmission path between the position of the “front-left” loudspeaker and the position of the left ear of the listener, in a n-channel stereo reproduction situation.
Another embodiment of the invention where m=4 (L, Ls, R, Rs) and n=2 (Lo, Ro) is shown in
More precisely, the signal Rs is preprocessed by pre-filtering Rs by a third filtering function H3, which third filter satisfies the following formula:
H(br−re)=H3*H(fr−re)
and Ls is preprocessed by prefiltering Ls by a fourth filter H4, which fourth filter satisfies the following formula:
H(bl−le)=H4*H(fl−le),
where
H(bl−le) is the frequency characteristic of the transmission path between the position of the “back-left” loudspeaker and the position of the left ear of the listener, in the m-channel surround reproduction situation,
H(br−re) is the frequency characteristic of the transmission path between the position of the “back-right” loudspeaker and the position of the right ear of the listener, in the m-channel surround reproduction situation,
H(fl−le) and H(fr−re) are defined above.
By doing so, the listener may receive the following Rs signal component at its right ear, in case of a stereo reproduction situation (n=2):
Rs·H3·β·H(fr−re)=Rs·H(br−re)/H(fr−re)·β·H(fr−re)=β·Rs·H(br−re),
which can be what the listener's right ear would have perceived in the m-channel surround reproduction situation (m=5).
Since an exact solution for H3 in general is not feasible or does not exist, an approximation H3′ is to be used, where
H3′·H(fr−re)≈H(br−re).
An equivalent calculation can be of course valid for the perception by the listener's left ear of the Ls signal component.
Ls·H4·β·H(fl−le)=Ls·H(bl−le)/H(fl−le)·β·H(fl−le)=β·Ls·H(bl−le),
And an equivalent approximation
H4′·H(fl−le)≈H(bl−le).
Generally, the down-mixing method generates a right hand channel component (Ro) of the n-channel audio signal in the following way:
Ro=δ·R+β·H
3·Rs+A(m)
where R is the front right signal component of the m-channel audio signal, δ and β are multiplication factors preferably ≦1, and A(m) an equation dependent of m.
In a similar way the down-mixing unit generates the left hand channel component (Lo) of the n-channel audio signal in the following way:
Lo=δ·L+β·H4·Ls+B(m)
where L is the front left signal component of the m-channel audio signal, δ and β are multiplication factors preferably ≦1, and B(m) an equation dependent of m.
For m=3 (the embodiment of
Ro=δ19 R+α·H1·C
Lo=δ·L+α·H2·C
where A(m)=α·H1·C and B(m)=α·H2·C, and the contributions relating to Rs and Ls are not present.
For m=4 (the embodiment of
For m=5 (the embodiment of
A further embodiment of the method of the invention (see
With reference to
In this case of m=7, the method of the invention provides for a fifth signal pre-processing with a filtering function (H5) for pre-processing the side right signal component of the m-channel audio signal (Rss) prior to down-mixing the m-channel audio signal into the n-channel stereo audio signal, the pre-processing step on the side right signal component being equivalent to a pre-filtering step; the filtering function H5 at least substantially satisfies the following formula:
H(sr−re)=H5*H(fr−re),
where H(sr−re) is the frequency characteristic of the transmission path between the position of the “side-right” loudspeaker Rss and the position of the right ear of the listener, in the seven channel surround reproduction situation, and
H(fr−re) is the above defined frequency characteristic of the transmission path between the position of the “front-right” loudspeaker and the position of the right ear of the listener, in a n-channel stereo reproduction situation.
In addition the method of the invention provides for a sixth signal pre-processing with a filtering function (H6) for pre-processing the side left signal component of the m-channel audio signal (Lss) prior to down-mixing the m-channel audio signal into the n-channel stereo audio signal, the pre-processing step on the side left signal component being equivalent to a pre-filtering step; the filtering function H6, at least substantially satisfies the following formula:
H(sl−le)=H6*H(fl−le),
where H(sl−le) is the frequency characteristic of the transmission path between the position of the “side-left” loudspeaker Lss and the position of the left ear of the listener, in the situation of m=7, and
H(fl−le) is the above defined frequency characteristic of the transmission path between the position of the “front-left” loudspeaker and the position of the left ear of the listener, in a n-channel stereo reproduction situation.
In the case of m=7, A(m)=α·H1·C+γ·H5·Rss and B(m)=α·H2·C+γ·H6·Lss.
Further embodiments of the method of the invention apply in a situation where the signals of the “side right” signal component and the “side left” signal components of the m-channel audio signal are pre-processed and subsequently combined with the “back right” signal component and the “back left” signal component and fed to the right and left surround loudspeakers of an n-channel audio reproduction arrangement. This is shown in the embodiment of
H(sr−re)=H7*H(br−re),
where H(sr−re) is the frequency characteristic of the transmission path between the position of the “side-right” loudspeaker and the position of the right ear of the listener, in an m-channel surround reproduction situation, and
H(br−re) is the frequency characteristic of the transmission path between the position of the “back-right” loudspeaker Rso and the position of the right ear of the listener, in an n-channel reproduction situation.
In these cases, the method of the invention provides further for an eighth signal pre-processing with a filtering function (H8) for pre-processing a side left signal component of the m-channel audio signal (Lss) prior to down-mixing the m-channel audio signal into the n-channel audio signal, the pre-processing step on the side left signal component being equivalent to a pre-filtering step; the filtering function H8 at least substantially satisfies the following formula:
H(sl−le)=H8*H(bl−le),
where H(sl−le) is the frequency characteristic of the transmission path between the position of the “side-left” loudspeaker and the position of the left ear of the listener, in an m-channel surround reproduction situation, and
H(bl−le) is the frequency characteristic of the transmission path between the position of the “back-left” loudspeaker Lso and the position of the left ear of the listener, in an n-channel reproduction situation.
In the above cases further components of the n-channel signal are generated, namely:
Rso=ε·Rs+ζ·H7·Rss
and
Lso=ε·Ls+ζ19 H8·Lss,
where
Rso is the composite signal applied to back right loudspeaker, Lso is the composite signal applied to the back left loudspeaker, ε and ζ are multiplication factors, preferably ≦1.
In this case preferably:
Ro=δ·R
Lo=δ·L
In this embodiment, the downmix is one where the side left- and side right loudspeaker signals are added to back left and back right loudspeakers, respectively. So, suppose m=6 (R, Rs, Rss, L, Ls, Lss), the downmix results in n=4 (R, Rso, L, Lso), as shown in
In a still further embodiment, starting from the previous embodiment, a further center component C is present in the m-channel signal, which is applied to the Ro and Lo components of the n-channel signal multiplied by the above mentioned coefficients H1, H2 respectively, obtaining:
Ro=δ·R+H1·C;
Lo=δ·L+H2·C
Generally, the presence of the multiplying factors (α, β, δ, η, γ, ε, ζ) in the various formulae keeps into account the need to control the global level of sound generated by the down-mixed signal, by reducing proportionally the contributions of the original sound components. Therefore each one of them is set to a value lower than 1.
A preferred way to realize the filter functionality of the filtering functions H1, H2, H3, H4, H5, H6 is by implementing a discrete-time finite-impulse-response (FIR) filter whose filter coefficients are fixed and have been calculated in advance.
The filter coefficients can be derived from the filters' desired impulse responses K1, K2, K3, K4, K5, K6 respectively.
For example, for a non-recursive direct-form filter, the coefficients vector is identical to the impulse response function. K1 and K2 are calculated as described later.
The calculation of K1 is based on transmission path impulse responses K(fr−re) and K(br−re), which are the time-domain counterparts of the corresponding transmission path frequency characteristics H(fr−re), H(br−re).
The same applies to the calculation of K2 based on K(fl−le) and K(bl−le), corresponding to H(fl−le) and H(bl−le), respectively.
The calculation results K1 and K2 are the time-domain counterparts of the filtering functions H1 and H2, respectively.
A common method to determine said transmission path impulse responses is by directly recording them in a measuring setup with a loudspeaker and a microphone, positioned appropriately in a room, preferably an anechoic chamber.
The use of a dummy-head microphone is the common, and in this case preferred, way to obtain head-related impulse responses (HRIR), which are the time-domain counterparts of head-related transfer functions (HRTF).
A preferred method to calculate K1 uses the known concept of least-squares approximation of the linear equation system that expresses the convolution of a filter with an input signal, identified with an output signal.
This method belongs to the concepts also known as inverse filtering or deconvolution and is described in short as follows.
Here applies:
K(fr−re)(*)K1=K(br−re),
where (*) is the convolution operator (denoting discrete convolution).
When expanded to an equation system in matrix form, the left equation side becomes a Toeplitz matrix formed from K(fr−re), multiplied with a vector, equivalent to K1, and the right equation side is a vector, equivalent to K(br−re).
For this linear equation system, one of the known least-squares approximative solution methods are then performed, for example a singular value decomposition (SVD). This results in a suitable solution for K1.
The same calculation is performed respectively for K2 with:
K(fl−le)(*)K2=K(bl−le).
As far as some example of apparatus are concerned, for the implementation of the method for conversion of a m-channel audio signal into a n-channel audio signal of the present invention, the following can apply.
In the case of transmission of an original m-channel signal, the method of the invention can be implemented in a consumer audio equipment, suitably modified to include means for the implementation of the method.
With reference to
The method of the present invention can be advantageously implemented through a program for computer comprising program coding means for the implementation of one or more steps of the method, when this program is running on a computer. Therefore, it is understood that the scope of protection is extended to such a program for computer and in addition to a computer readable means having a recorded message therein, said computer readable means comprising program coding means for the implementation of one or more steps of the method, when this program is run on a computer.
Many changes, modifications, variations and other uses and applications of the subject invention will become apparent to those skilled in the art after considering the specification and the accompanying drawings which disclose preferred embodiments thereof.
Further implementation details will not be described, as the man skilled in the art is able to carry out the invention starting from the teaching of the above description.
Number | Date | Country | Kind |
---|---|---|---|
TO2012A000193 | Mar 2012 | IT | national |
TO2012A000886 | Oct 2012 | IT | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/054336 | 3/5/2013 | WO | 00 |