PCT/EP2018/052160 filed Jan. 29, 2018, claiming priority based on European Patent Application No. 17153650.1 filed Jan. 27, 2017, the entire disclosures of which are incorporated by reference herein.
The present invention relates to a sound processing method and system for panning audio objects on multichannel speaker setups.
Sound panning systems are typical components of the audio production and reproduction chains. They have been commonly found in cinema mixing stages for decades, more recently in movie theaters and home movie theaters, and allow spatializing audio content using a number of loudspeakers.
Modern systems typically take one or more audio input streams comprising audio data and time-dependent positional metadata, and dynamically distribute said audio streams to a number of loudspeakers which spatial arrangement is arbitrary.
The time-dependent positional metadata typically comprises three dimensional (3D) coordinates, such as Cartesian or spherical coordinates. The loudspeaker spatial arrangement is typically described using similar 3D coordinates.
Ideally, said panning systems account for the spatial location of the loudspeakers and the spatial location of the audio program, and dynamically adapt the output loudspeakers gains, so that the perceived location of the panned streams is that of the input metadata.
Typical panning system compute a set of N loudspeaker gains given the positional metadata, and apply said N gains to the input audio stream.
Numerous panning systems technologies have been developed for use in research or theatrical facilities.
Stereophonic systems have been known since Blumlein works, especially in GB 394325, followed by the system used for the Fantasia movie as described in U.S. Pat. No. 2,298,618, along with other movie-related systems such as WarnerPhonic. The standardization of stereophonic vinyl discs allowed a large democratization of stereophonic audio systems.
An adaptation of content-creation systems, especially mixing desks, was then mandatory as they were only capable of monophonic sound mixing. Switches were added to consoles to direct sounds to one channel, or the two simultaneously. Such a discrete panning system was widely used until the mid-1960s, when double-potentiometer systems were introduced in order to allow a continuous variation of the stereophonic panning without degrading the original signal.
Based on the same repartition principle, the so-called surround panning systems were thereafter introduced to allow the distribution of a monophonic signal on more than two channels, for instance in the context of movie soundtracks where the use of three to seven channels is common. The most frequently encountered implementation, commonly called “pair-wise panning”, consists of a double stereophonic panning system, one being used for left-right distribution, and the other for front-back distribution. Extending such a system to three dimensions, by adding a third panning system to manage up-down sound repartition between horizontal layers of transducers, is then trivial.
However, in some cases, one has to position a transducer between left-right or front-back positions, for example a center channel placed in the middle of the left and right channels and used for dialogue in movie soundtracks. This mandates substantial modifications of the stereophonic panning system. Indeed, for esthetical or technical reasons, it can be desirable to either playback a centered signal via the left and right channels, or via the center channel alone, or even via the three channels at the same time.
The emergence of object-based audio formats such as Dolby Atmos or Auro-Max recently required additional transducers in intermediate positions to be added, for instance along the walls of a movie theatre, in order to assure a good localization precision of said audio objects. Such systems are commonly managed by the so-called pair-wise panning systems mentioned above, in which transducers are used by pair. The use of such pair-wise panning systems can be justified, among other reasons, by the symmetry of the transducer set in the room. Coordinates used in such systems are typically Cartesian ones, and assume that transducers are positioned along the faces of a room surrounding the audience.
Other approaches were disclosed, such as Vector-Based Amplitude Panning (VBAP), an algorithm that allows computing gains for transducers positioned on the vertices of a triangular 3D mesh. Further developments allow VBAP to be used on arrangements that comprise quadrangular faces (WO2013181272A2), or arbitrary n-gons (WO2014160576).
VBAP was originally developed to produce point-sources panning on arbitrary arrangements. In “Uniform spreading of amplitude panned virtual sources” (Proc. 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, N.Y., Oct. 17-20, 1999), Pulkki presented a new addition to VBAP, multiple-direction amplitude panning (MDAP) to allow for uniform spread of sources. The method basically involves additional sources around the original source position, which are then panned using VBAP and superimposed to the original panning gains. If non-uniform spreading is needed, or more generally on dense speaker arrangements in the three-dimensional panning case, the number of additional sources can be very high and the computational overhead will be substantial. MDAP is the method used in MPEG-H VBAP renderer.
Similarly, in the context of three-dimensional panning methods, WO2014159272 (Rendering of audio objects with apparent size to arbitrary loudspeaker layouts) introduces a source width technique based on the creation of multiple virtual sources around the initial source, the contribution of which are ultimately summed to form transducer gains.
In “An optimization approach to control sound source spread with multichannel amplitude panning” (Proc. CSV24, London, 23-27 Jul. 2017), Franck et al. proposed another method for source width control, based on a convex optimization technique, this method reduces itself to VBAP in the absence of source width. Some virtual-source methods also involve a decorrelation step, such as WO2015017235.
Ambisonics, which are based on a spherical harmonics representation of a soundfield, have also been extensively used for audio panning (a recent example being given in WO2014001478).
The most important drawback in original Ambisonics panning techniques is that the loudspeaker arrangement shall be as regular as possible in the 3D space, mandating the use of regular layouts such as loudspeakers positioned at the vertices of platonic solids, or other maximally regular tessellations of the 3D sphere. Such constraints often limit the use of Ambisonic panning to special cases. To overcome these limitations, mixed approaches using, for example, both VBAP and Ambisonics have been disclosed in WO2011117399 and further refined in WO2013143934.
Another issue with Ambisonics is that point-sources are almost never played back by one or two speakers only: because the technology is based on a reconstruction of the soundfield at a given position or in a given space, for a single point-source a large number of speakers will emit signals, possibly phase shifted. While it theoretically allows for a perfect reconstruction of the soundfield in a specific location, this behaviour also means that off-centred listening positions will be somewhat suboptimal in this regard: the precedence effect will, in some conditions, make point-source perceived as coming from unexpected positions in space.
Other approaches have also been presented that are able to use totally arbitrary spatial layouts, for example Distance-Based Audio Panning (DBAP) (“Distance-based Amplitude Panning”, Lossius et al., ICMC 2009). In “Evaluation of distance based amplitude panning for spatial audio”, DBAP was shown to yield satisfactory results compared to third-order Ambisonics, especially when the listener is off-centred in regard to the speaker arrangement, and was also shown to perform very similarly to VBAP in most configurations.
The most prominent issue with DBAP is the choice of the distance-based attenuation law, which is central to the algorithm. As shown in US20160212559 a constant law can only handle regular arrangements, and DBAP has problems with irregular spatial speaker arrangements, due to the fact that the algorithm doesn't take the spatial speaker density into account.
Also presented was Speaker Placement Correction Amplitude Panning (SPCAP) (“A novel multichannel panning method for standard and arbitrary loudspeaker configurations”, Kyriakakis et al., AES 2004). Both the DBAP and SPCAP methods only account for the metric between the intended position of the input source and the positions of the loudspeakers, for instance the Euclidean distance in the DBAP case, or the angle between the source and the speakers in the SPCAP case.
One of SPCAP advantages over the above discrete panning schemes is that it was originally developed to provide a framework for producing wide (non-point-source) sounds.
To this effect, a virtual three-dimensional cardioid, whose principal axis is the direction of the panned sound, is projected onto the spatial loudspeaker arrangement, the value of the cardioid function indirectly yielding the final loudspeakers gains. The tightness of said cardioid function can be controlled by raising the whole function to a given power greater or equal to 0, so that sounds with user-settable width can be produced.
The cardioid law proposed in Kyriakakis et al., AES 2004, is a power-raised law:
where d denotes the spread-related width, which is indicative of the spatial extent of the source with respect to the position of the source, and ranges from 0 to 1.
One key observation with prior art methods such as SPCAP is that the cardioid law as proposed in Kyriakakis et al., AES 2004 is not adequate to produce point-sources: one cannot simulate such focused sources without running into speaker attraction issues.
Another issue with the proposed power-raised law in the original SPCAP algorithm is the discontinuity of said cardioid function at an angle of r: for u≠0, r(π)=0, but for u=0, r(π)=1. This means that a speaker positioned at the exact opposite of the panned source would never produce any sound for values of u close but not equal to 0, but would abruptly produce sound for u=0.
To illustrate the inadequacy of the cardioid law,
and is considered to be a good indicator of how sound localization is perceived under 700 to 1000 Hz, whereas the energy vector, computed as
gives sound localization above 700 to 1000 Hz. In the above, {right arrow over (I)}t is the unitary vector pointed towards the i-th transducer, and gi is the gain of the i-th transducer. On
It is an object of the present invention to provide solutions to the issues of all aforementioned standard algorithms, namely:
In a first aspect, the invention provides a method of processing an audio object along an audio axis according to claim 1.
The disclosed invention builds upon a substantially modified version of the original SPCAP, solves the issues mentioned above, while keeping the advantages of the algorithm.
In the disclosed invention, the cardioid law is modified so that it bears no spatial discontinuity when the spread changes, and the spread is no longer constrained to the 0 . . . 1 interval.
In one embodiment, the cardioid law is modified to a pseudo-cardioid law,
where u denotes the spread according to the present invention, which ranges from 0 to infinity. Any other law having the same spatial continuity with variable values of spread can be used instead. An example according to the present invention is presented in
To solve the moving point-source issues presented in
This novel algorithm solves the abovementioned issues:
This algorithm also ensures that even for high values of spread, the acoustic energy and velocity vectors of the panned source are still closely aligned to the intended source position.
As such, novel technical aspects of the invention when compared to the original SPCAP algorithm may relate to the following
In a second aspect, the present invention provides a method of processing an audio object with respect to an inner surface of a parallelepipedic room, according to claim 3.
In a third aspect, the present invention provides a method for processing an audio object with respect to an inner surface of a sphere according to claim 4.
According to further aspects, the present invention provides a system for processing an audio object along an axis according to claims 4-5, a system for processing an audio object with respect to an inner surface of a parallelepipedic room, according to claim 6, and a system for processing an audio object with respect to an inner surface of a sphere according to claim 7.
According to further aspects, the invention offers a use of the method according to claims 1-2 in the system according to claims 5-6, a use of the method according to claim 3 in the system according to claim 7, and a use of the method according to claim 4 in the system according to claim 8.
Preferred embodiments and their advantages are provided in the detailed description and the dependent claims.
The invention relates to a processing method and system for panning audio objects.
In this document, the terms “loudspeaker” and “transducer” are used interchangeably. Furthermore, the terms “spread”, “directivity” and “tightness” may be used interchangeably in some instances but not necessarily in all instances, and all relate to the spatial extent of the audio object with respect to the position of the audio object, and ranges from 0 to 1.
In this document, the term “source” refers to an audio object taking the role of source.
In a preferred embodiment, for notational convenience, the spread-related width d is replaced by the spread u according to the present invention, which is indicative of the spatial extent of the source with respect to the position of the source and ranges from 0 to infinity, and may relate to the spread-related width d according to following formulas: u=d/(1−d); and, conversely, d=u/(1+u). The spread u is e.g. used throughout the claims. In other embodiments, the present invention is illustrated by using the equivalent spread-related width d, as for instance in the case of
The invention offers a plurality of related embodiments, and may be categorized in three groups of embodiments:
In a first aspect, the invention provides a method of processing an audio object along an audio axis according to claim 1. This relates to a usage for panning on speakers positioned on a single wall, along an axis. In a preferred embodiment, this relates to following algorithm:
In a second aspect, the present invention provides a method of processing an audio object with respect to an inner surface of a parallelepipedic room, according to claim 3. This relates to a “triple 1D processing”, and relates to a usage with panning on speakers positioned on room's walls (front back left right top walls) where independent three-axis spread values are needed
Preferred inputs are:
In a preferred embodiment, the algorithm relates to the following:
Global algorithm:
In a third aspect, the present invention provides a method for processing an audio object with respect to an inner surface of a sphere according to claim 4. This relates to a usage for panning on speakers positioned on a sphere
Preferred inputs are:
In a preferred embodiment, the algorithm relates to the following:
Offline part:
Real-time part, for given object coordinates:
and by dividing the initial gains to yield the corrected gains for each speaker:
In a further aspect, the present invention relates to following considerations.
Typical panning system compute a set of N loudspeaker gains given the positional metadata, and apply said N gains to the input audio stream.
For instance, Vector-Based Amplitude Panning allows computing said gains for loudspeaker positioned on the vertices of a triangular 3D mesh. Further developments allow VBAP to be used on arrangements that comprise quadrangular faces (WO2013181272A2), or arbitrary n-gons (WO2014160576).
Ambisonics have also been extensively used for audio panning (WO2014001478). The most important drawback in Ambisonics panning is that the loudspeaker arrangement must be as regular as possible in the 3D space, mandating the use of regular layouts such as loudpseakers positioned at the vertices of platonic solids, or other maximally regular tessellations of the 3D sphere. Said constraints limit the use of Ambisonic panning to special cases.
To overcome these problems, mixed approaches using both VBAP and Ambisonics have been disclosed in WO2011117399A1 and further refined in WO2013143934.
Other approaches have also been presented that are able to use totally arbitrary spatial layouts, for example Distance-Based Audio Panning (DBAP) (“Distance-based Amplitude Panning”, Lossius et al., ICMC 2009) or Speaker Placement Correction Amplitude Panning (SPCAP) (“A novel multichannel panning method for standard and arbitrary loudspeaker configurations”, Kyriakakis et al., AES 2004). Those methods only account for the distance between the intended position of the input source and the positions of the loudspeakers, for instance the Euclidean distance in the DBAP case, or the angle between the source and the speakers in the SPCAP case.
In “Evaluation of distance based amplitude panning for spatial audio”, DBAP was shown to yield satisfactory results compared to third-order Ambisonics, especially when the listener is off-centred in regard to the speaker arrangement, and was also shown to perform very similarly to VBAP in most configurations.
Hereby, an important drawback with these distance-based methods is the lack of control over the spatial spread of the input source.
The invention is further described by the following non-limiting examples which further illustrate the invention, and are not intended to, nor should they be interpreted to, limit the scope of the invention.
Particularly,
Particularly,
Particularly,
This example provides an example embodiment of the present invention, related to rendering of object-based audio. Rendering of Object-based Audio and other features such as head tracking for binaural audio, require the use of a high-quality panning/rendering algorithm.
In this example, LSPCAP is used to perform these tasks.
High-Level Features
LSPCAP is a lightweight, scalable panning algorithm, available in two versions that target any 2D/3D speaker arrangement:
LSPCAP also allows for a separated horizontal/vertical control over audio object focus/spread. LSPCAP ensures a better directional precision (energy and amplitude vectors) than pair-wise, VBAP or HOA panning, even for wide (spread) audio objects.
Underlying Technologies
LSPCAP works by coupling a modified Speaker Placement Correction Amplitude Panning (SPCAP) algorithm with a generalized Vector-Based Amplitude Panning (VBAP) along with specific energy vector maximization.
Usages of the Enhanced LSPCAP Algorithm
Two modes of the algorithm were developed: a full-3D listener-centric and a layered 3D room-centric mode.
Listener-Centric Mode
This version accepts spherical or polar coordinates for objects, and uses a spherical speaker arrangement, which advantageously should be as regular as possible. The following arrangements are implemented:
For each arrangement, the achievable HOA order, should an HOA renderer be used with this arrangement, is shown. Next to it, the equivalent HOA order achieved by LSPCAP is shown, which merges the following metrics over the whole sphere and frequency range: ITD precision, ILD precision.
The precision of the directional rendering rises with the number of speakers; of course, the computational complexity rises as well, and this is especially important when using LSPCAP for binaural rendering.
This version will mostly be used as an intermediate rendering between panning of objects and binaural rendering (e.g. Auro-Headphones), as spherical, regular speaker layouts are unpractical in most real-world situations. Its precision is better, ITD- and ILD-wise, than that of the achievable HOA rendering for a given layout.
Room-Centric Mode
The room-centric mode accepts Cartesian coordinates, and is especially targeted for panning of objects to real speaker setups in a room.
Internally, it is built with a number of layers of planar (2D) version of SPCAP.
Each layer accepts only an azimuth angle for the objects, and describes the speakers with their azimuth angles as well. These azimuth angles are derived from the X-Y coordinates of the objects and speakers.
The Z coordinates are used to pan between successive layers. The Top layer has a special behavior: a dual SPCAP-2D algorithm is run on the X-Z and Y-Z planes (the top layer speakers are then projected on those two planes), and the results are merged to form the top layer gains.
Parameters
Listener-Centric Version
Speaker Layout Setup
The listener-centric loudspeaker setup can be defined by means of a discrete speaker density parameter, ranging from 1 to 8, which controls the regular spherical arrangement as well as the amount of speakers in the layout (see also elsewhere in this document).
Source Parameters
Room-Centric Mode
Speaker Layout Setup
The room-centric LSPCAP algorithm only supports speakers positioned on walls of a virtual room. Therefore, for each speaker, at least one of the X, Y, Z parameters must have an absolute value of 1.01.
Source Parameters
The Zone Control parameter allows controlling which speakers (or speaker zones) will be used by the panned source. The exact meaning of the parameter depends on the actual speaker layout. In the following table the active speakers are given for a 7.1 planar layout, the same principle applies to other layouts, including Auro-3D layouts. New zones can be implemented as needed in the SDK. This may relate to the TpFL/TpFR being at azimuth angle of +45/−45.
2D Version Algorithm
Usage:
Further aspects and potential extensions relate to zone control and speaker groups definition.
3D Version
Usage:
Number | Date | Country | Kind |
---|---|---|---|
17153650 | Jan 2017 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2018/052160 | 1/29/2018 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2018/138353 | 8/2/2018 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
2298618 | Garity et al. | Oct 1942 | A |
20160212559 | Mateos Sole et al. | Jul 2016 | A1 |
Number | Date | Country |
---|---|---|
394325 | Jun 1933 | GB |
2011117399 | Sep 2011 | WO |
2013143934 | Oct 2013 | WO |
2014001478 | Jan 2014 | WO |
2014159272 | Oct 2014 | WO |
2014160576 | Oct 2014 | WO |
2015017235 | Feb 2015 | WO |
Entry |
---|
Ramy Sadek et al., “A Novel Multichannel Panning Method for Standard and Arbitrary Loudspeaker Configurations”, Convention Paper 6263, AES 117th Convention, Oct. 1, 2004, pp. 1-5. |
Ville Pulkki, “Uniform spreading of amplitude panned virtual sources”, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 1, 1999, pp. 187-190. |
Written Opinion for PCT/EP2018/052160, dated Mar. 27, 2018. |
International Search Report for PCT/EP2018/052160, dated Mar. 27, 2018. |
Number | Date | Country | |
---|---|---|---|
20190373394 A1 | Dec 2019 | US |