Aspects in the disclosure here relate generally to a system and method for performing panning for an arbitrary loudspeaker setup.
Ambisonics is a surround sound technique based on spherical Fourier expansion of the sound field. Ambisonics is used to represent a 3D sound field for scene-based audio. This representation can be performed using first order Ambisonics (FOA) or higher order Ambisonics (HOA.) Within the context of this disclosure, the term Ambisonics or ambisonic content refers to any order of Ambisonics of ambisonic content. A sound source can either be encoded in an ambisonic format, or it may be recorded via a special microphone. Such a representation of the sound field may then be transmitted to an end user machine where it is decoded for playback. Conventional ambisonic decoders require an optimally placed fixed loudspeaker setup which means that the decoders cannot perform well with arbitrary loudspeaker setups.
Panning is the distribution of a sound signal into a new stereo or multi-channel sound field, as determined by a pan control setting that may for example be in the range from a hard left position to a hard right position. Existing panning techniques have some limitations. For example, the existing panning techniques do not perform well when loudspeakers are not distributed in a way that fully encompasses the listening position (e.g., horizontal loudspeaker setups, frontal only setups, etc.). Existing panning techniques only perform well when the panning trajectory is within the span of the loudspeakers. Some loudspeaker setups impose limitations on more complex panning trajectories, such as when trying to pan a sound source in 3-dimensional space while the loudspeaker setup spans only a 2-dimensional space.
Generally, aspects of the disclosure here relate to a system and method for performing panning for an arbitrary loudspeaker setup. One aspect is to modify the vector base amplitude panning (VBAP) technique to improve the panning behavior of sound sources in arbitrary loudspeaker setups by optimizing the placement of virtual loudspeakers in the loudspeaker setup. Further, in some aspects, contrary to existing techniques, the energy of the intended sound field is preserved.
In one aspect, the method of performing panning for an arbitrary loudspeaker setup starts by determining a placement of one or two virtual loudspeakers within the loudspeaker setup which includes a plurality of real loudspeakers. When the loudspeaker setup is a 2-channel setup, locations of the two virtual loudspeakers are based on a center of a line formed by locations of two real loudspeakers included in the loudspeaker setup and a listening position. When the loudspeaker setup is a 2-dimensional (2D) setup including more than two real loudspeakers, the locations of the two virtual loudspeakers are based on a centroid of a polygon formed by locations of real loudspeakers and the listening position. When the loudspeaker setup is a 3-dimensional (3D) setup, the location of the one virtual loudspeaker is based on a center of gravity of a polyhedron formed by the positions of the real loudspeakers. The VBAP gains are then determined. The VBAP gains may include gains of the real loudspeakers and the one or two virtual loudspeakers. Loudspeaker outputs (signals that drive loudspeakers) are then generated and transmitted to the real loudspeakers in the loudspeaker setup to be played back.
In another aspect, a method of performing panning for an arbitrary loudspeaker setup starts by determining a placement of one or two virtual loudspeakers within the loudspeaker setup which includes a number of real loudspeakers. VBAP gains are then determined. The VBAP gains may include gains of the real loudspeakers and the one or two virtual loudspeakers. The gains of the one or two virtual loudspeakers to the real loudspeakers are then redistributed to ensure preservation of total energy. In one aspect, redistributing gains includes determining a location of a panned sound source, determining a quadratic formula based on the location of the panned sound source, and solving the quadratic formula to obtain a redistribution of gains needed to ensure preservation of total energy. The loudspeaker outputs are then generated and transmitted to the real loudspeakers in the loudspeaker setup to be played back.
In one aspect, the placement of one or two placed virtual loudspeakers within the arbitrary loudspeaker setup (that also includes a number of real loudspeakers) is determined so as to produce a 3D shape or convex hull. This allows the method to then calculate the VBAP gains, that include gains of the real loudspeakers and of the placed one or two virtual loudspeakers. Gains of the one or two placed virtual loudspeakers are redistributed to the real loudspeakers in a way that ensures preservation of total energy of a sound field (e.g., an intended sound field, or the recorded sound field of the audio content that is to be output through the loudspeakers which may be ambisonic content.) Loudspeaker outputs (loudspeaker driver signals) are generated based on that redistribution, and transmitted to the real loudspeakers in the loudspeaker setup to be played back. In other words, the gains assigned to the real loudspeakers in the loudspeaker setup now have the redistributed gains of the one or two placed virtual loudspeakers.
In yet another aspect, a system for performing panning for an arbitrary loudspeaker setup comprises a storage storing instructions; and a processor coupled to the storage. When the processor executes the instructions, the processor receives audio content, for playback via a number of real loudspeakers in the loudspeaker setup, determines a placement of one or two virtual loudspeakers within the loudspeaker setup, determines VBAP gains, redistributes gains of the one or two virtual loudspeakers to the real loudspeakers to ensure preservation of total energy, and generates and transmits the loudspeaker outputs (based on the redistributed gains) to be played back by the real loudspeakers.
In one particular aspect, a method of performing panning for an arbitrary loudspeaker setup starts by receiving audio content for playback through a number of real loudspeakers in a given loudspeaker setup, and determining whether the audio content is ambisonic content (e.g., Higher Order Ambisonics, HOA.) When the audio content is ambisonic content, a predetermined placement of points on a sphere or on a grid is generated, wherein the placement of points may depend on the order of the ambisonic content or its resolution. This may also be referred to as a projection grid. The ambisonic content may then be projected onto this grid, thereby producing a separate audio signal for each point. This result is then passed to the
The above summary does not include an exhaustive list of all aspects of the present disclosure. It is contemplated that the invention includes all systems, apparatuses and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations may have particular advantages not specifically recited in the above summary.
The aspects of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” aspect in this disclosure are not necessarily to the same aspect, and they mean at least one. Also, in the interest of conciseness and reducing the total number of figures, a given figure may be used to illustrate the features of more than one aspect of the disclosure, and not all elements in the figure may be required for a given aspect. In the drawings:
In the following description, numerous specific details are set forth. However, it is understood that aspects of the disclosure may be practiced without these specific details. In other instances, well-known circuits, structures, and techniques have not been shown to avoid obscuring the understanding of this description.
In the description, certain terminology is used to describe the various aspects of the disclosure here. For example, in certain situations, the terms “component,” “unit,” “module,” and “logic” are representative of hardware and/or software configured to perform one or more functions. For instance, examples of “hardware” include, but are not limited or restricted to an integrated circuit such as a processor (e.g., a digital signal processor, microprocessor, application specific integrated circuit, a micro-controller, etc.). Of course, the hardware may be alternatively implemented as a finite state machine or even combinatorial logic. An example of “software” includes executable code in the form of an application, an applet, a routine or even a series of instructions. The software may be stored in any type of machine-readable medium.
While not shown each of the real loudspeakers 31-3n may be integrated in a separate loudspeaker cabinet (also referred to as an enclosure) that includes a loudspeaker driver, a power audio amplifier and digital to analog converter (DAC). The loudspeaker driver may be an electrodynamic driver. The power audio amplifier may have an output coupled to the drive signal input of the loudspeaker driver and may receive an analog input from a DAC. The DAC and the amplifier may be separate blocks or may have electronic circuit components that are combined. The DAC may receive its input digital audio signal (also referred to here as a loudspeaker output or a loudspeaker driver signal) through an audio signal communication link (wired or wireless) to the central control unit 2. Sound content (or audio content), in the form of digital signals for example, is received and processed by the central control unit 2 to produce or generate loudspeaker output signals which are transmitted to the loudspeakers 31-3n to be played back (converted into sound).
The content processor 20 receives the audio content for playback by the loudspeakers 3_1 to 3_n and determines whether the audio content is ambisonic content. When the audio content is ambisonic content, the content processor 20 generates a predetermined placement of points or a grid and projects the ambisonic content to the grid to generate projected ambisonic content. In one aspect, the content processor 20 may include an ambisonic decoding matrix to decode the ambisonic content.
In one aspect, the ambisonic content processor 20 generates the projection grid on a surface of a sphere. The points may be uniformly distributed on the surface of the sphere, e.g., in accordance with a spherical t-design. Thus, the ambisonic content processor 20 generates the projection grid using a uniform or almost uniform arrangement of points on the surface of a sphere. More generally, the ambisonic content processor 20 generates an array in which the positions are defined as points on the surface of a sphere that are distributed in a uniform or almost uniform manner. Such arrangements can be based on, but not limited to, spherical t-designs, and alternative arrangements can be based on sphere packing and sphere covering problem solutions, using the vertices of a regular polyhedral, using minimum energy criteria, or based or geodesic spheres.
The ambisonic content processor 20 generates the projected ambisonic content and sends the projected ambisonic content to the panning processor 21. In one aspect, when the content processor 20 determines that the audio content is not ambisonic content, the content processor 20 performs no processing on the audio content and transmits the audio content “as is” to the panning processor 21.
By combining the ambisonic content processor 20 and the panning processor 21, when ambisonic content is received, ambisonic decoding may be automated for arbitrary loudspeaker setups, while preserving the total energy of the original ambisonic content or that of the intended sound scene as described below in detail.
As shown in
Referring to
In
In
In
In some aspects, using the methods described with reference to
Once the one or two placed virtual loudspeakers are added to the loudspeaker setup, panning gains are calculated for sources in any direction, utilizing both the real and the virtual loudspeakers in the loudspeaker setup. Referring back to
The gain redistributor 23 then redistributes the gains of the one or two virtual loudspeakers to the gains of the real loudspeakers in a way that ensures preservation of total energy. By reassigning the gains that are assigned to the placed one or two virtual loudspeakers to the real loudspeakers, the gain redistributor 23 may also reduce panning artifacts. Depending on the location of the panned sound source, an appropriate quadratic equation is determined (e.g., as given using the table below) and solved in order to ensure preservation of energy and to further reduce panning error. To reassign the gains of the one or two virtual loudspeakers, the gain redistributor 23 determines a location of a panned sound source, determines a quadratic formula based on the location of the panned sound source, and solves the quadratic formula to obtain a redistribution of gains needed to ensure preservation of total energy.
In one aspect, the gain redistributor 23 uses the quadratic formulas based on the location of the panned source in the following Table. In the quadratic formulas, N is the number of total loudspeakers in the loudspeaker system, which includes the real loudspeakers and the virtual loudspeakers; g1, g2 are real loudspeakers gains and gi, gj are virtual loudspeaker gains that were calculated by the gain redistributor 23 when the gain redistributor determined the VBAP gains of the one or two placed virtual loudspeakers (before redistribution of the gains). Using the quadratic formula, solving for scalar x, allows the redistribution of gains needed to ensure preservation of total energy.
The panning processor 21 then generates and transmits loudspeaker outputs (loudspeaker driver signals) to the real loudspeakers 31-3n in
The following aspects may be described as a process, which may be depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed. A process may correspond to a method, a procedure, etc.
The method 400 starts with the central control unit 2 receiving audio content for playback via a plurality of real loudspeakers 31-3n in the loudspeaker setup (Block 401). At Block 402, the content processor 20 included in the central control unit 2 determines whether the audio content is ambisonic content. If the audio content is ambisonic content, at Block 403, the content processor 20 generates a grid or array, and projects the ambisonic content to the grid. The projected ambisonic content is transmitted from the content processor 20 to the panning processor 21 for further processing at Block 407. If at Block 402, the audio content is determined not to be ambisonic content, then the content processor 20 sends the audio content directly to the panning processor 21 for further processing at Block 407.
The method 400 also has a parallel path that is performed in Blocks 404-406, which results in the gains that are assigned to the real loudspeakers (and that will be applied to the audio content being panned.) At Block 404, the virtual loudspeaker placer 22 included in the panning processor 21 determines the placement of one or two placed virtual loudspeakers within the loudspeaker setup based on the positions of the real loudspeakers in the loudspeaker setup. The loudspeaker setup may include a number of real loudspeakers 31-3n in an arbitrary loudspeaker setup. For example, the loudspeaker setup may be a 2-channel setup (e.g., a stereo pair) as shown in
An aspect of the disclosure is a machine-readable medium having stored thereon instructions which program a processor to perform some or all of the operations described above. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), such as Compact Disc Read-Only Memory (CD-ROMs), Read-Only Memory (ROMs), Random Access Memory (RAM), and Erasable Programmable Read-Only Memory (EPROM). In other aspects, some of these operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmable computer components and fixed hardware circuit components.
While the disclosure here has been described in terms of several aspects, those of ordinary skill in the art will recognize that the disclosure is not limited to the aspects described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting. There are numerous other variations to different aspects described above, which in the interest of conciseness have not been provided in detail. Accordingly, other aspects are within the scope of the claims.
This application claims the benefit pursuant to 35 U.S.C. 119(e) of U.S. Provisional Application No. 62/566,245, filed Sep. 29, 2017, which application is specifically incorporated herein, in its entirety, by reference.
Number | Name | Date | Kind |
---|---|---|---|
9154877 | Kim et al. | Oct 2015 | B2 |
9681249 | Yamamoto et al. | Jun 2017 | B2 |
9736609 | Morrell et al. | Aug 2017 | B2 |
20110316967 | Etter | Dec 2011 | A1 |
20140016802 | Sen | Jan 2014 | A1 |
20160073213 | Yamamoto | Mar 2016 | A1 |
Number | Date | Country |
---|---|---|
106446532 | Feb 2017 | CN |
Entry |
---|
Fabio Kaiser, A Hybrid Approach for Three-Dimensional Sound spatialization, seminar, May 3, 2011, 17 pages Algorithmen in Akustik and Computermusik 2, SE, Graz, Austria. |
Sebastian Moreau, Jerome Daniel, Stephanie Bertet, 3D Sound Field Recording with Higher Order Ambisonics —Objective Measurements and Validation of a 4th Order Spherical Microphone, 24 pages, Audio Engineering Society Convention Paper, 120th Convention, Paris, France. |
Franz Zotter, Matthias Frank, All-round Ambisonic Panning and decoding, 14 pages, J. Audio Eng. Soc., vol. 60, No. 10, Oct. 2012, New York, New York. |
Ville Pulkki, Compensating Displacement of Amplitude-Panned Virtual Sources, AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio, 10 pages, Laboratory of Acoustics and Audio Signal Processing, Finland. |
Jurgen Herre, Johannes Hilpert, Achim Kuntz, Jan Plogsties, MPEG-H Audio—The New Standard for Universal Spatial / 3D Audio Coding, International Audio Laboratories Erlangen, Germany, 12 pages, Fraunhofer IIS, Erlangen, Germany. |
Ville Pulkki, Spatial Sound Generation and Perception by Amplitude Panning Techniques, Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing, Dissertation, Aug. 3, 2001, Espoo 2001 Report 62, 59 pages, Helsinki University of Technology, Finland. |
Ville Pulkki, Virtual Sound Source Positioning Using Vector Base Amplitude Panning, 11 pages, J. Audio Eng. Soc., vol. 45, No. 6, Jun. 1997 , New York. |
Virtual Sound Source Positioning Using Vector Base Amplitude Panning, by V Pulkki; Audio Engineering Society, Inc. Jun. 1997; pp. 456-466—2001 <http://lib.tkk.fi/Diss/2001/isbn9512255324/article1>. |
Compensating Displacement of Amplitude-panned Virtual Sources, by V Pulkki; Laboratory of Acoustics and Audio Signal Processing; AES 22nd Annual conference on Virtual, Synthetic and Entertainment Audio; pp. 1-10—2002. |
Number | Date | Country | |
---|---|---|---|
20190104364 A1 | Apr 2019 | US |
Number | Date | Country | |
---|---|---|---|
62566245 | Sep 2017 | US |