The present invention provides for systems and methods for the input of an audio soundfield signal and the creation of a reverberant acoustic equivalent soundfield signal.
Any discussion of the background art throughout the specification should in no way be considered as an admission that such art is widely known or forms part of common general knowledge in the field.
Multi-channel audio signals are used to store or transport a listening experience, for an end listener, that may include the impression of a very complex acoustic scene. The multi-channel signals may carry the information that describes the acoustic scene using a number of common conventions including, but not limited to, the following:
It is an object of the invention, in its preferred form to provide for the modification of multi channel audio signals that adhere to various Soundfield formats for the creation of reverberant soundfield signals.
In accordance with a first aspect of the present invention, there is provided a method for creating an output soundfield signal from an input soundfield signal, the method including the steps of: (a) forming at least one delayed signals from the input soundfield signal, (b) for each of the delayed signals, creating an acoustically transformed delayed signal, by an acoustic transformation process, and (c) combining together the acoustically transformed delayed signals and the input soundfield signal to produce the output soundfield signal
Preferably, the acoustic transformation process utilises a multi-channel matrix mixer. The multi-channel matrix mixer can be formed by combining one or more spatial operations, including a spatial rotation operation. The multi-channel matrix mixer can be formed by combining one or more spatial operations, including a spatial mirror operation. The multi-channel matrix mixer can be formed by combining one or more spatial operations, including a directional gain operation. In some embodiments, the multi-channel matrix mixer can be formed by combining one or more spatial operations, including a directional permutation operation. The acoustic transformation process preferably can include frequency-dependant filtering.
In accordance with a further aspect of the present invention, there is provided a method for adding simulated reverberance to an input sound field signal, the method including the steps of: (a) receiving an input soundfield signal including at least one audio component encoded with a first direction of arrival; (b) determining a further soundfield signal including at least one simulated echo of the original audio components having alternative directions of arrival; (c) combining the input soundfield signal and the further soundfield signal to produce an output sound field signal.
In some embodiments, each simulated echo can comprise a delayed and rotated copy of the input sound field signal. In some embodiments, each simulated echo preferably can include substantially the same delay. In some embodiments, the alternative direction of arrival can comprise a geometric transformation of the first direction of arrival.
In accordance with a further aspect of the present invention, there is provided a system for processing of soundfield signals to simulate the presence of reverberance, the system including: an input unit for the input of a soundfield encoded signal; a tapped delay line for interconnected to the input unit and providing a series of tapped delays of the soundfield encoded signal; a series of acoustic transformation units interconnected to the output taps of the tapped delay line, for applying an acoustic transformation to the output taps to produce transformed delayed outputs; and a combining unit for combining the transformed delayed outputs into an output soundfield signal.
In some embodiments, the acoustic transformation units can include: a multi channel matrix multiplier for applying a geometric transformation to an output tap to produce a geometric transformed output; and a series of linear audio filters applied to each channel of the geometric transformed output.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
The preferred embodiments provide for a system and method which, given that an input soundfield signal contains audio components that are encoded with different directions of arrival, produces an output soundfield signal that will contain simulated echoes, such that each simulated echo will have a direction of arrival that is a function of the direction of arrival of the original audio component as it appeared in the input signal. The output soundfield signal thereby provides for reverberance and other simulated audio effects.
Soundfield Formats
An N-channel Soundfield Format is often defined by it's panning function, PN(ϕ) Specifically, G=PN(ϕ), where G is an [N×1] column vector of gain values, and ϕ defines the spatial location of the object, i.e:
Hence, a set of M objects (represented by the M audio signals o1(t), o2(t), . . . , oM(t)) can be encoded into a N-channel Spatial Format signal XN(t) as per Equation 2 below (where object m is “located” at the position defined by ϕm):
The signal XN(t) can be referred to as an Anechoic Mixture of the audio objects. The symbol ϕm is used to denote the abstract concept of “the location of object m”. In some cases, this symbol may be used to indicate the 3-vector: ϕm=(xm, ym, zm), indicating that the object is located at a specific point in 3D space. In other cases, a restriction can be added that ϕm corresponds to a unit-vector, so that xm2+ym2+zm2=1.
Acoustic Modelling with Soundfield Signals
When an audio object and a listener are both located within the boundaries of an acoustic space (defined by a set of acoustically reflective surfaces), any sound emitted by the audio object will reach the listener via multiple paths. This phenomenon is well known in the art, and the resulting sound, received at the listening position, is said to be reverberant. The number of acoustic paths, formed by the propagation of sound from the object and reflected off acoustic surfaces to reach the listener, may be infinite, but a reasonably close estimate of the sound received at the listening position may be formed by considering a finite number (E) of echoes.
In order to express this mathematically, the following variables can be defined:
e:echonumber 1≤e≤E (4)
ϕm: the direction of arrival of sound from object m (5)
ϕ′m,e: the direction of arrival of echo e from the object m (6)
dm,e: the delay (in samples) of echo e from object m (7)
hm,e(t): the impulse response of echo e from object m (8)
Equation 2 shows how an N-channel soundfield signal, XN(t), may be created by combining M audio objects, based on the assumption that each audio object has a location (ϕm) and an audio signal (om(t)).
It is possible to devise a more complex acoustic soundfield signal, RN(t)=XN(t)+YN(t), intended to contain all of the M audio objects, combined together in a way that includes a simulation of an acoustic space (by including E echoes for each object). This is shown in Equation 10 below:
and hence:
The signal YN(t) can be referred to as the Reverberant Mixture of the audio objects. The complete acoustic-simulation is created by summing together the Anechoic Mixture, XN(t), and the Reverberant Mixture, YN(t).
In Equation 10, the terminology [om⊕hm,e](t) is used to indicate the convolution of the object audio signal om(t) with the impulse response hm,e(t), and hence
indicates the convolved signal with an additional delay of dm,e samples (where Fs is the sample frequency).
Those familiar with the art will also recognise that Equation 11 may be written in terms of the frequency domain equation in Equation 12 below:
where ŶN(z), ôm(z) and Hm,e(z) are the z-domain equivalents of YN(t), om(t) and hm,e(t) respectively.
Geometric Transformations of Soundfield Signals
The N-channel soundfield signal format is defined by the panning function, P(ϕ). One popular choice for this panning function is the 4-channel (N=4) Ambisonic panning function (assuming ϕ is expressed in the form of a 3×1 unit-vector: ϕ=[x y z]T):
Now, given a 3×3 matrix, A, from examination of Equation 13, it can be seen that:
Equation 14 tells us that, if we wish to apply a 3×3 matrix transformation, A, to the (x, y, z) coordinates of an object location, prior to the computation of the panning function, we can instead achieve this transformation as a 4×4 matrix operation, applied to the panning-gain vector, after the computation of the panning function.
The result shown in in Equation 14 can be applied to Equation 2, in order to manipulate the location of all objects in audio scene, as per Equation 17 below. In this case, a transformed soundfield signal, X′N(t) is created from XN(t), achieving the same result that would have occurred if all of the objects had their (x, y, z) locations modified by the 3×3 matrix A.
It is known in the art, that certain manipulations of the objects within an N-channel soundfield can be achieved by applying a N×N matrix to the N channels of the soundfield signal. In the example given here, whereby the soundfield panning-function is the known Ambisonic panning function, the available manipulations of the soundfield include:
Rotation: The locations of all objects within a soundfield can be rotated around the listening position. The manipulation of the (x, y, z) coordinates of each object may be defined in terms of a 3×3 matrix, A, and the manipulation of the 4-channel soundfield signal may be carried out according to Equation 17.
Mirroring: The locations of all objects within a soundfield may be mirrored about a plane that passes through the listening position. The manipulation of the (x, y, z) coordinates of each object may be defined in terms of a 3×3 matrix, A, and the manipulation of the 4-channel soundfield signal may be carried out according to Equation 17.
Dominance: A transformation of the 4-channel soundfield signal (known as the Lorentz transformation) may be applied by multiplying the 4 channels of the signal by the following 4×4 matrix:
The result of this transformation is to boost the gain of the audio objects located at ϕ=(1,0,0) by λ. Audio objects located at ϕ=(−1,0,0) will be attenuated by λ−1.
All Rotation and Mirroring operations are defined in terms of 3×3 unitary matrices (so that A×AT=I3×3). If det(A)=1, the matrix A corresponds to a rotation in 3D space, and if det(A)=−1, the matrix A corresponds to a mirroring operation in 3D space. In many of the embodiments described below, it will be convenient to assume that A is unitary.
The above manipulations of Ambisonic soundfield signal are known in the Art.
Creation of a Reverberant Mixture
It is one intention of the preferred embodiments to create a Reverberant Mixture, YN(t), of the audio objects from the Anechoic Mixture, XN. In the preferred embodiments, a unique Shared Echo Model is utilised, whereby all objects share the same time-delay pattern of echoes.
In order to use the Anechoic Mixture, XN as the starting point for creating the Reverberant Mixture, YN(t), it is desirable to apply some modified rules for the behaviour of the reverberation function as shown in Equation 10. In one embodiment of the invention, the following simplifications may be made:
Echo Time Simplification: It will be recalled that the original reverberation calculation (as per Equation 10) treats the reverberation for each object as a series of echoes, wherein for object m, echo e, has a time delay (relative to the direct-path) equal to dm,e (so, the echo times are different for each object). For the new Shared Echo Model, a delay d′k is defined to be the arrival time (relative to the direct sound) of echo k, and this delay is the same for every object (and hence, the echo delay, d′k, is no longer dependant on the object identifier, m).
Echo Direction Simplification: The original reverberation calculation (as per Equation 10) treats the reverberation for each object as a series of echoes, wherein for object m, echo e has a direction of arrival, ϕ′m,e (so, the echo arrival directions are different for each object). For the new, simplified method, an angle is defined as: ϕ′m,k=Ak×ϕm to be the direction of arrival of echo k, so that this direction is now formed by a simple geometric transformation of the objects location, ϕm.
The two simplifications provide for a simplified processing chain
In
The time delay, from the input soundfield signal, 2, to the delayed signal, for tap k, will be defined to be d′k sample periods. So, for example, in
The intention of the acoustic transformation process, in one embodiment, is to create a simulation of the kth acoustic echo according to the following operating principles:
Echo Delay: The time delay of echo k is defined by use of the Delay Line so that input to the Delay Line 2 (of
Echo Direction: The direction of arrival of echo k, for object m, is determined by applying a matrix, Ak to the direction unit-vector of the object, ϕm=[xm ym zm] resulting in:
and we therefore create the echo signal, with the corresponding direction-of-arrival, according to Equation 17 (substitution Ak in place of A in Equation 17). This means that, in the case where our soundfield is represented in the Ambisonic format, the following matrix, Rk is computed according to:
Echo Amplitude and Frequency Response: The amplitude and frequency response of echo k are provided by the filter, Hk(z) e.g. 12, applied to each of the N channels as per
Further Generalisations and Alternative Embodiments:
In the case where the soundfield is defined in terms of an Ambisonic panning function (as per Equation 13), a more general version of the acoustic transformation process may be built by converting the Ambisonic signals from B-Format to A-Format. This transformation is known in the art.
The following conversion matrices can be defined:
Equation 19 defines a 4×4 matrix, AtoB that maps an A-format signal, represented by a 4×1 column vector, to a B-format signals, also represented by a 4×1 column vector: BF=AtoB×AF. Likewise, Equation 20 defines the 4×4 matrix, BtoA that is the inverse of AtoB.
Using these transformation matrices, an acoustic transformation process can be implemented by:
EchoProcessk=Rot″k×AtoB×H′h×BtoA×Rot′k (21)
where:
where R′ and R″ are arbitrary 3×3 rotation matrices.
Two new intermediate matrices can be defined: Bk=BtoA×Rot′k, and Ck=Roc″k×AtoB, and this allows us to simplify Equation 21 to get Equation 25:
EchoProcessk=Ck×H′h×Bk (25)
A processing train for implementing the method of Equation 25 is also shown in
As shown in
Methods for Creation of More Complex Room Impulse Responses
The methods described above may also be combined with alternative reverberation processes, which may be known in the art, to produce a reverberant mixture that contains some echoes generated according to the above described methods, along with additional echoes and reverberation that are generated by the alternative methods.
Interpretation
Reference throughout this specification to “one embodiment”, “some embodiments” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment”, “in some embodiments” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
As used herein, unless otherwise specified the use of the ordinal adjectives “first”, “second”, “third”, etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.
As used herein, the term “exemplary” is used in the sense of providing examples, as opposed to indicating quality. That is, an “exemplary embodiment” is an embodiment provided as an example, as opposed to necessarily being an embodiment of exemplary quality.
It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, FIG., or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a computer system or by other means of carrying out the function. Thus, a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it is to be noticed that the term coupled, when used in the claims, should not be interpreted as being limited to direct connections only. The terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Thus, the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means. “Coupled” may mean that two or more elements are either in direct physical or electrical contact, or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.
Thus, while there has been described what are believed to be the preferred embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the spirit of the invention, and it is intended to claim all such changes and modifications as falling within the scope of the invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
15185913 | Sep 2015 | EP | regional |
This application is a divisional application of U.S. application Ser. No. 15/746,787 filed Jan. 22, 2018, which is a 371 of International Application No. PCT/US2016/044286 filed Jul. 27, 2016, which claims priority to U.S. Provisional Patent Application No. 62/198,440, filed Jul. 29, 2015 and European Patent Application No. 15185913.9, filed Sep. 18, 2015, each of which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6694028 | Matsuo | Feb 2004 | B1 |
7515719 | Hooley | Apr 2009 | B2 |
7577260 | Hooley et al. | Aug 2009 | B1 |
7933421 | Asada | Apr 2011 | B2 |
8073125 | Zhang | Dec 2011 | B2 |
8103006 | McGrath | Jan 2012 | B2 |
8199921 | Katayama | Jun 2012 | B2 |
8218774 | Buchner | Jul 2012 | B2 |
8284961 | Miyasaka | Oct 2012 | B2 |
8345887 | Betbeder | Jan 2013 | B1 |
8670570 | Zong | Mar 2014 | B2 |
8705750 | Berge | Apr 2014 | B2 |
8705757 | Betbeder | Apr 2014 | B1 |
8908881 | Sato | Dec 2014 | B2 |
20030001672 | Cavers | Jan 2003 | A1 |
20120109645 | Hallam | May 2012 | A1 |
20130148812 | Corteel | Jun 2013 | A1 |
20130243201 | Algazi | Sep 2013 | A1 |
20140010375 | Usher | Jan 2014 | A1 |
20140185812 | Van Achte | Jul 2014 | A1 |
20140355796 | Xiang | Dec 2014 | A1 |
Number | Date | Country |
---|---|---|
107258091 | Oct 2017 | CN |
2255884 | Nov 1992 | GB |
2014159376 | Oct 2014 | WO |
Entry |
---|
Lopez, An architecture for Reverberation in High order Ambisonics, 2014. |
Anderson, Adapting Artificial Reverberation Architectures for B-Format Signal Processing, 2009. |
Lopez, An architecture for reverberation in high order ambisonics, 2014, p. 1-5. |
Anderson, J. et al “Adapting Artificial Reverberation Architectures for B-Format Signal Processing” Ambisonics Symposium, Jun. 27, 2009, pp. 1-5. |
Bertet, S. et al “3D Sound Field Recording with Higher Order Ambisonics-Objective Measurements and Validation of Spherical Microphone” AES Convention, May 1, 2006, pp. 1-24. |
Breebaart, J. et al “High-Quality Parametric Spatial Audio Coding at Low Bit Rates” AES presented at the 116th Convention, Berlin, Germany, May 8-11, 2004, pp. 1-13. |
Gerzon, Michael A. “Periphony: With-Height Sound Reproduction” JAES vol. 21, Issue 1, pp. 2-10, Feb. 1, 1973. |
James, B.St. et al “Corpuscular Streaming and Parametric Modification Paradigm for Spatial Audio Teleconferencing” J. Audio Eng. Soc., vol. 56, No. Nov. 10, 2008, pp. 823-842. |
Lopez-Lescano, F. et al “An Architecture for Reverberation in High Order Ambisonics” AES Convention, presented at the 137th Convention, Oct. 9-12, 2014, Los Angeles, USA, pp. 1-5. |
Number | Date | Country | |
---|---|---|---|
20210160640 A1 | May 2021 | US |
Number | Date | Country | |
---|---|---|---|
62198440 | Jul 2015 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15746787 | US | |
Child | 17166162 | US |