The subject matter described herein relates to simulating sound propagation in a scene. More particularly, the subject matter described herein includes methods, systems, and computer readable media for simulating sound propagation in large scenes using equivalent sources.
Interactive sound propagation has emerged as a powerful tool in computer graphics to enhance the realism of virtual worlds by predicting the behavior of sound as it interacts with the environment. In order to generate realistic acoustic effects, including interference, diffraction, scattering, and higher-order effects, it is important to develop techniques that can directly solve the acoustic wave equation. There is extensive work on numerical methods in scientific computing and acoustics to solve the wave equation.
Furthermore, there has been considerable interest in developing interactive wave techniques such as modeling free-space sound radiation [James et al. 2006], first-order scattering from surfaces [Tsingos et al. 2007], and performing sound propagation for indoor scenes [Savioja 2010; Raghuvanshi et al. 2010].
Large open scenes, which arise in many applications ranging from games to training/simulation systems, present a significant challenge for interactive wave-based sound propagation techniques. State-of-the-art wave simulation methods can take hours of computation and gigabytes of memory for performing sound propagation in indoor scenes such as concert halls. For large, open scenes spanning hundreds of meters, it is challenging to run these techniques in real-time. On the other hand, geometric (ray-based) acoustic techniques can provide real-time performance for such environments. However, geometric techniques are better suited for higher frequencies of the audible range since accurately modeling diffraction and higher-order wave effects remains a significant challenge, especially at low frequencies.
The subject matter described herein includes methods, systems, and computer readable media for simulating sound propagation in large scenes using equivalent sources. One method for simulating sound propagation in a scene includes decomposing a scene into disjoint objects. Per object transfer functions are generated for each of the disjoint objects. Each per object transfer function maps an incoming sound field reaching an object to an outgoing sound field emanating from the object. A plurality of inter-object transfer functions corresponding to pairs of the disjoint objects is generated. Each inter-object transfer function maps the outgoing sound field emanating from one of the disjoint objects to an incoming sound field of another of the disjoint objects. The per-object transfer functions and the inter-object transfer functions are used to perform simulation of sound propagation within the scene and thereby to determine a sound field for the scene.
The subject matter described herein may be implemented in hardware, software, firmware, or any combination thereof. As such, the terms “function” “node” or “module” as used herein refer to hardware, which may also include software and/or firmware components, for implementing the feature being described. In one exemplary implementation, the subject matter described herein may be implemented using a computer readable medium having stored thereon computer executable instructions that when executed by the processor of a computer control the computer to perform steps. Exemplary computer readable media suitable for implementing the subject matter described herein include non-transitory computer-readable media, such as disk memory devices, chip memory devices, programmable logic devices, and application specific integrated circuits. In addition, a computer readable medium that implements the subject matter described herein may be located on a single device or computing platform or may be distributed across multiple devices or computing platforms.
The subject matter described herein will now be explained with reference to the accompanying drawings of which:
The subject matter described herein includes an approach for precomputed wave-based sound propagation that is applicable to large, open scenes. It is based on the equivalent source method, which has been widely studied for radiation and scattering problems in acoustics and electromagnetics [Doicu et al. 2000]. Our approach consists of two main stages—preprocessing and runtime. During preprocessing, we decompose the scene into disjoint, well-separated rigid objects. The acoustic behavior of each object taken independently is characterized by its per-object transfer function, which maps an arbitrary incident field on the object to the resulting scattered field. We propose an equivalent source formulation to express this transfer function as a compact scattering matrix. Pairwise acoustic coupling between objects is then modeled by computing inter-object transfer functions between all pairs of objects, which maps outgoing scattered field from one object to incoming field on another object. These transfer functions are represented compactly by using the same equivalent source framework, yielding interaction matrices. Acoustic transfer between multiple objects can thus be represented using chained multiplication of their scattering and interaction matrices. Finally, the acoustic response of the scene to a static source distribution is computed by solving a global linear system that accounts for all orders of inter-object wave propagation.
At run-time, fast summation over all outgoing equivalent sources for all objects is performed at the listener location(s). The computed response is used for real-time binaural sound rendering. Multiple moving sources with a static listener can be handled by exploiting acoustic reciprocity. The runtime memory and computational requirements are proportional to the number of objects and their outgoing field complexity (usually a few thousand equivalent sources for a few percent error), instead of the volume or surface area of the scene. Exemplary contributions of our work include:
Our approach is well-suited for quick iterations while authoring scenes—per-object transfer functions, which take a significant portion of our precomputation time, are independent of the scene and can thus be stored in a lookup table. Thus, adding, deleting or moving a few objects in an existing scene has low precomputation overhead, linear in the number of objects.
We have tested our technique on a variety of benchmarks (
Our technique has close theoretical parallels with prior numerical wave solvers. We first explore these connections, followed by a comparison with related prior work in interactive geometric and wave-based techniques.
Research in wave-based acoustic simulation techniques spans a broad range of areas such as noise control, automotive design, urban architectural planning and concert hall design. Wave solvers can be classified into volumetric and surface-based approaches. The most common among volumetric techniques are the Finite Element Method (FEM) [Zienldewicz et al. 2006; Thompson 2006] and Finite Difference Time Domain (FDTD) method [Yee 1966; Taflove and Hagness 2005; Sakamoto et al. 2006], which require a discretization of the entire volume of the scene. The compute and memory usage of these methods thus increases linearly with the volume of the scene. Faster methods like Pseudospectral Time Domain (PSTD) [Liu 1997] and Adaptive Rectangular Decomposition (ARD) [Raghuvanshi et al. 2009] have been proposed and achieve good accuracy with a much coarser spatial discretization. Volumetric techniques are well-suited for scenes with high surface area and low air volume, which makes them highly applicable to indoor spaces.
Surface-based techniques are better suited for open scenes, for which scattering geometry is sparse with large regions of air with uniform wave-propagation speed. The most common approach here is the Boundary Element Method (BEM) [Cheng and Cheng 2005], which express the global acoustic field as the sum of the elementary radiating fields from monopole and dipole sources placed on a uniform, sub-wavelength sampling of the scene's surface. Traditional BEM scales as the square of the surface area but recent research on the Fast Multipole Method for BEM (FMM-BEM) [Liu et al. 2009; Gumerov and Duraiswami 2009] has improved the complexity to linear in surface area by creating a hierarchical clustering of BEM monopoles and dipoles (using an Octree) and approximating their interactions compactly using high-order multipole Green's functions.
For acoustic radiation and scattering problems, an efficient and powerful surface-based technique is the Equivalent Source Method (ESM) [Fairweather 2003; Kropp and Svensson 1995; Ochmann 1999; Pavic 2006], which forms the basis of our formulation. Instead of relying on a boundary-integral formulation as BEM does, ESM exploits the uniqueness of solutions to the acoustic boundary value problem—equivalent multipole sources (Green's functions) are placed at variable locations in space with the intent of making the total generated field match boundary conditions on the object's surface, since uniqueness guarantees correctness of the solution. The flexibility of location results in fewer multipole sources. The ESM can result in large gains in performance and memory-efficiency for scattering and radiation problems in large spaces, and has hence been used widely in both acoustic and electromagnetic applications [Doicu et al. 2000]. ESM is an attractive starting point for precomputed approaches such as [James et al. 2006] and our method, because it allows very flexible performance-accuracy tradeoffs. More importantly, the compactness of the solutions reduces runtime memory and compute requirements by a large factor making them amenable for real-time evaluation. Offline FMM-BEM solutions are infeasible for interactive applications because of the large, dense number of monopole and dipole sources in the final solution that needs to be stored and summed on the fly.
A related technique called the Transition-matrix method has been used extensively for electromagnetic scattering, and also developed for acoustics [Waterman 2009]. The method relates the incoming and outgoing fields in a scattering process in terms of the coefficients of a complete system of the vector basis functions, that are not necessarily Green's functions, unlike BEM or ESM.
Most current interactive sound propagation systems are based on geometric acoustics, which applies the high-frequency Eikonal (ray) approximation to sound propagation. The image source method [Allen and Berkley 1979] is the most commonly used geometric technique, and there has been much research on improving its performance [Funkhouser et al. 1998]. However, the image source method can only model purely specular reflections. Other techniques based on ray tracing [Vorlander 1989; Lentz et al. 2007] or radiosity [Tsingos and Gascuel 1997] have been developed for modeling diffuse reflections, but these energy-based formulations may not model phase accurately. Techniques based on acoustic radiance transfer [Siltanen et al. 2007] can model arbitrary surface interactions for wide-band signals, but cannot accurately model wave phenomena such as diffraction.
The two main approaches for modeling diffraction in a geometric acoustics framework are the Uniform Theory of Diffraction (UTD) [Tsingos et al. 2001] and the Biot-Tolstoy-Medwin (BTM) formulation [Svensson et al. 1999]. UTD is an approximate formulation while the BTM yields accurate results at a significant performance cost. Methods based on image source gradients [Tsingos 2009] and acoustic radiance transfer operators [Antani et al. 2012] have been developed to interactively model higher-order propagation effects. Recent developments in fast ray tracing have enabled interactive geometric propagation in dynamic scenes, but these methods only model first order edge diffraction based on UTD [Taylor et al. 2009].
Sound propagation from a single radiating object in free space can be efficiently modeled using Precomputed Acoustic Transfer (PAT) [James et al. 2006], based on equivalent sources. [Tsingos et al. 2007] solves the boundary integral formulation of the Helmholtz equation subject to the Kirchhoff approximation in real-time. [Raghuvanshi et al. 2010] relies on a volumetric sampling of acoustic responses on a spatial grid and perceptual encoding based on acoustic properties of indoor spaces. Recent work has shown that FDTD simulations can be run in real-time on the GPU [Savioja 2010], but only for very small spaces that span a few meters across. We compare our method in more detail with other interactive wave-simulation techniques in Section 6.3.
In this section, we give a brief review of the Equivalent Source Method, starting with our notation in Table 1. Consider the exterior scattering problem [Thompson and Pinsky 2004]—a solid three-dimensional object A immersed in an unbounded air volume (
where p is the (complex-valued) pressure field, ∂A+ is the domain exterior to the object and ∇2 is the Laplacian operator. At the boundary of the domain, ∂A the pressure is specified using a Dirichlet boundary condition:
p=f(x) on ∂A, (2)
To complete the problem specification, the behavior of p at infinity must be specified, usually by the Somtnetfeld radiation condition [Pierce 1989]. The Equivalent Source Method [Ochmann 1995; Ochmann 1999; Pavic 2006] relies on the existence of fundamental solutions (also called Green's functions or equivalent sources), q(x, y), of the Helmholtz equation (1) subject to the Sommerfeld radiation condition for all x≠y. An equivalent source, q(x, yi), is the solution field induced at any point x due to a point source located at yi, and can be expressed as the sum:
where k is a generalized index for (l, m). The fundamental solution ®ik (x) is a multipole source located at yi, dik, is its strength and L is the order of the multipole (L=1 is just a monopole, L=2 includes dipole terms as well, and so on). (Refer to the Appendix A.1 for the full expression of φilm (x).)
The fundamental solutions φik(x) are used to solve the Helmholtz equation. Consider the outgoing scattered field due to an object, and the associated Dirichlet boundary value problem on a smooth offset boundary ∂A. Consider a discrete set of R source locations {yi} i=1′R all contained in the interior region ∂A−. The total field due to these sources at any x ∈∂A+ is:
where cik=cidcik are the corresponding strengths of the equivalent sources. The main idea of the ESM is that if the equivalent source strengths cik and positions yi are chosen to match the Dirichlet boundary condition on ∂A,
then p(x) is the correct solution over all ∂A+. This process can also be used to represent the field incident on an object, the only difference being that the equivalent sources are now placed in the exterior region ∂A+. Again, by matching the boundary condition (5), we get the correct solution p(x) for all x in the interior region ∂A−.
In practice, the boundary conditions can only be satisfied approximately with a finite number of sources (R), and the degree of approximation can be controlled by changing R. Since each source's multipole strengths must be stored and its contribution evaluated at run-time, R is the main parameter for trading accuracy for run time performance and memory requirements. This flexibility makes the ESM highly suitable for interactive applications.
4 Sound Propagation using ESM
We now give a brief overview of the pre-computation and runtime stages of our technique (
Offset surface calculation: In the pre-processing stage, we first classify objects in the scene and calculate the offset surface for each object. Per-object transfer function: For each object, we compute a per-object transfer function, which maps the incoming field reaching the object to the outgoing scattered field.
Inter-object transfer function: For each object pair, we pre-compute an inter-object transfer function, which encodes how the outgoing scattered field of one object becomes the incoming field for another object.
Global solve: Based on per-object and inter-object transfer functions, and a sound source configuration, we model acoustic interactions between the objects in the scene, thereby computing the strengths of all the outgoing field equivalent sources of all objects.
Run-time pressure evaluation: At run-time, we add the pressure produced by all the outgoing field equivalent sources at the listener position for each frequency. This is an extremely fast computation, and can be performed for moving sources or a moving listener in real-time.
The first step is to decompose the input scene into well-separated objects. To decide if two objects are well-separated, we use the notion of an offset surface. The offset surface is defined by taking the constant offset along the normal direction at each point on the boundary of the object. Two objects are considered disjoint if and only if their offset surfaces do not intersect. Otherwise, we combine them and treat them as a single object (see
In order to capture an object's scattering behavior, we define a per-object transfer function f, a function which maps an arbitrary incoming field reaching the object to the corresponding outgoing scattered field after reflection, scattering and diffraction due to the object itself. This function is linear owing to the linearity of the wave equation and depends only on the shape and material properties of the object.
The incoming and outgoing fields for an object A are both expressed using equivalent sources. The outgoing field is represented by placing equivalent sources {q1out, q2out, q3out, . . . } in the interior region ∂A− of the object. Similarly, the incoming field is represented by placing equivalent sources {q1in, q2in, q3in, . . . } in the exterior region ∂A+. The transfer function f maps the basis of the incoming field (multipoles φijin) to the corresponding outgoing field expressed as a linear combination of its basis functions (multipoles φfhout):
where αjhik≡TA (ik, jh) is the (complex) amplitude for the outgoing multipole φfhout induced by a unit amplitude incoming multipole φikin. The per-object sound transfer function for object A is encoded in the coefficient matrix TA, which we call the scattering matrix.
For each incoming field multipole in turn, we place a unit-amplitude sound source φik and use a numerical wave solver to compute the total pressure field at n uniformly sampled locations {x1, x2, . . . , xn} on ∂A. We subtract the incident field from the total pressure field to compute the outgoing scattered field (see
We fit the outgoing field multipole expansion to the sampled scattered field, in a least-squares sense, by solving an over-determined linear system (n>PN2) subject to a pre-specified error threshold a for all incoming field multipoles:
The least-squares solution yields the coefficients αik corresponding to the ikth row of the scattering matrix T. This process is repeated for all incoming source multipoles to compute the scattering matrix. The solution can be computed efficiently using a single combined linear system:
VT
A
tr
=[
11
. . .
QM
] (9)
where TAtr is the transpose of TA. The per-object transfer function is computed for all objects at sampled frequencies. Details on choosing the number and positions of both incoming and outgoing equivalent sources are given in Section 4.5.
Scenes with multiple objects exhibit object-to-object interactions, where the outgoing field from one object enters as the incoming field in other objects. For example, with two objects A and B, source s and listener l, the possible interactions that can occur from s to l are: direct sound (0th order) s→l, 1st order s→A→l; S→B→l, 2nd order s→A→B→l; S→B→A→l, and so on. We model these interactions by formulating an inter-object transfer function. For two objects A and B, the inter-object transfer function gAB expresses the outgoing field of A in terms of the basis of incoming field of B. Like the per-object transfer function, the inter-object transfer function is also a linear function. The inter-object transfer function gAB maps each basis function of the outgoing field of A (multipoles φjhout) to the corresponding incoming field of B expressed as a linear combination of its basis functions (multipoles φikin):
where βikjh=GAB (jh, jk) is the (complex) amplitude of the incoming multipole φikin of B induced by a unit-amplitude outgoing multipole φfhout of A. The inter-object transfer function from A to B is thus encoded as GAB, which we call the interaction matrix. The interaction matrix is not symmetric in general, i.e. GAB≠GBA. Since the outgoing field of an object is not defined in its interior region, GAA and GBB are zero matrices.
The interaction matrix GAB can be computed using a least-squares formulation similar to the one used for computing scattering matrices. However, the pressure values at the offset surface samples for B,
Again, this process is repeated for each outgoing multipole of B, and solved efficiently as a single combined linear system:
UG
A
B
=[
11
. . .
PN
] (12)
The inter-object transfer functions are computed for all object pairs, independently for each frequency.
Choosing offset surface samples Solving equations (9) and (12) at frequency f involves computing the pressure at sampled locations {x1, x2, . . . , xn} on the offset surface of each object. The number of sampled locations n depends on the spatial variation of the pressure field, which in turn, depends on its frequency f. As per Nyquist theorem, representing a signal at frequency f with a finite number of samples requires a sampling rate of 2f. The spatially-varying pressure field defined on the 2D offset surface S must be sampled at a rate of 2f in both dimensions. Therefore, we place n∝(2f)2x area (S) samples uniformly on the offset surface.
Choosing incoming equivalent sources Since the nature of the incoming field is not known a priori, it is hard to optimize the number and position of incoming equivalent sources. We resolve this problem by generating another offset surface at distance Δ>δ from the object (where δ is the original offset surface's distance) and placing incoming equivalent sources on this new surface. The number of incoming equivalent sources Q depends on the spatial variation of the incoming pressure field. As before, Q∝(2f)2x area (S) equivalent sources are placed uniformly on the larger offset surface. This allows us to represent the incoming field on the inner offset surface to good accuracy.
Choosing outgoing equivalent sources The number of outgoing equivalent sources P and their positions are decided based on a multi-level source placement algorithm similar to [James et al. 2006]. The previous algorithm was designed to satisfy a single radiating field of an object at each frequency. Our algorithm is designed to satisfy multiple outgoing radiating fields at each frequency simultaneously. In our case, for each object at each frequency, we have as many outgoing radiating fields as the number of incoming multipoles QM2. This gives us a vector of pressure residuals r=[r1r2 . . . rQM
Once the scattering and interaction matrices are computed, and the sound source position has been decided, we compute the outgoing equivalent source strengths of all the objects in the scene. We initially describe our algorithm for a simple two-object scene, and it naturally generalizes to any number of objects. Consider a sound source s and two objects A and B. Let the incoming field multipoles at A and B be φAin and φBin, respectively. Similarly, let the multipole expansion for the outgoing field for A and B be φZout and φBout respectively. Let the scattering matrices for A and B are TA and TB, respectively. Let the interaction matrices for the objects be GAB and GBA respectively. The sound source pressure field s(x) expressed in terms of incoming field multipoles is denoted SA and SB, respectively. Based on these transfer functions, one can compute the steady state field after all orders of interaction. The steady state outgoing fields of A and B (denoted by PAout and PBout respectively) are:
In order to compute the source strengths, CA and CB, we solve a linear system. Assume the outgoing field in the scene is {umlaut over (C)}. This field, in addition to the source field {umlaut over (S)}, when propagated through the scene (transferred via all possible objects), generates an incoming field on the object ({umlaut over (C)}{umlaut over (C)}+{umlaut over (S)}). This incoming field is then scattered by the object to produce an outgoing field T({umlaut over (G)}{umlaut over (C)}+{umlaut over (S)}). Under steady state, this outgoing field must equal {umlaut over (C)}. Mathematically, this can be written as
{umlaut over (C)}={umlaut over (T)}(({umlaut over (G)}{umlaut over (C)}+{umlaut over (S)})
(see Appendix A.2 for the symbols and details). This yields a linear system for the outgoing source strengths for all objects:
(I−{umlaut over (T)}{umlaut over (G)}){umlaut over (C)}={umlaut over (T)}{umlaut over (S)} (15)
This linear system is solved for a regularly-sampled set of frequencies. For a scene composed of multiple objects, we derive the same equation as above with the symbols having analogous meanings, as described in detail in Appendix A.3.
At the end of the preprocessing stage, we get the outgoing equivalent source strengths for all objects at regularly sampled set of frequencies corresponding to each sound source. During run-time, we use these strengths to compute the pressure field at any listener position x:
where m is the number of objects in the scene, s(x) is the field generated by the sound source, and CA
In this section, we describe the implementation details of our technique. Typical parameter values used in our experiments are specified in Table 3.
Implementation Details The offset surface generation code is written in C++. When computing per-object transfer functions, outgoing scattered fields are computed on the offset surface (Section 4.3) using an efficient GPU-based implementation of the ARD wave solver [Raghuvanshi et al. 2009]. The remaining parts of the pre-processing stage (solving the linear system for per-object transfer functions, inter-object transfer functions and equivalent source strengths) are implemented in MATLAB. Precomputation for each frequency is performed in parallel over the nodes of a CPU cluster. The run-time code is written in C++, and has also been integrated with Valve's Source engine, as demonstrated in the supplementary video.
The timing results for offset surface generation, the ARD solver, and run-time code are measured on a 4-core 2.80 GHz Xeon X5560 desktop with 4 GB of RAM and NVIDIA GeForce GTX 480 GPU 429 with 1.5 GB memory. The timing results for the MATLAB-based precomputation are measured on a 64-node CPU cluster (8-core, 2.80 GHz, 48 GB, Xeon X5560 processor nodes). Detailed statistics are provided in Table 2.
Abbreviations used in Table 2 are as follows : “#objs.” denotes the number of objects in the scene, “#freq.” is the number of frequency sampled in the range [0-νmax], “#srcs” is the number of sound sources, “wave sim.” is the total simulation time of the numerical wave solver for all objects for all frequencies, “per-object” and “inter-object” denote the total compute time for the per-object transfer function for all objects and inter-object transfer functions for all object pairs respectively, “source-field” is time to express each sound source in terms of incoming multipoles and “global-solve” is time to compute equivalent source strengths. At runtime, the total number of equivalent sources “# eq. srcs” (in million M), performance “eval.” and storage requirement “storage” are also specified. For column, “#objs.”, notation a*+b* denotes that first object has been instanced a times and second object instanced b times, but their per-object transfer function is computed only once. Offset surface generation code takes<1 sec for each object. All timings are for a single 2.80 GHz Xeon X5560 processor core.
Due to the computational overhead of the pre-computation stage, we treat band-limited sources that emit sound whose frequency range is bounded by maximum frequency νmax, for the purpose of wave simulations. The pressure is computed at regularly sampled set of frequencies in the range [0, σmax] with a step size of Δν. The value of parameter Δν is 1.9 Hz for concave, wall, rock and parallel walls benchmarks and 3.8 Hz for desert, reservoir and Christmas scene.
Handling ground reflections To handle ground reflections, we assume the ground to be an infinite plane. Similar to the image source method [Allen and Berkley 1979], we reflect our equivalent sources about the ground plane and multiply their source strengths by the reflection coefficient of the ground. Since sound waves traveling in air maintain their phase upon reflection from a hard surface, we do not need to invert the strengths of the equivalent sources.
Spectral Extrapolation The band-limited nature of the frequency responses of our technique necessitates a plausible extrapolation to higher frequencies at runtime. Simply replicating the spectrum maintains temporal peak structure, but results in audible comb filter artifacts. Instead, we observe that typical edge diffraction spectra are roughly linear on a logarithmic scale. Hence, we first estimate a trend-line by a least-squares fit to the maxima of the log magnitude spectrum, and adjust for the trend to create a flat response by multiplying with the inverse of the trend on a linear scale. This adjusted response is replicated to higher frequencies and then multiplied by the trend (for all frequencies), yielding the final wide-band spectrum. If the trend-line has positive slope, indicating a high-pass response, we flatten the trend-line for frequencies beyond νmax. This extrapolation step exactly preserves the spectrum up to νmax.
Real-Time Auralization We have integrated our technique with Valve's Source engine. Audio is rendered using FMOD, and is processed in frames of 1024 audio samples, at a sampling rate of 44.1 kHz. In-game (“dry”) audio clips are pre-processed by computing a windowed short-time Fourier transform (STFT) on each frame (Blackman window). The STFTs are computed on audio frames after zero-padding by the maximum IR length to prevent aliasing artifacts. Real-time auralization is performed using overlap-add STFT convolutions. In each rendered frame, the dry audio frame for each source is convolved (multiplied in the frequency-domain) with the corresponding IR. The results are then mixed, and an inverse FFT performed on the mixed audio. Finally, overlap from previous frames is added in, and overlap from the current frame is cached in a ring buffer. Binaural sound rendering is achieved by using two virtual listeners, one for each ear.
In this section, we present the results of our technique on different benchmarks, provide detailed error analysis and compare it with prior work.
See Table 4 and Appendix A.4 for the computational complexity of our approach.
We have considered a variety of scenes for testing our technique. For auralizations corresponding to all the benchmarks discussed below, kindly refer to the supplementary video.
Single Object We considered various objects which have different scattering characteristics to test our technique—rocks, a wall, and a concave reflector. The rocks scatter sound in all directions (
Parallel Buildings This scene consists of two buildings situated parallel to one another. We show two walkthroughs of this scene, with a flying helicopter, and a person speaking, respectively. As the helicopter moves behind a building, diffraction leads to a distinct low-passed occlusion effect. The two walls trap sound between them as high order reflections (see
Desert This is a large scene with three different sound sources spread throughout the scene—a jeep, a bird, and a radio. As the listener walks through the scene, the received sound from the various sources changes significantly depending on whether or not the listener is in line-of-sight of the source(s). We also specifically demonstrate the effect of second-order diffracted occlusion of the jeep sound around two buildings.
Christmas Town This scene demonstrates sound propagation in a village with many houses, a church, a bell tower and large buildings. This benchmark shows diffraction around buildings, sound propagation over large distances from the bell tower, reflections between two parallel buildings, for multiple sources.
Reservoir We show that our technique can be integrated with an existing game (Half-Life 2) to generate realistic wave acoustic effects in a large outdoor game map. Ours is the first wave-based sound propagation technique which can accurately model wave phenomena such as diffraction behind the rocks and scattering around buildings over large distances on such a scene in real-time.
6.3 Comparison with Prior Interactive Techniques
Our usage of equivalent sources for sound propagation is in a similar vein to prior work [James et al. 2006], where the authors represent arbitrary outgoing radiation fields from a single geometrically complex object. Our work differs primarily in three regards: First, we model mutual interactions between objects in arbitrary scenes using inter-object transfer functions, accounting for high-order interactions such as echoes and multiple diffraction. Secondly, we model acoustic scattering from objects (as opposed to radiation), which requires an approximation of both the incoming and outgoing pressure fields for an object. Finally, our outgoing equivalent sources are chosen to satisfy multiple outgoing scattered fields as opposed to a single radiation field.
The problem of real-time acoustic scattering has been previously addressed using GPUs [Tsingos et al. 2007]. First-order scattering effects are modeled but acoustic interactions between objects cannot be modeled. In contrast, our work can handle all orders of interactions between the objects using inter-object transfer functions.
A recent technique for interactive acoustics based on wave simulation was proposed in [Raghuvanshi et al. 2010], which relies on sampling the volume of the scene and uses perceptual compression specific to indoor scenes. The runtime memory requirement of their technique on our benchmarks (assuming a spatial sampling of 1 m) is 187 MB for parallel walls and 1.8 GB for the reservoir. This technique is complementary to our approach—it works best in indoor spaces with a lot of geometric clutter but limited volume, while our technique is better suited to outdoor spaces with well-separated objects and is insensitive to air volume. In fact, it would be quite natural to integrate this method with ours, with the indoor and outdoor propagation models coupled through transport operators defined on doors and windows.
We have presented a novel wave-based sound propagation algorithm that captures acoustic effects such as high-order diffraction and scattering using an equivalent source formulation. As a result, our technique can perform accurate sound propagation on large, open scenes in real-time, has a small memory footprint, and allows flexible efficiency-accuracy tradeoffs. Compared to directly storing and convolving wave-solver solutions for auralization, we reduce the memory usage more than 100 times.
Our approach is currently most suitable for static scenes, due to the computational cost of recomputing inter-object transfers as objects move. Our approach can be combined with Fast Multipole Method (FMM) to accelerate inter-object transfer evaluations using progressive far-field approximations. Moreover, real-time performance could be achieved by further using GPU-based dense linear solvers. The incoming field strength computation for a moving source is similar to inter-object transfer. Thus, the combination of FMM and GPU-based computations could enable dynamic sources along with a moving listener.
The complexity of our method increases linearly as a function of the frequency, and our approach is practical up to roughly 1-2 kHz depending on the scene. Geometric approximations become quite accurate for outdoor scenes at these frequencies because buildings and terrain have much larger dimensions than the wavelength of 17 cm at 2 kHz. Thus, hybridization of our technique with geometric methods could lead to accurate wide-band propagation techniques for open scenes.
The field due to a multipole located at point yi is mathematically defined as follows
φilm(x)=Γtmhl(2)(wric)Ψtm(θi, φi) (1)
Where w is the angular frequency, c is the speed of sound, (ri, θi, φi) is the vector (x−yi) expressed in spherical coordinates, hl(2) (wri/c) are the spherical Hankel functions of the second kind [Abramowitz and Stegun 1964], Ψlm (θ, φ) are the spherical harmonic functions [Hobson 1955] and Γimis the normalizing factor that makes spherical harmonics orthonormal.
We describe detail the way we computer the equivalent source strengths for a scene composed of two objects. Consider a scene with objects A and B and a sound source s. Let the incoming field multipoles at A and B be ΦAin and ΦBin, respectively. Similarly, let the multipoles for the outgoing field for A and B be ΦAout and ΦBout, respectively. The scattering matrices for A and B are TA and TB, respectively. Let the interaction matrices for the objects be GAB and GBA respectively. First of all, we express the incoming field produced by sound source s on objects A and B in terms of their incoming field multipoles
Now assume that the steady state outgoing field of object A and B is PAout and PBout respectively.
The outgoing field of one object becomes incoming field for the other object. Using linearity of the inter-object transfer function and equation (10) in Section 4.4, we find the incoming field produced on B due to outgoing field of A
Similarly, we find the incoming field produced on A due to the outgoing field of B as
The total incoming field on object A is given by,
Similarly, for object B,
Applying linearity of per-object transfer function f and using equation (6) in section 4.3, we get outgoing pressure Pout A and PoutB due to the scattering of incoming fields by the objects as—
In steady state, this outgoing pressure should match the outgoing pressure we started with. Equating 8 with 2, we get—
C
A
trΦAout=SAtrTA+CBtrGBATA)ΦAout CAtr=SAtrTA+CBtrGBATA)
Similarly, equating 9 with 3, we get
C
B
tr
=S
B
tr
T
B
+C
A
tr
G
A
B
T
B
Combining above two equations into a single equation and rearranging, we get
In other words,
{umlaut over (C)}={umlaut over (T)}({umlaut over (G)}{umlaut over (C)}+{umlaut over (S)}) (10)
Rearranging,
(I−{umlaut over (T)}{umlaut over (G)}){umlaut over (C)}={umlaut over (T)}{umlaut over (S)} (10)
which is linear system Ax=b. At run-time, the outgoing scattered field at any listener position x is given by—
p(x)=CAtrΦAjout(x)+CBtrΦAjout(x) (12)
The complete pressure field becomes
p(x)=CAtrΦAjout(x)+CBtrΦAjout(x) (13)
For a scene with m objects, A1, A2, . . . , Am, equation (11) remains the same except the vectors and matrices are generalized for m objects,
The outgoing field at point x is
The complete pressure field becomes
Consider a second with m objects. We have to computer the pressure at the listener position at regularly sampled set of frequencies in the range [vmin(usually 0), vmax] with a step of Δv. The total number of frequency samples become vmax/Δv. To simplify the analysis, we assume that all objects have the same number of offset surface samples n, incoming equivalent sources Q and outgoing equivalent sources P, at all frequencies. We also assume that all the objects have equal volume u.
Scattering matrix For each of the QM2 incoming multipoles of an object, wave simulations are performed and a dense least-squares system of size PN2×n is solved to find the object's scattering matrix. The cost for each simulation is u log u, and the cost of solving the least-squares system1 is nP2N4. Hence the total cost is O (mQM2 (nP2N4+u log u)).
Interaction matrix For every pair of objects, PN2 linear systems of size QM2×n need to be solved for finding the interaction matrix. In total, we have m2 object pairs. The cost of evaluating analytical expressions for multipole pressure is O(1) each, and is dominated by the cost of solving the linear systems. Hence the total cost is O (m2PN2nQ2M4).
Computing strengths. The size of the final linear system for finding outgoing equivalent source strengths for all objects in response to a sound source is mPN2×mPN2; solving it takes O (m3P3N6) time.
The above steps are performed for all the frequency samples. The total pre-processing cost thus scales as:
O ((mQM2(nP2N4+μ log μ+mPN2nQM2)+m3P3N6)vmax/Δv)
Run-time At run-time, we evaluate equation (16) for all the frequencies, which takes O (mPN2vmax/Δv) time. The runtime memory requirement consists of positions (3 floats) and (complex-valued) strengths (2 floats) of equivalent source for all frequencies, which comes out to be m(3P+2 PN2)vmax/Δv.
Since the computational cost and runtime memory requirement scales adversely with the multipole order, we limit equivalent sources to monopoles and dipoles, N=2, M=2.
BEM The storage requirements of BEM depends on the total surface area of the objects in the scene S and the number of frequency samples vmax/Δv. Assuming BEM places s samples per wavelength (usually s=10), the number of monopoles/dipoles placed by BEM on the object's surface at frequency sample vi=Ss2vi2/c2. The total number of monopoles/dipoles for all the frequency samples is equal to mSs2vmax3/(c2Δv) where each multipole is specified by its position (3 floats) and complex amplitude (2 floats). The total memory requirement of storing the simulation results becomes
5Ss2vmax3/(c2 Δv)
ART and FDTD The runtime memory requirements of ARD and FDTD is equal to the number of grid cells in the spatial discretization of the entire volume of the scene and the number of timestops in the simulation. Assuming volume of the scene to be V, grid size h, maximum frequency vmax, speed of sound c, and number of samples per wavelength s (equal to 3 for ARD and 10 for FDTD), we can compute the number of grid cells equal (svmax/c)3V. The total number of time-samples to store is at least twice the number of samples in the frequency domain. The total memory requirement of storing the simulation results for these techniques is thus
2s2vmax4V/(c3 Δv).
Memory 104 may also include per-object transfer function generation module 108. In accordance with embodiments of the subject matter described herein, per-object transfer function generation module 108 may be configured to cause processor(s) 102 to generate per-object transfer functions for each of the disjoint objects, wherein each per-object transfer function maps an incoming sound field reaching an object to an outgoing sound field emanating from the object. In some embodiments, each of the per-object transfer functions maps an incoming sound field reaching its corresponding disjoint object to an outgoing sound field after accounting for sound propagation effects such as reflection, scattering, or diffraction due to the corresponding disjoint object. In some embodiments, the incoming and the outgoing sound field are expressed using equivalent sources. For example, the incoming sound field may be represented by placing the equivalent sources in an exterior region of the corresponding disjoint object and the outgoing sound field may be represented by placing the equivalent sources in an interior region of the corresponding disjoint object. In some embodiments, per-object transfer function generation module 108 may be configured to store at least one of the per-object transfer functions corresponding to one of the disjoint objects within the scene for future use in the simulation of a second scene that includes the disjoint object. Memory 104 may further include inter-object transfer function generation module 110. In accordance with embodiments of the subject matter described herein, inter-object transfer function generation module 110 may be configured to cause processor(s) 102 to generate a plurality of inter-object transfer functions corresponding to pairs of the disjoint objects, wherein each inter-object transfer function maps an outgoing sound field emanating from one of the disjoint objects to an incoming sound filed of another of the disjoint objects. In some embodiments, each of the inter-object transfer functions maps an outgoing sound field of one of the disjoint objects to an incoming sound field of another of the disjoint objects. In some embodiments, both the incoming and the outgoing sound field are expressed in terms of equivalent sources. In some embodiments, inter-object transfer function generation module 110 may be configured to generate the plurality of inter-object transfer functions corresponding to pairs of the disjoint objects by generating an inter-object transfer function for each possible combination of the disjoint objects.
Memory 104 may further include simulation module 112. In accordance with embodiments of the subject matter described herein, simulation module 112 may be configured to cause processor(s) 102 to utilize the per-object transfer functions and the inter-object transfer functions to perform simulation of sound propagation within the scene and thereby determine a sound field for the scene. In some embodiments, simulation module 112 may be configured to utilize the per-object transfer functions and the inter-object transfer functions to perform simulation of sound propagation within the scene by computing an outgoing equivalent source strength for each of the disjoint objects within the scene. For example, simulation module 112 may be configured to utilize the outgoing equivalent source strengths for the disjoint objects to compute a pressure field corresponding to a listener position within the scene. In some embodiments, the simulation of sound propagation within the scene may be wave-based.
The subject matter described herein may be utilized for performing sound rendering or auditory displays which may augment graphical rendering and provide a user with an enhanced spatial sense of presence. For example, some of the driving applications of sound rendering include acoustic design of architectural models or outdoor scenes, walkthroughs of large computer aided design (CAD) models with sounds of machine parts or moving people, urban scenes with traffic, training systems, computer games, etc.
It will be understood that various details of the presently disclosed subject matter may be changed without departing from the scope of the presently disclosed subject matter. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation.
The disclosures of all of the references listed herein are hereby incorporated herein by reference in their entireties.
ASSENMACHER, I. 2007. Virtual reality system with integrated sound field simulation and reproduction. EURASIP J. Appl. Signal Process. 2007, 1, 187.
DELLEPIANE, M. 2007. Instant Sound Scattering. In Rendering Techniques (Proceedings of the Eurographics Symposium on Rendering).
The finite element method for fluid dynamics, 6 ed. Butterworth-Heinemann, Jan.
The presently disclosed subject matter claims the benefit of U.S. Provisional Patent Application Ser. No. 61/614,468, filed Mar. 22, 2012; the disclosure of which is incorporated herein by reference in its entirety.
This invention was made with government support under Grant No. W911NF-10-1-0506 awarded by the Army Research Office and Grant Nos. CMMI-1000579, IIS-0917040, and 0904990 awarded by the National Science Foundation. The government has certain rights in this invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2013/031619 | 3/14/2013 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61614468 | Mar 2012 | US |