This application is directed to a system and method for virtual 2D or 3D navigation of a recorded (or synthetic) or live sound field through interpolation of the signals from an array of two or more microphone systems (each comprising an assembly of multiple microphone capsules) to estimate the sound field at an intermediate position.
Sound field recordings are commonly made using spherical or tetrahedral assemblies of microphones, which capture spherical harmonic coefficients (SHCs) of the sound field, thereby providing a mathematical representation of the sound field. The SHCs, also called higher-order Ambisonics (HOA) signals, can then be rendered for playback over headphones (or earphones), two-channel stereo loudspeakers, or one of many other multi-channel loudspeaker configurations. Ideally, playback results in a perceptually realistic reproduction of the 3D sound field from the vantage point of the microphone assembly.
From a single microphone assembly, the SHCs accurately describe the recorded sound field only in a finite region around the location of the assembly, where the size of said region increases with the number of SHCs but decreases with increasing frequency. Furthermore, the SHCs are only a valid description of the sound field in the free field, i.e., in a spherical region around the microphone assembly that extends up to the nearest source or obstacle. A review of this theory is given by M. A. Poletti in the article “Three-Dimensional Surround Sound Systems Based on Spherical Harmonics,” published November, 2005, in volume 53, issue 11 of the Journal of the Audio Engineering Society.
An existing category of sound field navigation techniques entails identifying, locating, and isolating discrete sound sources, which may then be artificially moved relative to the listener to simulate navigation. The details of this method are given by Xiguang Zheng in the thesis “Soundfield navigation: Separation, compression and transmission,” published in 2013 by the University of Wollongong. This type of technique is only applicable to sound fields consisting of a finite number of discrete sources that can be easily separated (i.e., sources that are far enough apart or not emitting sound simultaneously). Furthermore, even in ideal situations, the source separation technique employed in the time-frequency domain (i.e., short-time Fourier transform domain) often results in a degradation of sound quality.
An alternative technique is to average the SHCs directly, and is described by Alex Southern, Jeremy Wells, and Damian Murphy in the article “Rendering walk-through auralisations using wave-based acoustical models,” presented at the 17th European Signal Processing Conference (EUSIPCO), 2009. However, if a sound source is nearer to one microphone assembly than to another, this technique will necessarily produce two copies of the source's signal, separated by a finite time delay, yielding a comb-filtering-like effect.
It is therefore an objective of the present invention to provide a system and method for generating virtual navigable sound fields in 2D or 3D without introducing spectral coloration or degrading sound quality.
The system and method for virtual navigation of a sound field through interpolation of the signals from an array of microphone assemblies of the present invention utilizes an array of two or more higher-order Ambisonics (HOA) microphone assemblies, which measure spherical harmonic coefficients (SHCs) of the sound field from spatially-distinct vantage points, to estimate the SHCs at an intermediate listening position. First, sound sources near to the microphone assemblies are detected and located either acoustically using the measured SHCs or by simple distance measurements. Simultaneously, the desired listening position is received via an input device (e.g., a keyboard, mouse, joystick, or a real-time head/body tracking system). Only the microphone assemblies that are nearer to said desired listening position than to any near sources are considered valid for interpolation. The SHCs from these valid microphone assemblies are then interpolated using a combination of weighted averaging and linear translation filters. The result is an estimate of the SHCs that would have been captured by a HOA microphone assembly placed in the original sound field at the desired listening position.
In general, the system and method for virtual navigation of a sound field through interpolation of the signals from an array of microphone assemblies of the present invention involves an array of two or more compact microphone assemblies that are used to capture spherical harmonic coefficients (SHCs) of the sound field from spatially distinct vantage points. Said compact microphone assembly may be the tetrahedral SoundField DSF-1 microphone by TSL Products, the spherical Eigenmike by mh Acoustics, or any other microphone assembly consisting of at least four (4) microphone capsules arranged in a 3D configuration (such as a sphere). First, the microphone assemblies are arranged in the sound field at specified positions (or, alternatively, the positions of the microphone assemblies are determined by simple distance measurements), and any sound sources near to the microphone assemblies (i.e., near-field sources) are detected and located either by simple distance measurements, through triangulation using the signals from the microphone assemblies, or with any other existing source localization techniques found in the literature. Simultaneously, the desired listening position is either specified manually with an input device (such as a keyboard, mouse, or joystick) or measured by a real-time head/body tracking system. Next, the desired position of the listener, the locations of the microphone assemblies, and the previously determined locations of any near-field sources are used to determine the set of microphone assemblies for which the listening position is valid. Based on the positions of each of the valid microphone assemblies and the listening position, a set of interpolation weights is computed. Ultimately, the SHCs from the valid assemblies are interpolated using a combination of weighted averaging and linear translation filters. Such linear translation filters are described by Joseph G. Tylka and Edgar Y. Choueiri in the article “Comparison of Techniques for Binaural Navigation of Higher-Order Ambisonic Soundfields,” presented at the 139th Convention of the Audio Engineering Society, 2015.
The general method for virtual navigation of a sound field through interpolation of the signals from an array of microphone assemblies of the present invention is depicted in
In step 12, the desired position of the listener, the locations of the microphone assemblies, and the previously determined locations of any near-field sources are used to determine the set of microphone assemblies for which the listening position is valid. The spherical harmonic expansion describing the sound field from each microphone assembly is a valid description of said sound field only in a spherical region around the microphone assembly that extends up to the nearest source or obstacle. Consequently, if a microphone assembly is nearer to a near-field sound source than said microphone assembly is to the listening position, then the SHCs captured by that microphone assembly are not suitable for describing the sound field at the listening position. By comparing the distances from each microphone assembly to its nearest source and the distance of that microphone assembly to the listening position, a list of the valid microphone assemblies is compiled. As an example, the geometry of a typical situation is depicted in
In step 14, the positions of the valid microphone assemblies are used in conjunction with the desired listening position to compute a set of interpolation weights. Depending on the geometry of the valid microphone assemblies and the listening position, the weights may be calculated using standard interpolation methods, such as linear or bilinear interpolation weights. A simple implementation for an arbitrary geometry is to compute each weight based on the reciprocal of the respective microphone assembly's distance from the listening position. Generally, the interpolation weights should be normalized such that either the sum of the weights or the sum of the squared weights is equal to 1.
In step 16, the list of valid microphone assemblies is used to isolate (i.e., pick out) only the SHCs from said valid microphone assemblies. These SHCs from said valid microphone assemblies, as well as the previously computed interpolation weights, are then passed to the interpolation block for step 18. In general, the interpolation step 18 involves a combination of weighted averaging and linear translation filters applied to the valid SHCs. In the following discussion, three potential implementations are described.
One potential implementation of the interpolation step 18 is depicted in
In step 22, the square root of each interpolation weight is computed. Then, in step 24, each individual sub-matrix in the combined translation matrix is multiplied by the square root of the interpolation weight for the respective microphone assembly. In parallel, in step 26, the set of SHCs from each of the valid microphone assemblies is also multiplied by the square root of the interpolation weight for the respective microphone assembly. The weighted SHCs are then arranged into a combined column-vector, with each microphone assembly's respective SHCs first arranged as a column-vector, and then arranged vertically by microphone assembly in the combined column-vector.
In step 28, singular value decomposition (SVD) is performed on the weighted combined translation matrix, from which a regularization parameter is computed in step 30. The computed regularization parameter may be frequency-dependent so as to mitigate spectral coloration. One such method for computing such a regularization parameter is described by Joseph G. Tylka and Edgar Y. Choueiri in the article “Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones,” presented at the Audio Engineering Society's International Conference on Audio for Virtual and Augmented Reality, 2016. Using the regularization parameter and the SVD matrices, a regularized pseudoinverse matrix is computed in step 32.
Finally, in step 34, the combined column-vector of weighted SHCs is multiplied by the previously computed regularized pseudoinverse matrix. The result is an estimate of the SHCs of the sound field at the listening position.
An alternate implementation of the interpolation step 18 is depicted in
In step 38, the sets of weighted SHCs summed term-by-term across different microphone assemblies. That is, the nth term of the interpolated SHCs is calculated by summing together the nth term from each set of weighted SHCs. For this implementation in particular, it is important that the interpolation weights be normalized (for example, such that the sum of the weights is equal to 1). The result is an estimate of the SHCs of the sound field at the listening position.
Another alternate implementation of the interpolation step 18 is depicted in
In step 42, each individual sub-matrix in the combined matrix is multiplied by the interpolation weight for the respective microphone assembly. In parallel in step 44, the sets of SHCs from the valid microphone assemblies are converted to plane-wave coefficients (PWCs). The relationship between SHCs and PWCs is obtained from the Gegenbauer expansion, and is given by Dmitry N. Zotkin, Ramani Duraiswami, and Nail A. Gumerov in the article “Plane-Wave Decomposition of Acoustical Scenes Via Spherical and Cylindrical Microphone Arrays,” published January, 2010, in volume 18, issue 1 of the IEEE Transactions on Audio, Speech, and Language Processing. These PWCs are then arranged into a combined column-vector, with each microphone assembly's respective PWCs first arranged as a column-vector, and then arranged vertically by microphone assembly in the combined column-vector.
In step 46, the combined column-vector of PWCs is multiplied by the previously computed weighted combined translation matrix. The result is an estimate of the PWCs of the sound field at the listening position. Finally, in step 48, the estimated PWCs are converted to SHCs, again using the relationship obtained from the Gegenbauer expansion mentioned previously.
The method of the present invention can be embodied into a system, such as that shown in
Prior to performing the method of the present invention, the processor 52 first computes the spherical harmonic coefficients (SHCs) of the sound field using the raw capsule signals from the microphone assemblies 50. Procedures for obtaining SHCs from said capsule signals are well established in the prior art; for example, the procedure for obtaining SHCs from a closed rigid spherical microphone assembly is described by Jens Meyer and Gary Elko in the article “A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield,” presented at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2002. A more general procedure for obtaining SHCs from any compact microphone assembly is described by Angelo Farina. Simone Campanini, Lorenzo Chiesi. Alberto Amendola, and Lorenzo Ebri in the article “Spatial Sound Recording with Dense Microphone Arrays,” presented at the 55th International Conference of the Audio Engineering Society, August, 2014.
Once the measured SHCs are obtained, the processor 52 determines which of the measured SHCs are valid for use at a desired listening position based on near-field source location and positions of the microphone assemblies 50, computes a set of interpolation weights based on positions of said microphone assemblies 50 and said listening position, and interpolates said valid measured SHCs to obtain a set of SHCs for a desired intermediate listening position. During processing, the processor 52 also receives the desired listening position via an input device 56, e.g., a keyboard, mouse, joystick, or a real-time head/body tracking system. Subsequently, the processor 52 renders the interpolated SHCs for playback over the desired sound playback equipment 54.
The sound playback equipment 54 may comprise one of the following: a multi-channel array of loudspeakers 58, a pair of headphones or earphones 60, or a stereo pair of loudspeakers 62. For playback over a multi-channel array of loudspeakers, an ambisonic decoder (such as those described by Aaron J. Heller, Eric M. Benjamin, and Richard Lee in the article “A Toolkit for the Design of Ambisonic Decoders,” presented at the Linux Audio Conference. 2012, and freely available as a MATLAB toolbox) or any other multi-channel renderer is required. For playback over headphones/earphones or stereo loudspeakers, an ambisonics-to-binaural renderer is required, such as that described by Svein Berge and Natasha Barrett in the article “A New Method for B-Format to Binaural Transcoding,” presented at the 40th International Conference of the Audio Engineering Society, 2010, and widely available as an audio plugin. Additionally, for playback of the binaural rendering over two loudspeakers, a crosstalk canceller is required, such as that described by Bosun Xie in chapter 9 of the textbook “Head-Related Transfer Function and Virtual Auditory Display,” published by J. Ross Publishing, 2013.
While the foregoing invention has been described with reference to its preferred embodiments, various alterations and modifications will occur to those skilled in the art. All such variations and modifications are intended to fall within the scope of the appended claims. For example, the above description exclusively to recorded sound fields, but the system and method of the present invention may be applied to synthetic sound fields in the same manner to interpolate between discrete positions at which SHCs have been computed numerically.
This application is a national stage application of prior International Application No. PCT/US2017/54404, entitled “System and Method for Virtual Navigation of Sound Fields through Interpolation of Signals from an Array of Microphone Assemblies,” filed Sep. 29, 2017 which relates and claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 62/401,463, titled “System and Method for Virtual Navigation of Sound Fields through Interpolation of Signals from an Array of Microphone Assemblies,” which was filed on Sep. 29, 2016 each of which are hereby incorporated by reference herein in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/054404 | 9/29/2017 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2018/064528 | 4/5/2018 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20060045275 | Daniel | Mar 2006 | A1 |
20130216070 | Keiler et al. | Aug 2013 | A1 |
20140355766 | Morrell et al. | Dec 2014 | A1 |
20140355771 | Peters et al. | Dec 2014 | A1 |
20140358565 | Peters et al. | Dec 2014 | A1 |
Entry |
---|
Berge et al., “A New Method for B-Format to Binaural Transcoding,” presented at the 40th International Conference of the Audio Engineering Society, Tokyo, Japan, Oct. 8-10, 2010, 10 pages. |
Farina et al., “Spatial Sound Recording with Dense Microphone Arrays,” presented at the 55th AES International Conference, Helsinki, Finland, Aug. 27-29, 2014, 8 pages. |
Gumerov et al. “Chapter 3. Translations and Rotations of Elementary Solutions,” Fast Multipole Methods for the Helmholtz Equation in Three Dimensions, published by Elsevier Science, Jan. 27, 2005, pp. 89-137. |
Heller et al., “A Toolkit for the Design of Ambisonic Decoders,” Presented at the Linux Audio Conference, Stanford University, California, Apr. 12-15, 2012, 12 pages. |
International Search Report and Written Opinion dated Dec. 5, 2017, in the International Application No. PCT/US17/54404, 23 pages. |
Meyer et al., “A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield,” 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, May 13-17, 2002, pp. 1781-1784. |
Poletti, “Three-Dimensional Surround Sound Systems Based on Spherical Harmonics,” Journal of the Audio Engineering Society, Nov. 2005, vol. 53, No. 11, pp. 1004-1025. |
Schultz et al., “Data-based Binaural Synthesis Including Rotational and Translatory Head-Movements,” presented at the 52nd International Conference of the Audio Engineering Society, Guildford, UK, Sep. 2-4, 2013, 11 pages. |
Southern et al., “Rendering walk-through auralisations using wave-based acoustical models,” presented at the 17th European Signal Processing Conference (EUSIPCO 2009), Glasgow, UK, Aug. 24-28, 2009, pp. 715-719. |
Tylka et al., “Comparison of Techniques for Binaural Navigation of Higher-Order Ambisonic Soundfields,” Audio Engineering Society, Convention Paper 9421, Presented at the 139th Convention, Oct. 29 to Nov. 1, 2015, 13 pages. |
Tylka et al., “Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones,” Presented at the Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, Sep. 30 to Oct. 1, 2016, 10 pages. |
Xie “Chapter 9. Binaural Reproduction through Loudspeakers,” Head-Related Transfer Function and Virtual Auditory Display, published by J. Ross Publishing, Jun. 2013, pp. 283-326. |
Zheng, “Soundfield navigation: Separation, compression and transmission,” Doctor of Philosophy Thesis, School of Electrical, Computer, and Telecommunications Engineering, University of Wollongong, 254 pages (2013). |
Zotkin et al., “Plane-Wave Decomposition of Acoustical Scenes Via Spherical and Cylindrical Microphone Arrays,” IEEE Transactions on Audio, Speech, and Language Processing, Jan. 2010, vol. 18, Issue 1, 29 pages. |
Number | Date | Country | |
---|---|---|---|
20200021940 A1 | Jan 2020 | US |
Number | Date | Country | |
---|---|---|---|
62401463 | Sep 2016 | US |