The present application claims priority to European Patent Application 18164551.6 filed by the European Patent Office on Mar. 28, 2018, the entire contents of which being incorporated herein by reference.
The present disclosure relates to the field of acoustics, in particular to a device for acoustic room analysis, a corresponding method for acoustic room analysis, and a computer program for acoustic room analysis.
For many algorithms in acoustics, like e.g. beam forming, dereverberation, and more, an analysis of the room, e.g. an estimation of the geometry of the room, is required. However, asking a user to enter the dimensions of the room is impractical. Acoustic measurements, such as impulse response, to determine room geometries are difficult and typically require that the sound emitter and/or the measurement microphone are physically moved or turned to point in different directions, which necessitates the usage of a gimbal or a turntable.
The present disclosure enhances acoustic room analysis and room geometry estimation.
Embodiments are explained by way of example with respect to the accompanying drawings, in which:
Before a detailed description of the embodiments under reference of
The embodiments disclose a device for room geometry analysis comprising: a plurality of segments built of acoustic metamaterials, each segment acting as a waveguide with a unique transfer function; and a processor configured to calculate delays and respective directions of mirror sound sources by decomposing a sound signal obtained from a microphone based on the transfer functions of the segments and based on a calibration signal emitted by a speaker.
The sound signal captured by the microphone is the sum of all incoming sound passing the segments and therefore it is the sum of all reflections of the calibration signal, as the calibration signal travels from the speaker to the microphone on many ways: directly, by first order reflection and by higher order reflections. Each reflected signal passes a different segment due to the room geometry and the reflection geometry resulting of the room geometry.
A mirror sound source captured by the microphone is for example generated when the calibration signal is reflected at a wall. For example, the calibration signal may be reflected by walls of a room and travel from the reflecting walls directly or with additional reflections to the metamaterial device. Reaching the metamaterial device, the reflected calibration signal passes at least one segment of the metamaterial device and reaches the microphone.
A segment may be made of any metamaterial with a known transfer function and may have any shape and structure. A segment may for example be structured to provide a specific and unique transfer function. A metamaterial can for example be produced with acrylonitrile butadiene styrene plastics using fused filament fabrication 3D printing technology. There are many possibilities for metamaterial composition and structure known to the skilled person.
At least one of the segments (301-304) may be a Helmholtz resonator.
A transfer function is a mathematical function giving the corresponding output value for each possible value of the input to the object to which the transfer function relates. For example, a transfer function may define a frequency-dependence of the transmitted amplitude of a sound signal passing the segment built of acoustic metamaterials.
The microphone that captures the sound signal may for example be placed so that it is surrounded by the segments, e.g. in the center of the metamaterial device. The segments may for example be arranged around the microphone in a cylindrical, spherical, circular, cubic, cuboid or planar shape. The microphone may for example be a single microphone, e.g. an omnidirectional microphone, or a microphone array.
The speaker may be any type of omnidirectional speaker known to the skilled person, for example an array of dynamic or electrostatic loudspeaker, one single dynamic loudspeaker or a piezo loudspeaker.
The processor may be a CPU, a microcomputer, a computer, or any other circuit configured to perform calculation operations.
The processor may be configured to determine, for each delay and the respective direction, the location of a mirror sound source based on the delay and the respective direction.
The processor may be configured to determine the directions of mirror sound sources from angle-dependent amplitude coefficients of the sound signal captured by the microphone. For example, each angle-dependent amplitude coefficient may be identified as a signal related to a mirror sound source.
The processor may be configured to determine a normal vector of a wall and a wall point based on the location of the mirror sound sources and the location of the loudspeaker. The normal vector and one point on a plane uniquely define a plane in a 3-dimensional space. The normal vector and the wall point thus can define a wall of a room that is to be acoustically analyzed.
The calibration signal may for example have a flat spectral density or the calibration signal may be a time-stretch pulse signal. Time-stretch pulse signals are known to the skilled person. The skilled person may use any technique to generate a time-stretch pulse signal, that he or she prefers.
The processor may be configured to decompose the sound signal based on compressive sensing (CS) techniques. Any compressive sensing techniques like L1-norm minimization, Edge-preserving total variation, Iterative model using a directional orientation field and directional total variation or the like may be used for decomposing the sound signal.
The processor may be configured to decompose the sound signal based on the minimization of the L1 norm of the sound signal, e.g. of coefficients of a decomposition of the sound signal.
The processor may be configured to optimize an error function.
The processor may be configured to decompose the sound signal based on additional constraints. Additional constraints may be the suppression of spurious reflections from furniture etc., and they are added to the calculation to improve the robustness of the approach.
The embodiments also disclose a system comprising a device as describe above and in the embodiments below, and a speaker, the speaker being configured to emit a calibration signal. The speaker may for example be placed in a room some distance away from the device comprising the segments and the microphone.
The embodiments also disclose a method comprising: emitting a calibration signal; obtaining a sound signal; and calculating delays and respective directions of mirror sound sources by decomposing the sound signal based on the transfer functions of a plurality of segments and based on the calibration signal, the segments built of acoustic metamaterials, each segment acting as a waveguide with a unique transfer function.
The embodiments also disclose a program comprising instructions, which when executed on a processor: emit a calibration signal; obtain a sound signal; and calculate delays and respective directions of mirror sound sources by decomposing the sound signal based on the transfer functions of a plurality of segments and based on the calibration signal, the segments built of acoustic metamaterials, each segment acting as a waveguide with a unique transfer function.
The metamaterial device 100 comprises acoustic metamaterials. The acoustic metamaterials are arranged in a special structure with desired acoustic properties. The metamaterial is arranged around a microphone 110 in segments (here “slices”) 101, 102, 103, 104, 105, 106. Each segment is a fanlike region of different metamaterial structure functioning as waveguide for sound with a unique transfer function, that is dependent on the solid angle (azimuth and elevation) of the slice. Each segment (waveguide) possesses a unique and highly frequency-dependent response. The transfer function of each segment is known and all transfer functions are substantially different from each other. The different transfer functions result in a “coloring” of sounds received by the microphone 110 depending on the segment by which the sound is received. This “coloring” allows identifying the segment, respectively, the direction from which a sound has been received.
The spatial resolution of the metamaterial device 100 depends on the total number of segments.
The metamaterial device 100 can for example be produced with acrylonitrile butadiene styrene plastics using fused filament fabrication 3D printing technology. It should be mentioned that there are many possibilities for metamaterial fabrication. Every kind of material with known transfer function, or any material forming a Helmholtz resonator, can be used for the metamaterial device 10.
In the embodiment of
Estimating a Room Geometry with a Metamaterial Device
The air in the neck open 502 forms an inertial mass-system because of the air's own inertial mass. Combined with the elasticity of the volume of the rigid container V0 the hole resonator forms a mass-spring system and hence a harmonic oscillator. For a spherical volume V0 and approximately for a cubic volume V0 the mass-spring system has exactly one resonance frequency that can be calculated as
With the speed of sound cs in the gas filling the rigid container (mostly air), and the so called equivalent length Leq of the neck with end correction, which can be calculated as Leq=L+0.3 D, where L is the actual length of the neck and D is the hydraulic diameter of the neck.
B(φ,θ,ω)={tilde over (B)}(φ,θ)·δ(ω−ωH)
With {tilde over (B)}(φ, θ) the angular part of B(φ, θ, ω) and ωH=2π·ƒH.
This makes the Helmholtz resonator a good candidate for the segments of the metamaterial device as here the transfer functions of the different segments can be chosen with different resonance frequencies making the basis functions orthogonal to each other (separated in frequency) and therefore the decomposition becomes trivial. The resulting measured transfer function will then be composed of a series of tones at the resonator frequencies and delays, where the maximum energy is coming from the corresponding direction.
By using successively a series of band-pass filters, which preserves only the corresponding resonator frequencies, the delay and direction can be directly estimated from each maxima.
As shown in
The decomposition of the sum signal captured by the microphone can be done analogous to the decomposition in the case of compressive sensing (CS), i.e. the L1 norm of the decomposition is minimized. Different to the case of CS, the system of the basis functions is essentially known, since it contains the transfer functions for all azimuth/elevation solid angles, or equivalently, the transfer functions of each of the segments of the acoustic metamaterial. This knowledge of the basis functions is simplifying the decomposition process significantly. The process is even more simplified by the fact that it is possible to control the basis functions. Below, a choice for the basis function is given that makes the decomposition very easy. An example of a procedure to decompose the microphone signal is as follows:
Suppose the transfer function of the segment of the acoustic metamaterial in the frequency domain, corresponding to the azimuth and elevation directions (φ,θ), is denoted B(φ,θ,ω) and its time domain fourier transform is denoted b(φ,θ,t).
The signal s(φ,θ,t) arriving from direction (φ,θ) defined by the respective segment can be written as a delayed and attenuated version of the time-stretch pulse signal signal tsp(t) emitted by the speaker: s(φ,θ,t)=tsp(t−τ) c(φ,θ) where τ is the delay (caused by the time the sound wave takes to travel from the speaker to the microphone) and c(φ,θ) are angle-dependent amplitude coefficients that describe the attenuation function that is a function of the room wall reflections and hence the angles (φ,θ) and also the distance traveled by the sound wave (hence, both factors depend, except for the direction of the direct path between speaker and microphone, on the geometry of the room). Note that in this example, only first order reflections are considered. Second order (multiple) reflections coming from the same direction could however be modeled as an additional term, with a different and a different c(φ,θ).
The arriving sound wave from direction (φ,θ) is spectrally modified by the metamaterial with its transfer function b(φ,θ,t), which can be expressed as a convolution product s(φ,θ,t)*b(φ,θ,t), or equivalently as (tsp(t−τ) c(φ,θ))*b(φ,θ,t) which is the same as
c(φ,θ)(tsp(t−τ)*b(φ,θ,t))
The total sound signal Ŝ arriving at the microphone (S2 of
In this equation, the angle-dependent amplitude coefficients c(φ, θ) and the delays τ(φ, θ) are the desired parameters that allow to estimate the shape of the room, and Ŝ is an estimate of the measured microphone signal. The b(φ, θ, t) are the known parameters of the metamaterial for the respective angle. Note that τ(φ, θ) is expressed as a function of the angle. The summation over the room angle (φ, θ) is done over as many discrete angles as there are discrete segments in the acoustic metamaterial device, i.e. 36 values of φ, and 18 values of θ if the angular resolution of the metamaterial is 10 degrees. In such a layout, the sound signal Ŝ captured by the microphone is seen as the sum of 36*18 or 648 constituents.
Equation [1] is solved for the angle-dependent amplitude coefficients c(φ,θ) and the respective delays τ(φ, θ) (S3 in
F=(S−Ŝ)2
the following error function is optimized
The second term is the L1 norm of the c coefficients. By proper choice of β, the order of the decomposition can be chosen. A value of 0.0 means a full order decomposition (648 constituents in the example above), and a large value results in only few coefficients that are nonzero.
To solve the optimization problem [2], many methods can be used which are known to the skilled person. For example First Fit, Best Fit, Hill-climbing, etc. It is advantageous to do the computation in the frequency domain to avoid the convolution in [1]. The estimation of τ can be assisted by cross-correlating the measured signal with the direct sound colored by the respective b(φ,θ,t), or by explicitly sampling many possible values of τ in [1] (which is equivalent to increasing the number of coefficients that need to be estimated). Further, it is advantageous to analyze the low frequency, mid frequency and high frequency part of the tsp signal separately in order to gain robustness and obtain faster analysis speed by using the higher spatial aliasing frequency at lower frequencies.
As a result of the decomposition, the angle-dependent amplitude coefficients c(φ, θ) and the delays τ(φ, θ) are obtained.
Based on the thus obtained angle-dependent amplitude coefficients c(φ, θ) and the corresponding delays τ(φ, θ), a room geometry analysis can be performed. The angle-dependent amplitude coefficients c(φ, θ) can be interpreted as the attenuation function of the room wall reflections. That is, directions and delays of mirror sound sources can be obtained from the angle-dependent amplitude coefficients c(φ, θ) and the corresponding delays τ(φ, θ). Each amplitude coefficient c(φ, θ) in the decomposition may be interpreted as corresponding to a mirror sound source in the direction (φ, θ) with a specific amplitude. Each angle-dependent amplitude coefficient provides a direction (φ, θ) of a mirror sound source, and the corresponding τ(φ, θ) provides a respective delay attributed to the mirror sound source. In this way, a direction and a delay τ is obtained for each of the reflections from the room walls. The order of the decomposition (i.e. into how many basis functions the acoustic microphone signal is decomposed) can be chosen a priori to a reasonable number (e.g. up to two reflections for each of the six wall directions), or it can be dynamically found (e.g. by requesting a given amount of residual after the decomposition, or by specifying a fixed value of β in the optimization of [2]).
From the directions and from the respective delays of the mirror sound sources, the locations of the mirror sound sources can be estimated (S4 in
Additional constraints, like e.g. the suppression of spurious reflections from furniture etc. may be added to improve the robustness of the approach.
The transfer functions b(φ, θ, t) of the metamaterial segments can be measured in advance, for example in an anechoic chamber, for different impinging directions.
In
In
It should be mentioned that in the case of more than one wall there are reflections of higher order visible to the metamaterial device 410. As mentioned earlier, the algorithm weights the first order reflections most, making the approximation of having only first order reflections.
The same principle can be also used for more than four walls, for example six walls (also floor and ceiling) in three dimensions or more (more complex room geometry, than cuboid geometry). Basically for each wall there will be one first order reflection, therefore the number of first order reflections can be used for counting the number of walls in the room.
Note that the present technology can also be configured as follows:
[1] A device for room geometry analysis comprising:
[2] The device of [1], wherein the processor is configured to determine, for each delay (τ(φ, θ)) and the respective direction (φ, θ), the location of a mirror sound source (722-725) based on the delay (τ(φ, θ)) and the respective direction (φ, θ).
[3] The device of [1] or [2], wherein the processor is configured to determine a normal vector (n) of a wall and a wall point (Q) based on the location of the mirror sound sources (722-725) and the location of the loudspeaker (701).
[4] The device of anyone of [1] to [3], wherein the processor is configured to determine the directions (φ, θ) from angle-dependent amplitude coefficients (c(φ, θ)) of the sound signal (g).
[5] The device of anyone of [1] to [4], in which the calibration signal has a flat spectral density.
[6] The device of anyone of [1] to [5], in which the calibration signal is a time-stretch pulse signal (tsp).
[7] The device of anyone of [1] to [6], wherein the processor is configured to decompose the sound signal (Ŝ) based on compressive sensing (CS) techniques.
[8] The device of anyone of [1] to [7], wherein the processor is configured to decompose the sound signal (Ŝ) based on the minimization of the L1 norm of the sound signal (g).
[9] The device of anyone of [1] to [8], wherein the processor is configured to decompose the sound signal (Ŝ) based on the formula
wherein c(φ, θ) are angle-dependent amplitude coefficients that represent the attenuation function of the room wall reflections, tsp(t) is the calibration signal, τ(φ, θ) is the delay, b(φ, θ, t) is the transfer function in time-domain, * is a convolution operation and (φ, θ) are polar and azimuthal coordinates referring to the segments. The sum Σφ,θ is performed over all segments with coordinates (φ, θ).
[10] The device of [9], wherein the processor is configured to optimize the following error function:
[11] The device of anyone of [1] to [10], wherein the processor is configured to decompose the sound signal (Ŝ) based on additional constraints.
[12] The device of anyone of [1] to [11] in which at least one of the segments (301-304) is a Helmholtz resonator.
[13] The device of anyone of [1] to [12], in which the segments (101-106) are arranged around the microphone (110) in a cylindrical, spherical, circular, cubic, cuboid or planar shape.
[14] A system comprising the device of claim 1 and a speaker (420), the speaker being configured to emit a calibration signal (tsp(t)).
[15] A method comprising:
[16] A program comprising instructions, which when executed on a processor:
[17] A tangible computer-readable medium storing instructions, which when executed on a processor:
Number | Date | Country | Kind |
---|---|---|---|
18164551.6 | Mar 2018 | EP | regional |