The present invention relates to audio processing and, particularly, to audio signal processing for rendering sound scenes comprising reflections modeled by image sources in the field of Geometrical Acoustics.
Geometrical Acoustics are applied in auralization, i.e., real-time and offline audio rendering of auditory scenes and environments [1, 2]. This includes Virtual Reality (VR) and Augmented Reality (AR) systems like the MPEG-I 6-DoF audio renderer. For rendering complex audio scenes with six degrees of freedom (DoF), the field of Geometrical Acoustics is applied, where the propagation of sound data is modeled with models known from optics such as ray-tracing. Particularly, the reflections at walls are modeled based on models derived from optics, in which the angle of incidence of a ray that is reflected at the wall results in a reflection angle being equal to the angle of incidence.
Real-time auralization systems, like the audio renderer in a Virtual Reality (VR) or Augmented Reality (AR) system, usually render early specular reflections based on geometry data of the reflective environment [1,2]. A Geometrical Acoustics method like ray-tracing [3] or the image source method [4] is then used to find valid propagation paths of the reflected sound. These methods are valid, if the reflecting planar surfaces are large compared to the wave length of incident sound [1]. Furthermore, the distance of the reflection point on the surface to the boundaries of the reflecting surface also has to be large compared to the wave length of incident sound.
If the geometry data approximates curved surfaces by triangles or rectangles, the classic Geometrical Acoustics methods are no longer valid and artifacts become audible. The resulting “disco ball effect” is illustrated in
If a classic image source model is used, there is usually no mitigation technique applied for the given problem [5]. If diffuse reflections are modeled in addition to specular reflections, this will further reduce the effect, but cannot solve it. Summarizing, no solution for this problem is described in the state-of-the-art.
It is an object of the present invention to provide a concept for mitigating the disco ball effect in Geometrical Acoustics or to provide a concept of rendering a sound scene that provides an improved audio quality.
According to an embodiment, an apparatus for rendering a sound scene having reflection objects and a sound source at a sound source position may have: a geometry data provider for providing an analysis of the reflection objects of the sound scene to determine a reflection object represented by a first polygon and a second adjacent polygon having associated a first image source position for the first polygon and a second image source position for the second polygon, wherein the first and second image source positions result in a sequence having a first visible zone related to the first image source position, an invisible zone and a second visible zone related to the second image source position; an image source position generator for generating an additional image source position such that the additional image source position is placed between the first image source position and the second image source position; and a sound renderer for rendering the sound source at the sound source position and, additionally for rendering the sound source at the first image source position, when a listener position is located within the first visible zone, for rendering the sound source at the additional image source position, when the listener position is located within the invisible zone, or for rendering the sound source at the second image source position, when the listener position is located within the second visible zone.
According to an embodiment, a method of rendering a sound scene having reflection objects and a sound source at a sound source position may have the steps of. providing an analysis of the reflection objects of the sound scene to determine a reflection object represented by a first polygon and a second adjacent polygon having associated a first image source position for the first polygon and a second image source position for the second polygon, wherein the first and second image source positions result in a sequence having a first visible zone related to the first image source position, an invisible zone and a second visible zone related to the second image source position; generating an additional image source position such that the additional image source position is placed between the first image source position and the second image source position; and rendering the sound source at the sound source position and, additionally rendering the sound source at the first image source position, when a listener position is located within the first visible zone, rendering the sound source at the additional image source position, when the listener position is located within the invisible zone, or rendering the sound source at the second image source position, when the listener position is located within the second visible zone.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method of rendering a sound scene having reflection objects and a sound source at a sound source position, the method having the steps of: providing an analysis of the reflection objects of the sound scene to determine a reflection object represented by a first polygon and a second adjacent polygon having associated a first image source position for the first polygon and a second image source position for the second polygon, wherein the first and second image source positions result in a sequence having a first visible zone related to the first image source position, an invisible zone and a second visible zone related to the second image source position; generating an additional image source position such that the additional image source position is placed between the first image source position and the second image source position; and rendering the sound source at the sound source position and, additionally rendering the sound source at the first image source position, when a listener position is located within the first visible zone, rendering the sound source at the additional image source position, when the listener position is located within the invisible zone, or rendering the sound source at the second image source position, when the listener position is located within the second visible zone, when said computer program is run by a computer.
The present invention is based on the finding that the problems associated with the so-called disco ball effect in Geometric Acoustics can be addressed by performing an analysis of reflecting geometric objects in a sound scene in order to determine whether a reflecting geometric object results in visible zones and invisible zones. For an invisible zone, an image source position generator generates an additional image source position so that the additional image source positon is placed between two image source positions being associated with the neighboring visible zones. Furthermore, a sound renderer is configured to render the sound source at the sound source position in order to obtain an audio impression of the direct path and to additionally rendering the sound source at an image source position or an additional image source position depending on whether the listener position is located within a visible zone or an invisible zone. By this procedure, the disco ball effect in Geometrical Acoustics is mitigated. This procedure can be applied in auralization such as real-time and offline audio rendering auditory scenes and environments.
In embodiments, the present invention provides several components, where one component comprises a geometry data provider or a geometry pre-processor which detects curved surfaces such as “round edges” or “round corners”. Furthermore, the embodiments refer to the image source position generator that applies an extended image source model for the identified curved surfaces, i.e., the “round edges” or “round corners”.
Particularly, an edge is a boundary line of a surface, and a corner is the point where two or more converging lines meet. A round edge is a boundary line between two flat surfaces that approximate a rounded continuous surfaces by means of triangles or polygons. A round corner or rounded corner is a point that is a common vertex of several flat surfaces that approximate a rounded continuous surfaces by means of triangles or polygons. Particularly, when a Virtual Reality scene, for example, comprises an advertising pillar or advertising column, this advertising pillar or advertising column can be approximated by polygon-shaped planes such as triangle or other polygon-shaped planes, and due to the fact that the polygon planes are not infinitesimally small, invisible zones between visible zones can occur.
Typically, there will exist intentional edges or corners, i.e., objects in the audio scene that are to be acoustically represented as they are, and any effects that occur due to the acoustical processing are intended. However, rounded or round corners or edges are geometric objects in the audio scene that result in the disco ball artefact or, stated in other words, that result in invisible zones that degrade the audio quality when a listener moves with respect to a fixed source from a visible zone into an invisible zone or when a fixed listener listens to a moving source that results in bringing the user into an invisible zone and then a visible zone and then an invisible zone. Or, alternatively, when both, the listener and the source move, it can be that a listener is at one point in time within a visible zone and at another point in time in an invisible zone that is only due because of the applied Geometrical Acoustics model, but has nothing to do with the real-world acoustical scene that is to be approximated as far as possible by the apparatus for rendering the sound scene or the corresponding method.
The present invention is advantageous since it generates high quality audio reflections on spheres and cylinders or other curved surfaces. The extended image source model is particularly useful for primitives such as polygons approximating cylinders, spheres or other curved surfaces. Above all, the present invention results in a quickly converging iterative algorithm for computing first order reflections particularly relying on the image source tools for modeling reflections. Advantageously, a particular frequency-selective equalizer is applied in addition to a material equalizer that accounts for the frequency-selective reflection characteristic that typically is a high-pass filter that depends on a reflector diameter, for example. Furthermore, the distance attenuation, the propagation time and the frequency-selective wall absorption or wall reflection is taken into account in embodiments. Advantageously, the inventive application of an additional image source position generation “enlightens” the dark or invisible zones. An additional reflection model for rounded edges and corners relies on this generation of additional image sources in addition to the classical image sources associated with the polygonal planes. Advantageously, a continuous extrapolation of image sources into the “dark” or invisible zones is performed advantageously using the technology of frustum tracing for the purpose of calculating first order reflections. In other embodiments, the technology can also be extended to second or higher order reflection processing. However, performing the present invention for applying the calculation of first order reflections already results in high audio quality and it has been found out that performing higher order reflection calculation, although being possible, will not always justify the additional processing requirements in view of the additionally gained audio quality. The present invention provides a robust, relatively easy to implement but nevertheless powerful tool for modeling reflections in complex sound scenes having problematic or specific reflection objects that would suffer from invisible zones without the application of the present invention.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
The image source position generator relies on the source position and the listener position and, particularly due to the fact that the listener position will change in runtime, the image source position generator will operate in runtime. The same is true for the sound renderer 30 that additionally operates in runtime using the sound source data, the listener position and additionally using the image source positions and the additional image source positions if required, i.e., if the user is placed in an invisible zone that has to be “enlightened” by an additional image source determined by the image source position generator in accordance with the present invention.
Advantageously, the geometry data provider 10 is configured for providing an analysis of the reflection object of the sound scene to determine a specific reflection object that is represented by a first polygon and a second adjacent polygon. The first polygon has associated a first image source position and the second polygon has associated a second image source position, where these image source positions are constructed, for example, as illustrated in
The sound renderer 30 is configured for rendering the sound source at the sound source position in order to obtain the direct sound at the listener position. Additionally, in order to also render a reflection, the sound source is rendered at the first image source position, when the listener position is located within the first visible zone. In this situation, the image source position generator does not need to generate an additional image source position, since the listener position is such that any artefacts due to the disco ball effect do not occur at all. The same is true when the listener position is located within the second visible zone associated with the second image source. However, when the listener is located within the invisible zone, then the sound renderer uses the additional image source position and does not use the first image source position and the second image source position. Instead of the “classical” image sources modeling the reflections at the first and the second adjacent polygons, the sound renderer only renders, for the purpose of reflection rendering, the additional image source position generated in accordance with the present invention in order to fill up or enlighten the invisible zone with sound. Any artefacts that would otherwise result in a permanently switching localization, timbre and loudness are avoided by means of the inventive processing using the image source position generator generating the additional image source between the first and the second image source position.
Furthermore,
In
Furthermore,
Furthermore, a wall absorption/reflection behavior is modeled by means of the wall absorption or reflection coefficient ∝. Advantageously, the coefficient ∝ is dependent on the frequency, i.e., represents a frequency-selective absorption or reflection curve Hw(k) and typically has a high-pass characteristic, i.e., high frequencies are better reflected than low frequencies. This behavior is accounted for in embodiments. The strength of the image source application is that subsequent to the construction of the image source and the description of the image source with respect to the propagation time, the distance attenuation and the wall absorption, the wall 140 will be completely removed from the sound scene and is only modeled by the image source 120.
For the additional image source position 90, the same path length, propagation time, distance attenuation and wall absorption is used for the purpose of rendering the first order reflection in the invisible zone 80. In an embodiment, a reflection point 92 is determined. The reflection point 92 is at the junction between the first polygon and the second polygon when watched from above, and typically is in a vertical position, for example in the example of the advertising pillar that is determined by the height of the listener 130 and the height of the source 100. Advantageously, the additional image source position 90 is placed on a line connecting the listener 130 and the reflection point 92, where this line is indicated at 93. Furthermore, the exact position of the additional sound source 90 in the embodiment is at the intersection point of the line 93 and the connecting line 91, connecting the image source positions 62 and 63 that have visible zones adjacent to the invisible zone 80.
However, the
Furthermore, although it is advantageous to exactly calculate the propagation time depending on the exact path length, other embodiments rely on an estimation of the path length as depending on a modified path length of image source position 63, or a modified path length of the other adjacent image source position 62. Furthermore, with respect to the wall absorption or wall reflection modeling, for the purpose of rendering the additional sound source position 90, either the wall absorption of one of the adjacent polygons can be used, or an average value of both absorption coefficients if they are different from each other can be used, and even a weighted average can be applied depending on whether the listener is closer to which visible zone, so that a certain wall absorption data of the wall having the visible zone to which the user is located closer receives a higher weighting value in a weighted addition compared to the absorption/reflection data of the other adjacent wall having the visible zone being further away from the listener position.
Alternatively, when step 21 determines that the user is placed within the invisible zone 80, the additional image source position 90 of
Subsequently, further procedures are given in order to illustrate a further procedure of calculating the additional image source position. The extended image source model needs to extrapolate the image source position in the “dark zone” of the reflectors, i.e. the areas between the “bright zones” in which the image source is visible (see
{right arrow over (Nk)}{right arrow over (X)}−dk=0. (1)
If the distance
l
k={right arrow over (Nk)}{right arrow over (L)}−dk (2)
is greater than or equal zero for all 4 planes, then the listener is located within the frustum that defines the coverage area of the model for the given round edge. The invisible zone frustum is illustrated in
In this case, one can determine the reflection point on the round edge as follows:
Let {right arrow over (P)}S be the orthogonal projection of the source position {right arrow over (S)} onto the edge and {right arrow over (P)}L be the orthogonal projection of the listener position {right arrow over (L)} onto the edge. This yields the reflection point {right arrow over (R)} as follows:
d
S=|{right arrow over (PS)}−{right arrow over (S)}| (3)
d
L=|{right arrow over (PL)}−{right arrow over (L)}| (4)
The construction of the reflection point is illustrate in
The computation of the coverage area of the round corners is very similar. Here, the k adjacent planes yield k image sources which together with the corner position result in a frustum that is bounded by k planes. Again, if the distances of the listener to these planes are all greater than or equal zero, the listener is located within the coverage area of the round corner. The reflection point {right arrow over (R)} is given by the corner point itself.
This situation, i.e., the invisible frustum or a round corner is illustrated in
For higher-order reflections, one can extend this method according to the frustum-tracing method where one splits up each frustum into sub-frustums whenever one hits a surface, round edge, or round corner.
The geometric data provider may apply a curved surface detection. The geometry data provider also termed to be the geometry-processor calculates the specific reflection object determination in advance, in an initialization procedure or a runtime. If, for example, a CAD software is used to export the geometry data, as much information about curvatures as possible is advantageously used by the geometry data provider. For example, if surfaces are constructed from round geometry primitives like spheres or cylinders or from spline interpolations, the geometry pre-processor/geometry data provider is advantageously implemented within the export routine of the CAD software and detects and uses the information from the CAD software.
If no a priori knowledge about the surface curvature is available, the geometry preprocessor or data provider needs to implement a round edge and round corner detector by using only the triangle or polygon mesh. For example, this can be done by computing the angle Φ between two adjacent triangles 1, 2 or 1a, 2a as illustrated in
Furthermore, depending on the output format required by the sound renderer 30, i.e., depending on whether the sound renderer outputs via headphones, via loudspeakers or just for storage or transmission in a certain format, a certain number of output adders such as a left adder 34, a right adder 35 and a center adder 36 and probably other adders for left surround output channels, or for right surround output channels, etc. are provided. While the left and the right adders 34 and 35 are advantageously used for the purpose of headphone reproduction for virtual reality applications, for example, any other adders for the purpose of loudspeaker output in a certain output format can also be used. When, for example, an output via headphones is required, then the direct sound filter stage 31 applies head related transfer functions depending on the sound source position 100 and the listener position 130. For the purpose of the first order reflection filter stage, corresponding head related transfer functions are applied, but now for the listener position 130 on the one hand and the additional sound source position 90 on the other hand. Furthermore, any specific propagation delays, path attenuations or reflection effects are also included within the head related transfer functions in the first order reflection filter stage 32. For the purpose of higher order reflection filter stages, other additional sound sources are applied as well.
If the output is intended for a loudspeaker set up, then the direct sound filter stage will apply other filters different from head related transfer functions such as filters that perform vector based amplitude panning, for example. In any case, each of the direct sound filter stage 31, the first order reflection filter stage 32 and the second order reflection filter stage 33 calculates a component for each of the adder stages 34, 35, 36 as illustrated, and the left adder 34 then calculates the output signal for the left headphone speaker and the right adder 35 calculates the headphone signal for the right headphone speaker, and so on. In case of an output format that is different from a headphone, the left adder 34 may deliver the output signal for the left speaker and the right adder 35 may deliver the output for the right speaker. If only two speakers in a two-speaker environment are there, then the center adder 32 is not required.
The inventive method avoids the disco-ball effect, that occurs when a curved surface, approximated by a discrete triangle mesh, is auralized using the classical image sound source technique [3, 4]. The novel technique avoids invisible zones, making the reflection audible. For this procedure, approximations of curved surfaces have to be identified by threshold face angle. The novel technique is an extension to the original model, with special treatment faces identified as a representation of a curvature.
Classical image sound source techniques [3, 4] do not consider that the given geometry can (partially) approximate a curved surface. This causes dark zones (silence) to be casted away from edge points of adjacent faces (see
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
20163151.2 | Mar 2020 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2021/056362, filed Mar. 12, 2021, which is incorporated herein by reference in its entirety, and additionally claims priority from European Applications No. EP 20 163 151.2, filed Mar. 13, 2020, which is which incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2021/056362 | Mar 2021 | US |
Child | 17940876 | US |