The present application is concerned with early reflection processing concepts for auralization.
A room impulse response (RIR) describes the relationship between a sound source in an acoustic environment (a room) and the receiver (i.e. the listener). It specifies the room's response to a unit impulse in time domain and corresponds to the room transfer function in frequency domain. It consists of the direct sound path, the early reflections (ERs) and the diffuse late reverberation.
In binaural (or loudspeaker) rendering for virtual and augmented reality (VR/AR) applications, the room impulse response from a particular source and listener location may change considerably. In 6-Degrees-of-Freedom (6DOF) VR/AR applications, the listener can usually move freely within the entire scene, resulting in a permanently changing room impulse response. Consequently, a tremendous amount of computation has to be spent to determine each reflection from the source to the listener, taking into consideration the geometry of walls, occluding objects and other effects to compute a physically accurate reflection pattern.
It is the observation of this invention that the exact acoustic reproduction of the early reflection (ER) pattern in a room is not required to make a perceptually convincing rendering and that this can be done in a way that largely abstracts from the exact geometric details of the room. In this way, a lot of computation can be saved. In case the reflection pattern has to be transmitted from an encoder to a renderer, a considerable part of the side information associated with efficiently computing reflections depending on the listener position can be saved as compared to the state of the art in regular geometry-based rendering.
The document [1] concerns a replacement of exactly calculated “real” ER by a more general Simple ER pattern. The idea of this was to find, describe and simulate the perceptually orthogonal parameters describing small or large sound sources (e.g. orchestra) on a stage of a large room (e.g. concert hall), [2, 3] and play them back over a loudspeaker setup (e.g. stereo) or binaurally over headphone. A composer or sound engineer was able to use these parameters (like source presence, source warmth, source brilliance, room presence, running reverberation, envelopment and reverberance) to set up a scene. The SPAT software has been used over a long time for such kind of productions, [4]. The approach was also adopted in the ISO MPEG-4 standardization [5].
In a dynamic 6DOF environment the acoustic description of rooms (dimensions, RT60, . . . ) can vary to a considerable amount. The source and receiver position are fully free and will be calculated in real-time for auralization. Perceptual parameters, which are highly dependent on these changing physical setups cannot be defined as constants and are therefore not appropriate for this task.
The invention here has the new approach to take just few basic physical parameters of the environment to select and adjust simple basic ER pattern. This has the following advantages: No specific sound engineering background is needed to define the parameters. They come directly from the physical model. The used Simple ER pattern is adaptive to different room sizes and different RT60 values. Even for outdoor environments, Simple ER patterns are defined, which was not the case in SPAT. The perceptual degradation with this approach relative to a full physically correct simulation is limited because the human auditory system is not able to analyze the fine structure of the early reflections, e.g. [6].
In the following, newly invented Simple ER patterns, room acoustic parameters are used, like RT60, predelay time, room volume or room dimensions, and frequency dependency of RT60. The ER pattern is specifically defined to produce a smooth transition between the direct sound and the late reverb. It should be frequency neutral and the proximity to walls and openings of the source and receiver.
It is the idea to produce a plausible and convincing perception of the listener, fitting to the overall room acoustical parameters. This is enough for most of the cases, because the listener has no direct comparison possibility to the “real” physically exact ER.
The computational consuming exact geometrical calculation of ER, especially with visibility checks, can be avoided, especially in applications like real-time auditory virtual environment and augmented reality. The exact calculation of “real” ER is also sometimes difficult and sensitive to produce artifacts by appearing and disappearing ERs, depending on the exact (and time-varying) location of the source and the listener. This can be avoided by using a constant ER pattern, which has been computed once when entering of the scene or by moving from one acoustic environment to another environment, defined by different acoustic parameters.
The invention takes advantage of an encoder-bitstream-renderer scenario. In one case (a), a default Simple ER pattern can be calculated with the room acoustical parameters available in the renderer alone. These parameters are adjusted in real-time by the source-listener distance and the azimuth angle between them. In case (b), the geometry of the scene is pre-analyzed in a more advanced way in the encoder. Then the Simple ER pattern of few ERs is pre-calculated in the encoder and transmitted to the renderer in a bitstream. There it is adjusted in the same way as in case (a) by the listener distance and angle (or other information that is available at the time of rendering). These two cases give the full flexibility for an open future-proof approach, in which further analysis knowledge can be incorporated later into the encoder.
A room impulse response (RIR) describes the relationship between a sound source in an acoustic environment (a room) and the receiver (the listener) and specifies the room's response to a unit impulse, see e.g.
Especially in complex physical environments/rooms, defined by many surfaces, the calculation of the geometrical correct ERs with the needed visibility checks (“is this source in direct line-of-sight to the listener?”) is very time consuming. On the other hand, it is known that the human auditory perceptions suppresses a lot of details about the ERs with regard to the direct sound (law of the first wave front, precedence effect, scene analysis, [8, 9]) and that therefore a precise modeling of the ER part of the impulse response is in many cases not necessary to achieve a convincing rendering quality, e.g. [6]. The auditory system uses the ERs to determine or refine several perceptual attributes. Among them are:
There are several approaches known to simplify ER calculation. The first one is just to avoid the calculation of the ER completely, i.e. render sound without simulated ER, i.e. render only direct sound and late reverb, see
The next possibility is to calculate only geometrically exact 1st order reflections, see
The next possibility are just two ERs side by side with the direct sound, see
In the next pattern the two side reflections are replaced by 4 reflections to each side of the direct sound and four fixed source position independent reflection sequences at [±45° and ±135°], each consisting of 4 reflections, see
The previously described approach is designed such that the input parameters, which define the ER pattern, are perceptual parameters. They should describe the listener's perception caused by the ERs. The shortcoming is that it only vaguely adapts to room related parameters. Sound engineering knowledge and experience is needed to set the perceptual defined parameters, like source presence, source warmth, source brilliance, room presence, running reverberation, envelopment and reverberance. This is a clear disadvantage for designers defining the physical properties of a real-time VR/AR system and having no perceptual sound engineering experience. Especially for VR applications, the geometry of the virtual physical space is often known quite well as a by-product of the visualization process. Also, there is no ER pattern for outdoor environments known with the SPAT algorithm.
The object of the invention is to avoid the shortcomings of the state of the art by explicitly using room acoustical and physical parameters to define the ER pattern. Furthermore, different patterns are defined depending on the room properties, and are even suitable for outdoor environments (where a precise description of the geometry is difficult). The patterns have different numbers of ERs dependent on room size or other physical parameters.
The new ER patterns feature
This is achieved by using parameterizable but fixed spatial ER patterns that do not depend on the exact geometry of the room. In an embodiment of the invention, the pattern also does not depend on the listener position in the room. Instead, only one (or a few) global characteristic parameters are used to configure the ER pattern. In this way, the pattern can be rendered extremely efficiently.
In the following newly invented ER patterns, specifically room acoustic parameters are used like RT60, predelay time, room dimensions or room volume, frequency dependency of RT60 for pattern configuration. The ER pattern is defined in a way to produce a (temporally) smooth transition between the direct sound and the late reverb. It should be of neutral timbre. It is dependent on room volume and surface. It is not dependent on the position of the source and receiver in the room.
It is the objective of the invention to produce a plausible and convincing perception by the listener, fitting to the overall room acoustical parameters. This is sufficient for most use cases, especially since the listener has no possibility for a direct comparison with a rendering of the “real” physically correct ER.
One embodiment relates to an apparatus for sound rendering, configured to receive information on a listener position and a sound source position; render an audio signal of the sound source using a room impulse response whose early reflection portion is exclusively determined by an early reflection pattern, which is indicative of a constellation of early reflection positions, and which is positioned at the listener position in a manner so that the early reflection positions are located around the listener position and at angular directions from the listener position which are invariant with respect to changes in a listener head orientation.
Another embodiment relates to a bitstream for being subject to inventive sound rendition.
Another embodiment relates to a digital storage medium storing an inventive bitstream for being subject to sound rendition.
According to another embodiment, a method for sound rendering may have the steps of: receiving information on a listener position and a sound source position; rendering an audio signal of the sound source using a room impulse response whose early reflection portion is exclusively determined by an early reflection pattern, which is indicative of a constellation of early reflection positions, and which is positioned at the listener position in a manner so that the early reflection positions are located around the listener position and at angular directions from the listener position which are invariant with respect to changes in a listener head orientation.
Another embodiment relates to a non-transitory digital storage medium having a computer program stored thereon to perform the inventive method when said computer program is run by a computer.
In accordance with a first aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to use early reflection (ER) rendering of audio signal stems from the fact that the early reflections depend on a relationship between a source position and a listener position. The inventors found, that it is possible to consider a source position independent ER pattern without, e.g., floor reflection; so that ER rendering gets easier while the rendering result is still pretty good. The early reflection portion of the room impulse response used for the rendering, is exclusively determined by an early reflection pattern. A spatial relationship between a sound source and the listener is not considered for the early reflection portion of the room impulse response. Further the early reflection positions in the early reflection pattern are invariant with respect to changes in a listener head orientation. This is based on the finding that the same ER pattern can be used for determining the early reflection portion of the room impulse response independent whether the listener looks to the sound source or in any other direction.
Accordingly, in accordance with a first aspect of the present application, an apparatus for sound rendering is configured to receive information on a listener position and a sound source position. The apparatus is configured to render an audio signal of the sound source using a room impulse response whose early reflection portion is exclusively determined by an early reflection pattern. The early reflection pattern is indicative of a constellation, e.g. constellation shall denote a set of positions along with defining their mutual placement in terms of the angles between the lines connecting the positions; a synonymous term shall be “pattern”, of early reflection positions. The early reflection pattern is positioned at the listener position in a manner so that the early reflection positions are located around the listener position and at angular directions from the listener position which are invariant with respect to changes in a listener head orientation, i.e. the constellation is translatorily placed at the listener position.
In accordance with a second aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to use early reflection (ER) rendering of audio signal stems from the fact that the early reflection patterns for outdoor environments are highly individual and dependent on the physical setup of the scene. The inventors found, that ER pattern generated using moderate analysis of an environment can result into an acoustically convincing, but computationally moderate ER rendering result.
Accordingly, in accordance with a second aspect of the present application, an apparatus for determining an early reflection pattern for sound rendition is configured to perform a geometric analysis of an acoustic environment by, at each of one or more analysis positions, determining a function indicative, for each of different distances from the respective analysis position, a value representative of an early reflection contribution; and by inspecting the function or a further function derived therefrom with respect to one or more maxima to derive one or more control parameters. Additionally, the apparatus is configured to determine an early reflection pattern, which is indicative of a constellation of early reflection positions, by placing the early reflection positions using the one or more control parameters.
In accordance with a third aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to use early reflection (ER) rendering of audio signal stems from the fact that a transmission of early reflection patterns of the audio scenes for the rendering may result in high signaling costs. The inventors found, that ER pattern can be generated by use of bitstream hints resulting into an acoustically convincing, but computationally moderate ER rendering result. By using only hints in the bitstream, the signaling costs can be reduced, since it is not necessary to transmit the complete ER pattern.
Accordingly, in accordance with a third aspect of the present application, an apparatus for sound rendering is configured to receive first information on a listener position and a sound source position. The apparatus is configured to receive a bitstream comprising, e.g. and read therefrom, a representation of an audio signal of a sound source positioned at the sound source position and one or more early reflection pattern parameters. For example, the bitstream is audio bitstream with the early reflection parameter inside a header or metadata field of the bitstream, or a file format stream with the early reflection parameter inside a packet of the file format stream and a track of the file format stream comprising an audio bitstream representing the audio signal. Additionally, the apparatus is configured to determine an early reflection pattern, which is indicative of a constellation of early reflection positions, depending on the one or more early reflection pattern parameters. Further, the apparatus is configured to render the audio signal of the sound source using a room impulse response whose early reflection portion is determined by an early reflection pattern. The early reflection pattern is indicative of a constellation, e.g. constellation shall denote a set of positions along with defining their mutual placement in terms of the angles between the lines connecting the positions; an synonymous term shall be “pattern”, of early reflection positions. The early reflection pattern is positioned at the listener position in a manner so that the early reflection positions are located around the listener position and at angular directions from the listener position which are invariant with respect to changes in listener head orientation, i.e. the constellation is translatorily placed at the listener position.
In accordance with a fourth aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to use early reflection (ER) rendering of audio signal stems from the fact that a tremendous amount of computation has to be spent to determine each reflection from the source to the listener, taking into consideration the geometry of walls, occluding objects and other effects to compute a physically accurate reflection pattern. The inventors found, that simple room acoustical parameters, like room dimension, room volume or predelay, can be used to determine the number of early reflection positions within an early reflection pattern. It is not needed to analyze the real early reflection of the scene, since the early reflections can be approximated dependent on a room acoustical parameter. The inventors found that ER pattern generation by ER number dependency on room acoustical parameter results into an acoustically convincing, but computationally moderate ER rendering result.
Accordingly, in accordance with a fourth aspect of the present application, an apparatus for determining an early reflection pattern for sound rendition is configured to receive at least one room acoustical parameter which is representative of an acoustical characteristic of an acoustic environment. The apparatus is configured to determine an early reflection pattern, which is indicative of a constellation of early reflection positions, in a manner so that a number of the early reflection positions depend on the at least one room acoustical parameter.
In accordance with a fifth aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to use early reflection (ER) rendering of audio signal stems from the fact that each source is associated with a different early reflection pattern. The inventors found, that it is not necessary to use different ER pattern for signals of different sources. This is based on the idea that the signals can be weighted and summed dependent on a source listener relationship, so that only the weighted sum of the audio signals is rendered based on the ER patter. The inventors found that ER rendition by use of a ER pattern for more than one sound source results into acoustically convincing, but computationally moderate ER rendering result.
Accordingly, in accordance with a fifth aspect of the present application, an apparatus for sound rendering is configured to receive information on a listener position, a first sound source position and a second sound source position. The apparatus is configured to render audio signal of the two sound sources using a room impulse response whose early reflection portion is determined by an early reflection pattern. The early reflection pattern is indicative of a constellation, e.g. constellation shall denote a set of positions along with defining their mutual placement in terms of the angles between the lines connecting the positions; an synonymous term shall be “pattern”, of early reflection positions. The early reflection pattern is positioned at the listener position in a manner so that the early reflection positions are located around the listener position and at angular directions from the listener position which are invariant with respect to changes in listener head orientation, i.e. the constellation is translatorily placed at the listener position. The apparatus is configured to render the audio signals of the two sound sources by forming a weighted sum of a first audio signal of a first sound source positioned at the first sound source position and a second audio signal of a second sound source positioned at the second sound source position. The weighted sum weights the first audio signal more than the second audio signal, if a first distance between the first sound source position and the listener position is smaller than a second distance between the second sound source position and the listener position, and weights the second audio signal more than the first audio signal, if the first distance is larger than the second distance. Additionally, the apparatus is configured to render the audio signals of the two sound sources by generating early reflection contribution loudspeaker signals relating to the early reflection portion of the room impulse response by rendering the weighted sum from the early reflection positions.
In accordance with a sixth aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to use early reflection (ER) rendering of audio signal stems from the fact that a tremendous amount of computation has to be spent to determine each reflection from the source to the listener, taking into consideration the geometry of walls, occluding objects and other effects to compute a physically accurate reflection pattern. The inventors found, that simple room acoustical parameters, like room dimension, room volume or predelay, can be used to parametrize function defining a position of the early reflections. It is not needed to analyze the real early reflection of the scene, since the early reflections can be approximated dependent on the room acoustical parameter. Further it was found that spiral functions provide a good distribution of the early reflection positions. The inventors found that ER pattern generation using one or more spiral functions results into an perceptually convincing, but computationally moderate ER rendering result.
Accordingly, in accordance with a sixth aspect of the present application, an apparatus for determining an early reflection pattern for sound rendition is configured to receive at least one room acoustical parameter which is representative of an acoustical characteristic of an acoustic environment and determine an early reflection pattern, which is indicative of a constellation of early reflection positions, by parameterizing one or more spiral functions centered at the listener position, and place the early reflection positions using the one or more spiral functions.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals even if occurring in different figures.
In the following description, a plurality of details is set forth to provide a more throughout explanation of embodiments of the present invention. However, it will be apparent to those skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present invention. In addition, features of the different embodiments described herein after may be combined with each other, unless specifically noted otherwise.
In the following, various examples are described which may assist in achieving a reduced audio rendering complexity when using early reflection processing concepts. The herein discussed simplified early reflection processing concepts may be added to other early reflection processing concepts heuristically designed, for instance, or may be provided exclusively.
In order to ease the understanding of the following embodiments of the present application, the description starts with a general presentation of an early reflection pattern 1, according to an embodiment of the invention. The features described with regard to the early reflection pattern 1 in
An early reflection pattern 1 is indicative of a constellation of early reflection positions ERP, see ERP1 and ERP2. For example, the constellation shall denote a set of positions ERP along with defining their mutual placement, e.g., in terms of the angles a between the lines connecting the positions with the center 2 of the pattern 1. A synonymous term for constellation shall be “pattern”.
The early reflection positions ERP, i.e. positions of early reflections, may indicate or identify positions in an environment 5, e.g., an indoor room or an outdoor area, at which early reflections of an audio signal may occur. For example, a listener positioned at the center 2 of the early reflection pattern 1 may perceive early reflections coming from the early reflection positions ERP. In other word, the early reflection positions ERP may indicate positions from which a listener positioned at the center of the early reflection pattern 1 receives early reflections.
The early reflection pattern 1, for example, is positioned at a listener position 10 in a manner so that the early reflection positions ERP are located around the listener position 10 and at angular directions from the listener position 10 which are invariant with respect to changes in a listener head orientation, i.e. the constellation is translatorily placed at the listener position 10. For example, the early reflection positions ERP may be determined, so that same are in a substantially uniform manner angularly distributed around the listener position 10.
According to an embodiment, the early reflection pattern 1, i.e. the early reflection positions ERP, may be determined, so that connection lines, see 7 and 8 in
As shown in
According to an embodiment, the early reflection positions ERP lie in a horizontal plane along with the listener position 10.
According to an embodiment, An apparatus for audio rendering or for generating an early reflection pattern 1 may be configured to determine the early reflection positions ERP with adjusting an azimuthal rotation of the constellation according to a pattern azimuth parameter in a bitstream comprising a representation of an audio signal to be rendered. In other words, the complete early reflection pattern 1 may be rotated to better approximate real early reflections, e.g. in a certain environment 5. This azimuthal rotation is not performed in reaction to movements, e.g., a rotational movement of the listener. This adjustment of the azimuthal rotation of the constellation may be performed at an initial determination of the early reflection pattern 1. Once the early reflection pattern 1 is determined, all early reflection positions ERP can solely undergo an identical translational movement in reaction to a translational movement of the listener position 10. The arrangement of the early reflection positions ERP relative to the center 2 of the pattern 1 may be determined using the adjustment of the azimuthal rotation of the constellation. Once the pattern 1 is determined, it may not be adjusted anymore, i.e. a movement of a listener position does not change the relative arrangement between the early reflection positions ERP and the center 2 of the pattern 1.
According to an embodiment, at least one room acoustical parameter which is representative of an acoustical characteristic of an acoustic environment may be considered at a determination of the early reflection pattern. The at least one room acoustical parameter comprises one or more of room dimensions, room volume, and predelay time to the late reverberation. Advantageously, the at least one room acoustical parameter comprises only one of this acoustical characteristics of the acoustic environment. The at least one room acoustical parameter can be received or read from a bitstream, e.g., from the bitstream comprising a representation of an audio signal to be rendered using the early reflection pattern 1.
According to an embodiment, the early reflection pattern 1 can be determined in a manner so that a number of the early reflection positions depends on the at least one room acoustical parameter and/or so that a mutual spacing of the early reflection positions is varied/adapted dependent on the at least one room acoustical parameter. For example, the mutual spacing of the early reflection positions is varied by central expansion centered at the listener position.
According to an embodiment, the number of early reflection positions ERP of the pattern 1 can be determined so that
Under “a farthest early reflection position from the listener position” a “distance of a maximally distanced position among the early reflection positions to the listener position” is understood. According to an embodiment, early reflection positions ERP are placed near the center 2 of the pattern 1 and the more early reflection positions ERP are comprised by the pattern 1 the farther away is the farthest early reflection position from the center 2.
According to an embodiment, mutual spacing of the early reflection positions ERP can be varied/adapted dependent on the at least one room acoustical parameter by uniformly increasing a distance of each early reflection positions ERP to the center 2 with increasing room dimensions, room volume, or predelay time to the late reverberation. Optionally, the mutual spacing of the early reflection positions ERP can be varied/adapted dependent on the at least one room acoustical parameter, so that a distance of a maximally distanced position among the early reflection positions ERP to the listener position 10 is larger the larger the room dimensions are, or the larger the room volume is, or the larger the predelay time to the late reverberation is with the distance being smaller than the predelay time. This allows an even distribution of the early reflection positions ERP and thus an acoustically convincing ER rendering result. It may be advantageous, if the distance of the maximally distanced position among the early reflection positions ERP to the listener position 10 is increased more than a distance of the nearest distanced position among the early reflection positions ERP to the listener position 10 with increasing room dimensions, room volume, or predelay time to the late reverberation.
As shown in
Each of the first set of early reflection positions ERP1 is associated with a corresponding early reflection position of the second set of early reflection positions ERP2. For example, the early reflection position ERP11 may be associated with the corresponding early reflection position ERP21, the early reflection position ERP12 may be associated with the corresponding early reflection position ERP22, the early reflection position ERP13 may be associated with the corresponding early reflection position ERP23, the early reflection position ERP14 may be associated with the corresponding early reflection position ERP24 and the early reflection position ERP15 may be associated with the corresponding early reflection position ERP25. For each of the first set of early reflection positions ERP1, the respective early reflection position ERP1 is positioned on an opposite side of a line perpendicularly crossing a connecting line between the respective early reflection position ERP1 and the corresponding early reflection position ERP2 of the second set of early reflection positions ERP2. This ensures that the listener receives early reflections from different directions and prevents an accumulation of early reflection positions in one area. This positioning using the spiral functions enables a uniform distribution of early reflection positions in the environment 5, resulting into an acoustically convincing, but computationally moderate early reflection rendering result of an audio signal.
According to an embodiment, the apparatus for audio rendering or for generating an early reflection pattern 1 may be configured to place the early reflection positions ERP1 and ERP2 using the two spiral functions 3 and 4,
The one or more spiral functions 3, 4 may define the early reflection positions ERP in polar coordinates (r, β), see (r11 to 5, β11 to 5) for defining the early reflection position ERP1 of the first set of early reflection positions ERP1 and (r21 to 5, β21 to 5) for defining the early reflection position ERP2 of the second set of early reflection positions ERP2.
As will be described in the following in more detail, see especially section 1 “Indoor ER Parameter Calculation”, the one or more spiral functions 3, 4 can be parameterized depending on at least one room acoustical parameter, i.e. the respective spiral function 3, 4 defines the respective early reflection positions ERP dependent on the at least one room acoustical parameter. The at least one room acoustical parameter comprises one or more of room dimensions, room volume and predelay time to late reverberation. The at least one room acoustical parameter may be representative of an acoustical characteristic of an acoustic environment 5.
For example, the one or more spiral functions 3, 4 can be parameterized depending on the at least one room acoustical parameter,
According to an embodiment, the apparatus for audio rendering or for generating an early reflection pattern 1 may be configured to parametrize the one or more spiral functions and determine a number of early reflection positions ERP so that a distance of a maximally distanced position among the early reflection positions to the listener position is larger the larger the room dimensions are, or the larger the room volume is, or the larger the predelay time to the late reverberation is with the distance being smaller than the predelay time.
According to an embodiment, the apparatus for audio rendering or for generating an early reflection pattern 1 may be configured to support different determinations of the early reflection pattern. The apparatus for audio rendering or for generating an early reflection pattern 1 may be configured to choose the type of determination dependent on the environment 5. For example, the determination, e.g., a first determination, of the early reflection pattern 1 using one or more spiral functions 3, 4 and/or the determination, e.g., a first determination, of the early reflection pattern 1 in a manner so that the number of the early reflection positions depends on the at least one room acoustical parameter may be associated with an indoor environment, like a room, see especially section 1 “Indoor ER Parameter Calculation”. Such a determination, e.g., a first determination, may be selected in case of the acoustic environment 5 being an indoor environment or in case of a pattern type index in a bitstream comprising a representation of an audio signal to be rendered assuming a predetermined state. An alternative determination, e.g., a second determination, is described in more detail in section 3 “Outdoor ER Pattern”.
As already described above, one of the newly invented ER patterns 1 for indoor consists of two spirals, see
The following description of the indoor ER parameter calculation refers to
The variable parameters for the spiral pattern, i.e. for the first spiral function 3 and for the second spiral function 4, are mainly set by the predelay time. For example, used is the predelay time to the late reverb, e.g.
The parameters are set dependent on the predelay of the room, which defines the start of the late reverb and calculated with Eq. 1.
The first spiral function 3 and the second spiral function 4 can be used so that the first set of early reflection positions ERP1 is determined in polar coordinates as (r1; β1) and the second set of early reflection positions ERP2 is determined in polar coordinates as (r2; β2). Azimuth and radius calculation of ER positions with the two spiral pattern:
The constant distfactor may correspond to the above mentioned constant distFac. According to an embodiment, the distfactor can be determined based on the at least on room acoustical parameter, e.g., the distfactor can be determined such that same is the larger the larger the predelay time to the late reverb is.
As can be seen in
An apparatus for sound rendering can be configured to generate early reflection contribution loudspeaker signals relating to an early reflection portion of a room impulse response by performing a rendition of an audio signal of one or more sound sources from the early reflection positions ERP, e.g., in a manner level adjusted according to a distance of the respective early reflection position to the listener position, e.g., see the determination of amp1 and amp2 above. For example, for each of the first set of early reflection positions ERB1, the audio signal of the sound source is rendered from the respective early reflection position ERB1 at the level amp1 and, for each of the second set of early reflection positions ERB2, the audio signal of the sound source is rendered from the respective early reflection position ERB2 at the level amp2.
The amplitude of the reflections is dependent on several influencing parameters:
As seen in
The rendering of the audio signal of the sound source from each early reflection position in a manner level adjusted according to a distance of the respective early reflection position to the listener position, may be performed by
For example, for each of the first set of early reflection positions ERB1, the level amp1 at which the audio signal of the sound source is rendered from the respective early reflection position ERB1 is offset by ampCorrection (see Eq. 6) and, for each of the second set of early reflection positions ERB2, the level amp2 at which the audio signal of the sound source is rendered from the respective early reflection position ERB2 is offset by ampCorrection (see Eq. 6). The amplitude correction factor, i.e. ampCorrection of Eq. 6, may be contained in a bitstream comprising a representation of the audio signal. According to an embodiment, the amplitude correction factor is contained in one or more early reflection pattern parameters.
According to an embodiment, the rendering of the audio signal of the sound source from each early reflection position in a manner level adjusted according to a distance of the respective early reflection position to the listener position, may be performed by modifying the level adjustment according to the distance of the respective early reflection position to the listener position relative to a level adjustment used by the apparatus for rendering of the audio signal from the sound source positon according to a distance attenuation (amp1 and amp2). The distance attenuation may be contained in a bitstream comprising a representation of the audio signal. According to an embodiment, the attenuation is contained in one or more early reflection pattern parameters.
As can be seen in
As described above for an audio signal of a single sound source, it is also possible to apply this rendering technic to two or more audio signals of two or more sound sources, wherein the special rendering is applied to a weighted sum of the two or more audio signals. The calculation of the weighted sum is described in more detail in section 5.
An embodiment shown in
Specifically for outdoor scenes, but not limited thereto, a new pattern 1 with four roughly cross-positioned ERs is designed, see
Usage of ER patterns for outdoor environments known is highly individual and dependent on the physical setup of the scene. The geometrical analysis 110 described hereafter captures perceptually important characteristics of the outdoor scene, i.e. the environment 5, which are relevant to the perception of ERs:
In other words, the acoustic environment 5 is radially sampled with respect to a nearest reflective surface distance to obtain a radial sampling result. Additionally, a radial integration over the radial sampling result and a weighting of the radial sampling result may be performed so as to obtain the function 112. The weighting may be performed according to radial distance so as to decrease the early reflection contribution with increasing distance.
Alternatively, to analyzing for each analysis point the respective function 112, it is advantageous in terms of efficiency to subject the function 112 determined at the one or more analysis positions to a summation, e.g. averaging, to yield the further function 112′ shown in
As can be seen in
The amplitudes a1 and a2—together with their distances p1 and p2—are, for example, the input values to calculate the outdoor ER pattern 1. The outdoor ER pattern 1 comprises four ERs, see
According to an embodiment shown in
The four early reflection positions ERPi can be placed so that same are positioned at polar coordinates (r(i); β(i)) with i=1 . . . 4.
The angle coordinates may be β(1)≈5°-15°, β(2)≈90°-110°, β(3)≈180°-200°, β(4)≈270°-290°. According to an embodiment, β≈[10°, 100°, 190°,280°].
The radius coordinates may be determined according to equations 7 and 8, wherein a deviation of up to 40% from the calculated radius value may be allowable:
As can be seen, the radius coordinate of the early reflection positions ERP1 and ERP3 is determined with equation 7 and for early reflection positions ERP2 and ERP4 equation 7 is modified to become equation 8.
According to the embodiment shown in
The level reduction of an acoustical point source in free-field conditions follows a 1/r law,
corresponding to an amplitude reduction of factor 2 for every distance doubling, [13]. When the influence of different reflective areas are summarized in few ERs, this reduction over distance should be reduced by an exponential factor.
The distAlpha values [0.5 . . . 1] can be estimated from the area distribution by e.g.
A deviation of about 20% from the calculated distAlpha values may be allowable.
According to an embodiment, distAlpha can be set according to:
if distAlpha<0.5; distAlpha=0.5;
if distAlpha>1.0; distAlpha=1.0.
When the geometrical analysis is carried out in the encoder, then only the algorithmic parameters: predelay, compFactor and distAlpha have to be transferred to the render.
In the case that a more detailed geometrical analysis results in an ER pattern, which cannot be derived by the above defined equations, all single reflection positions and relative amplitudes can be transmitted independently to represent the desired pattern.
Example values from the geometrical analysis for different outdoor scenarios to calculate the ER pattern:
As already described above with regard to
A portal describes the border between one acoustic environment to the next, from one room to the next or from a room to a free-field environment. To make the transition through such portals smooth, a cross-fade processing between the associated simple ER patterns is beneficial. Within a region of e.g. d=5 m, the level of the contribution from one acoustic environment is faded out.
According to an embodiment, an apparatus for rendering may be configured to support a first manner of determination of the early reflection pattern 1 and a second manner of determination of the early reflection pattern 1, wherein the first manner of determination is different from the second manner of determination, e.g., see section 1 and the description of
In a real environment, every audio source has its individual ER pattern, which is dependent on the source and receiver position. In the simplified simulation, every audio source in one environment has the same ER pattern, which is positioned around the listener. When source or listener moves, the source-listener distance changes and therefore the important level relation to the direct sound changes. This level relation has to be preserved.
In an embodiment of the invention this can be accommodated in a computationally efficient way as described in
According to an embodiment, an apparatus for audio rendering or for generating an early reflection pattern 1 may be configured to render an audio signal of two or more sound sources using a room impulse response whose early reflection portion is determined by an early reflection pattern by forming a weighted sum of a first audio signal of a first sound source positioned at the first sound source position and a second audio signal of a second sound source positioned at the second sound source position and by generating early reflection contribution loudspeaker signals relating to the early reflection portion of the room impulse response by rendering the weighted sum from the early reflection positions. The weighted sum, for example, weights the first audio signal more than the second audio signal if a first distance between the first sound source position and the listener position is smaller than a second distance between the second sound source position and the listener position, and weights the second audio signal more than the first audio signal if the first distance is larger than the second distance.
According to an embodiment, the early reflection contribution loudspeaker signals relating to the early reflection portion of the room impulse response may be generated by rendering the weighted sum from each early reflection position in a manner level adjusted according to a distance of the respective early reflection position to the listener position.
In
The reduction caused by the source listener distance is individual per source. There is an additional ampCorrection for the complete ER pattern
A renderer that is equipped to render early reflection patterns in a virtual auditory environment which.
At the direct path 2201/2202 the one or more audio signals 2121/2122 may be rendered to obtain for each of the one or more audio signals 2121/2122 a direct sound contribution loudspeaker signal 2221/2222. For example, for each of the audio signals 2121 and 2122 to be rendered a distance d1/d2 between the respective associated sound source 2101/2102 and a listener position 10 as well as an angle α1/α2 between the respective sound source 2101/2102 and an orientation of the listener may be considered to determine the respective direct sound contribution loudspeaker signal 2221/2222. The direct sound contribution loudspeaker signals 2221/2222 relate to a direct sound source portion of a room impulse response.
According to an embodiment, the apparatus 200 may be configured to mix 260 the one or more audio signals 2121/2122 of the one or more sound sources 2101/2102 to obtain a mixed audio signal 262. At the mixing 260, the signals 2121/2122 may be panned dependent on the position of the respective associated sound source 2101/2102. For example, for each of the audio signals 2121/2122, a distance d1/d2 between the respective associated sound source 2101/2102 and the listener position 10 is considered at the panning/mixing 260. Alternatively, or additionally, the mixing may be performed as described in section 5.
The apparatus 200 is configured to render an audio signal, e.g., the mixed audio signal 262, e.g., a weighted sum of the audio signals 2121 and 2122, of the one or more sound sources 2101/2102 using the room impulse response whose early reflection portion is determined by an early reflection pattern 1, e.g., at the ER paths 230, e.g., to obtain early reflection contribution loudspeaker signals 232 relating to the early reflection portion of the room impulse response. The early reflection contribution loudspeaker signals 232 may be generated by performing a rendition of the audio signal from the early reflection positions ERP, see ERP1 to ERP6.
Optionally, the apparatus 200 may comprise an ER pattern determiner 270, e.g., an apparatus for generating an early reflection pattern 1. The determination of the early reflection pattern 1 may be performed as described in one of the above mentioned embodiments, e.g., see
The bitstream 300 may comprise a representation 2141 of the audios signal 2121 associated with the first sound source 2101 and a representation 2142 of the audios signal 2122 associated with the second sound source 2102.
According to an embodiment, the bitstream 300 may contain/comprise one or more of the herein mentioned parameters. The bitstream 300 may comprise a representation of an audio signal 2141/2142 of a sound source 2101/2102 positioned at a sound source position and comprising one or more early reflection pattern parameters. For example, the bitstream 300 is an audio bitstream with the early reflection parameter inside a header or metadata field of the bitstream, or a file format stream with the early reflection parameter inside a packet of the file format stream and a track of the file format stream comprising an audio bitstream representing the audio signal. The one or more early reflection pattern parameters comprise one or more of an pattern type index, a predelay time to late reverberation, a compression factor, an amplitude correction factor, a distance attenuation exponent, a pattern azimuth parameter, and one or more frequency response parameters.
At the ER path 230, i.e. at the generation of the early reflection contribution loudspeaker signals 232, the apparatus 200 is optionally configured to render the audio signal of the one or more sound sources 2101/2102 from each early reflection position ERP in a manner spectrally shaped according to one or more frequency response parameters (see
The apparatus 200, may be configured to, in performing the rendition of the audio signal of the one or more sound sources 2101/2102 from the early reflection positions ERP, use HRTFs specific for a listener head orientation. The HRTF represents a head related transfer function.
At the optional diffuse path 240 the one or more audio signals 2121/2122 may be rendered to obtain diffuse late reverberation loudspeaker signals 242. The apparatus 200 may be configured to generate a diffuse late reverberation portion of the room impulse response and, for example, use this room impulse response to render the one or more audio signals 2121/2122 in the diffuse path 240. The diffuse late reverberation loudspeaker signals 242 relate to the diffuse late reverberation portion of the room impulse response.
The apparatus 200 may be configured to, in rendering the one or more audio signals 2121/2122, generate a set of loudspeaker signals 252 by forming a summation 250 over direct sound contribution loudspeaker signals 2221/2222 relating to a direct sound source portion of the room impulse response and early reflection contribution loudspeaker signals 232 relating to the early reflection portion of the room impulse response and, optionally, diffuse late reverberation loudspeaker signals 242 relating to the diffuse late reverberation portion of the room impulse response.
The time consuming exact geometrical calculation of ER can especially be avoided in applications like
The apparatus 200 can comprise any of the features described above. For example, the apparatus 200 can comprise the apparatus 100 of
The one or more early reflection pattern parameters 310 may comprise one or more of an pattern type index, a predelay time to late reverberation, a compression factor, an amplitude correction factor, a distance attenuation exponent, a pattern azimuth parameter, one or more frequency response parameters.
Additionally, the apparatus 200 is configured to determine 270 an early reflection pattern 1 depending on the one or more early reflection pattern parameters 310, e.g., as described with regard to
Further the apparatus 200 is configured to render 202 the audio signal of the sound source using a room impulse response 400 whose early reflection portion 410 is determined by an early reflection pattern 1 The early reflection pattern 1 is indicative of a constellation of early reflection positions ERP, see ERP1 to ERP4, and is positioned at the listener position 10 in a manner so that the early reflection positions ERP are located around the listener position 10 and at angular directions from the listener position 10 which are invariant with respect to changes in listener head orientation.
According to an embodiment, the apparatus 200 is configured to, if a pattern type index indicates an encoder-parametrized manner of determination, e.g., as described in section 1, read from the bitstream 300 as part of the one or more early reflection pattern parameters 310 one or more of a number of the early reflections of the early reflection pattern, for each early reflection, an azimuth, an elevation, a radius, e.g., distance to listener position, for each early reflection, an amplitude correction factor, for each early reflection, a distance attenuation exponent and for each early reflection, a frequency response description.
The apparatus 200 can comprise any of the features described above.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
The inventive rendered audio signal or the invented early reflection pattern information can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
21207272.2 | Nov 2021 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2022/081089, filed Nov. 8, 2022, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No EP 21207272.2, filed Nov. 9, 2021, which is also incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2022/081089 | Nov 2022 | WO |
Child | 18655897 | US |