The present application relates to apparatus and methods for rendering reverberation for external sources, but not exclusively for rendering reverberation for external sources in augmented reality and/or virtual reality apparatus.
Reverberation refers to the persistence of sound in a space after the actual sound source has stopped. Different spaces are characterized by different reverberation characteristics. For conveying spatial impression of an environment, reproducing reverberation perceptually accurately is important. Room acoustics are often modelled with individually synthesized early reflection portion and a statistical model for the diffuse late reverberation.
In other words after the direct sound, the listener would hear directional early reflections. After some point, individual reflections can no longer be perceived but the listener hears diffuse, late reverberation. The starting time of the diffuse late reverberation can be referred to as the predelay.
The reverberation can be rendered using, e.g., a Feedback-Delay-Network (FDN) reverberator with a suitable tuning of delay line lengths. FDNs enable a controlling of the reverberation times (RT60) and the energies of different frequency bands individually. Thus, it can be used to render the reverberation based on the characteristics of the room. The reverberation times and the energies of the different frequencies are affected by the frequency-dependent absorption characteristics of the room.
Reverberation spectrum or level can be controlled using the diffuse-to-direct ratio, which describes the ratio of the energy (or level) of reverberant sound energy to the direct sound energy (or the total emitted energy of a sound source). It has been defined, for example within N0182 MPEG-I Immersive Audio Encoder Input Format, that an input to an encoder is provided as a diffuse-to-source energy ratio (DSR) value which indicates the ratio of the diffuse (reverberant) sound energy to the total emitted energy of a sound source. Another well-known measure is the RDR which refers to reverberant-to-direct ratio and which can be measured from an impulse response. The relation between the RDR and DSR values is, described in N0083_MPEG-1 Immersive Audio CfP Supplemental Information, Recommendations and Clarifications, Version 1, and can be represented as:
Referring to
The logarithmic RDR can be obtained as 10*log 10(RDR). Reverberation ratio can refer to the RDR or DSR or other suitable ratio between direct and diffuse/reverberant energy or signal level.
In a virtual environment for virtual reality (VR) or a real physical environment for augmented reality (AR) there can be several acoustic environments, each with their own reverberation parameters which can be different in different acoustic environments. This kind of environment can be rendered with multiple reverberators running in parallel, so that a reverberator instance is running in each acoustic environment. When the listener is moving in the environment, the current environment reverberation is rendered as an enveloping spatial sound surrounding the user, and the reverberation from nearby acoustic spaces is rendered via so called acoustic portals. The acoustic portal or window is a connection between two spaces.
An acoustic portal reproduces the reverberation from the nearby acoustic environment as a spatially extended sound source. In other words the acoustic portal can be seen as acting within an acoustic environment as a sound source with spread, and a reverberation from a nearby room is rendered through the portal. An example of which can be shown from
There is provided according to a first aspect a method for generating reverberant audio signals, the method comprising: obtaining at least one reverberation parameter associated with a first acoustic environment; obtaining at least one audio source located at at least one position outside of the first acoustic environment, the at least one audio source having an associated audio signal; generating at least one parameter for the at least one position of the at least one audio source, related to energy propagation for the at least one audio source; and generating a reverberated audio signal associated with the at least one audio source based on the at least one parameter to adjust the level of the associated audio signal.
The first acoustic environment may comprise at least one finite defined dimension range and at least one acoustic portal associated with the at least one finite defined dimension range.
Generating at least one parameter for the at least one position of the at least one audio source may comprise: obtaining at least one model parameter associated with the at least one position of the at least one audio source; and generating the at least one parameter based on the at least one model parameter, the at least one parameter related to energy propagation for the at least one audio source from the at least one position to the first acoustic environment.
The at least one parameter may be related to energy propagation for the at least one audio source from the at least one position to the first acoustic environment through the at least one acoustic portal.
The method may further comprise generating at least one further parameter related to a propagation delay for the at least one audio source from the at least one position to the first acoustic environment, wherein generating the reverberated audio signal associated with the at least one audio source may be further based on the further parameter applied to delay the associated audio signal.
Obtaining at least one model parameter may comprise obtaining a polynomial in at least two dimensions, and generating at least one parameter based on the at least one model parameter may comprise generating a direct propagation value representing transmission of energy from the at least one audio source through the at least one acoustic portal.
Generating a direct propagation value representing transmission of energy from the at least one audio source through the at least one acoustic portal may comprise evaluating the polynomial in at least two dimensions at a position for the at least one audio source to be rendered.
The method may further comprise obtaining a flag or indicator configured to identify whether the at least one audio source is a static or dynamic audio source, wherein generating at least one parameter may comprise recalculating the generation of the at least one parameter at determined update times for an identified dynamic audio source.
Generating the reverberated audio signal associated with the at least one audio source based on the at least one parameter related to energy propagation applied to the associated audio signal to adjust the level of the associated audio signal further may comprise applying a directivity filter based on an orientation of the audio source.
The at least one position outside of the first acoustic environment may be a center of a spatial extent of the at least one audio source.
The at least one position outside of the first acoustic environment may be at least two positions within a spatial extent of the at least one audio source, wherein generating the at least one parameter may comprise generating a weighted average of parameters associated with the at least two positions of the at least one audio source.
According to a second aspect there is provided an apparatus for assisting generating reverberant audio signals, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: obtaining at least one reverberation parameter associated with a first acoustic environment; obtaining at least one audio source located at at least one position outside of the first acoustic environment, the at least one audio source having an associated audio signal; generating at least one parameter for the at least one position of the at least one audio source, related to energy propagation for the at least one audio source; and generating a reverberated audio signal associated with the at least one audio source based on the at least one parameter to adjust the level of the associated audio signal.
The first acoustic environment may comprise at least one finite defined dimension range and at least one acoustic portal associated with the at least one finite defined dimension range
The apparatus caused to perform generating at least one parameter for the at least one position of the at least one audio source may be caused to perform: obtaining at least one model parameter associated with the at least one position of the at least one audio source; and generating the at least one parameter based on the at least one model parameter, the at least one parameter related to energy propagation for the at least one audio source from the at least one position to the first acoustic environment.
The at least one parameter may be related to energy propagation for the at least one audio source from the at least one position to the first acoustic environment through the at least one acoustic portal.
The apparatus may be further caused to perform generating at least one further parameter related to a propagation delay for the at least one audio source from the at least one position to the first acoustic environment, wherein the apparatus caused to perform generating the reverberated audio signal associated with the at least one audio source may be further caused to perform generating the reverberated audio signal based on the further parameter applied to delay the associated audio signal.
The apparatus caused to perform obtaining at least one model parameter may be further caused to perform obtaining a polynomial in at least two dimensions, and the apparatus caused to perform generating at least one parameter based on the at least one model parameter may be further caused to perform generating a direct propagation value representing transmission of energy from the at least one audio source through the at least one acoustic portal.
The apparatus caused to perform generating a direct propagation value representing transmission of energy from the at least one audio source through the at least one acoustic portal may be caused to perform evaluating the polynomial in at least two dimensions at a position for the at least one audio source to be rendered.
The apparatus may be further caused to obtain a flag or indicator configured to identify whether the at least one audio source is a static or dynamic audio source, wherein the apparatus caused to generate at least one parameter may be caused to perform recalculating the generation of the at least one parameter at determined update times for an identified dynamic audio source.
The apparatus caused to perform generating the reverberated audio signal associated with the at least one audio source based on the at least one parameter related to energy propagation applied to the associated audio signal to adjust the level of the associated audio signal may be further caused to perform applying a directivity filter based on an orientation of the audio source.
The at least one position outside of the first acoustic environment may be a center of a spatial extent of the at least one audio source.
The at least one position outside of the first acoustic environment may be at least two positions within a spatial extent of the at least one audio source, wherein the apparatus caused to perform generating the at least one parameter may be caused to perform generating a weighted average of parameters associated with the at least two positions of the at least one audio source.
According to a third aspect there is provided an apparatus for generating reverberant audio signals, the apparatus comprising means configured to: obtain at least one reverberation parameter associated with a first acoustic environment; obtain at least one audio source located at at least one position outside of the first acoustic environment, the at least one audio source having an associated audio signal; generate at least one parameter for the at least one position of the at least one audio source, related to energy propagation for the at least one audio source; and generate a reverberated audio signal associated with the at least one audio source based on the at least one parameter to adjust the level of the associated audio signal.
The first acoustic environment may comprise at least one finite defined dimension range and at least one acoustic portal associated with the at least one finite defined dimension range
The means configured to generate at least one parameter for the at least one position of the at least one audio source may be configured to: obtain at least one model parameter associated with the at least one position of the at least one audio source; and generate the at least one parameter based on the at least one model parameter, the at least one parameter related to energy propagation for the at least one audio source from the at least one position to the first acoustic environment.
The at least one parameter may be related to energy propagation for the at least one audio source from the at least one position to the first acoustic environment through the at least one acoustic portal.
The means may be further configured to generate at least one further parameter related to a propagation delay for the at least one audio source from the at least one position to the first acoustic environment, wherein the means configured to generate the reverberated audio signal associated with the at least one audio source is further configured to generate the reverberated audio signal based on the further parameter applied to delay the associated audio signal.
The means configured to obtain at least one model parameter may be configured to obtain a polynomial in at least two dimensions, and the means configured to generate at least one parameter based on the at least one model parameter may be configured to generate a direct propagation value representing transmission of energy from the at least one audio source through the at least one acoustic portal.
The means configured to generate a direct propagation value representing transmission of energy from the at least one audio source through the at least one acoustic portal may be configured to evaluate the polynomial in at least two dimensions at a position for the at least one audio source to be rendered.
The means may be further configured to obtain a flag or indicator configured to identify whether the at least one audio source is a static or dynamic audio source, wherein the means configured to generate at least one parameter is configured to recalculate the generation of the at least one parameter at determined update times for an identified dynamic audio source.
The means configured to generate the reverberated audio signal associated with the at least one audio source based on the at least one parameter related to energy propagation applied to the associated audio signal to adjust the level of the associated audio signal is further configured to apply a directivity filter based on an orientation of the audio source.
The at least one position outside of the first acoustic environment may be a center of a spatial extent of the at least one audio source.
The at least one position outside of the first acoustic environment may be at least two positions within a spatial extent of the at least one audio source, wherein the means configured to generate the at least one parameter may be configured to generate a weighted average of parameters associated with the at least two positions of the at least one audio source.
According to a fourth aspect there is provided an apparatus for generating reverberant audio signals, the apparatus comprising: obtaining circuitry configured to obtain at least one reverberation parameter associated with a first acoustic environment; obtaining circuitry configured to obtain at least one audio source located at at least one position outside of the first acoustic environment, the at least one audio source having an associated audio signal; generating circuitry configured to generate at least one parameter for the at least one position of the at least one audio source, related to energy propagation for the at least one audio source; and generating circuitry configured to generate a reverberated audio signal associated with the at least one audio source based on the at least one parameter to adjust the level of the associated audio signal.
According to a fifth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising instructions] for causing an apparatus, for generating reverberant audio signals, the apparatus caused to perform at least the following: obtaining at least one reverberation parameter associated with a first acoustic environment; obtaining at least one audio source located at at least one position outside of the first acoustic environment, the at least one audio source having an associated audio signal; generating at least one parameter for the at least one position of the at least one audio source, related to energy propagation for the at least one audio source; and generating a reverberated audio signal associated with the at least one audio source based on the at least one parameter to adjust the level of the associated audio signal.
According to a sixth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus, for generating reverberant audio signals, to perform at least the following: obtaining at least one reverberation parameter associated with a first acoustic environment; obtaining at least one audio source located at at least one position outside of the first acoustic environment, the at least one audio source having an associated audio signal; generating at least one parameter for the at least one position of the at least one audio source, related to energy propagation for the at least one audio source; and generating a reverberated audio signal associated with the at least one audio source based on the at least one parameter to adjust the level of the associated audio signal.
According to a seventh aspect there is provided an apparatus, for generating reverberant audio signals, comprising: means for obtaining at least one reverberation parameter associated with a first acoustic environment; means for obtaining at least one audio source located at at least one position outside of the first acoustic environment, the at least one audio source having an associated audio signal; means for generating at least one parameter for the at least one position of the at least one audio source, related to energy propagation for the at least one audio source; and means for generating a reverberated audio signal associated with the at least one audio source based on the at least one parameter to adjust the level of the associated audio signal.
According to an eighth aspect there is provided a computer readable medium comprising instructions for causing an apparatus, for generating reverberant audio signals, to perform at least the following: obtaining at least one reverberation parameter associated with a first acoustic environment; obtaining at least one audio source located at at least one position outside of the first acoustic environment, the at least one audio source having an associated audio signal; generating at least one parameter for the at least one position of the at least one audio source, related to energy propagation for the at least one audio source; and generating a reverberated audio signal associated with the at least one audio source based on the at least one parameter to adjust the level of the associated audio signal.
An apparatus comprising means for performing the actions of the method as described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An electronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein.
Embodiments of the present application aim to address problems associated with the state of the art.
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
The following describes in further detail suitable apparatus and possible mechanisms for implementing reverberation in audio scenes with multiple acoustic environments and where two or more acoustic environments are acoustically coupled.
As discussed above several virtual (for VR) or physical (for AR) acoustic environments can be rendered with several digital reverberators running in parallel, each reproducing reverberation according to the characteristics of an acoustic environment.
The environments can furthermore provide inputs to each other via so called portals. For example, as shown with respect to the example environment shown in
The rendering of the audio is such that the listener when at P1 experiences reverberation based on AE2 205 but when passing through the acoustic opening or portal into another acoustic environment AE1 203 then the audio sources S1 2101 and S2 2102 should also be reverberated by the reverberator associated with the AE1 203.
If the audio sources from the neighboring environment AE1 are not reverberated in AE2 then the reverberated sound of AE2 may sound unrealistic. Consider, for example, a gunshot being fired in a relatively dry room (AE1) connected to highly reverberant corridor or room (AE2). If the reverberation is implemented as indicated by current reference models, then the gunshot sound is not reverberated in the highly reverberant corridor even though from the physical perspective this would be clearly expected by the listener.
There exist some solutions for reverberating sources in connected acoustic environments in a listener acoustic environment, which generally require geometric calculations during rendering to determine contributions of sound source energy into a reverberator through a portal opening. These can be computationally heavy, especially if such calculations need to be repeated for several (even hundreds or thousands) of sound sources. This can be shown in
An alternative to run-time calculations is determination or calculation of the necessary gain coefficients (or direct propagation values, DPV) on the encoder side. This has the benefit that computational complexity regarding geometric calculation and checks for line of sight can be offloaded to the encoder. However, encoder side processing has the limitation of generating a large bitstream size if the calculation is performed for all possible sound source positions and the DPV has to be written into the bitstream at all possible sound source.
Furthermore, these known solutions lack the possibility of adjusting the delay of arrival for sound sources from neighboring environments. If such adjustments are not implemented, then any reverberation created for a sound source in a neighboring environment can be presented too early compared to the propagated direct sound, or reverberation created for a sound source within the current environment. This can lead to reduced plausibility or realism of the VR or AR audio experience.
The concept which is expressed in the embodiments as described in further detail herein is one which relates to reproduction of (late) reverberation, where apparatus and methods are configured to enable rendering of reverberation for sound sources external to an acoustic environment with low computational complexity and bitstream size. In other words to offload any determinations and calculations to the encoder in order to reduce computational complexity on renderer, and have compact model parameters to carry the parameters for gain calculation in order to maintain a compact bitstream size.
In some embodiments this can be achieved by:
In some embodiments, the model parameters are the coefficients of a polynomial in two dimensions which enable the calculation of a direct propagation value representing the passage of sound energy through an acoustic portal.
In some other embodiments, the model parameters relate to a three-dimensional region within an audio scene.
For example in some embodiments the polynomial is of the form
In some embodiments, there is a flag indicating static sound sources for which the model evaluation does not need to be repeated but can be implemented only once at their position.
In some embodiments, there is a flag for dynamic objects indicating such sound sources which need to be recalculated at every update cycle.
In some embodiments, the polynomial coefficients are associated with regions in the audio scene where the value of the gain coefficient modelled with the polynomial has a unimodal distribution suitable for modelling with a polynomial.
In some other embodiments the parameters are the weights πk, means μk, and variances Σk of a Gaussian mixture model (GMM). Such a model can be defined as N(ρk,Σk)=Σk=1KπkN(χ|μk,Σk) where N(χ|μk,Σk) evaluates a multivariate normal density with parameters μk and Σk for an input vector x.
In some other embodiments different regions of a (multimodal) surface of gain coefficients (DPV values) are modelled with a Gaussian mixture model and the means of the mixture densities model the peaks in the surface.
In some other embodiments the number of Gaussians in the mixture K is set to be equal to the number of peaks in the surface of the DPV data.
In some embodiments, any other suitable approach is used to determine a model which determines the DPV based on the audio source position with acceptable accuracy while being represented by a compact set of parameters. For example, the derivation of DPV can be performed by a suitably trained neural network, where the neural network can be represented by a compact set of parameters.
In some further embodiments, the signal of an external sound source is fed into a predelay line having a length proportional to the distance of the sound source from the audio environment whose reverberation is rendered.
Furthermore in some embodiments, the orientation of the sound source is taken into account when applying a directivity filter to the samples in the predelay line.
In some embodiments, in case of a sound source with spatial extent (or size), the center of the spatial extent is defined as the sound source position. In another embodiment, in case of a sound source with spatial extent, the evaluation with the model is performed with two or more representative point sources with weights associated with each of the representative point source.
MPEG-1 Audio Phase 2 will normatively standardize the bitstream and the renderer processing. There will also be an encoder reference implementation, but it can be modified later as long as the output bitstream follows the normative spec. This allows improving the codec quality also after the standard has been finalized with novel encoder implementations.
The concept as discussed in the following embodiments can be assigned to different parts of the MPEG-1 standard such as follows:
The normative bitstream shall contain the model parameter values corresponding to different portals and different regions of the audio space where sound source can locate and propagate to this portal. The bitstream shall also contain the necessary scene and acoustic (reverberation parameters).
The normative renderer shall decode the bitstream to obtain Scene and reverberation parameters and model parameters, initialize reverberators for rendering using the reverberator parameters, determine portal connection information between acoustic environments, determine model parameters associated with a portal and position outside an acoustic environment, evaluate a gain value to be applied to a sound source external to an acoustic environment using the model parameters and render reverberated signal using the reverberator while applying the gain value to the audio signal of the sound source when input to the reverberator.
With respect to
In some embodiments the input to the system of apparatus comprises scene and reverberation parameters 300. The scene and reverberation parameters 300 in some embodiments can be obtained from a retrieved 6DoF rendering bitstream such as provided by a suitable bitstream. The scene and reverberation parameters 300 in some embodiments are in the form of enclosing room geometry and acoustic parameters (for example reverberation time RT60, reverberation ratio as DSR or RDR). The scene and reverberation parameters 300 in some embodiments can also comprise: the positions of audio elements (sound sources) in the environment; the positions of the enclosing room geometries (or Acoustic Environments) so that the method can determine in which acoustic environment the listener currently is based on the listener pose parameters 302; the positions and geometries of the portals (i.e. the acoustic couplings or openings in scene geometry) such that sound can pass between acoustic environments; and polynomial coefficients (or more generally model parameters) for calculating gain values for sources in connected acoustic environments (or elsewhere in the audio scene).
Additionally the input to the apparatus comprises an audio signal 306 which can be obtained from the retrieved audio data and which in some embodiments is provided by the suitable obtained bitstream.
The system furthermore is configured to obtain listener pose information 302. The listener pose information is based on the orientation and/or position of the listener or user of the playback apparatus.
As an output, the apparatus provides a reverberated audio signal 314 (e.g. binauralized with head-related-transfer-function (HRTF) filtering for reproduction to headphones, or panned with Vector-Base Amplitude Panning (VBAP) for reproduction to loudspeakers).
In some embodiments the apparatus comprises a reverberator configurator 303. The reverberator configurator 303 in some embodiments is configured to convert the reverberation parameters into reverberator parameters 304 which are parameters for the digital feedback delay network (FDN) reverberator (or more generally the reverberators 305)
The apparatus in some embodiments comprises a reverberator controller 301, which is configured to receive the scene and reverberation parameters 300 and produce direct propagation values and delays 324 for sound sources which are outside acoustic environments but feed their energy to the acoustic environments via portals. These direct propagation values and delays 324 information can change over time as portals open or close or sound sources move. In order to produce the direct propagation values and delays 324, the reverberator controller 301 is configured to employ the positions and geometries of portals, positions of sound sources, and polynomial coefficients obtained from the scene and reverberation parameters 300.
In some embodiments the apparatus comprises reverberators 305. The reverberators 305 are configured to receive the direct propagation values and delays 324, audio signal 306 sin(t) (where t is time) and reverberator parameters 304. The reverberators 305 in some embodiments are initialized and employed to reproduce reverberation according to the reverberator parameters 304. In some embodiments the each of the reverberators 305 is configured to reproduce the reverberation according to the characteristics (reverberation time and level) of an acoustic environment, where the corresponding reverberator parameters are derived from. In some embodiments, the reverberator parameters 304 are produced by an optimization or configuration routine on the reverberator controller 301 based on acoustic environment (reverberation) parameters.
In these embodiments the reverberators 305 are configured to reverberate the audio signal 306 based on the reverberator parameters 304 and direct propagation values and delays 324. The details of the reverberation processing are discussed in further details below.
The reverberator output audio signals srev,r(j, t) 310 (where j is the output audio channel index and r the reverberator index) are output from the reverberators 305.
In some embodiments there are several reverberators, each of which produce several output audio signals.
In some embodiments the apparatus comprises a reverberator output signals spatializer 307 which is configured to receive the reverberator output audio signals 310 and produce a reverberated audio signal 314 suitable for reproduction via headphones or via loudspeakers. The reverberator output signals spatializer 307 is also configured to receive reverberator output channel positions 312 from a reverberator output signals spatialization controller 309. The reverberator output channel positions 312 in some embodiments is configured to indicate the Cartesian coordinates which are to be used when rendering each of the signals in srev,r(j, t). In alternative embodiments other representations such as polar coordinates can be used.
The reverberator output signals spatializer 307 can be configured to render each reverberator into a desired output format such as binaural and then sum the signals to produce the output reverberated audio signal 314. For binaural reproduction the reverberator output signals spatializer 307 can be configured to use HRTF filtering to render the reverberator output audio signals 310 in their desired positions indicated by reverberator output channel positions 312.
In such a manner this reverberation in the reverberated audio signals 314 is based on the scene and reverberation parameters 300 as was desired and considers listener pose parameters 302.
Thus for example the scene and reverberator parameters is obtained as shown in
Additionally then the acoustic environment information or parameters are obtained as shown in
Furthermore there is obtained a portal connected to the acoustic environment (as indicated by the acoustic environment information or parameters) as shown in
Then there is obtained an audio source position outside this acoustic environment as shown in
Based on these previous operations there is then determined or obtained model parameters for example a set of polynomial coefficients associated with this audio source position as shown by
Then based on the determined or obtained model parameters a determination of obtaining of the DPV value for this sound source position and portal is implemented as shown by
In some embodiments there are region of validity data associated with polynomial coefficients. The region of validity data can describe, for example, the corner coordinates of a rectangular region defining a validity region on the x, y plane for polynomial coefficients. There can be several such validity regions if there are several polynomials. If there are no polynomial coefficients for this sound source position (i.e. no validity region covers the current sound source position) then it means that from this position sound does not propagate via the portal. Alternatively, or in addition to, if the polynomial evaluates to zero then it can be determined that sound does not propagate from this position. If there are no validity regions then the polynomial coefficients can be considered to cover the entire scene.
In some embodiments, as discussed above the polynomial is in the form
As shown in
The direct propagation values and delays can then be output as shown in
In some embodiments, there can be an additional determination of whether a portal connection is active.
An active portal connection can be determined as a connection where the portal is open; that is, there is no blocking acoustic element such as door in the portal. The exact method for determination about which portal connections are active is not the focus of this information. It can be determined using any suitable approach (e.g., explicit scene information about the state of the portal connection or determined with via shooting rays for detecting occlusion). For nonactive portal connections the DPV value can be set to zero.
The obtaining or determining of an audio source associated with this reverberator but outside this acoustic environment is shown in
Furthermore is shown the obtaining of the audio signals as shown in FIG. by 503.
After the parameters for the FDN have been provided and the input audio signal obtained the audio signal can be input to the pre-delay bus corresponding to the delay and apply the direct propagation value as shown
Following this delay is shown a processing of the input bus and the reverberator as shown in
Depending on determined direct propagation values and delays, the audio signal sin(t) of an audio source at a position x, y is taken as input to the reverberators. If the direct propagation value DPV(p, r, x, y) corresponding to portal p of reverberator r is nonzero, then sin(t) is provided as an input signal to reverberator r. When inputting sin(t) into the reverberator r, sin(t) is multiplied with the obtained gain sqrt(DPV(p, r, x, y)). Providing the sin(t) as an input to a reverberator which has a portal opening and a non-zero direct propagation value has the desired effect that sin(t) gets reverberated by the reverberator r even if the sound source was not located in the corresponding acoustic environment. Moreover, the gain of the source in the reverberator is scaled by the DPV, which depends on the path from the source to the portal opening.
For example, considering a virtual scene comprising a main hall (having a reverberator r) and an entrance room (having reverberator k). In this case it is desired that the sound sources of the entrance hall get also reverberated in the main room and vice versa.
Having extra predelay for external sound sources models approximately the additional time of flight that the sound needs to take before it arrives from the connected AE to the current AE reverberator. In some embodiments, the largest dimension is used to determine the predelay for audio sources from the neighboring acoustic environments sources contributing to the current acoustic environment.
In
Thus for example as shown in
There is also shown in
A third input bus path (p3) for a further source outside of the environment for sources directivity pattern dir4 and predelay p3 and directivity pattern dir5 and predelay p3 which comprises a pair of DPV filters 603 sqrt(DPV(x2, y2)) and 605 sqrt(DPV(x3, y3)), a pair of GEQdir4,p3 617 and GEQdir5,p3 619 which receive the output of the DPV filters respectively and the outputs are combined by combiner 625 before a third delay applied by delay (with delay z−p3) 635.
Each of the paths can then be combined by combiner 641 and a ratio filter applied GEQratio 651 before the output is passed to the FDN reverberator 661. In other words the outputs from each path are ratio filtered with the GEQratio filter 651. The FDN reverberator 661 processing is applied to the filtered and summed input signal. The resulting reverberator output signals srev,r(j, t) (where j is the output audio channel index and r the reverberator index) are the output of the reverberators.
Directivity filtering can in some embodiments dynamically take into account changing sound source orientation during rendering. The directivity filtering can take into account the changes caused by integrating over the sector Area determining DPV such as shown in
With respect to
Thus is shown the operations of obtaining scene and reverberator parameters as shown by
Following this is the determination of a listener reverberator corresponding to listener acoustic environment as shown in
Then is the provision of head tracked output positions for the listener reverberator 709.
The determination of portals directly connected to the listener acoustic environment is shown in
For each portal found, obtain its geometry and provide output channel positions for the connected acoustic environment reverberator on the geometry shown in
Then output the determined reverberator output channel positions as shown in
A neighbor acoustic environment can be audible in the current environment via the directional portal output. The reverberator output signals spatialization controller is thus configured to employ the portal position information carried in the scene parameters to provide in the reverberator output channel positions suitable positions for the reverberator outputs which correspond to portals. To obtain a spatially extended perception of the portal sound, the output channels corresponding to reverberators which are to be rendered at a portal are provided positions along portal geometry which divides two acoustic spaces, such as AC1 207 depicted in
Then the reverberator output signals spatializer 307 comprises an output channel combiner 803 which combines the channels and generates the reverberated audio signal 314.
In some embodiments the FDN reverberator 305 comprises an energy ratio control filter GEQratio 953 which is configured to receive the input.
The example FDN reverberator 305 is configured such that the reverberation parameters are processed to generate coefficients GEQd (GEQ1, GEQ2, . . . GEQD) of the attenuation filters 961, feedback matrix 957 coefficients A, lengths md (m1, m2, . . . mD) for D delay lines 959 and energy ratio control filter 953 coefficients GEQratio. The energy ratio control filter 953 can also be referred as RDR energy ratio control filter or reverberation ratio control filter or reverberation equalization or coloration filter. The purpose of such a filter is to adjust the level and spectrum according to the RDR or DSR or other reverberation ratio data.
In some embodiments the attenuation filter GEQd 961 is implemented as a graphic EQ filter using M biquad IIR band filters. With octave bands M=10, thus, the parameters of the graphic EQ comprise the feedforward and feedback coefficients for biquad IIR filters, the gains for biquad band filters, and the overall gain.
The reverberator uses a network of delays 959, feedback elements (shown as attenuation filters 961, feedback matrix 957 and combiners 955 and output gain 963) to generate a very dense impulse response for the late part. Input samples are input to the reverberator to produce the reverberation audio signal component which can then be output.
The FDN reverberator comprises multiple recirculating delay lines. The unitary matrix A 957 is used to control the recirculation in the network. Attenuation filters 961 which may be implemented in some embodiments as graphic EQ filters implemented as cascades of second-order-section IIR filters can facilitate controlling the energy decay rate at different frequencies. The filters 961 are designed such that they attenuate the desired amount in decibels at the pulse pass through the delay line and such that the desired RT60 time is obtained.
With octave bands M=10, thus, the parameters of the graphic EQ comprise the feedforward b and feedback a coefficients for 10 biquad IIR filters, the gains for biquad band filters, and the overall gain.
The number of delay lines D can be adjusted depending on quality requirements and the desired tradeoff between reverberation quality and computational complexity. In an embodiment, an efficient implementation with D=15 delay lines is used. This makes it possible to define the feedback matrix coefficients A as proposed by Rocchesso in Maximally Diffusive Yet Efficient Feedback Delay Networks for Artificial Reverberation, IEEE Signal Processing Letters, Vol. 4. No. 9, September 1997, in terms of a Galois sequence facilitating efficient implementation.
With respect to
The first operation is one of obtaining scene and reverberator parameters as shown in
Then is shown determining delay line lengths based on room dimensions as shown in
Following this is determining delay line attenuation filter parameters based on delay line lengths and RT60 as shown in
This can be followed by determining reverberation ratio filter parameters based on RDR or DSR parameters as shown in
Then is the output of the reverberator parameters as shown in
With respect to
The encoder or server 1901 in some embodiments can be performed on content creator computers and/or network server computers. The encoder 1901 can generate the bitstream 1921 which is made available for downloading or streaming (or storing). The decoder/renderer 1941 which may be implemented as a playback device and which can be a mobile device, personal computer, sound bar, tablet computer, car media system, home HiFi or theatre system, head mounted display for AR or VR, smart watch, or any suitable system for audio consumption.
The encoder 1901 is configured to receive the virtual scene description 1900 and the audio signals 1904. The virtual scene description 1900 can be provided in the MPEG-1 Encoder Input Format (EIF) or in other suitable format. Generally, the virtual scene description contains an acoustically relevant description of the contents of the virtual scene, and contains, for example, the scene geometry as a mesh or voxel, acoustic materials, acoustic environments with reverberation parameters, positions of sound sources, and other audio element related parameters such as whether reverberation is to be rendered for an audio element or not.
In some embodiments the encoder 1901 comprises a scene and portal connection parameter obtainer 1915 configured to obtain the virtual scene description and portal parameters.
The encoder 1901 further comprises a DPV value and polynomial coefficient obtainer 1916. The obtainer 1916 can be configured to derive the direct propagation value (DPV) for each AE and each portal opening. For the derivation the encoder uses the obtained portal geometry from portal geometry processing, or from input from the content creator. Portal geometry contains the mesh or other geometric representation describing the portal opening geometry.
The processing is as follows:
For each AE 1201 and for each portal 1203 within the AE 1201
Obtain the portal opening face 1205 having the same orientation as the wall of the AE where the portal is 1207 and which is closest to the center of the AE;
For each possible sound source position 1209:
It is noted that the area ratio is an approximation since the formed face 1235 is rectangular and is not taking the spherical shape into account. In some embodiments, the modelling approximation error of calculating the rectangular surface area while neglecting the curvature of the surface is compensated by adding a suitable multiplier to compensate for the error. Such multiplier can be a constant that is applied to the calculated area of the face 1235 which will increase the area as if it was curved rather than rectangular and flat.
The DPV can depend on the position of the source within the AE and with respect to the opening. Polynomial modelling can be used to model a smoothly varying DPV value within a range of x, y positions in the space. Thus, the polynomial is used to model the place-dependent value of the DPV within the AE. Note that equivalently the coordinates can be x, z if the openGL coordinate system is used, in which the x, and z define the horizontal plane and y is the vertical axis.
For example, second or third order polynomial in two dimensions having the form f(x,y)=a0+a1x+a2x2+a3x3+b0+b1y+b2y2+b3y3 can be used. The fit of the polynomial to the calculated DPV data at positions x and y can be implemented, for example, using a least squares fit. The fit can be done for a second order polynomial and third order polynomial and the one giving a better fit to the data can be selected. In other embodiments higher order polynomials can be used.
Polynomial coefficients are carried in the bitstream. The polynomial coefficients are associated with a region of validity.
In some embodiments the selection of polynomial modeling regions is done by analyzing the error of the polynomial modeling. That is, the error of DPV values in positions DPV(x, y) calculated using the method of
The bitstream syntax and semantics that can be used to transmit information from an encoder device on the example embodiment where polynomial coefficients are used to represent the DPV data are presented as follows:
The revNumUniquePortals lists the number of portals in the audio scene. Each unique portal typically has two acoustic environment(s) associated with the portal opening. Depending on the audio source (object, channel, HOA signal type) position, the correct unique portal is selected. Subsequently, the polynomial corresponding to the audio source position is selected evaluate the contribution of the audio source to the diffuse late reverberation rendering in the acoustic environment where its contribution is calculated to.
In some embodiments there can be a number of elevation levels defined for each of the polynomials. In this case the bitstream syntax above will have a variable referred as revNumAreaElevations which will indicate the number of elevation levels used. Each elevation level will have its polynomial coefficients, and the renderer will then select the coefficients having the elevation level closest to the current sound source elevation. The number of elevation levels can have an explicit height specified or in other cases, the levels divide the height of the audio scene into equal number of parts.
In some embodiments the polynomial order (e.g., whether it is a second or third order polynomial) can be explicitly carried in the bitstream, e.g., as a variable polynomialAreaEquationOrder.
It is noted that if the model has a different form than a polynomial then the parameters will also be different. The model could be an alternative way of creating or modelling a surface which represents the DPV data over a certain region. Examples include the weights, means, and covariances of a Gaussian mixture model or the weights of a neural network. In some embodiments the model can be a simple linear model in one or more dimensions. Such a simple linear model in one dimension can have just one parameter.
The following mnemonics are defined to describe the different data types used in the coded bitstream payload.
In some embodiments a complementary or alternative syntax can be used to carry the explicit DPV values for sound source positions. In the below syntax, there are revNumObjectSources object sources, each having a bitstream identifier objSrcBsld, which get a DPV value represented as directPropagationValue with respect to portal openings identified with portalldx
In some embodiments the above syntax can be used for a subset of the most important sound sources of the scene. Such important sound sources can be e.g. the static sound sources in the scene (i.e., sources which do not move) or sources which are otherwise determined or marked to be important. In some embodiments explicit DPV value data can be carried for important regions of the scene or regions in the scene where the modelled values do not result in accurate enough modelling of the calculated DPV data.
Furthermore the encoder 1901 can comprise a scene and portal connection payload encoder 1917 which is configured to encode the scene and portal connection payload and the DPV values and/or polynomial coefficients.
Furthermore the encoder 1901 can comprise in some embodiments a reverberation parameter obtainer 1911 which is configured to obtain the virtual scene description 1900 and generate or obtain suitable reverberation parameters.
Furthermore, in some embodiments, the encoder 1901 comprises a reverberation payload encoder 1913 configured to obtain the determined or obtained reverberation parameters and generate a suitable encoded payload.
The encoder 1901 further comprises a MPEG-H 3D audio encoder 1914 configured to obtain the audio signals 1904 and MPEG-H encode them and pass them to a bitstream encoder 1915.
The encoder 1901 furthermore in some embodiments comprises a bitstream encoder 1921 which is configured to receive the output of the reverberation payload encoder 1913 and the encoded audio signals from the MPEG-H encoder 1914 and the scene and portal connection payload encoder 1917 and generate the bitstream 1921 which can be passed to the bitstream decoder 1951. The bitstream 1921 in some embodiments can be streamed to end-user devices or made available for download or stored.
The decoder/renderer 1941 in some embodiments is configured to receive or otherwise obtain the bitstream 1921, and furthermore can be configured to receive or otherwise obtain the listening space description from a listening space description generator 1971 (which can in some embodiments be in a listening space description format-LSDF), which defines the acoustic properties of the listening space within which the user or listener is operating in. Additionally in some embodiments the playback device is configured to obtain, for example from the head mounted device (HMD), listener orientation or position information. These can for example be generated by sensors within the HMD or from sensors in the environment sensing the orientation or position of the listener.
In some embodiments the decoder/renderer 1941 comprises a bitstream decoder 1951 which is configured to regenerate the scene, portal and reverberation information and pass it to a scene, portal and reverberation payload decoder 1953, and obtain MPEG-H 3D audio packets which are passed to the MPEG-H 3D audio decoder 1954, and audio element parameters such as sound sources positions for direct sound processing.
The decoder/renderer 1941 further can comprise a scene, portal and reverberation payload decoder 1953 configured to obtain the encoded scene, portal and reverberation parameters and decode these in an opposite or inverse operation to the reverberation payload encoder 1913 and scene and portal connection payload encoder 1917.
In some embodiments the decoder/renderer 1941 comprises a head pose generator 1957 which is configured to receive information from a head mounted device or similar and generates head pose information or parameters which can be passed to the reverberator output signal spatializer 1962 and HRTF processor 641
The decoder/renderer 1941, in some embodiments, comprises a reverberator controller 1955 and configurator 1956 which is configured to obtain the determined scene, portal and reverberation parameters and generate the parameters which can be passed to the (FDN) reverberators 1961 in a manner as described earlier.
The decoder/renderer 1941 in some embodiments comprises a MPEG-H 3D audio decoder 1954 which is configured to decode the audio signals and pass them to the (FDN) reverberator 1911 and direct sound processor 1965.
The decoder/renderer 1941 furthermore comprises the (FDN) reverberator 1961 initialized by the reverberator controller 1955 and reverberator configurator 1956 and configured to implement a suitable reverberation of the audio signals.
The output of the (FDN) reverberator 1955 is configured to output to a reverberator output signals spatializer 1962.
Additionally the decoder/renderer 1941 comprises a direct sound processor 1965 which is configured to receive the decoded audio signals and configured to implement any direct sound processing such as air absorption and distance-gain attenuation and which can be passed to a HRTF processor 1963.
The HRTF processor 1963 can be configured to receive the output of the direct sound processor 1965 and generate processed audio signals associated with the processed direct audio components to the binaural signal combiner 1967.
The binaural signal combiner 1967 is configured to combine the direct and reverberant parts to generate a suitable output (for example for headphone reproduction).
The output can be passed to the head mounted device.
The playback device can be implemented in different form factors depending on the application. In some embodiments the playback device is equipped with its own listener position tracking apparatus or receives the listener position information from an external apparatus. The playback device can in some embodiments be also equipped with headphone connector to deliver output of the rendered binaural audio to the headphones.
With respect to
In some embodiments the device 2000 comprises at least one processor or central processing unit 2007. The processor 2007 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 2000 comprises a memory 2011. In some embodiments the at least one processor 2007 is coupled to the memory 2011. The memory 2011 can be any suitable storage means. In some embodiments the memory 2011 comprises a program code section for storing program codes implementable upon the processor 2007. Furthermore in some embodiments the memory 2011 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 2007 whenever needed via the memory-processor coupling.
In some embodiments the device 2000 comprises a user interface 2005. The user interface 2005 can be coupled in some embodiments to the processor 2007. In some embodiments the processor 2007 can control the operation of the user interface 2005 and receive inputs from the user interface 2005. In some embodiments the user interface 2005 can enable a user to input commands to the device 2000, for example via a keypad. In some embodiments the user interface 2005 can enable the user to obtain information from the device 2000. For example the user interface 2005 may comprise a display configured to display information from the device 2000 to the user. The user interface 2005 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 2000 and further displaying information to the user of the device 2000. In some embodiments the user interface 2005 may be the user interface for communicating.
In some embodiments the device 2000 comprises an input/output port 2009. The input/output port 2009 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 2007 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
The input/output port 2009 may be configured to receive the signals.
In some embodiments the device 2000 may be employed as at least part of the renderer. The input/output port 2009 may be coupled to headphones (which may be a headtracked or a non-tracked headphones) or similar.
Thus in summary the embodiments as described above show:
A normative bitstream comprising:
Additionally in some embodiments the normative bitstream comprises trigger and predelay modification parameters described using the syntax described herein. The bitstream in some embodiments is streamed to end-user devices or made available for download or stored.
In some embodiments the normative renderer is configured to decode the bitstream to obtain the scene, reverberation parameters and dynamic reverb adjustment parameters and perform the modification to reverberator parameters as described herein. Moreover in some embodiments the renderer is configured to implement reverberation and early reflections rendering.
In some embodiments the complete normative renderer can also obtain other parameters from the bitstream related to room acoustics and sound source properties, and use them to render the direct sound, diffraction, sound source spatial extent or width, and other acoustic effects in addition to diffuse late reverberation and early reflections.
Thus in summary the concept is on in which there is the capacity for dynamic modification of rendering of reverberation based on the various triggers specified in the bitstream to enable bitrate and computational scalability based on suboptimal early reflections or other missing acoustic effects.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or “fab” for fabrication.
As used in this application, the term “circuitry” may refer to one or more or all of the following:
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
The term “non-transitory,” as used herein, is a limitation of the medium itself (i.e., tangible, not a signal) as opposed to a limitation on data storage persistency (e.g., RAM vs. ROM).
As used herein, “at least one of the following: <a list of two or more elements>” and “at least one of <a list of two or more elements>” and similar wording, where the list of two or more elements are joined by “and” or “or”, mean at least any one of the elements, or at least any two or more of the elements, or at least all the elements
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Number | Date | Country | |
---|---|---|---|
63496441 | Apr 2023 | US |