The present document relates to an efficient and consistent handling of the directivity of audio sources in a virtual reality (VR) rendering environment.
Virtual reality (VR), augmented reality (AR) and/or mixed reality (MR) applications are rapidly evolving to include increasingly refined acoustical models of sound sources and scenes that can be enjoyed from different viewpoints and/or perspectives or listening positions. Two different classes of flexible audio representations may e.g. be employed for VR applications: sound-field representations and object-based representations. Sound-field representations are physically-based approaches that encode the incident wavefront at the listening position. For example, approaches such as B-format or Higher-Order Ambisonics (HOA) represent the spatial wavefront using a spherical harmonics decomposition. Object-based approaches represent a complex auditory scene as a collection of singular elements comprising an audio waveform or audio signal and associated parameters or metadata, possibly time-varying.
Enjoying the VR, AR and/or MR applications may include experiencing different auditory viewpoints or perspectives by the user. For example, room-based virtual reality may be provided based on a mechanism using 6 degrees of freedom (DoF). A 6 DoF interaction may comprise translational movement (forward/back, up/down and left/right) and rotational movement (pitch, yaw and roll). Unlike a 3 DoF spherical video experience that is limited to head rotations, content created for 6 DoF interaction also allows for navigation within a virtual environment (e.g., physically walking inside a room), in addition to the head rotations. This can be accomplished based on positional trackers (e.g., camera based) and orientational trackers (e.g. gyroscopes and/or accelerometers). 6 DoF tracking technology may be available on desktop VR systems (e.g., PlayStation®VR, Oculus Rift, HTC Vive) as well as on mobile VR platforms (e.g., Google Tango). A user's experience f directionality and spatial extent of sound or audio sources is critical to the realism of 6 DoF experiences, particularly an experience of navigation through a scene and around virtual audio sources.
Available audio rendering systems (such as the MPEG-H 3D audio renderer) are typically limited to the rendering of 3 DoFs (i.e. rotational movement of an audio scene caused by a head movement of a listener) or 3 DoF+, which also adds small translational changes of the listening position of a listener, but without taking effects such as directivity or occlusion into consideration. Larger translational changes of the listening position of a listener and the associated DoFs can typically not be handled by such renderers.
The present document is directed at the technical problem of providing resource efficient methods and systems for handling translational movement in the context of audio rendering. In particular, the present document addresses the technical problem of handling the directivity of audio sources within 6DoF audio rendering in a resource efficient and consistent manner.
According to an aspect, a method for rendering an audio signal of an audio source in a virtual reality rendering environment is described. The method comprises determining whether or not the directivity pattern of the audio source is to be taken into account for the (current) listening situation of a listener within the virtual reality rendering environment. Furthermore, the method comprises rendering an audio signal of the audio source without taking into account the directivity pattern of the audio source, if it is determined that the directivity pattern of the audio source is not to be taken into account for the listening situation of the listener. In addition, the method comprises rendering the audio signal of the audio source in dependence of the directivity pattern of the audio source, if it is determined that the directivity pattern is to be taken into account for the listening situation of the listener.
According to a further aspect, a method for rendering an audio signal of a first audio source to a listener within a virtual reality rendering environment is described. It should be noted that the term virtual reality rendering environment should also include augmented and/or mixed reality rendering environments. The method comprises determining a control value for the listening situation of the listener within the virtual reality rendering environment based on a directivity control function. Furthermore, the method comprises adjusting the directivity pattern, notably a directivity gain of the directivity pattern, of the first audio source in dependence of the control value. In addition, the method comprises rendering the audio signal of the first audio source in dependence of the adjusted directivity pattern, notably in dependence of the adjusted directivity gain, of the first audio source to the listener within the virtual reality rendering environment.
According to a further aspect, a virtual reality audio renderer for rendering an audio signal of an audio source in a virtual reality rendering environment is described. The audio renderer is configured to determine whether or not the directivity pattern of the audio source is to be taken into account for the listening situation of a listener within the virtual reality rendering environment. In addition, the audio renderer is configured to render an audio signal of the audio source without taking into account the directivity pattern of the audio source, if it is determined that the directivity pattern of the audio source is not to be taken into account for the listening situation of the listener. The audio renderer is further configured to render the audio signal of the audio source in dependence of the directivity pattern of the audio source, if it is determined that the directivity pattern is to be taken into account for the listening situation of the listener.
According to another aspect, a virtual reality audio renderer for rendering an audio signal of a first audio source to a listener within a virtual reality rendering environment is described. The audio renderer is configured to determine a control value for a listening situation of the listener within the virtual reality rendering environment based on a directivity control function (provided e.g., within a bitstream). Furthermore, the audio renderer is configured to adjust the directivity pattern (provided e.g., within the bitstream) of the first audio source in dependence of the control value. The audio renderer is further configured to render the audio signal of the first audio source in dependence of the adjusted directivity pattern of the first audio source to the listener within the virtual reality rendering environment.
According to a further aspect, a method for generating a bitstream is described. The method comprises determining an audio signal of at least one audio source, and determining a source position of the at least one audio source within a virtual reality rendering environment. In addition, the method comprises determining a (non-uniform) directivity pattern of the at least one audio source, and determining a directivity control function for controlling use of the directivity pattern for rendering the audio signal of the at least one audio source in dependence of the listening situation of a listener within the virtual reality rendering environment. The method further comprises inserting data regarding the audio signal, the source position, the directivity pattern and the directivity control function into the bitstream.
According to a further aspect, an audio encoder configured to generate a bitstream is described. The bitstream may be indicative of an audio signal of at least one audio source and/or of a source position of the at least one audio source within a virtual reality rendering environment. Furthermore, the bitstream may be indicative of a directivity pattern of the at least one audio source, and/or of a directivity control function for controlling use of the directivity pattern for rendering the audio signal of the at least one audio source in dependence of a listening situation of a listener within the virtual reality rendering environment.
According to another aspect, a bitstream and/or a syntax for a bitstream is described. The bitstream may be indicative of an audio signal of at least one audio source and/or of a source position of the at least one audio source within a virtual reality rendering environment. Furthermore, the bitstream may be indicative of a directivity pattern of the at least one audio source, and/or of a directivity control function for controlling use of the directivity pattern for rendering the audio signal of the at least one audio source in dependence of a listening situation of a listener within the virtual reality rendering environment. The bitstream may comprise one or more data elements which comprise data regarding the above mentioned information.
According to a further aspect, a software program is described. The software program may be adapted for execution on a processor and for performing the method steps outlined in the present document when carried out on the processor.
According to another aspect, a computer-readable storage medium is described. The computer-readable storage medium may comprise (instructions of) a software program adapted for execution on a processor (or a computer) and for performing the method steps outlined in the present document when carried out on the processor.
According to a further aspect, a computer program product is described. The computer program may comprise executable instructions for performing the method steps outlined in the present document when executed on a computer.
It should be noted that the methods and systems including its preferred embodiments as outlined in the present patent application may be used stand-alone or in combination with the other methods and systems disclosed in this document. Furthermore, all aspects of the methods and systems outlined in the present patent application may be arbitrarily combined. In particular, the features of the claims may be combined with one another in an arbitrary manner.
The invention is explained below in an exemplary manner with reference to the accompanying drawings, wherein
As outlined above, the present document relates to the efficient and consistent provision of 6DoF in a 3D (three dimensional) audio environment.
The different audio sources 113 of an audio environment 110 may be captured using audio sensors 120, notably using microphone arrays. The one or more audio scenes 111, 112 of an audio environment 110 may be described using multi-channel audio signals, one or more audio objects and/or higher order ambisonic (HOA) and/or first order ambisonic (FOA) signals. In the following, it is assumed that an audio source 113 is associated with audio data that is captured by one or more audio sensors 120, wherein the audio data indicates an audio signal (which is emitted by the audio source 113) and the position of the audio source 113, as a function of time (at a particular sampling rate of e.g. 20 ms).
A 3D audio renderer, such as the MPEG-H 3D audio renderer, typically assumes that a listener 181 is positioned at a particular (fixed) listening position 182 within an audio scene 111, 112. The audio data for the different audio sources 113 of an audio scene 111, 112 is typically provided under the assumption that the listener 181 is positioned at this particular listening position 182. An audio encoder 130 may comprise a 3D audio encoder 131 which is configured to encode the audio data of the one or more audio sources 113 of the one or more audio scenes 111, 112 of an audio environment 110.
Furthermore, VR (virtual reality) metadata may be provided, which enables a listener 181 to change the listening position 182 within an audio scene 111, 112 and/or to move between different audio scenes 111, 112. The encoder 130 may comprise a metadata encoder 132 which is configured to encode the VR metadata. The encoded VR metadata and the encoded audio data of the audio sources 113 may be combined in combination unit 133 to provide a bitstream 140 which is indicative of the audio data and the VR metadata. The VR metadata may e.g. comprise environmental data describing the acoustic properties of an audio environment 110.
The bitstream 140 may be decoded using a decoder 150 to provide the (decoded) audio data and the (decoded) VR metadata. An audio renderer 160 for rendering audio within a rendering environment 180 which allows 6DoFs may comprise a pre-processing unit 161 and a (conventional) 3D audio renderer 162 (such as a MPEG-H 3D audio renderer). The pre-processing unit 161 may be configured to determine the listening position 182 of a listener 181 within the listening environment 180. The listening position 182 may indicate the audio scene 111 within which the listener 181 is positioned. Furthermore, the listening position 182 may indicate the exact position within an audio scene 111. The pre-processing unit 161 may further be configured to determine a 3D audio signal for the current listening position 182 based on the (decoded) audio data and possibly based on the (decoded) VR metadata. The 3D audio signal may then be rendered using the 3D audio renderer 162. The 3D audio signal may comprise the audio signals of one or more audio sources 113 of the audio scene 111.
It should be noted that the concepts and schemes, which are described in the present document, may be specified in a frequency-variant manner, may be defined either globally or in an object/media-dependent manner, may be applied directly in a spectral or a time domain and/or may be hardcoded into the VR renderer 160 or may be specified via a corresponding input interface.
The intensity F of an audio source 211, 212, 213 on the destination sphere 114 typically differs from the intensity on the origin sphere 114. The intensity F may be modified using an intensity gain function or distance function 315 (also referred to herein as an attenuation function), which provides a distance gain 310 (also referred to herein as an attenuation gain) as a function of the distance 320 of an audio source 211, 212, 213 from the listening position 182, 201, 202. The distance function 315 typically exhibits a cut-off distance 321 above which a distance gain 310 of zero is applied. The origin distance 221 of an audio source 211 to the origin listening position 201 provides an origin gain 311. Furthermore, the destination distance 222 of the audio source 211 to the destination listening position 202 provides a destination gain 312. The intensity F of the audio source 211 may be rescaled using the origin gain 311 and the destination gain 312, thereby providing the intensity F of the audio source 211 on the destination sphere 114. In particular, the intensity F of the origin audio signal of the audio source 211 on the origin sphere 114 may be divided by the origin gain 311 and multiplied by the destination gain 322 to provide the intensity F of the destination audio signal of the audio source 211 on the destination sphere 114.
Hence, the position of an audio source 211 subsequent to a local transition 192 may be determined as: Ci=source_remap_function(Bi, C) (e.g. using a geometric transformation). Furthermore, the intensity of an audio source 211 subsequent to a local transition 192 may be determined as: F(Ci)=F(Bi)*distance_function(Bi, Ci, C). The distance attenuation may therefore be modelled by the corresponding distance gains 315 provided by the distance function 315.
The directivity profile 232 of an audio source 212 may be taken into account in the context of a local transition 192 by determining the origin directivity angle 421 of the origin ray between the audio source 212 and the origin listening position 201 (with the audio source 212 being placed on the origin sphere 114 around the origin listening position 201) and the destination directivity angle 422 of the destination ray between the audio source 212 and the destination listening position 202 (with the audio source 212 being placed on the destination sphere 114 around the destination listening position 202). Using the directivity gain function 415 of the audio source 212, the origin directivity gain 411 and the destination directivity gain 412 may be determined as the function values of the directivity gain function 415 for the origin directivity angle 421 and the destination directivity angle 422, respectively (see
Hence, sound source directivity may be parametrized by a directivity factor or gain 410 indicated by a directivity gain function 415. The directivity gain function 415 may indicate the intensity of the audio source 212 at a defined distance as a function of the angle 420 relative to the listening position 182, 201, 202. The directivity gains 410 may be defined as ratios with respect to the gains of an audio source 212 at the same distance, having the same total power that is radiated uniformly in all directions. The directivity profile 232 may be parametrized by a set of gains 410 that correspond to vectors which originate at the center of the audio source 212 and which end at points distributed on a unit sphere around the center of the audio source 212.
The resulting audio intensity of an audio source 212 at a destination listening position 202 may be estimated as: F(Ci)=F(Bi)*Distance_function( )*Directivity_gain_function(Ci, C, Directivity_paramertization), wherein the Directivity_gain_function is dependent of the directivity profile 232 of the audio source 212. The Distance_function( ) takes into account the modified intensity caused by the change in distance 321, 322 of the audio source 212 due to the transition of the listening position 201, 202.
The directivity pattern data may represent:
The directivity pattern data is typically measured (and only valid) for a specific distance range from the audio source 211, as illustrated in
If directivity gains from a directivity pattern 232 are directly applied to the rendered audio signal, one or more issues may occur. The consideration of directivity may often not be perceptually relevant for a listening position 182, 201, 202 which is relatively far away from the audio object 211. Reasons for this may be
A further issue may be that the application of directivity results in a sound intensity discontinuity at the origin 500 (i.e. at the source position) of the audio source 211, as illustrated in
For a given audio source 211 a set of N acoustic source directivity patterns Pi=P (Di), with i ∈{1, . . . , N}, may be available for the different distances Di from the center 500 of the sound emitting object 211 (with N=1 or more, or 2 or more, or 3 or more). For all distances D in between these directivity patterns min(Di)≤D≤max(Di) a spatial interpolation scheme may be applied to determine that directivity pattern for the particular distance D. By way of example, linear interpolation may be used.
In the present document a scheme is described for determining extrapolated (and possibly optimized) directivity gain values P=P(D) for relatively small distances (D<min(Di)) from the origin 500 of the audio source 211 (i.e. for distances 320 between the audio source center 500 and the smallest distance for which a directivity pattern is available).
Furthermore, a scheme is described for determining extrapolated (and possibly optimized) directivity gain values P=P(D) for relatively large distances (D>max(Di)) (i.e. beyond the largest distance for which a directivity pattern exists).
The scheme described herein is configured to prevent a sound level discontinuity at the origin 500, i.e. for D=0, and/or to avoid directivity gain calculations resulting in perceptually irrelevant changes of a sound field for relatively large distances, i.e. D→∞. It may be assumed that the effect of applying directivity is neglectable for D→0 and/or for D>D*, where D*is a defined distance threshold.
A directivity control value (directivity_control_gain) may be calculated for a particular listening situation. The directivity control value may be indicative of the relevance of the corresponding directivity data for the particular listening situation of the listener 181. The directivity control value may be determined based on the value of the user-to-object distance D (distance) 320 and based on a given reference distance Di (reference distance) for the directivity (which may e.g. be min(Di) or max(Di). The reference distance Di may be a distance Di for which a directivity pattern 232 is available. The directivity control value (also referred to herein as the directivity_control_gain) may be determined using a directivity control function (as shown in
The value of the directivity control value (directivity_control_gain) may be compared to the predefined directivity control value threshold D*(directivity_control_threshold). If the directivity control value is bigger than the threshold D*, then a (distance independent) directivity gain value (directivity_gain_tmp) may be determined based on the directivity pattern 232, and the directivity gain value may be modified according to the directivity control value (directivity_control_gain). The modification of the directivity gain value may be done as indicated by the following pseudocode:
As indicated in the above-mentioned pseudocode, the directivity data application may be omitted if the directivity control value is smaller than the threshold D*.
The resulting (distance dependent) directivity gain (directivity_gain) may be applied to the corresponding distance attenuation gain as follows:
Different types of directivity control functions 600 and/or distance attenuation functions 650 may be considered, defined and applied for the directivity application control. Their shapes and values may be derived from physical considerations, measurement data and/or content creator's intent. The directivity control function 600 may be dependent on the specific listening situation for the listener 181. The listening situation may be described by one or more of the following parameters, i.e. the directivity control function 600 may be dependent on one or more of the following parameters,
The schemes which are described in the present document allow the quality of 3D audio rendering to be improved, notably by avoiding a discontinuity in audio volume close to the origin 500 of an sound source 211. Furthermore, the complexity of directivity application may be reduced, notably by avoiding the application of object directivity where it is perceptually irrelevant. In addition, the possibility for controlling the directivity application via a configuration of the encoder 130 is provided.
In the present document, a generic approach for increasing 6DoF audio rendering quality, saving computational complexity and establishing directivity application control via a bitstream 140 (without modifying the directivity data itself) is described. Furthermore, a decoder interface is described, for enabling the decoder 150, 160 to perform directivity related processing as outlined in the present document. Furthermore, a bitstream syntax is described, for enabling the bitstream 140 to transport directivity control data. The directivity control data, notably the directivity control function 600, may be provided in a parametrized and sampled way and/or as a pre-defined function.
The method 700 may comprise determining 701 whether or not a directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account for a listening situation of a listener 181 within the virtual reality rendering environment 180. The listening situation may describe the context in which the listener 181 perceives the audio signal of the audio source 211, 212, 213. The context may depend on the distance between the audio source 211, 212, 213 and the listener 181. Alternatively, or in addition, the context may depend on whether the listener 181 faces the audio source 211, 212, 213 or whether the listener 181 turns his back on the audio source 211, 212, 213. Alternatively, or in addition, the context may depend on the listening position 182, 201, 202 of the listener 181 within the virtual reality rendering environment 180, notably within the audio scene 111 that is to be rendered.
In particular, the listening situation may be described by one or more parameters, wherein different listening situations may differ in at least one of the one or more parameters. Example parameters are,
The method 700 may comprise determining 701 whether or not the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account based on the one or more parameters describing the listening situation. For this purpose, a (pre-determined) directivity control function 600 may be used, wherein the directivity control function 600 may be configured to indicate for different listening situations (notably for different combinations of the one or more parameters) whether or not the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account. In particular, the directivity control function 600 may be configured to identify listening situations, for which the directivity of the audio source 211, 212, 213 is not perceptually relevant and/or for which the directivity of the audio source 211, 212, 213 would lead to a perceptual artifact.
Furthermore, the method 700 comprises rendering 702 an audio signal of the audio source 211, 212, 213 without taking into account the directivity pattern 232 of the audio source 211, 212, 213, if it is determined that the directivity pattern 232 of the audio source 211, 212, 213 is not to be taken into account for the listening situation of the listener 181. As a result of this, the directivity pattern 232 may be ignored by the renderer 160. In particular, the renderer 160 may omit calculating the directivity gain 410 based on the directivity pattern 232. Furthermore, the renderer 160 may omit applying the directivity gain 410 to the audio signal for rendering the audio signal. As a result of this, a resource efficient rendering of the audio signal may be achieved, without impacting the perceptual quality.
On the other hand, the method 700 comprises rendering 703 the audio signal of the audio source 211, 212, 213 in dependence of the directivity pattern 232 of the audio source 211, 212, 213, if it is determined that the directivity pattern 232 is to be taken into account for the listening situation of the listener 181. In this case, the renderer 160 may determine the directivity gain 410 which is to be applied to the audio signal (based on the directivity angle 420 between the source position 500 of the audio source 211, 212, 213 and the listening position 182, 201, 202 of the listener 181). The directivity gain 410 may be applied to the audio signal prior to rendering the audio signal. As a result of this, the audio signal may be rendered at high perceptual quality (in a listening situation, for which directivity is relevant).
Hence, a method 700 is described, which verifies upfront, prior to processing an audio signal for rendering, (e.g. using a directivity control function 600) whether or not the use of directivity is relevant and/or perceptually advantageous in the current listening situation of the listener 181. The directivity is only calculated and applied, if it is determined that the use of directivity is relevant and/or perceptually advantageous. As a result of this, a resource efficient rendering of an audio signal at high perceptual quality is achieved.
The directivity pattern 232 of the audio source 211, 212, 213 may be indicative of the intensity of the audio signal in different directions. Alternatively, or in addition, the directivity pattern 232 may be indicative of a direction-dependent directivity gain 410 to be applied to the audio signal for rendering the audio signal (as outlined in the context of
In particular, the directivity pattern 232 may be indicative of a directivity gain function 415. The directivity gain function 415 may indicate the directivity gain 410 as a function of the directivity angle 420 between the source position 500 of the audio source 211, 212, 213 and the listening position 182, 201, 202 of the listener 181. The directivity angle 420 may vary between 0° and 360° as the listening position 182, 201, 202 moves (on a circle) around the source position 500. In case of a non-uniform directivity pattern 232, the directivity gains 410 vary as a function of the directivity angle 420 (as shown e.g. in
Rendering 703 the audio signal of the audio source 211, 212, 213 in dependence of the directivity pattern 232 of the audio source 211, 212, 213 may comprises determining the directivity gain 410 (for rendering the audio signal in the particular listening situation) based on the directivity pattern 232 and based on the directivity angle 420 between the source position 500 of the audio source 211, 212, 213 and the listening position 182, 201, 202 of the listener 181 (as outlined in the context of
It should be noted that the method 700 described herein is typically repeated at a sequence of time instances (e.g. periodically with a certain repetition rate such as every 20 ms). At each time instant, the currently valid listening situation is determined (e.g. by determining current values for the one or more parameters for describing the listening situation). Furthermore, at each time instant, it is determined whether or not the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account. Furthermore, at each time instance, rendering of the audio signal is performed in dependence of the decision. As a result of this, a continuous rending of audio signals within a virtual reality rendering environment 180 may be achieved.
Furthermore, it should be noted that typically multiple audio signals from multiple different audio sources 211, 212, 213 are rendered simultaneously within the virtual reality rendering environment 180 (as outlined e.g. in the context of
The method 700 may comprise determining an attenuation or distance gain 310, 651 in dependence of the distance 320 between the source position 500 of the audio source 211, 212, 213 and the listening position 182, 201, 202 of the listener 181. The attenuation or distance gain 310, 651 may be determined using an attenuation or distance function 315, 650 which indicates the attenuation or distance gain 310, 651 as a function of the distance 320. The audio signal may be rendered in dependence of the attenuation or distance gain 310, 651 (as described in the context of
As indicated above, the distance 320 of the source position 500 of the audio source 211, 212, 213 from the listening position 182, 201, 202 of the listener 181 within the virtual reality rendering environment 180 may be determined (when determining the listening situation of the listener 181).
The method 700 may comprise determining 701 based on the determined distance 320 whether or not the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account. For this purpose, a pre-determined directivity control function 600 may be used, which is configured to indicate the relevance and/or the appropriateness of using directivity of the audio source 211, 212, 213 within the current listening situation, notably for the current distance 320 between the source position 500 and the listening position 182, 201, 202. The distance 320 between the source position 500 and the listening position 182, 201, 202 is a particularly important parameter of the listening situation, and by consequence, it has a particularly high impact on the resource efficiency and/or the perceptual quality when rendering the audio signal of the audio source 211, 212, 213.
It may be determined that the distance 320 of the source position 500 of the audio source 211, 212, 213 from the listening position 182, 201, 202 is smaller than a near field distance threshold. Based on this, it may be determined that the directivity pattern 232 of the audio source 211, 212, 213 is not to be taken into account. On the other hand, it may be determined that the distance 320 of the source position 500 of the audio source 211, 212, 213 from the listening position 182, 201, 202 is greater than the near field distance threshold. Based on this, it may be determined that the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account. The near field distance threshold may e.g. be 0,5 m or less. By suppressing the use of directivity at relatively small distances, perceptual artifacts may be prevented in cases where the listener 181 traverses the (virtual) audio source 211, 212, 213 within the virtual reality rendering environment 180.
Furthermore, it may be determined that the distance 320 of the source position 500 of the audio source 211, 212, 213 from the listening position 182, 201, 202 is greater than a far field distance threshold (which is larger than the near field distance threshold). Based on this, it may be determined that the directivity pattern 232 of the audio source 211, 212, 213 is not to be taken into account. On the other hand, it may be determined that the distance 320 of the source position 500 of the audio source 211, 212, 213 from the listening position 182, 201, 202 is smaller than the far field distance threshold. Based on this, it may be determined that the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account. The far field distance threshold may be 5 m or more. By suppressing the use of directivity at relatively large distances, the resource efficiency of the renderer 160 may be improved without impacting the perceptual quality.
The near field threshold and/or the far field threshold may depend on the directivity control function 600. The directivity control function 600 may be configured to provide a control value as a function of the distance 320 between the source position 500 and the listening position 182, 201, 202. The control value may be indicative of the extent to which the directivity pattern 232 is to be taken into account. In particular, the control value may indicate whether or not the directivity pattern 232 is to be taken into account (e.g. depending on whether the control value is greater or smaller than a control threshold D*). By making use of the directivity control function 600, the application of the directivity pattern 232 may be controlled in an efficient and reliable manner.
The method 700 may comprise determining a control value for the listening situation based on the directivity control function 600. The directivity control function 600 may be configured to provide different control values for different listening situations (notably for different distances 320 between the source position 500 and the listening position 182, 201, 202). It may then be determined in a reliable manner based on the control value whether or not the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account.
In particular, the method 700 may comprise comparing the control value with the control threshold D*. The directivity control function 600 may be configured to provide control values between a minimum value (e.g. 0) and a maximum value (e.g. 1). The control threshold may lie between the minimum value and the maximum value (e.g. at 0,5). It may then be determined in a reliable manner based on the comparison, in particular depending on whether the control value is greater or smaller than the control threshold, whether or not the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account. In particular, the method 700 may comprise determining that the directivity pattern 232 of the audio source 211, 212, 213 is not to be taken into account, if the control value for the listening situation is smaller than the control threshold. Alternatively, or in addition, the method 700 may comprise determining that the directivity pattern 232 of the audio source 211, 212, 213 is to be taken into account, if the control value for the listening situation is greater than the control threshold.
The directivity control function 600 may be configured to provide control values which are below the control threshold in a listening situation, for which the distance 320 of the source position 500 of the audio source 211, 212, 213 from the listening position 182, 201, 202 of the listener 181 is smaller than the near field threshold (thereby preventing perceptual artifacts when the user traverses the virtual audio source 211, 212, 213 within the virtual reality rendering environment 180). Alternatively, or in addition, the directivity control function 600 may be configured to provide control values which are below the control threshold in a listening situation, for which the distance 320 of the source position 500 of the audio source 211, 212, 213 from the listening position 182, 201, 202 of the listener 181 is greater than the far field threshold (thereby increasing resource efficiency of the renderer 160 without impacting the perceptual quality).
Hence, the method 700 may comprise determining a control value for the listening situation based on the directivity control function 600, wherein the directivity control function 600 may provide different control values for different listening situations of the listener 181 within the virtual reality rendering environment 180. As indicated above, the control value may be indicative of the extent to which the directivity pattern 232 is to be taken into account.
Furthermore, the method 700 may comprise adjusting the directivity pattern 232, notably a directivity gain 410 of the directivity pattern 232, of the audio source 211, 212, 213 in dependence of the control value (notably, if it is determined that the directivity pattern 232 is to be taken into account). The audio signal of the audio source 211, 212, 213 may then be rendered in dependence of the adjusted directivity pattern 232, notably in dependence of the adjusted directivity gain 410, of the audio source 211, 212, 213. Hence (in addition to deciding on whether or not to make use of the directivity pattern 232), the directivity control function 600 may be used to control the extent to which directivity is taken into account when rendering the audio signal. The extent may vary (in a continuous manner) in dependence of the listening situation of the listener 181 (notably in dependence of the distance 320 between the source position 500 and the listening position 182, 201, 202). By doing this, the perceptual quality of audio rendering within a virtual reality rendering environment 180 may be further improved.
Adjusting the directivity pattern 232 of the audio source 211, 212, 213 (in dependence of the control value) may comprise determining a weighted sum of the (non-uniform) directivity pattern 232 of the audio source 211, 212, 213 with a uniform directivity pattern, wherein a weight for determining the weighted sum may depend on the control value. The adjusted directivity pattern may be the weighted sum. In particular, the adjusted directivity gain may be determined as the weighted sum of the original directivity gain 410 and a uniform gain (typically 1 or 0 dB). By adjusting the directivity pattern 232 (smoothly) for distances 320 approaching the near field threshold or the far field threshold, smooth transitions may be achieved between the application and the suppression of the directivity pattern 232, thereby further increasing the perceptual quality.
The directivity pattern 232 of the audio source 211, 212, 213 may be applicable to a reference listening situation, notably to a reference distance 610 between the source position 500 of the audio source 211, 212, 213 and the listening position of the listener 181. In particular, the directivity pattern 232 may have been measured and/or designed for the reference listening situation, notably for the reference distance 610.
The directivity control function 600 may be such that the directivity pattern 232, notably the directivity gain 410, is not adjusted if the listening situation corresponds to the reference listening situation (notably if the distance 320 corresponds to the reference distance 610). By way of example, the directivity control function 600 may provide the maximum value for the control value (e.g. 1) if the listening situation corresponds to the reference listening situation.
Furthermore, the directivity control function 600 may be such that the extent of adjustment of the directivity pattern 232 increases with increasing deviation of the listening situation from the reference listening situation (notably within increasing deviation of the distance 320 from the reference distance 610). In particular, the directivity control function 600 may be such that the directivity pattern 232 progressively tends towards the uniform directivity pattern (i.e. the directivity gain 410 progressively tends towards 1 or 0 dB) with increasing deviation of the listening situation from the reference listening situation (notably within increasing deviation of the distance 320 from the reference distance 610). As a result of this, the perceptual quality may be increased further.
The method 710 comprises determining 711 a control value for the listening situation of the listener 181 within the virtual reality rendering environment 180 based on the directivity control function 600. As indicated above, the directivity control function 600 may provide different control values for different listening situations, wherein the control value may be indicative of the extent to which the directivity of an audio source 211, 212, 213 is to be taken into account.
The method 710 further comprises adjusting 712 the directivity pattern 232, notably the directivity gain 410, of the first audio source 211, 212, 213 in dependence of the control value. In addition, the method 710 comprises rendering 713 the audio signal of the first audio source 211, 212, 213 in dependence of the adjusted directivity pattern 232, notably in dependence of the directivity gain 410, of the first audio source 211, 212, 213 to the listener 181 within the virtual reality rendering environment 180. By adjusting the extent of the application of directivity in dependence of the current listening situation, the perceptual quality of audio rendering within a virtual reality rendering environment 180 may be increased.
As outlined in the context of
The method 720 comprises determining 721 an audio signal of at least one audio source 211, 212, 213, determining 722 a source position 500 of the at least one audio source 211, 212, 213 within a virtual reality rendering environment 180, and/or determining 723 a directivity pattern 232 of the at least one audio source 211, 212, 213. Furthermore, the method 720 comprises determining 724 a directivity control function 600 for controlling the use of the directivity pattern 232 for rendering the audio signal of the at least one audio source 211, 212, 213 in dependence of the listening situation of a listener 181 within the virtual reality rendering environment 180. In addition, the method 720 comprises inserting 725 data regarding the audio signal, the source position 500, the directivity pattern 232 and/or the directivity control function 600 into the bitstream 140.
Hence, a creator of a virtual reality environment is provided with means for controlling the directivity of one or more audio sources 211, 212, 213 in a flexible and precise manner.
Furthermore, a virtual reality audio renderer 160 for rendering an audio signal of an audio source 211, 212, 213 in a virtual reality rendering environment 180 is described. The audio renderer 160 may be configured to execute the method steps of method 700 and/or method 710.
In addition, an audio encoder 130 configured to generate a bitstream 140 is described. The audio encoder 130 may be configured to execute the method steps of method 720.
Furthermore, a bitstream 140 is described. The bitstream 140 may be indicative of the audio signal of at least one audio source 211, 212, 213, and/or of the source position 500 of the at least one audio source 211, 212, 213 within a virtual reality rendering environment 180 (i.e. within an audio scene 111). Furthermore, the bitstream 140 may be indicative of the directivity pattern 232 of the at least one audio source 211, 212, 213, and/or of the directivity control function 600 for controlling use of the directivity pattern 232 for rendering the audio signal of the at least one audio source 211, 212, 213 in dependence of a listening situation of a listener 181 within the virtual reality rendering environment 180.
The directivity control function 600 may be indicated in a parametrized and/or in a sampled manner. The directivity pattern 232 and/or the directivity control function 600 may be provided as VR metadata within the bitstream 140 (as outlined in the context of
The methods and systems described in the present document may be implemented as software, firmware and/or hardware. Certain components may e.g. be implemented as software running on a digital signal processor or microprocessor. Other components may e.g. be implemented as hardware and or as application specific integrated circuits. The signals encountered in the described methods and systems may be stored on media such as random access memory or optical storage media. They may be transferred via networks, such as radio networks, satellite networks, wireless networks or wireline networks, e.g. the Internet. Typical devices making use of the methods and systems described in the present document are portable electronic devices or other consumer equipment which are used to store and/or render audio signals.
Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs):
Number | Date | Country | Kind |
---|---|---|---|
21174024.6 | May 2021 | EP | regional |
This application claims priority of the following priority applications: U.S. provisional application 63/189,269 (reference: D21027USP1), filed 17 May 2021 and EP application 21174024.6 (reference: D21027EP), filed 17 May 2021, which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/062543 | 5/10/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63189269 | May 2021 | US |