The present disclosure is directed to the general area of Doppler effect modelling, and more particularly, to methods and apparatuses for controlling the Doppler effect modelling, for example for use in virtual reality or augmented reality environments.
Generally speaking, the term Doppler effect (or Doppler shift) is typically used to refer to an audio effect that is experienced when there is a change in frequency of a wave (e.g., an audio wave) in relation to an observer (e.g., a listener) who is moving relative to the wave source (e.g., an audio source). More specifically, the Doppler effect may be generally perceived that when the wave source (e.g., a siren of an emergency vehicle) approaches the observer, the pitch (which is a commonly used measure indicative of the frequency or perceived frequency) goes higher; whilst when the wave source passes by and moves farther away, the pitch goes lower.
Nowadays, the Doppler effect has begun to be considered as an important aspect of audio rendering of dynamic scenes in a 6 degrees of freedom (6DoF) environment, which is widely employed for example in virtual reality (VR) and/or augmented reality (AR) scenarios (e.g., gaming). Broadly speaking, within the context of audio processing (e.g., rendering), the Doppler effect may generally be modelled using audio pitch factor modification values. In some conventional implementations, it is generally proposed to perform the Doppler effect modelling based on the physical description or approximation of the Doppler effect. However, such an approach generally has no means or ability to account for capabilities of the underlying signal processing unit for pitch factor modification (e.g., a high magnitude of pitch factor modification values for high relative velocities, singularity point, etc.), nor to control pitch factor modification values (i.c., representing the strength of the Doppler effect) according to content creator intent (or in other words, subjective listening experience).
There is a need for techniques of performing (controlling) the Doppler effect modelling, more particularly when rendering audio content for a 6DoF environment, such as in a virtual reality and/or augmented reality environment.
In view of the above, the present disclosure generally provides a method of modelling a Doppler effect when rendering audio content for a 6 degrees of freedom (6DoF) environment, a method of encoding parameters for use in modelling a Doppler effect when rendering audio content for a 6DoF environment, as well as a corresponding audio renderer, an encoder, a program, and a computer-readable storage media, having the features of the respective independent claims.
According to a first aspect of the present invention, a method of modelling a Doppler effect when rendering audio content for a 6DoF environment is provided. The method may be performed on a user side, or in other words, in a user (decoding) side environment. In particular, the method may comprise obtaining first parameter values of one or more first parameters indicative of an allowable range of pitch factor modification values. The allowable range of the pitch factor modification values may be indicated by using an upper limit (e.g., boundary) and/or a lower limit (e.g., boundary), for example. The method may further comprise obtaining a second parameter value of a second parameter indicative of a desired strength (or in some cases also referred to as “aggressiveness”) of the to-be-modelled Doppler effect. The method may yet further comprise determining a pitch factor modification value based on a relative velocity between a listener and an audio source in the audio content, and the first and second parameter values, using a predefined pitch factor modification function. Particularly, the predefined pitch factor modification function may have the first and second parameters (or in other words, may take, among others, the first and second parameters as input) and may be a function for mapping relative velocities to pitch factor modification values. Notably, as can be understood and appreciated by the skilled person, the pitch factor modification value may be seen as a value (possibly being represented in any suitable form) generally used for suitably modifying (e.g., shifting) the pitch, thereby enabling a suitable and proper modelling of the Doppler effect and rendering of the audio content in the 6DoF environment. Finally, the method may comprise rendering the audio source based on the (determined) pitch factor modification value.
Put differently, in a broad sense, the present disclosure generally proposes a method that makes use of a predefined (or predetermined/pre-implemented) pitch factor modification function to map relative velocities into corresponding pitch factor modification values, for modelling the Doppler effect (e.g., when rendering the audio content in the 6DoF environment). As will be described in more detail below, such predefined pitch factor modification function may be implemented in any suitable means, provided certain requirements (or properties) to be generally fulfilled. Specifically, the predefined pitch factor modification function may have a plurality of parameters (or in other words, take a plurality of parameters as input), among which are (at least) the first parameter(s) indicative of the allowable range of the pitch factor modification values and the second parameter indicative of the desired strength (aggressiveness) of the to-be-modelled Doppler effect. Then, in practice, when the proposed method is being performed, e.g., by an audio rendering device (or simply referred to as an audio renderer) where the predefined pitch factor modification function is being deployed, the audio rendering device may be configured to obtain the first and second parameter values corresponding to the first and second parameters, respectively. The obtaining of the first and second parameter values may be performed in any suitable means, depending on various requirements and/or implementations. For instance, in some possible cases, the first and second parameter values may be derived (or simply extracted) from a bitstream received from an encoding device; or in some other possible cases, obtained (or simply read out) from a file or a lookup table (LUT). As such, particularly by applying the pitch factor modification function, the pitch factor modification values for modelling the Doppler effect may be determined based on the relative velocities between the listener and the audio source, and also based on the first and second parameter values.
Configured as described above, the proposed method can provide an efficient and flexible mechanism for performing (e.g., controlling) the Doppler effect modelling when rendering the audio content for the 6DoF environment, while at the same time taking into account both the (allowable or acceptable) capabilities of the underlying signal processing unit (e.g., of the audio renderer) for the pitch factor modification (e.g., a high magnitude of pitch factor modification values for high relative velocities, singularity point, etc.) and also the possibility to control the (desired) pitch factor modification values (e.g., representing the desired strength/aggressiveness of the to-be-modelled Doppler effect) according to the intent of the content creator (in other words, subjective listening experience), thereby improving the perceived listening experience (at the listener side, e.g., a user playing a game in a VR environment).
Moreover, since the pitch factor modification function is already pre-implemented as a predefined function that takes, among others, values of both the first and second parameters as input, there is generally no need to redesign (or re-implement) a new pitch factor modification function every time the rendering condition changes (e.g., a different renderer with different processing capabilities being deployed, a different audio content created by a different author and/or for a different scene, etc.). Rather, it is generally needed to only communicate (e.g., using a bitstream coded at an encoding side) different first and second parameter values representing the corresponding allowable ranges (e.g., limits) of pitch factor modification values and the corresponding desired strength of the to-be-modelled Doppler effect, respectively. As such, in some possible scenarios, the predefined pitch factor modification function may be implemented as simple as a plugin or library (taking the first and second parameters as input for modelling the desired Doppler effect) that can be deployed in various software and/or platforms, and that can be further customized if needed, depending on various requirements and/or implementations. Thereby, unnecessary redesign/re-implementation of the pitch factor modification functions would be avoided, further improving the efficiency in the overall audio rendering process.
In some example implementations, the relative velocity may be calculated based on positions (e.g., relative positions) of the listener and the audio source. For instance, in some possible cases, the relative velocity may be determined from a rate of change (e.g., by taking the 1st_ order derivative) of the relative distances between the audio source and the listener based on their respective position. Of course, any other suitable means may be adopted for obtaining or calculating the relative velocities, as will be understood and appreciated by the skilled person.
In some example implementations, the one or more first parameters may comprise parameters that are indicative of upper and/or lower limits of the allowable range of pitch factor modification values.
In some example implementations, the allowable range of pitch factor modification values may reflect a processing capability of an audio renderer rendering the audio content. That is to say, broadly speaking, the first parameter(s), i.e., that are indicative of the allowable range (e.g., upper and/or lower limits) of the pitch factor modification values, may be seen in some perspective as to represent a range of processing capabilities that are supported by the rendering device (c.g., an audio renderer), or more precisely, the underlying processing unit of that rendering device for modelling the Doppler effect.
In some example implementations, if the allowable range of pitch factor modification values so obtained could not be supported by (the underlying processing unit of) the audio renderer, a default range of pitch factor modification values may be used by that audio renderer. An illustrative example of such scenario may be that of a mobile device (e.g., a mobile phone) with (relatively) limited processing (rendering) capability that obtains (e.g., receives) a range of pitch factor modification values that have been set (e.g., by an encoding device) originally to target for a (relatively) more powerful rendering device (e.g., a gaming console or a professional work station). In such a scenario, it may be considered more practical for the mobile device to, instead of using the obtained unsupported parameter value(s), apply a default range parameter setting (e.g., falling into the originally obtained wider range) that may be, for example, set by the manufacturer (of the mobile device) to more correctly reflect the actual processing (rendering) capability of that mobile device, in order to avoid unexpectedly or adversely affecting the rendering process.
In some example implementations, the second parameter may control a slope (or in some possible cases, also referred to as “strength”) of the pitch factor modification function that may be seen as reflecting the aggressiveness of the to-be-modelled Doppler effect.
In some example implementations, the audio content may be extracted from a received bitstream. The bitstream may have been encoded by an encoding device for example, by using any suitable means and in any suitable format. Accordingly, depending on various implementations, the first and second parameter values may be derived (e.g., extracted, decoded, etc.) from indications included in the bitstream. In some possible cases, the indications of the first and second parameter values may be encoded as labels (or fields) in the bitstream, as will be understood and appreciated by the skilled person.
Of course, it is also possible that in some other possible implementations, the audio content, and the first and second parameters may be obtained separately (e.g., from two separate bitstreams).
In some example implementations, the second parameter value may be set by a content creator of the audio content. Particularly, the second parameter value may be set by the content creator of the audio content according to the intent of that content creator. Thus, in a broad sense, the second parameter value may also be seen to reflect the subjective listening experience that is aimed for by the content creator (and that is controlled by the content creator).
In some example implementations, the second parameter value may be set by modelling a real-world reference and/or artistic expectations for the desired Doppler effect strength. Of course, any other suitable implementation for determining and setting the second parameter value may be possible as well, as will be understood and appreciated by the skilled person. In some example implementations, rendering the audio content based on the pitch factor modification value may comprise adjusting a pitch of the audio source in the audio content based on the pitch factor modification value.
In some example implementations, a positive pitch factor modification value may generally indicate increasing the pitch of the audio source. Similarly, a negative pitch factor modification value may generally indicate decreasing the pitch of the audio source.
In some example implementations, the pitch adjustment of the audio source may be performed in units of semitones. For instance, a pitch factor modification value of 2 may simply mean to increase the pitch of the audio source by 2 semitones; and correspondingly, a pitch factor modification value of −2 may simply mean to decrease the pitch of the audio source by 2 semitones.
In some example implementations, the pitch factor modification function may be implemented based on a generalized logistic function. That is to say, in some possible cases, implementing the pitch factor modification function may involve for example modifying, as appropriate, a logistic function, or specifically, a generalized logistic function. However, it may be worthwhile to note that any other suitable means (e.g., formula or equation) may be used to implement such pitch factor modification function, provided that the so implemented pitch factor modification function fulfills certain properties, as will become more apparent in view of the description below.
In some example implementations, the pitch factor modification function may have one or more properties of: being continuous and monotonic with respect to relative velocities, having asymptotical limits controlled by the one or more first parameters, yielding zero pitch factor modification value at zero relative velocity, and/or having a slope in the vicinity of zero velocity that is controlled by the second parameter. Any other suitable properties may also be necessary in some possible implementations, as will be understood and appreciated by the skilled person.
In some example implementations, the pitch factor modification function F may be implemented as
where v represents the relative velocity, l={ll, lh} represents the first parameters with ll denoting the lower limit of the range and lh denoting the upper limit of the range, and s represents the second parameter. However, such function is merely provided as an example but not as limitation of any kind. As indicated above already, the skilled person would understand and appreciate that the pitch factor modification function F may be implemented in any other suitable manner as well.
In some example implementations, the method may further comprise outputting the rendered audio source (e.g., as part of the audio content) to a speaker or a headphone (or any other suitable playback device) for playback to the user, depending on various implementations or the user side environment (e.g., a computer, a gaming console, a mobile, etc.).
According to a second aspect of the present invention, a method of encoding parameters for use in modelling a Doppler effect when rendering audio content for a 6 degrees of freedom (6DoF) environment is provided. The parameters so encoded (e.g., at an encoder side or in an encoding side environment) by this method may be used by any of the methods described in the preceding first aspect and the example implementations thereof to model the Doppler effect when rendering the audio content for the 6DoF environment (e.g., at a user side or in a user/decoding side environment).
In particular, the method may comprise determining (e.g., calculating, setting, etc.) first parameter values of one or more first parameters indicative of an allowable range of pitch factor modification values. The method may further comprise determining (e.g., calculating, setting, etc.) a second parameter value of a second parameter indicative of a desired strength (or in some cases, also referred to as “aggressiveness”) of the to-be-modelled Doppler effect. Finally, the method may yet further comprise encoding indications of the first and second parameter values. Specifically, as illustrated above, the first and second parameter values may be used for mapping a relative velocity between a listener and an audio source of the audio content to a pitch factor modification value based on a predefined pitch factor modification function, wherein the pitch factor modification value may be used for rendering the audio source, and the predefined pitch factor modification function may have the first and second parameters and may be a function for mapping relative velocities to pitch factor modification values.
Configured as described above, the proposed method can provide an efficient and flexible mechanism for encoding the parameters to be used for the Doppler effect modelling when rendering the audio content for the 6DoF environment, while at the same time taking into account both the (allowable or acceptable) capabilities of the underlying signal processing unit (at the audio renderer side) for the pitch factor modification (e.g., a high magnitude of pitch factor modification values for high relative velocities, singularity point, etc.) and also the possibility to control the (desired) pitch factor modification values (i.e., representing the desired strength of the Doppler effect) according to the intent of the content creator (in other words, subjective listening experience aimed for), thereby improving the perceived listening experience (at the listener side).
Moreover, as noted above, since the pitch factor modification function is already implemented and deployed as a predefined function (at the renderer side) taking the first and second parameters as input, there is generally no need to redesign (or re-implement) a new pitch factor modification function every time the rendering condition changes (e.g., a different renderer with different processing capabilities being deployed, a different audio content created by a different person and/or for a different scene, etc.). Instead, the encoding side may just communicate different first and second parameter values (e.g., encoded in a bitstream) representing the corresponding allowable ranges (limits) of pitch factor modification values and the corresponding desired strengths of the to-be-modelled Doppler effect, respectively. As such, in some possible scenarios, the predefined pitch factor modification function may be implemented as simple as a plugin (at the renderer side) that can be deployed in various software and/or platforms, and that can be further customized if needed, depending on various requirements and/or implementations. Thereby, unnecessary redesign/re-implementation of the pitch factor modification functions would be avoided, further improving the efficiency in the overall audio rendering process.
In some example implementations, the indications of the first and second parameter values may be encoded as labels (or fields) in a bitstream. As will be understood and appreciated by the skilled person, such indications may also be implemented in any other suitable means, as long as the corresponding rendering side device (where the predefined pitch factor modification function is being deployed) may be enabled to derive the first and second parameter values as necessary. As an example but not as a limitation of any kind, in some possible cases where the encoding method may be performed by a (game/control) engine (or sometimes also referred to as a game control logic engine) e.g. in an AR/VR gaming environment, it is understandable that the first and second parameter values (or the respective indications thereof) do not necessarily have to be always encoded into a bitstream (e.g., possibly due to the reason that the game engine may typically be located in the same environment, e.g., in the form of a PC, as the rendering and/or listening component), but may be encoded (or encapsulated) in any other suitable format (or even as plain or clear parameter values in some possible cases), together with or separate from the audio content.
In some example implementations, the indications of the first and second parameter values may be encoded together with the audio content in a single bitstream or as separate bitstreams.
In some example implementations, the first and second parameter values may be determined by a content creator or a game engine, as indicated above.
According to a third aspect of the present invention, an audio renderer (rendering apparatus) including a processor and a memory coupled to the processor is provided. The processor may be adapted to cause the audio renderer to carry out all steps according to any of the example methods described in the first aspect.
According to a fourth aspect of the present invention, an encoder (encoder apparatus) including a processor and a memory coupled to the processor is provided. The processor may be adapted to cause the encoder to carry out all steps according to any of the example methods described in the second aspect.
According to a fifth aspect of the present invention, a computer program is provided. The computer program may include instructions that, when executed by a processor, cause the processor to carry out all steps of the methods described throughout the present disclosure.
According to a sixth aspect of the present invention, a computer-readable storage medium is provided. The computer-readable storage medium may store the aforementioned computer program.
It will be appreciated that apparatus features and method steps may be interchanged in many ways. In particular, the details of the disclosed method(s) can be realized by the corresponding apparatus (or system), and vice versa, as the skilled person will appreciate. Moreover, any of the above statements made with respect to the method(s) are understood to likewise apply to the corresponding apparatus (or system), and vice versa.
Example embodiments of the present disclosure are explained below with reference to the accompanying drawings, wherein
The Figures (Figs.) and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
Furthermore, in the figures, where connecting elements, such as solid or dashed lines or arrows, are used to illustrate a connection, relationship, or association between or among two or more other schematic elements, the absence of any such connecting elements is not meant to imply that no connection, relationship, or association can exist. In other words, some connections, relationships, or associations between elements are not shown in the drawings so as not to obscure the present invention. In addition, for ease of illustration, a single connecting element is used to represent multiple connections, relationships or associations between elements. For example, where a connecting element represents a communication of signals, data, or instructions, it should be understood by those skilled in the art that such element represents one or multiple signal paths, as may be needed, to affect the communication.
As indicated above, the term “Doppler effect” or “Doppler shift” is generally used to refer to an audio effect that is experienced when there is a change in frequency of a wave (e.g., an audio wave) in relation to an observer (e.g., a listener) who is moving relative to the wave source (e.g., an audio source). Broadly speaking, the Doppler effect can be observed whenever the source of waves is moving with respect to the observer. The Doppler effect may be described as the effect produced by a moving source of waves in which there is an apparent upward shift in frequency for observers towards whom the source is approaching and an apparent downward shift in frequency for observers from whom the source is receding. It is nevertheless important to note that the effect does not result because of an actual change in the frequency of the source.
The Doppler effect may be observed for any type of wave-water wave, sound wave, light wave, etc., as can be understood and appreciated by the skilled person. An exemplary scenario where the Doppler effect may be commonly perceived could be an instance in which a police car or emergency vehicle is traveling towards a listener on the highway. As the car approaches with its siren, the pitch (a generally used measure for indicating the frequency) of the siren sound goes high (higher); and then after the car passes by and travels farther away, the pitch of the siren sound goes low (lower).
As has also been noted above, the Doppler effect has recently started to be considered as an important aspect in the audio rendering of dynamic scenes in a 6 degrees of freedom (6DoF) environment, which is widely employed for example in virtual reality (VR) and/or augmented reality (AR) scenarios (e.g., gaming, immersive content, etc.).
Broadly speaking, a semitone, also called a half step or a half tone, is generally the smallest musical interval commonly used in most tonal music and is considered the most dissonant when sounded harmonically. In general terms, the semitone is defined as the interval between two adjacent notes on a 12-tone scale. In other words, in most of the musical instruments that are designed on the basis of the tempered scale in which the octave is divided into twelve equal parts, the frequency ratio of any two semitones (half steps) is roughly the twelfth root of two.
Within the context of audio processing (e.g., audio rendering), the Doppler effect may generally be modelled using audio pitch factor (shift) modification values (sometimes also simply denoted as p throughout the present disclosure). Thus, following the general concept of semitones, in some possible implementations of the present invention, as will be described in more detail below, a positive pitch factor (shift) modification value may generally mean an increase of the pitch of the audio source, particularly in units of semitones. For instance, a pitch factor modification value of 2 may simply translate into an increase of 2 semitones of the audio source. Correspondingly, a negative pitch factor (shift) modification value may generally mean a decrease in the pitch (in units of semitones) of the audio source. However, also alternative frameworks and units for pitch factor modification values may be feasible in the context of the present invention, as the skilled person will appreciate.
Some approaches for modelling the Doppler effect may involve modelling based on the physical description and/or approximation of the Doppler effect. Thus, those approaches generally do not have means to account for capabilities of the underlying signal processing unit for pitch factor modification (e.g., a high magnitude of pitch factor modification values for high relative velocities, singularity point), nor to control pitch factor modification values (i.e., representing the strength of the Doppler effect) according to content creator intent (in other words, subjective listening experience).
In view thereof, the present invention may be generally seeking to address the problem of, given: 1) relative velocities (calculated based on listener and audio source positions, also denoted as v throughout the present disclosure); 2) a range of the pitch factor modification values supported by the signal processing unit (also denoted as l throughout the present disclosure); and 3) content creator setting (e.g., based on the (conventional) modelling equation, real-world reference, artistic expectations for Doppler effect strength, etc., also denoted as s throughout the present disclosure), finding suitable pitch modification values p that may perceptually correspond to the input data of 1)-3). Notably, in some possible implementations, the range of the pitch factor modification values l may itself comprise a lower limit ll and an upper (higher) limit lh, such that the range l may also be denoted as l={ll, lh}.
To address the above-identified problem, in a broad sense, the present invention generally proposes to consider implementing a pitch factor modification function F to map the relative velocities v to the pitch factor modification values p, accounting for the signal processing unit limitations l and the user-adjustable settings s. In some possible implementations, the pitch factor modification function F may be implemented as a modified generalized logistic function.
A possible example (not intended as a limitation of any kind) for implementing such pitch factor modification function F may be as follows:
where, as illustrated above, v denotes relative velocities between the listener and the audio source, l{ll, lh} denotes the range of the pitch factor modification values supported by the underlying signal processing unit for performing the modelling (e.g., located in an audio renderer) with ll representing the lower limit and lh representing the upper limit respectively, s denotes the content creator setting (e.g., based on the (conventional) modelling equation, real-world reference, artistic expectations for Doppler effect strength, etc.), and e is a mathematical constant (the Euler's number).
Any equivalent representations of equation (1) shall be encompassed by the present invention. For example, as will be appreciated by the skilled person, the supported range of the pitch factor modification values may be expressed by any suitable combination of two parameters, which are considered to be encompassed by the present invention. For example, instead of using lower and upper limits lh and lh, also an indication of either limit together with an indication Δl=ll−lh or the like, of the size of the allowable range may be used. In this case, lh=Δl+ll or ll=lh−Δl would hold, for example, and could be used for modifying equation (1) above.
Further, also Euler's number in equation (1) could be replaced by alternative constants larger than 1, or swapping the sign of the exponent, alternative constants smaller than 1.
In addition, as will be understood and appreciated by the skilled person, the pitch factor modification function F may also be defined in any other suitable forms/formulas, depending on various requirements and/or implementations, provided the above-identified requirements (i.e., the range/limitation parameter l and the user/content creator setting s) being accounted for. The properties of the pitch factor modification function F may become more apparent in view of the below description accompanying the drawings.
Notably, in a broad sense, by applying the approaches as proposed in the present invention, values for parameters of the signal processing unit limitations l and the user setting(s) s may be enabled to be adjusted at the encoder side (and possibly to be put, e.g., encapsulated, into a bitstream). As such, a content creator may be enabled to establish control for the Doppler effect modelling to fit it to the capabilities of the audio renderer and at the same time also adjust it according to their own preferences.
Now regarding the drawings,
In particular, the x-axis schematically shows the (input) relative velocities between an audio source and an observer (e.g., a listener in the 6DoF environment). The relative velocities between the audio source and the observer/listener may be determined in any suitable means, e.g., based on positions of the listener and the audio source. For instance, in some possible implementations, the relative velocities may be determined based on the rate (e.g., the 1st order derivative) of changes of the positions of the listener and the audio source (e.g., in terms of distance between the audio source and the observer/listener). Therein, negative values of the relative velocities may generally mean that the audio source and the observer/listener are approaching each other (getting closer to each other); whist positive values of the relative velocities may generally mean that the audio source and the observer/listener are moving (farther) away from each other, as will be understood and appreciated by the skilled person.
On the other hand, the y-axis schematically shows the (output) pitch shift modification values (e.g., in units of semitones). As illustrated above, in some possible implementations, positive values of the pitch shift modification values may generally mean an increase of the pitch of the audio source; whilst negative values of the pitch shift modification values may generally mean a decrease in the pitch of the audio source, e.g., both in units of semitones.
More particularly, diagram 101 in
On the other hand, diagram 102 generally represents an example of a modelling of the Doppler effect according to a possible approach. Such modelling may be performed based on an estimation (approximation) of the (theoretical) mathematical formula, for example.
Moreover, diagram 103 (solid line) generally shows an example of a possible modelling of the Doppler effect according to an embodiment of the present invention. As illustrated above, this modelling of the Doppler effect—achieved through the (predetermined) pitch factor modification function F—comprises a plurality of parameters (or in other words, takes a plurality of parameters as input), among which are the range of the pitch factor modification values supported by the signal processing unit (i.e., l) and the content creator setting (i.e., s), based on, e.g., the (conventional) modelling equation, real-world reference, artistic expectations for Doppler effect strength, etc. More particularly, the parameter l is generally responsible for controlling the (upper and/or lower) limit of the Doppler effect model, while the parameter s is generally responsible for controlling the slope (or in some cases, also referred to as “strength” or “aggressiveness”) of the Doppler effect model.
It is to be noted that in the specific example as shown in
As is clearly shown in the example of
As indicated above, any suitable form/formula (other than the one exemplified in equation (1)) may be used to implement the pitch factor modification function F, provided at least the range/limitation parameter l and the user/content creator setting s being accounted for.
Nevertheless, it may be worthwhile to note a few properties that the pitch factor modification function F may need to fulfill, in order to achieve more or less similar performance (e.g., in terms of perceived audio quality) comparable to that of the above exemplified equation (1).
To be more specific, according to embodiments of the invention the pitch factor modification function F may have one or more properties of:
In some implementations, the pitch factor modification function F may have all of the above properties.
The above properties are also clearly reflected in diagram 103 which implements the pitch factor modification function F according to an embodiment of the present invention. Reference is now made to
In particular,
Specifically, in the examples as shown in
As mentioned above, the range/limit parameter l is generally set to be indicative of the range of the pitch factor modification values that are supported by the (underlying) signal processing unit (e.g., at the renderer side) for performing the Doppler modelling. Put differently, such parameter(s) l may also be seen as generally representing the (processing) capability (e.g., in terms of hardware and/or software capability) of the signal processing unit (or broadly speaking, of the renderer). Moreover, as has been noted above, the pitch factor modification function F may be typically implemented as a plugin (or encapsulated as some sort of library) that merely receives (e.g., from the encoding side) the range parameter(s) l along with the other inputs (e.g., the relative velocities, the content creator setting s, etc.). Thus, in some implementations, it may be possible that the so-received range parameter l unfortunately may not be supported (either completely or partially) by (the processing unit of) the renderer. An example of such a scenario may be that a mobile device (e.g., a mobile phone) with (relatively) limited processing (rendering) capability receives, along with the audio content for rendering, a range parameter l (e.g., {−8,8}) that has been set (e.g., by an encoder) originally to target for a (relatively) more powerful rendering device (e.g., a gaming console or a professional workstation). In such cases, it may be more practical for the mobile device to apply a default range parameter setting (e.g., {−4,8} or {−8,4}), instead of using the received unsupported parameter l (e.g., {−8,8}). Here, the default range parameter setting may for example be set by the manufacturer (of the mobile device) to more correctly reflect the actual processing (rendering) capability of that mobile device, in order to avoid unexpectedly and adversely affecting the rendering process.
On the other hand,
In the examples as shown in
Summarizing, by appropriately setting the slope/aggressiveness parameter s (possibly followed by encoding/encapsulating the parameter value e.g., to a bitstream or other suitable format, and transmitting or communicating it to the user/decoder side device, or renderer in general), the content creator (or a “game engine” in some possible implementations) generally has the freedom to control the modelling behavior of the Doppler effect as desired, e.g., between no Doppler effect modelling at all to (near-) “real” (theoretical) Doppler effect modelling, or even over-emphasized Doppler effect modelling.
The present invention thus can be said to provide content creators with an additional degree of freedom relating to the modelling of the Doppler effect at the user side. As such, the present invention enables the content creator to selectively control or override Doppler effect modelling by the decoder/renderer in an object-specific manner. This is achieved by providing a set of parameter values to the user/decoder side device (including, eventually the actual renderer) in a suitable form. These parameter values may be encoded in a bitstream or may be provided to the renderer in any form suitable for or compatible with the renderer's data interfaces.
As an illustrative, non-limiting example, a use case of a VR scene having a flying jet with supersonic speed may be considered. If the actual laws of physics for Doppler effect modelling (e.g., corresponding to diagram 301 in
As another illustrative, non-limiting example, weaker Doppler effect modelling or possibly even no Doppler effect modelling at all may be applied for objects with relatively low speed or objects having speech (e.g., characters in a cartoon movie), while moderate or physically accurate Doppler effect modelling may be considered for others. For instance, considering a scene having a very fast audio object (such as a flying superhero, for example) that has speech attached thereto, it may be desirable to apply less or no Doppler effect modelling at all to the speech, but to apply at least some degree of Doppler effect modelling to the remaining sounds.
In particular, the method 400 may start at step S401 by obtaining (e.g., receiving) first parameter values of one or more first parameters indicative of an allowable range of pitch factor modification values. Subsequently, in step S402 the method 400 may comprise obtaining a second parameter value of a second parameter indicative of a desired strength of the to-be-modelled Doppler effect. The method 400 may then continue with step S403 by determining a pitch factor modification value based on a relative velocity between a listener and an audio source in the audio content, and the first and second parameter values, using a predefined pitch factor modification function. The pitch factor modification function may be predefined, e.g., pre-implemented as a plugin or library, in any suitable form in accordance with the above illustration with respect to
As mentioned above, the proposed method 400 generally may make use of a predefined (or predetermined/pre-implemented) pitch factor modification function to map relative velocities into corresponding pitch factor modification values, for modelling the Doppler effect (e.g., when rendering the audio content in the 6DoF environment). During operation, in practice, when the proposed method 400 is performed, for example by an audio rendering device (e.g., an audio rendering device of an AR/VR device in the user side environment) where the predefined pitch factor modification function is being deployed, the audio rendering device may be configured to obtain the first and second parameter values corresponding to the first and second parameters, respectively. For example, the audio rendering device may obtain the first and second parameter values for the audio source on a frame basis, e.g., for each frame or for each key frame. Accordingly, the present disclosure provides for time-dependent and/or object-specific control of Doppler effect modelling.
The actual obtaining of the first and second parameter values may be performed in any suitable manner, depending on requirements and/or implementations. For instance, in some possible implementations, the first and second parameter values may be derived (e.g., decoded or extracted) from a bitstream that has been encoded and sent by an encoding device (for example as described in method 500 below in connection with
It is also to be noted that depending on how the actual audio content itself is encoded and/or transmitted, the decoding of the audio content (e.g., audio signal) at the user side may be performed in any suitable manner and at any appropriate timing before the final rendering (step S404) takes place, as will be understood and appreciated by the skilled person. As such, the decoding of the actual audio content (e.g., audio signal) is independent from the determination of the pitch factor modification value.
Configured as described above, the proposed method can provide an efficient and flexible mechanism for performing (e.g., controlling) the Doppler effect modelling when rendering the audio content for the 6DoF environment, while at the same time taking into account both the (allowable or acceptable) capabilities of the underlying signal processing unit (of the audio renderer) for the pitch factor modification (e.g., a high magnitude of pitch factor modification values for high relative velocities, singularity point, etc.) and also giving the possibility to control the (desired) pitch factor modification values (i.c., representing the desired strength/aggressiveness of the to-be-modelled Doppler effect) according to the intent of the content creator (in other words, subjective listening experience), thereby improving the perceived listening experience (at the listener side, e.g., a gamer in a VR environment).
Moreover, since the pitch factor modification function is already pre-implemented as a predefined function taking, among others, values of both the first and second parameters as input, there is generally no need to redesign (or re-implement) a new pitch factor modification function every time the rendering condition changes (e.g., a different renderer with different processing capabilities being deployed, a different audio content having been created by a different author and/or for a different scene, etc.). Rather, it is generally necessary to only communicate (e.g., using a bitstream coded at an encoding side) different first and second parameter values representing the corresponding allowable ranges (e.g., limits) of pitch factor modification values and the corresponding desired strength of the to-be-modelled Doppler effect, respectively. As such, in some possible scenarios, the predefined pitch factor modification function may be implemented as simple as a plugin or library (taking the first and second parameters as input for modelling the desired Doppler effect) that can be deployed in various software and/or platforms, or that even can be further customized if necessary, depending on various requirements and/or implementations. Thereby, unnecessary redesign/re-implementation of the pitch factor modification functions would be avoided, further improving efficiency in the overall audio rendering process.
In particular, method 500 may start with step S501 by determining first parameter values of one or more first parameters indicative of an allowable range of pitch factor modification values. Subsequently, in step S502 the method 500 may comprise determining a second parameter value of a second parameter indicative of a desired strength of the to-be-modelled Doppler effect. Finally, the method 500 may in step S503 comprise encoding indications of the first and second parameter values. More particularly, the first and second parameter values can be used for mapping a relative velocity between a listener and an audio source of the audio content to a pitch factor modification value based on a predefined pitch factor modification function. As illustrated above, the pitch factor modification value may be used for rendering the audio source, and the predefined pitch factor modification function may have the first and second parameters and may be a function for mapping relative velocities to pitch factor modification values.
As will be understood and appreciated by the skilled person, the first and second parameter values may be encoded in any suitable manner. For instance, in some implementations, the first and second parameters may be encoded together with the audio content (e.g., audio signal) into a single bitstream, or into separate bitstreams. Also, depending on implementations and/or requirements, the first and second parameter values (possibly also with the audio content/signal) may be encoded into any suitable format, e.g., bitstream or data formats compatible with audio standards such as MPEG audio standards (e.g., the upcoming MPEG-I audio standard, etc.), with or without compression. In that case, the first and second parameter values may be encoded accordingly, e.g., as part of a header field, metadata, etc., as will be understood and appreciated by the skilled person. In some other possible cases, the first and second parameter values may be inserted (encapsulated) as plain variables (e.g., floating point numbers) into any suitable data format. The encoded bitstream(s) may also be transmitted or communicated to the user environment (e.g., comprising the decoding or rendering device) by using any suitable means, for example in wired or wireless manner.
Further, as mentioned above, the encoding method 500 as proposed in the present disclosure may be performed based on user input, for example by a content creator. In that case, the determination of the first parameter values at step S501 and/or the determination of the second parameter value at step S502 may be based on user input. Likewise, encoding method 500 may be performed by a (software-based) game engine (game control logic engine), depending on scenarios and/or implementations. In this case, the determinations at S501 and/or S502 would be performed in accordance with decision-routines of the game engine, for example based on a type and/or speed of the audio source.
To be more specific, in the case of user input from a content creator, in some possible implementations the content creator may obtain, by using any suitable means, information indicative of the processing capability or profile of the target decoding rendering device, in order to appropriately determine and set the values for the first parameters indicative of an allowable range of pitch factor modification values. In some possible implementations, it is also possible that the content creator may input a plurality of parameter sets for encoding (i.e., with respective first and second parameter values) for a respective plurality of target devices, with each one of the parameter sets comprising first and second parameter values targeted for a respective decoding/rendering device. In that case, the decoding/rendering device may simply pick or choose from the received parameter sets to obtain the respective first and second parameter values that best fit the decoding/rendering device (e.g., that best fit the profile or capability of the decoding/rendering device). In addition, the content creator may also need to, depending on various scenarios and/or implementations, determine whether to apply Doppler effect modelling at all or not, and if yes, to what extent (e.g., as illustrated above with respect to
On the other hand, in the case of a (software-based) game engine (e.g., for a VR/AR environment) performing the encoding task, the process is more or less the same as illustrated above with respect to user input from a (human) content creator, except for the fact that the role of the content creator is now substituted by the game engine. More specifically, it is now the game engine (or the developer(s) thereof) that may need to gain knowledge of the corresponding capability/profile of the rendering/decoding platform, and additionally to determine and control (c.g., by using any suitable logic/algorithm, machine learning, hard coded, etc.) the slope/aggressiveness of the modelling of the Doppler effect as appropriate, depending on implementations and/or requirements. In some possible case, mainly for reasons that in (real-time) rendering in a VR/AR environment the game engine (performing the encoding task) would typically be located together (e.g., in the same computer device or gaming console) with the decoding/rendering device/component, the values of the first and second parameters may not even have to be encoded into a bitstream, but may be communicated/transmitted to the decoding/rendering device/component in other suitable format (e.g., as plain variable, etc.). Also, depending on scenarios and/or implementations, the parameter values may be communicated periodically (e.g., on a frame basis) or on demand, or in any other suitable form.
Configured as described above, the proposed method can provide an efficient and flexible mechanism for encoding the parameters to be used for the Doppler effect modelling when rendering the audio content for the 6DoF environment, while at the same time taking into account both the (allowable or acceptable) capabilities of the underlying signal processing unit (at the audio renderer side) for the pitch factor modification (e.g., a high magnitude of pitch factor modification values for high relative velocities, singularity point, etc.) and also giving the possibility to control the (desired) pitch factor modification values (i.e., representing the desired strength of the Doppler effect) according to intent of the content creator (in other words, subjective listening experience), thereby improving the perceived listening experience (at the listener/user side).
Moreover, as noted above, since the pitch factor modification function is already implemented and deployed as a predefined function (at the renderer side) taking the first and second parameters as input, there is generally no need to redesign (or re-implement) a new pitch factor modification function every time the rendering condition changes (e.g., a different renderer with different processing capabilities being deployed, a different audio content having been created by a different person and/or for a different scene, etc.). Instead, the encoding side may just communicate different first and second parameter values (e.g., encoded in a bitstream) representing the corresponding allowable ranges (limits) of pitch factor modification values and the corresponding desired strengths of the to-be-modelled Doppler effect, respectively. As such, in some possible scenarios, the predefined pitch factor modification function may be implemented as simple as a plugin (at the renderer side) that can be deployed in various software and/or platforms, or can even be further customized if needed, depending on various requirements and/or implementations. Thereby, unnecessary redesign/re-implementation of the pitch factor modification functions would be avoided, further improving the efficiency in the overall audio rendering process.
It is finally noted that the minimum requirement(s) for the pitch factor modification function F recited above allow for a computationally simple implementation of this function, while still achieving realistic modelling of the Doppler effect. This may be particularly true for the explicit example for the pitch factor modification function F as given in equation (1) and equivalents thereof.
Now in
More particularly, as is reflected in
The present invention likewise relates to apparatuses for performing methods and techniques described throughout the present invention.
A computing device implementing the techniques described above can have the following example architecture. Other architectures are possible, including architectures with more or fewer components. In some implementations, the example architecture includes one or more processors (e.g., dual-core Intel® Xeon® Processors), one or more output devices (e.g., LCD), one or more network interfaces, one or more input devices (e.g., mouse, keyboard, touch-sensitive display) and one or more computer-readable mediums (e.g., RAM, ROM, SDRAM, hard disk, optical disk, flash memory, etc.). These components can exchange communications and data over one or more communication channels (e.g., buses), which can utilize various hardware and software for facilitating the transfer of data and control signals between components.
The term “computer-readable medium” refers to a medium that participates in providing instructions to processor for execution, including without limitation, non-volatile media (e.g., optical or magnetic disks), volatile media (e.g., memory) and transmission media.
Transmission media includes, without limitation, coaxial cables, copper wire and fiber optics. Computer-readable medium can further include operating system (e.g., a Linux operating system), network communication module, audio interface manager, audio processing manager and live content distributor. Operating system can be multi-user, multiprocessing, multitasking, multithreading, real time, etc. Operating system performs basic tasks, including but not limited to: recognizing input from and providing output to network interfaces and/or devices; keeping track and managing files and directories on computer-readable mediums (e.g., memory or a storage device); controlling peripheral devices; and managing traffic on the one or more communication channels. Network communications module includes various components for establishing and maintaining network connections (e.g., software for implementing communication protocols, such as TCP/IP, HTTP, etc.).
Architecture can be implemented in a parallel processing or peer-to-peer infrastructure or on a single device with one or more processors. Software can include multiple software components or can be a single body of code.
The described features can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. A computer program is a set of instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result. A computer program can be written in any form of programming language (e.g., Objective-C, Java), including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, a browser-based web application, or other unit suitable for use in a computing environment.
Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and the sole processor or one of multiple processors or cores, of any kind of computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memories for storing instructions and data. Generally, a computer will also include, or be operatively coupled to communicate with, one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
To provide for interaction with a user, the features can be implemented on a computer having a display device such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor or a retina display device for displaying information to the user. The computer can have a touch surface input device (e.g., a touch screen) or a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer. The computer can have a voice input device for receiving voice commands from the user.
The features can be implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system can be connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include, e.g., a LAN, a WAN, and the computers and networks forming the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device). Data generated at the client device (e.g., a result of the user interaction) can be received from the client device at the server.
A system of one or more computers can be configured to perform particular actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the present invention discussions utilizing terms such as “processing”, “computing”, “calculating”, “determining”, “analyzing” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing devices, that manipulate and/or transform data represented as physical, such as electronic, quantities into other data similarly represented as physical quantities.
Reference throughout this invention to “one example embodiment”, “some example embodiments” or “an example embodiment” means that a particular feature, structure or characteristic described in connection with the example embodiment is included in at least one example embodiment of the present invention. Thus, appearances of the phrases “in one example embodiment”, “in some example embodiments” or “in an example embodiment” in various places throughout this invention are not necessarily all referring to the same example embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this invention, in one or more example embodiments.
As used herein, unless otherwise specified the use of the ordinal adjectives “first”, “second”, “third”, etc., to describe a common object, merely indicate that different instances of like objects are being referred to and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner. Also, it is to be understood that the phraseology and terminology used herein are for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having” and variations thereof are meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless specified or limited otherwise, the terms “mounted”, “connected”, “supported”, and “coupled” and variations thereof are used broadly and encompass both direct and indirect mountings, connections, supports, and couplings.
In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.
It should be appreciated that in the above description of example embodiments of the present invention, various features of the present invention are sometimes grouped together in a single example embodiment, Fig., or description thereof for the purpose of streamlining the present invention and aiding in the understanding of one or more of the various inventive aspects. This method of invention, however, is not to be interpreted as reflecting an intention that the claims require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed example embodiment. Thus, the claims following the Description are hereby expressly incorporated into this Description, with each claim standing on its own as a separate example embodiment of this invention.
Furthermore, while some example embodiments described herein include some but not other features included in other example embodiments, combinations of features of different example embodiments are meant to be within the scope of the present invention, and form different example embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed example embodiments can be used in any combination.
In the description provided herein, numerous specific details are set forth. However, it is understood that example embodiments of the present invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Thus, while there has been described what are believed to be the best modes of the present invention, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the spirit of the present invention, and it is intended to claim all such changes and modifications as fall within the scope of the present invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present disclosure.
Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs):
Number | Date | Country | Kind |
---|---|---|---|
21205769.9 | Nov 2021 | EP | regional |
This application claims priority of the following priority application: U.S. provisional application 63/273,185 (reference: D21092USP1), filed 29 Oct. 2021.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/080117 | 10/27/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63273185 | Oct 2021 | US |