The present disclosure is directed to a method for acoustic scene playback and an apparatus for acoustic scene playback.
In classical recording technologies, a surround image of spatial audio scenes, also called acoustic scenes or sound scenes, is captured and reproduced at a single listener's perspective in an original sound scene. Single-perspective recordings are typically achieved by stereophonic (channel-based) recording and reproduction technologies or Ambisonic recording and reproduction technologies (scene-based). The emerging possibilities of interactive audio displays and the generalization of audio transmission media away from cassettes or CDs to more flexible media allows for a more dynamic usage of audio, e.g. interactive client-side audio rendering of multi-channel data, or server side rendering and transmission of individually pre-rendered audio streams for clients. While already common in gaming, the before mentioned technologies are seldomly used for the reproduction of recorded audio scenes.
So far, traversing a sound scene in reproduction has been implemented only by audio rendering based on individually isolated recordings of the involved sounds and additional recordings or rendering of reverberation (object-based). By changing the arrangement of the recorded sources, the playback perspective at the reproduction side could be adapted.
Furthermore, another possibility is to extrapolate a parallax adjustment to a create an impression of perspective change from one single perspective recording by re-mapping a directional audio coding. This is done by assuming that source positions are obtained after projecting their directions onto a convex hull. This arrangement relies on time variant signal filtering using the spectral disjointness assumption for direct/early sounds. However, this can cause signal degradation. Furthermore, the assumption that sources are positioned on a convex hull will only work for small position changes.
Therefore, the prior art suffers from the limitations that when an object-based audio rendering is used to render a walkthrough, an explicit knowledge of the room properties, source locations and properties of the sources itself is required. Furthermore, obtaining an object based representation from a real scene is a difficult task and requires either many microphones close to all desired sources, or source separation techniques to extract the individual sources from a mix. As a result, object-based solutions are only practical for synthetic scenes, but cannot be used for achieving a high quality walkthrough in real acoustic scenes.
The present disclosure allows to solve the deficiencies of the prior art and allows for continuously varying a virtual listening position for audio playback within a real, recorded acoustic scene during playback of sound of the acoustic scene at the virtual listening position. Therefore, the present disclosure allows to solve the problem of having an improved method and apparatus for acoustic scene playback. Advantageous implementation forms of the present disclosure are provided in the respective dependent claims.
In a first aspect, a method for acoustic scene playback is provided, wherein the method comprises:
providing recording data comprising microphone signals of one or more microphone setups positioned within an acoustic scene and microphone metadata of the one or more microphone setups, wherein each of the one or more microphone setups comprises one or more microphones and has a recording spot which is a center position of the respective microphone setup;
specifying a virtual listening position, wherein the virtual listening position is a position within the acoustic scene;
assigning each microphone setup of the one or more microphone setups one or more Virtual Loudspeaker Objects, VLOs, wherein each VLO is an abstract sound output object within a virtual free field;
generating an encoded data stream based on the recording data, the virtual listening position and VLO parameters of the VLOs assigned to the one or more microphone setups;
decoding the encoded data stream based on a playback setup, thereby generating a decoded data stream; and
feeding the decoded data stream to a rendering device, thereby driving the loudspeaker device to reproduce sound of the acoustic scene at the virtual listening position.
The virtual free field is an abstract (i.e. virtual) sound field that consists of direct sound without reverberant sound. Virtual means modelled or represented on a machine, e.g., on a computer, or on a system of interacting computers. The acoustic scene is a spatial region together with the sound in that spatial region and may be alternatively referred to as a sound field or spatial audio scene instead of acoustic scene. Further, the rendering device can be one or more loudspeakers and/or one or more headphones. Therefore, a listener listening to the reproduced sound of the acoustic scene of the virtual listening position is enabled to change the desired virtual listening position and virtually traverse the acoustic scene. In this way, the listener is enabled to newly experience or re-experience an entire acoustic venue, for example, a concert. The user can walk through the entire acoustic scene and listen from any point in the scene. The user can thus explore the entire acoustic scene in an interactive manner by determining and inputting a desired position within the acoustic scene and can then listen to the sound of the acoustic scene at the selected position. For example, in a concert, the user can choose to listen from the back, within the crowd, right in front of the stage or even on the stage surrounded by the musicians. Furthermore, applications in virtual reality (VR) to extend from a rotation to also enable translation are conceivable. In embodiments of the present disclosure only the recording positions and the virtual listening positions have to be known. Therefore, in the present disclosure no information concerning the acoustic sources (for example the musicians), such as their number, positions or orientations is required. In particular, due to the usage of the virtual loudspeaker objects, VLOs, the spatial distribution of sound sources is inherently encoded without the need to estimate the actual position. Further, the room properties, such as reverberations are also inherently encoded and driving signals for driving the VLOs are used that do not correspond to source signals, thus eliminating the need to record or estimate the actual source signals. The driving signals are derived from the microphone signals by data independent linear processing. Further, embodiments of the present disclosure are computationally efficient and allow for both, real-time encoding and rendering. Hence, the listener is enabled to interactively change the desired virtual listening position and virtually traverse the (recorded) acoustic scene (e.g. a concert). Due to the computational efficiency of the disclosure the acoustic scene can be streamed to a far-end, for example, the playback apparatus, in real time. The present disclosure does not rely on prior information about the number or position of sound sources. Similar to classical single-perspective stereophonic or surround recording techniques all source parameters can be inherently encoded and need not to be estimated. Contrary to object-based audio approaches, source signals need not be isolated, thus avoiding the need for close microphones and audible artefacts due to source signal separation.
Virtual Loudspeaker Objects (VLOs) can be implemented on a computer; for example, as objects in an object-based spatial audio layer. Each VLO can represent a mixture of sources, early reflections, and diffuse sound. In this context, a source is a localized acoustic source such as an individual person speaking or singing, or a musical instrument, or a physical loudspeaker. Generally, a union of several (i.e. two or more) VLOs will be required to reproduce an acoustic scene.
In a first implementation form of the method according to the first aspect, after the assigning each microphone setup one or more VLOs, for each microphone setup, positioning the one or more VLOs within the virtual sound field at a position corresponding to the recording spot of the respective microphone setup within the acoustic scene.
This contributes for virtually setting up a virtual reproduction system consisting of the VLOs for each recording spot in one common virtual free field. Therefore, these features of the first implementation form contribute for arriving at an arrangement in which a user can vary the virtual listening position for audio playback within a real recorded acoustic scene during playback of the signal corresponding to the chosen virtual listening position.
In a second implementation form of the method according to the first aspect, the VLO parameters comprise one or more static VLO parameters which are independent of the virtual listening position and describe properties, which are fixed for the acoustic scene playback, of the one or more VLOs.
Therefore, the VLO parameters of the VLOs within the virtual free field describe properties of the VLOs, which are fixed for a specific playback setup arrangement, which contributes for adequately setting up a reproduction system in the virtual free field and describing the properties of the VLOs within the virtual free field. The playback setup arrangement for example refers to the properties of the playback apparatus itself, like for example, if playback is done by using loudspeakers provided within a room or headphones.
In a third implementation form of the method according to the first aspect, the method further comprises, before generating the encoded data stream, computing the one or more static VLO parameters based on the microphone metadata and/or a critical distance, wherein the critical distance is a distance at which a sound pressure level of the direct sound and a sound pressure level of the reverberant sound are equal for a directional source or, before generating the encoded data stream, receiving the one or more static VLO parameters from a transmission apparatus.
The static VLO parameters can thus be calculated within the playback apparatus or can be received from elsewhere, e.g., from a transmission apparatus. Furthermore, since the static VLO parameters take into account the microphone metadata and/or the critical distance, the static VLO parameters take into account parameters at the time point when the acoustic scene was recorded, so that as realistic as possible a certain sound corresponding to a certain virtual listening position can be played back by the playback apparatus.
In a fourth implementation form of the method according to the first aspect, the one or more static VLO parameters include for each of the one or more microphone setups: a number of VLOs, and/or a distance of each VLO to the recording spot of the respective microphone setup, and/or an angular layout of the one or more VLOs that have been assigned to the respective microphone setup (e.g, with respect to an orientation of the one or more microphones of the respective microphone setup), and/or a mixing matrix Bi which defines a mixing of the microphone signals of the respective microphone setup.
Accordingly, these static VLO parameters are parameters which are fixed for a certain acoustic scene playback and do not change during playback of the acoustic scene and which do not depend on the chosen virtual listening position.
In a fifth implementation form of the method according to the first aspect, the VLO parameters comprise one or more dynamic VLO parameters which depend on the virtual listening position and the method comprises, before generating the encoded stream, computing the one or more dynamic VLO parameters based on the virtual listening position, or receiving the one or more dynamic VLO parameters from a transmission apparatus.
Thus not only the static VLO parameters, but also the dynamic VLO parameters can be easily generated within the playback apparatus or can be received from a separate (e.g., distant) transmission apparatus. Furthermore, the dynamic VLO parameters depend on the chosen virtual listening position, so that the sound played back will depend on the chosen virtual listening position via the dynamic VLO parameters.
In a sixth implementation form of the method according to the first aspect the one or more dynamic VLO parameters include for each of the one or more microphone setups: one or more VLO gains, wherein each VLO gain is a gain of a control signal of a corresponding VLO, and/or one or more VLO delays, wherein each VLO delay is a time delay of an acoustic wave propagating from the corresponding VLO to the virtual listening position, and/or one or more VLO incident angles, wherein each VLO incident angle is an angle between a line connecting the recording spot and the corresponding VLO and a line connecting the corresponding VLO and the virtual listening position, and/or one or more parameters indicating a radiation directivity of the corresponding VLO.
By the provision of the VLO gains a proximity regularization can be performed by regulating the gain dependent on the distance between the corresponding VLO corresponding to the VLO gain and the virtual listening position. Further, a direction dependency can be ensured, since the VLO gain can be dependent on the virtual listening position relative to the position of the VLO within the virtual free field. Therefore, a much more realistic sound impression can be delivered to the listener. Further, the VLO delays, VLO incident angles and parameters indicating the radiation directivity also contribute for arriving at a realistic sound impression.
In a seventh implementation form of the method according to the first aspect, the method further comprises, before generating the encoded data stream, computing an interactive VLO format comprising for each recording spot and for each VLO assigned to the recording spot a resulting signal {tilde over (x)}ij(t) and an incident angle with φij with {tilde over (x)}ij(t)=gijxij(t−τij), wherein gij is a gain factor of a control signal xij of a j-th VLO of a i-the recording spot, τij is a time delay of an acoustic wave propagating from the j-th VLO of the i-th recording spot to the virtual listening position, and t indicates time, wherein the incident angle φij is an angle between a line connecting the i-th recording spot and the j-th VLO of the i-th recording spot and a line connecting the j-th VLO of the i-th recording spot and the virtual listening position. Therefore, a certain interactive VLO format can be effectively used as input for the encoding, so that this interactive VLO format helps for effectively performing encoding.
In an eighth implementation form of the method according to the first aspect the gain factor gij depends on the incident angle φij and a distance dij between the j-th VLO of the i-th recording spot and the virtual listening position.
Therefore, proximity regularization is possible in case the virtual listening position is close to a corresponding VLO, wherein furthermore, the direction dependency can be ensured, so that the gain factor acknowledges both the proximity regularization and the direction dependency.
In a ninth implementation form of the method according to the first aspect, for generating the encoded data stream, each resulting signal {tilde over (x)}ij(t) and incident angle φij is input to an encoder, in particular an ambisonic encoder.
Therefore, a prior art ambisonic encoder can be used, wherein specific signals are fed into the amibsonic encoder for encoding, namely each resulting signal {tilde over (x)}ij(t) and incident angle φij for arriving at the above mentioned effects with respect to the first aspect. Therefore, the present disclosure according to the first aspect or any implementation form also provides for a very simple and cheap arrangement in which prior art ambisonic encoders can be used for enabling the present disclosure.
In a tenth implementation form of the method according to the first aspect, for each of the one or more microphone setups, the one or more VLOs assigned to the respective microphone setup are provided on a circular line having the recording spot of the respective microphone setup as a center of the circular line within the virtual free field, and a radius Ri of the circular line depends on a directivity order of the microphone setup, a reverberation of the acoustic scene and an average distance di between the recording spot of the respective microphone setup and recording spots of neighboring microphone setups.
The VLOs can thus be effectively arranged within the virtual free field, which provides a very simple arrangement for obtaining the effects of the present disclosure.
In an eleventh implementation form of the method according to the first aspect a number of VLOs on the circular line and/or an angular location of each VLO on the circular line, and/or a directivity of the acoustic radiation of each VLO on the circular line depends on a microphone directivity order of the respective microphone setup and/or on a recording concept of the respective microphone setup and/or on the radius Ri of the recording spot of the i-th microphone setup and/or a distance dij between a j-th VLO of the i-th microphone setup and the virtual listening position.
These features contribute to generating a realistic sound impression for the listener and contribute to all advantages already mentioned above with respect to the first aspect.
In a twelfth implementation form of the method according to the first aspect, for providing the recording data, the recording data are received from outside (i.e. from outside the apparatus in which the VLOs are implemented), in particular by applying streaming.
This enables that the recording data do not have to be generated within any playback apparatus but can simply be received from, for example, a certain corresponding transmission apparatus, wherein for example the transmission apparatus is recording a certain acoustic scene, for example, a concert and supplies in a live stream the recorded data to the playback apparatus. Subsequently, the playback apparatus can then perform the herewith provided method for acoustic scene playback. Therefore, in the present disclosure a live stream of the acoustic scene, for example, a concert, can be enabled. The VLO parameters in the present disclosure can be adjusted in real time dependent on the chosen virtual listening position. Therefore, the present disclosure is computationally efficient and allows for both, real time encoding and rendering. Hence, the listener is enabled to interactively change the desire to virtual listening position and virtually traverse the recorded acoustic scene. Due to the computational efficiency of the present disclosure an acoustic scene can be streamed to the playback apparatus in real time.
In a thirteenth implementation form of the method according to the first aspect, for providing the recording data, the recording data are fetched from a recording medium, in particular from a CD-ROM.
This is a further possibility for providing the recording data to the playback apparatus, namely by inserting a CD-ROM into the playback apparatus, wherein the recording data are fetched from this CD-ROM and therefore provided for the acoustic scene playback.
According to a second aspect a playback apparatus or a computer program or both are provided. The playback apparatus is configured to perform a method according to the first aspect (in particular, according to any of its implementation forms). The computer program may be provided on a data carrier and can instruct the playback apparatus to perform a method according to the first aspect (in particular, according to any of its implementation forms) when the computer program is run on a computer.
Generally, it has to be noted that all arrangements, devices, elements, units and means and so forth, described in the present application, could be implemented by software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionality described to be performed by the various entities are intended to mean that the respective entity is adapted or configured to perform the respective steps and functionalities. Even if in the following description of specific embodiments a specific functionality or step to be performed by a general entity is not reflected in the description of a specific detailed element of that entity, which performs that specific step or functionality, it should be clear for a skilled person that these elements can be implemented in respective hardware or software elements or any kind of combination thereof. Further, the method of the present disclosure and its various steps are embodied in the functionalities of the various described apparatus elements.
Subsequently, in step 210, the virtual listening position can be specified. The virtual listening position is a position within the acoustic scene. The specifying of the virtual position can, for example, be done by a user using the playback apparatus. For example, the user may be enabled to specify the virtual listening position by typing in a specific virtual listening position into the playback apparatus. However, the specifying the virtual listening position is not restricted to this example and could also be done in an automated manner without manual input of the listener. For example, it is conceivable that the virtual listening positions are read from a CD-ROM or fetched from a storage unit and are therefore not manually determined by any listener.
Furthermore, in a subsequent step 220, each microphone setup of the one or more microphone setups can be assigned one or more virtual loudspeaker objects, VLOs. Each microphone setup comprises (or defines) a recording spot which is a center position of the microphone setup. Each VLO is an abstract sound output object within a virtual free field. The virtual sound field is an abstract sound field consisting of direct sound without reverberant sound. This method step 220 contributes to the advantages of the embodiments of the present disclosure to virtually set up a reproduction system comprising the VLOs for each recording spot in the virtual free field. In the embodiments of the present disclosure the desired effect, i.e. reproducing sound of the acoustic scene at the desired virtual listening position is obtained using virtual loudspeaker objects, VLOs. These VLOs are abstract sound objects that are placed in the virtual free field.
In a step 230, an encoded data stream is generated (e.g., in a playback phase after a recording phase) based on the recording data, the virtual listening position and VLO parameters of the VLOs assigned to the one or more microphone setups. The encoded data stream may be generated by virtually driving, for each of the one or more microphone setups, the one or more VLOs assigned to the respective microphone setup so that these one or more VLOs virtually reproduce the sound that was recorded by the respective microphone setup. The virtual sound at the virtual listening position may then be obtained by superposing (i.e. by forming a linear combination of) the virtual sound from all the VLOs of the method (i.e. from the VLOs of all the microphone setups) at the virtual listening position.
In step 240, the encoded data stream is decoded based on a playback setup, thereby generating a decoded data stream. In this context, the playback setup can be a setup corresponding to a loudspeaker array arranged, for example, in a certain room in a home where the listener wants to listen to sound corresponding to the virtual listening position, or headphones, which the listener wears when listening to the sound of the acoustic scene at the virtual listening position.
Furthermore, this decoded data stream can then, in a step 250, be fed to a rendering device, thereby driving the rendering device to reproduce sound of the acoustic scene at the virtual listening position. The rendering device can be one or more loudspeakers and/or headphones.
Therefore, it is possible to allow a user of a certain playback apparatus to vary a desired virtual listening position for (3D) audio playback within a real, recorded acoustic scene. For example, a user is thus enabled to walk through the entire acoustic scene and listen from any point in the scene. Accordingly, the user can explore the entire acoustic scene in an interactive manner by inputting the desired virtual listening position in a playback apparatus. In the present disclosure, according to the embodiment of
Ri=c0 max(di,3m)
Here, c0 is a design parameter that depends on a directivity order of the microphone and on the reverberation of the recording room (in particular the critical distance rH being the distance at which the sound pressure level of the direct sound and the reverberant sound are equal for a directional source). Therefore, for a microphone directivity order N=0, c0 is 0, and for a microphone directivity order N≥1, for an reverberant room (low rH≤1 m) c0 is 0.4, for an “average room” (rH≈2 m) c0 is 0.5, and for a dry room (rH≥3 m) c0 is 0.6. The number Li of virtual loudspeakers for the signals of the microphone array at the i-th recording spot, the angular location of the individual virtual loudspeaker objects as well as the virtual loudspeaker directivity control depends on the microphone directivity order Ni, on the channel or scene based recording concept of the microphone array, and on the radius Ri of the arrangement of the virtual loudspeakers around the end point of vector ri and furthermore depends of the distance dij between the j-th VLO of the i-th recording spot to the virtual listening position.
Further, for a directivity order Ni=0 and a single microphone, Li=1 for the i-th recording spot, no virtual loudspeaker directivity control of a virtual acoustic wave directivity is provided (omni-directional pattern). In this case, the virtual loudspeaker object is provided at the recording position of the single microphone.
Furthermore, for the case of having Ni≥1 one has to decide between two cases, namely a channel-based microphone array and a scene-based microphone array:
Otherwise, instead of the default adjustment, whenever there is a standard loudspeaker layout for a channel-based microphone array setup, this layout is used for positioning the VLOs on Ri for the i-th recording spot. This can be the case for ORTF with a playback loudspeaker pair dedicated to the two-channel stereo directions ±110°.
Further, for the scene based microphone arrays (Ambisonic microphone arrays), the arrangement of the VLOs might be potentially overlapping in the virtual free field. To avoid this, each arrangement of VLOs assigned to a corresponding recording spot is rotated with respect to the other arrangements of VLOs in the free virtual field, so that a minimal distance of the neighboring arrangements of VLOs becomes maximal.
In this way, the positions of the VLOs corresponding to the corresponding recording spots can be determined within the virtual free field. As said above,
Furthermore, all other method steps shown in
In this context, the function f(φij, dij) is exemplarily shown in
wherein
wherein dmin indicates the start of the linear interpolation towards dij=0 for φij=0°, and dmin2 indicates a limit of the linear interpolation, which is provided in the interval from dmin2 to dmin for φij=180° as indicated in
There, the first term
indicates the distance regularization and the second term α+(1−α) cos φij indicates the direction dependency of the virtual acoustic waves radiated by the corresponding VLO.
The radiation characteristics of each VLO can be adjusted, so that the interactive directivity (depending on the virtual listening position) distinguishes between “inside” and “outside” within an arrangement of VLOs corresponding to a corresponding microphone setup in a way that a signal amplitude for the dominant “outside” is reduced in order to avoid dislocation at the diffuse end far field. Furthermore, the directivity is formulated in a mix of omni-directional and figure-of-eight directivity patterns with controllable order
wherein α and β indicate parameters with which the direction dependency of a virtual acoustic wave radiated by the corresponding VLO is calculated. There, α determines the weight of the omni-directional radiation and β determines the weight of the figure-of-eight directivity pattern of the above-mentioned expression. Furthermore, also directivity patterns in the shape of hemispheric slepian functions are also conceivable. Furthermore, in particular, for a large distance dij between the virtual loudspeaker object and the virtual listening position, a backwards amplitude of each VLO can be lowered by controlling α. An implementation example would be that the backwards amplitude for the corresponding VLO for dij≤1 m is α=1 and the backwards amplitude of VLO for dij≥3 m is α=0, wherein in between a linear interpolation is provided. Furthermore, the exponent β controls the selectivity between inside and outside at great distances dij between the virtual listening position and the j-the VLO of the i-the recording spot, such that the localization mismatch or unnecessary diffuse appearance of distant acoustic sources are minimized. An implementation example would be that the distance dij≤3 m, so that β=1 and when the distance dij is ≥6 m, then β=2, wherein a linear interpolation is provided in between. In this way, the recording positions are getting suppressed that cannot be part of a common acoustic convex hull of a distant or diffuse audio scene due to their orientation. In this context,
Furthermore, all other steps in the embodiment according to
An example for performing method step 229, i.e. generating the interactive VLO format, can also be seen in
xi(t)=Bisi(t),
where xi(t)=[xi1(t), xi2(t), . . . , xiL
This can also be clearly seen in
In
The block diagram of
In the case that the virtual listening position is at the center position of the recording spot (recording position) signals of a virtual loudspeaker object join free of disturbing interference: typical acoustical delays are between 10-50 ms. Together with distance-related attenuation, a mix of hereby audio technically uncorrelated signals will not yield to any disturbing timbral interferences. Furthermore, a precedence effect supports proper localization at all recording positions. Furthermore, in case of a few virtual loudspeaker objects per playback spot in the free virtual field, the multitude of other playback spots supports localization and room impression.
However, for the case that the virtual listening position is off the center position of any recording spot, potential localization confusion can be avoided by adjusting position, gain and delay of corresponding virtual loudspeaker objects depending on the virtual listening position. Furthermore, interferences are reduced by choosing suitable distances between the virtual loudspeakers, which controls phase and delay properties to ensure high sound quality. The arrangement and therefore positions of the VLOs assigned to a corresponding recording spot can be automatically generated from the metadata of the microphone setups. This yields to an arrangement of VLOs whose superimposed playback is controllable so as to achieve the following properties for arbitrary virtual listening positions: Perceived interference (phase) is minimized by optimally considering the phenomena of the auditory precedence effect. In particular, the localization dominance can be exploited by selecting suitable distances between the virtual loudspeaker objects with respect to each other. In doing so, the acoustic propagation delays are adjusted so as to reach excellent sound quality. Furthermore, the angular distance of the virtual loudspeaker objects with respect to each other is chosen so as to yield the largest achievable stability of the phantom source, which will then depend on the order of the gradient microphone directivities associated with the virtual loudspeaker object, the critical distance of the room reverberations, and the degree of coverage of the recorded acoustic scene by the microphones.
In
where yN are circular or spherical harmonics evaluated at the VLO incident angles φij corresponding to the current virtual listener position. Further, Li refers to the number of VLOs for the i-th microphone recording spot and P indicates the total number of microphone setups within the acoustic scene. The recommended order of encoding is larger than 3, typically order 5 gives stable results.
Furthermore, with respect to the decoding, the decoding of scene-based material uses headphone or loudspeaker-based HOA decoding methods. In general, the most flexible and therefore the most favored decoding method to loudspeakers or in the case of headphone playback to a set of head-related impulse responses (HRIRs) is called ALLRAD. Other methods can be used, such as decoding by sampling, energy preservation, or regularized mode matching. All these methods yield similar performance on directionally well-distributed loudspeaker or HRIR layouts. Decoders typically use a frequency-independent matrix to obtain the signals for the loudspeakers of known setup directions or for being convoluted with a given set of HRIRs:
y(t)=DχN(t)
On headphone-based playback, the directional signals y(t) are convolved with the right and the left HRIRs of the corresponding directions and then summed up per ear:
To achieve the representation of a static virtual audio scene, head rotation β measured by head tracking has to be compensated for in headphone-based playback. In order to keep the set of HRIR static, this is preferably done by modifying the Ambisonic signal with a rotation matrix before decoding to the HRIR set
χ′N(t)=R(−β)χN(t)
The playback apparatus, which is configured to perform the methods for acoustic scene playback, can comprise a processor and a storage medium, wherein the processor is configured to perform any of the method steps and the storage medium is configured to store microphone signals and/or metadata of one or more microphone setups, the static and/or dynamic VLO parameters and/or any information necessary for performing the methods of the embodiments of the present disclosure. The storage medium can also store a computer program containing program code for performing the methods of the embodiments and the processor is configured to read the program code and perform the method steps of the embodiments of the present disclosure according to the program code. In a further embodiment, the playback apparatus can also comprise units, which are configured to perform the method steps of the disclosed embodiments, wherein for each method step a corresponding unit can be provided dedicated to perform the assigned method steps. Alternatively, a certain unit within the playback apparatus can be configured to perform more than one method step disclosed in the embodiments of the present disclosure.
The disclosure has been described in conjunction with various embodiments herein. However, other variations to the enclosed embodiments can be understood and effected by those skilled in the art and practicing the claimed disclosure, from a study of the drawings, the disclosure and the appended claims. In these claims, the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single processor or another unit may fulfill the function of several items recited in the claims. The mere effect that certain measures are recited in mutually different dependent claims does not indicate that a combination of these features cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid state medium supplied together with or as part of the other hardware, but may also be distributed in other forms, such as via the internet or other wired or wireless telecommunication systems.
This application is a continuation of International Application No. PCT/EP2016/075595, filed on Oct. 25, 2016, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
7394904 | Bruno | Jul 2008 | B2 |
20100092014 | Strauss | Apr 2010 | A1 |
20110002469 | Ojala | Jan 2011 | A1 |
20110261973 | Nelson | Oct 2011 | A1 |
20150230040 | Squires et al. | Aug 2015 | A1 |
20190253821 | Buchner | Aug 2019 | A1 |
Number | Date | Country |
---|---|---|
1302426 | Jul 2001 | CN |
104581604 | Apr 2015 | CN |
Entry |
---|
Spors et al., “Spatial Sound With Loudspeakers and Its Perception:A Review of the Current State,” Proceedings of the IEEE, vol. 101, No. 9, XP011524153, pp. 1920-1938, Institute of Electrical and Electronics Engineers, New York, New York (Sep. 1, 2013). |
Schroder et al., “RAVEN: A Real-Time Framework for the Auralization of Interactive Virtual Environments,” European Acoustics Association, Forum Acusticum, pp. 1541-1546 (2011). |
Number | Date | Country | |
---|---|---|---|
20190253826 A1 | Aug 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2016/075595 | Oct 2016 | US |
Child | 16393602 | US |