This application claims the benefit of the foreign priority of German Patent Application No. 10 2019 135 690.3, filed on Dec. 23, 2019, the entirety of which is incorporated herein by reference.
The invention relates to audio signal processing for binaural virtualization.
Various solutions are known for audio signals and their spatial reproduction, which differ from each other fundamentally. Two important principles are object-based audio, where the positions of the audio sources are given, and channel-based audio, where the positions of the loudspeakers or reproduction transducers respectively are given. E.g. the well-known stereo and 5.1 surround formats are channel-based. Here, a modification of the spatial perception is commonly achieved by the so-called panning, whereby the amplification or amplitude respectively of each reproduction channel can be controlled. This method is therefore known as amplitude panning. However, a considerably stronger spatial effect can be achieved by binaural audio signal processing, generating separate signals for the left and right ear. It uses head-related transfer functions (HRTFs), which are also known as anatomical transfer functions (ATFs).
When using headphones, the respective signals of the channels can be processed with a corresponding HRTF each for the left ear and right ear in order to achieve the same hearing impression as with a stereo playback via loudspeakers. In
A particularly simple alternative for a spatial virtualization in order to give the listener an impression of direction is panning. With panning, the signals are not processed by HRTFs, but the directional effect is only simulated by a sound level difference or volume difference between the left ear and the right ear. Although the spatial impression is less pronounced here, panning has the advantage that each single sound source is perceived clearer. This increases speech intelligibility, for example.
EP2258120 B1 shows the parallel use of equalization and binaural filtering of surround audio signals for correcting the timbre. A channel of a surround audio signal is, on the one hand, filtered by a binaural filter for each side (left/right), and on the other hand delayed and equalized by an equalizer for each side. The two signals belonging to a respective same side are weighted and mixed, wherein for one side an additional delay of the equalized signal is inserted in order to generate interaural time differences (ITD). Further, head-related transfer functions (HRTFs) may be modified in order to compensate for timbral colorations. The head-related transfer functions for the left and right sides are aligned with each other such that the timbral coloration is reduced, which however reduces also the spatial effect.
Binaurally reproduced signals are often perceived as unnatural or unpleasant. Speech is sometimes difficult to understand and music sounds strange and therefore uncomfortable, for example since certain emphases intended by the musician are lost.
A further improvement of the spatial reproduction of audio signals would be desirable.
At least this problem is solved by the present invention. Claim 1 discloses a method for processing an audio signal for binaural virtualization, and in particular for partial binaural virtualization, according to an embodiment of the invention. Claim 14 discloses a corresponding device, according to another embodiment of the invention.
According to the invention, an improvement of the spatial reproduction of audio signals may be achieved by filtering an audio signal such that it is only partially binaurally virtualized. A degree of binaural virtualization can be freely chosen for the audio signal. In one embodiment, a control method is provided that enables a smooth transition between a complete binaural virtualization and a non-binaural virtualization that corresponds to panning. This may be done during mixing, i.e. during the authoring process, or later during post-processing or during playback. Partially, the binaural virtualization may also be effected by the temporal behavior of the filters for both sides, i.e. their phase responses.
According to the invention, the signal processing includes modifying the amplitude responses, corresponding to filtering curves, and/or the phase responses of the HRTFs which correspond to delays of the filters. The amplitude responses and phase responses can in principle be modified independently from each other. Both approaches can be used separately or together.
In particular, the signal processing for a transition from a binaural to a non-binaural virtualization that is perceived as smooth has at least two sections, in one embodiment. In a first section beginning with a complete binaural virtualization and the HRTFs that are usually used for that purpose, these HRTFs are modified with a decreasing binaural virtualization, without modifying their phase behavior or phase responses. In particular, the “dynamic range” of each HRTF is successively reduced until it is zero, i.e. until the HRTF value is frequency independent. This frequency independent value is the gain factor that corresponds to a stereo panning. The “dynamic range” of an HRTF is understood herein as the difference between the highest and the lowest value of the HRTF within a frequency range. In a second section, which in one embodiment is adjacent to the first section, the phase behavior of the HRTF, or the delay respectively, is modified. The delay may be reduced, starting from a value that results from the “dynamic reduced” HRTFs, down to zero (or another constant value that is equal on both sides, left and right). At this point, the signal processing corresponds to the known stereo panning.
Further advantageous embodiments are disclosed in the following description and in the dependent claims.
An advantage of the invention is that audio objects or audio channels can be virtualized to a greater or lesser extent, due to a more binaural or more panning-like rendering or processing. In other words, a degree of binaural processing of an audio object may be freely chosen within a continuous range where the extremes are e. g. a complete binaural processing and a classical amplitude panning. This may be done by using e.g. a control device. A further advantage is that different audio objects or audio channels may be virtualized individually to different degrees and may then be superposed to each other.
Further details and advantageous embodiments are shown in the drawings, wherein
As described above, a second, substantially simpler way of processing is amplitude panning. A conventional amplitude panning for the given target direction DIR is modelled 406, which includes applying a first gain factor Gain_L for a left channel and a second gain factor Gain_R for a right channel to the single channel input audio signal. For example, for a certain given target direction DIR the first gain factor Gain_L may be −10 dB and the second gain factor Gain_R may be −6 dB, leading to a simple spatial virtualization of the audio object at a position rather to the right. For a target direction DIR that is just in front of the listener or behind the listener, both gain factors are usually essentially equal.
In the next step, the amplitude responses of the transformed head-related transfer functions are adjusted 403, 408 to the respective gain factors according to the processing parameter PFC for a degree of binaural virtualization. That is, the amplitude response of the first head-related transfer function HRTFL is brought closer to the first gain factor Gain_L to an extent depending on the processing parameter PFC, and the amplitude response of the second head-related transfer function HRTFR is brought closer to the second gain factor Gain_R to an extent depending on the processing parameter PFC. As explained below in more detail, this can be understood as scaling or compressing the amplitude responses of the HRTFs, approaching them to respective frequency independent target values and resulting in a first modified head-related transfer function HRTFL,mod1 and a second modified head-related transfer function HRTFR,mod1. This adjustment or approaching 403, 408 is stronger if the intended degree of binaural virtualization is lower, and vice versa. In an embodiment, the modified head-related transfer functions for a minimum degree of binaural virtualization are identical with the gain factors Gain_L, Gain_R, while for a maximum degree of binaural virtualization they are identical with the original head-related transfer functions. In an embodiment, the amplitude responses of the original head-related transfer functions are first, in a step 403, scaled or reduced according to the processing parameter PFC and then, in a further step 408, the scaled or reduced head-related transfer functions are adjusted or approached to the gain factors Gain_L, Gain_R by shifting (ie., by amplifying or attenuating the signals). In other embodiments, these steps 403, 408 may be swapped or may be executed simultaneously, or otherwise embedded in any processing.
Finally, filtering functions for the first and second modified head-related transfer functions HRTFL,mod1, HRTFR,mod1 are calculated 411 and transformed back to the complex spectrum. Ten, the filtering coefficients for implementing the filters are calculated 413. A first filter is implemented according to the first modified head-related transfer function HRTFL,mod1 and a second filter is implemented according to the second modified head-related transfer function HRTFR,mod1. Optionally, the modified head-related transfer functions HRTFL,mod1, HRTFR,mod1 may be transformed 412 into the time domain by an inverse Fourier transform before.
In an embodiment, the phase response of the first or second filter, respectively, results directly from the respective first or second modified head-related transfer function HRTFL,mod1, HRTFR,mod1. In another embodiment, however, the phase response of the first or second filter, respectively, may be modified. This modification may be based on the above-mentioned processing parameter PFC but it may also be based on a different second processing parameter PTC. Further details are explained below.
This relationship is depicted in
This relationship is depicted in
Also in
As mentioned above, the processing parameter PC for a degree of binaural virtualization in this example is composed of two separate sections B1,B2, which may be expressed by two separate processing parameters PFC, PTC. This embodiment is particularly advantageous since it results in a change of the spatial effect that is perceived as even. Alternatively, also other variants are possible, e.g. the following for Thr2<Thr1:
Here, the sections of the first processing parameter PFC and second processing parameter PTC overlap and there is a middle range between Thr2 and Thr1 in which both parameters are modified. In some cases. e.g. based upon individual preference, also this variant may be perceived as advantageous. In any case, the respective processing parameter PC, PTC, PFC may in principle be adjusted continuously from 0% to 100%.
Further, the device 600 comprises at least one gain factor determining module 606L,606R for determining a first gain factor Gain_L for the left side and a second gain factor Gain_R for the right side, which gain factors correspond to an amplitude panning for the direction DIR that is associated to the input audio signal 11. A rule or an algorithm for the amplitude panning may be predefined or selectable, such as e.g. Gain_L=0.5*(1+sin(□azimuth,L)) and Gain_R=0.5*(1−sin(□azimuth,R)), wherein □azimuth ∈[−180°, . . . , 180° ] is the respective angle to the front direction. In other embodiments, other audio virtualization rules and in particular other panning rules may be used, which may be based for example on A-B miking (time-of-arrival stereophony) with a given distance between the microphones (base distance). For a pure amplitude panning, the gains are to be set to Gain_L=Gain_R=0.
Further, the device 600 comprises a transformation module 603L,603R each for Fourier transforming 730 the first and second head-related transfer functions HRTFL,ori, HRTFR,ori into the frequency range, resulting in respective transformed transfer functions HRTF′L,ori, HRTF′R,ori. Then the amplitude responses and the phase responses of the transformed transfer functions HRTF′L,ori, HRTF′R,ori may be processed in principle independent from each other.
In an embodiment, the device 600 comprises two scaling and shifting modules 604L, 604R, 608L, 608R, one for each side, left and right. A first scaling and shifting module 604L, 608L for the left-hand side adjusts the amplitude response of the first head-related transfer function HRTF′L,ori to be closer to the first gain factor Gain_L according to a processing parameter PFC by scaling and shifting, for instance according to Mag_out_L=(1−PFC)*mag4L+PFC*Gain_L. This results in an amplitude response Mag_out_L of a first modified head-related transfer function HRTFL,mod1. Likewise, a second scaling and shifting module 604R, 608R for the right-hand side adjusts the amplitude response of the second head-related transfer function HRTF′R,ori to be closer to the second gain factor Gain_R according to the processing parameter PFC by scaling and shifting, for instance according to Mag_out_R=(1−PFC)*mag4R+PFC*Gain_R. This results in an amplitude response Mag_out_R of a second modified head-related transfer function HRTFR,mod1. As described above, the binaural virtualization effect is the stronger, the closer the amplitude responses Mag_out_L, Mag_out_R of the modified head-related transfer functions HRTFL,mod1, HRTFR,mod1 are to the original head-related transfer functions HRTFL,ori, HRTFR,ori. In other words, the approaching of the amplitude responses to the gain factors Gain_L, Gain_R is stronger pronounced for a lower degree of binaural virtualization than for a higher degree of binaural virtualization. This applies at least in a limited frequency range, e.g. below a certain maximum frequency (Nyquist frequency); it needs not necessarily be valid over the full frequency range. Therefore it may be sufficient to apply the processing in the limited frequency range.
The device further comprises for each side a configurable filter 613L, 613R for filtering the input audio signal 11 to obtain the left output signal and right output signal, and a filter configuration module 611L, 611R for each of the configurable filters. The first filter configuration module 611L calculates first filter coefficients from the amplitude response Mag_out_L of the first modified head-related transfer function HRTFL,mod1, and the first configurable filter 613L is configured with the first filter coefficients. The second filter configuration module 611R calculates second filter coefficients from the amplitude response Mag_out_R of the second modified head-related transfer function HRTFR,mod1, and the second configurable filter 613R is configured with the second filter coefficients. By filtering the input audio signal 11 with the first and the second configured filters 613L, 613R, audio signals 11out,L,11out,R are created that are partially binaurally virtualized to a certain degree, according to the associated parameter. They may be reproduced, e.g. via headphones. Each of the above-mentioned modules and filters individually or together may be implemented e.g. by one or more software-configurable processors or computers.
In the embodiment as described above, mainly the amplitude responses of the head-related transfer functions may be modified. In another embodiment, the phase responses or delays respectively of the head-related transfer functions may be modified. Both embodiments are independent from each other and may be combined. Therefore both are shown together in
For modifying the phase responses or delays respectively of the head-related transfer functions HRTFL,ori, HRTFaR,ori, the device 600 may optionally comprise a delay determining module 602L, 602R each for calculating 720 the respective linear delay or group delay LPD2L, LPD2R of the head-related transfer functions HRTFL,ori, HRTFR,ori for the left and right sides as received from the database. Alternatively, these values may also be received from the database, so that they need not be re-calculated again with each call. The Fourier transformation 730 may be performed before or after or concurrently with the step 720 of determining the linear delays. The device 600 further comprises an MLV calculation module 609 for calculating a mean or average linear delay MLV from the linear delays LPD2L, LPD2R of the two sides, for example according to MLV=0.5*(LPD2L+LPD2R).
Further, the device 600 comprises a subtraction module 605L, 605R each for subtracting 740 the respective group delay LPD2L, LPD2R from the phase response of the transformed head-related transfer function HRTF′L,ori, HRTF′R,ori, whereby a normalized first phase response and a normalized second phase response are generated. Since these normalized phase responses may contain phase jumps of 360°, they are unwrapped 750. That is, such phase jumps are eliminated from the phase responses by adding or subtracting 360° or multiples thereof. Unwrapping may also include changing absolute jumps greater than 180° to their 360° complement. The resulting so-called unwrapped phase responses Ang_L, Ang_R are free from phase jumps. The unwrapped phase responses Ang_L, Ang_R are then scaled 760 by interpolation through phase interpolation modules 610L, 610R. The interpolation may be a linear interpolation between the respective unwrapped phase response Ang_L, Ang_R and the average linear delay MLV according to the processing parameter PC, PTC for a certain degree of binaural virtualization, e.g. for the left-hand side according to
LinearDelayL=(1−pTC)*LPD2L+pTC*MLV
Ang_out_L=(1−pTC)*Unwrap(ang5L−LPD2L)+pTC*(LPL+LinearDelayL)
where ang5L is the phase response of the head-related transfer function HRTF′L,ori after Fourier transformation and before unwrapping, and LPL is an optional additional delay. This results in the modified phase responses Ang_out_L, Ang_out_R that are then fed to the filters 613L, 613R. The phase responses may optionally be modified by adding 770 a (possibly constant) delay LPL, LPR, which may be received from a panning module 607L, 607R that models a runtime panning. The respective additional delay for the left and right side may depend on the direction DIR.
From the modified phase responses Ang_out_L, Ang_out_R and/or the interpolated amplitude responses Mag_out_L, Mag_out_R, the modified head-related transfer functions HRTFL,mod1, HRTFR,mod1 or their coefficients respectively for configuring the filters 613L, 613R may be generated in the filter configuration modules 611L, 611R. Before configuring the filters, the modified filtering functions including the modified phase responses Ang_out_L, Ang_out_R may optionally be re-transformed 780 into the time domain by inverse Fourier transformation 612L, 612R if required.
From the interpolation results the desired phase response Ang_out_L, Ang_out_R, which is combined with the desired amplitude response Mag_out_L, Mag_out_R so as to obtain the target head-related transfer functions HRTFL,mod1, HRTFR,mod1. Thus, the filtering function is formed or determined respectively 411, from which then the filtering coefficients are determined 413 directly or after an optional inverse Fourier transformation 412, 612.
The device for superimposing multiple audio sources may comprise a plurality of separate devices 600 for processing single channel input audio signals each, as described above. The devices may also be integrated into a single device, however, which may lead to synergy effects (e.g. a shared database). Further, there may be cases where it is useful to perform the above-described processing for only one of the sides, left or right, while the audio signal for the other side may be processed differently.
It should be noted that the invention is not only applicable for gradual binaural virtualization, but also for gradual transaural virtualization. A device 600 for binaural virtualization differs from a device for transaural virtualization mainly in the type of transfer functions that are provided by the database.
The processing parameters PC, PTC, PFC or classification parameters PTyp respectively may be stored as metadata for later use in the input audio signals, e.g. for real-time rendering in a playback device during reproduction. Thus, for example, a system may be realized in which a head tracker provides additional information about the position and orientation of the listener. Apart from the real-time processing, the used parameters may also be defined and stored in advance, e.g. by a sound engineer. Tus, the invention may provide to sound engineers new tools for continuously controlling a gradual degree of tonal changes with respect to spectrum and/or phase. Moreover, the parameter values and their changes over time may be stored. Instead of assigning only a single value to the whole audio signal, the signal may be subdivided into blocks (e.g. of 1 ms length or for the length of a scene) and individual parameter values may be assigned to each of these blocks. Audible artifacts may be minimized by suitable windowing and cross-fading.
The invention is particularly advantageous for audio processing devices, for example. It may be implemented based on a configurable computer or processor, in an exemplary embodiment. The configuration may be achieved by a computer-readable storage medium having stored thereon instructions that when executed on a computer cause the computer to perform a method as described above.
Various combinations of the above-described features with each other or with further features are considered to be within the scope of the invention, even if such combination is not expressly mentioned herein.
Number | Date | Country | Kind |
---|---|---|---|
102019135690.3 | Dec 2019 | DE | national |
Number | Name | Date | Kind |
---|---|---|---|
20160266865 | Tsingos | Sep 2016 | A1 |
Number | Date | Country |
---|---|---|
3 063 955 | Oct 2019 | EP |
Number | Date | Country | |
---|---|---|---|
20210195361 A1 | Jun 2021 | US |