Generally, the disclosure relates to the field of audio signal processing, and in particular, to an audio signal processing apparatus and method allowing for generating a binaural audio signal from a virtual target position.
The human ears can locate sounds in three dimensions in range (distance), in direction above and below (elevation), in front and in rear (azimuth), as well as to either (right or left) side. The properties of sound received by an ear from some point of space can be characterized by head-related transfer functions (HRTFs). Therefore, a pair of HRTFs for two ears can be used to synthesize a binaural sound that seems to come from a target position, i.e. a virtual target position.
Many applications of three dimensional (3D) audio using headphones, such as virtual reality, spatial teleconferencing, virtual surround, require high quality HRTF datasets, which contain transfer functions for all necessary directions. Some forms of HRTF-processing have also been included in computer software to simulate surround sound playback from loudspeakers. However, measuring HRTFs for all azimuth angles is a tedious task, which involves hardware and materials. Moreover, the memory required to store the database of measured HRTFs can be very large. Additionally, using personalized HRTFs can further improve the sound experience, but acquiring them complicates the process of the synthesis of 3D sound.
The idea of a fully parametric model for deriving HRTFs to synthesize binaural sound has been proposed in R. O. Duda. “Modeling head related transfer functions”, 27th Asilomar Conference on Signals. Systems and Computers, 1993 and V. R. Algazi et al. “The use of head-and-torso models for improved spatial sound synthesis”, Audio Engineering Society (AES) 113th Convention, October 2002. However, for realistic binaural sound rendering the obtained HRTFs are not accurate enough, since these models strongly deviate from the personalized HRTFs.
A lot of research has been conducted to develop a method to obtain HRTFs that would not strongly deviate from personalized (user specific) HRTFs. 3D HRTFs interpolation can be used to obtain estimated HRTFs at the desired source position from measured HRTFs, as demonstrated in H. Gamper, “Head-related transfer function interpolation in azimuth, elevation and distance”, Journal of the Acoustical Society of America (JASA) Express Letters, 2013. This technique requires HRTFs measured at nearby positions, e.g. four measurements forming a tetrahedral enclosing the desired position. Additionally, it is hard to achieve a correct elevation perception with this technique.
Thus, there is a need for an improved audio signal processing apparatus and method allowing for generating a binaural audio signal from a virtual target position.
It is an object of the disclosure to provide an improved audio signal processing apparatus and method allowing for generating a binaural audio signal from a virtual target position.
This object is achieved by the feature of independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
According to a first aspect, the disclosure relates to an audio signal processing apparatus for processing an input audio signal to be transmitted to a listener in such a way that the listener perceives the input audio signal to come from a virtual target position defined by an azimuth angle and an elevation angle relative to the listener, the audio signal processing apparatus comprising a memory configured to store a set of pairs of predefined left ear and right ear transfer functions, which are predefined for a plurality of reference positions relative to the listener, wherein the plurality of reference positions lie in a two-dimensional plane, a determiner configured to determine a pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position and an adjustment filter configured to filter the input audio signal on the basis of the determined pair of left ear and right car transfer functions and an adjustment function configured to adjust a delay between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions and a frequency dependence of the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position in order to obtain a left ear output audio signal and a right ear output audio signal.
Thus, an improved audio signal processing apparatus allowing for generating a binaural audio signal from a virtual target position is provided. In particular, the audio signal processing apparatus according to the first aspect allows extending a set of predefined transfer functions defined for virtual target positions in a two-dimensional plane, for instance in the horizontal plane (which for a given scenario are very often already available), relative to the listener, in a computationally efficient manner to the third dimension, i.e. to virtual target positions above or below this plane. This has, for instance, the beneficial effect that the memory required for storing the predefined transfer functions is significantly reduced.
The set of pairs of predefined left ear and right ear transfer functions can comprise pairs of predefined left ear and right ear head related transfer functions.
The set of pairs of predefined left ear and right ear transfer functions can comprise measured left ear and right ear transfer functions and/or modelled left ear and right ear transfer functions. Thus, the audio signal processing apparatus according to the first aspect can use a database of user-specific measured transfer functions for a more realistic sound perception or modelled transfer functions, if user-specific measured transfer functions are not available.
In a first possible implementation form of the audio signal processing apparatus according to the first aspect as such, the adjustment filter is configured to adjust the delay between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position by compensating for sound travel time differences associated with the distance between the virtual target position and a left ear of the listener and the distance between the virtual target position and a right ear of the listener.
By introducing a delay as a function of the azimuth angle and/or the elevation angle of the virtual target position, sound travel time differences can be compensated resulting in a more realistic sound perception by the listener.
In a second possible implementation form of the audio signal processing apparatus according to the first aspect as such or the first implementation form thereof, the adjustment filter is configured to adjust the delay between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position on the basis of the following equations:
wherein τL denotes a delay applied to the left ear transfer function, wherein τR denotes a delay applied to the right ear transfer function and wherein τ and Θ are defined on the basis of the following equations:
wherein τ denotes a delay in seconds, c denotes the velocity of sound, a denotes a parameter associated with the head of a listener, θ denotes the azimuth angle of the virtual target position and ϕ denotes the elevation angle of the virtual target position.
Thus, a delay for compensating sound travel time differences as a function of the azimuth angle and/or the elevation angle of the virtual target position can be determined in a computationally efficient way.
In a third possible implementation form of the audio signal processing apparatus according to the first aspect as such or the first or second implementation form thereof, the adjustment filter is configured to adjust the frequency dependence of the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position on the basis of a plurality of infinite impulse response filters, wherein the plurality of infinite impulse response filters are configured to approximate at least a portion of the frequency dependence of a left ear transfer function and a right ear transfer function of a plurality of pairs of measured left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position.
By approximating measured transfer functions by infinite impulse response filters and considering only the main spectral features thereof, in particular those which are relevant for the perception of azimuth and/or elevation, the computational complexity can be reduced.
In a fourth possible implementation form of the audio signal processing apparatus according to the third implementation form of the first aspect, the frequency dependence of each infinite impulse response filter is defined by a plurality of predefined filter parameters and wherein the plurality of predefined filter parameters are selected such that the frequency dependence of each infinite impulse response filter approximates at least a portion, in particular prominent spectral features, such as a spectral maximum or a spectral minimum, of the frequency dependence of a left ear transfer function or a right ear transfer function of the plurality of pairs of measured left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position.
Defining each infinite impulse response filter by a finite set of filter parameters allows saving memory space, as only the filter parameters have to be saved in order to reconstruct the main spectral features of the measured transfer functions.
In a fifth possible implementation form of the audio signal processing apparatus according to the fourth implementation form of the first aspect, the plurality of infinite-impulse-response filters comprises a plurality of biquad filters, i.e. biquadratic filters. The plurality of biquad filters can be implemented as parallel filters or cascaded filters. The use of cascaded filters is preferred as it approximates the spectral features of the transfer functions better. The order of the plurality of biquad filters can be different.
In a sixth possible implementation form of the audio signal processing apparatus according to the fifth implementation form of the first aspect, the plurality of biquad filters comprises at least one shelving filter, wherein the at least one shelving filter is defined by a cut-off frequency parameter f0 and a gain parameter g0, and/or at least one peaking filter, wherein the at least one peaking filter is defined by a cut-off frequency parameter f0, a gain parameter g0 and a bandwidth parameter Δ0.
The frequency dependence of shelving and/or peaking filters provides good approximations to the frequency dependence of the measured transfer functions on the basis of 2 or 3 filter parameters.
In a seventh possible implementation form of the audio signal processing apparatus according to the sixth implementation form of the first aspect, for at least one infinite impulse response filter of the plurality of infinite response filters the plurality of predefined filter parameters are selected by determining a frequency and an azimuth angle and/or an elevation angle, at which a left ear transfer function or a right ear transfer function of the plurality of pairs of measured left ear and right ear transfer functions has a minimal or maximal magnitude, and by approximating the frequency dependence of the left ear transfer function or the right ear transfer function of the plurality of pairs of measured left ear and right ear transfer functions by the frequency dependence of the at least one infinite impulse response filter.
Thus, the predefined filter parameters can be determined in a computationally efficient way.
In an eighth possible implementation form of the audio signal processing apparatus according to the sixth or seventh implementation form of the first aspect, the filter parameters, namely the cut-off frequency parameter f0, the gain parameter g0 and the bandwidth parameter Δ0 are determined on the basis of the following equations:
f0=max(mf,min(Mf,af(ϕ−ϕp)2+fp)),
g0=max(mg,min(Mg,ag(ϕ−ϕp)2+gp)),
Δ0=max(mΔ,min(MΔ,aΔ(ϕ−ϕp)2+Δp)).
wherein Mf,g,Δ and mf,g,Δ denote maximal and minimal values of f, g, Δ, respectively, and wherein af,g,Δ denote coefficients controlling the speed of changing the corresponding filter design parameters.
In a ninth possible implementation form of the audio signal processing apparatus according to the first aspect as such or any one of the first to eighth implementation form thereof, the adjustment filter is configured to filter the input audio signal on the basis of the determined pair of left ear and right ear transfer functions and the adjustment function by convolving the adjustment function with the left ear transfer function and by convolving the result with the input audio signal in order to obtain the left ear output audio signal and/or by convolving the adjustment function with the right ear transfer function and by convolving the result with the input audio signal in order to obtain the right ear output audio signal.
In a tenth possible implementation form of the audio signal processing apparatus according to the first aspect as such or any one of the first to eighth implementation form thereof, the adjustment filter is configured to filter the input audio signal on the basis of the determined pair of left ear and right ear transfer functions and the adjustment function by convolving the left ear transfer function with the input audio signal and by convolving the result with the adjustment function in order to obtain the left ear output audio signal and/or by convolving the right ear transfer function with the input audio signal and by convolving the result with the adjustment function in order to obtain the right ear output audio signal.
In an eleventh possible implementation form of the audio signal processing apparatus according to the first aspect as such or any one of the first to tenth implementation form thereof, the audio signal processing apparatus further comprises a pair of transducers, in particular headphones or loudspeakers using crosstalk cancellation configured to output the left ear output audio signal and the right ear output audio signal.
In a twelfth possible implementation form of the audio signal processing apparatus according to the first aspect as such or any one of the first to eleventh implementation form thereof, the pairs of predefined left ear and right ear transfer functions are predefined for a plurality of reference positions relative to the listener, which lie in the horizontal plane relative to the listener. That is, the set of pairs of predefined left ear and right ear transfer functions can consist of pairs of predefined left ear and right ear transfer functions for a plurality of different azimuth angles and a fixed zero elevation angle.
In a thirteenth possible implementation form of the audio signal processing apparatus according to the first aspect as such or any one of the first to twelfth implementation form thereof, the determiner is configured to determine the pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position by selecting a pair of left ear and right ear transfer functions from the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position and/or by interpolating a pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position.
According to a second aspect, the disclosure relates to an audio signal processing method for processing an input audio signal to be transmitted to a listener in such a way that the listener perceives the input audio signal to come from a virtual target position defined by an azimuth angle and an elevation angle relative to the listener, the audio signal processing method comprising determining a pair of left ear and right ear transfer functions on the basis of a set of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position, wherein the pairs of predefined left ear and right ear transfer functions are predefined for a plurality of reference positions relative to the listener, wherein the plurality of reference positions lie in a two-dimensional plane, and filtering the input audio signal, e.g. by an adjustment filter, on the basis of the determined pair of left ear and right ear transfer functions and an adjustment function configured to adjust a delay between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions and a frequency dependence of the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position in order to obtain a left ear output audio signal and a right ear output audio signal.
In a first possible implementation form of the audio signal processing method according to the second aspect as such, the adjustment function is configured to adjust the delay between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position by compensating for sound travel time differences associated with the distances between the virtual target position and a left ear of the listener and between the virtual target position and a right ear of the listener.
In a second possible implementation form of the audio signal processing method according to the second aspect as such or the first implementation form thereof, the adjustment function is configured to adjust the delay between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position on the basis of the following equations:
wherein τL denotes a delay applied to the left ear transfer function, wherein τR denotes a delay applied to the right ear transfer function and wherein τ and Θ are defined on the basis of the following equations:
wherein τ denotes a delay in seconds, c denotes the velocity of sound, a denotes a parameter associated with the head of a listener, θ denotes the azimuth angle of the virtual target position and ϕ denotes the elevation angle of the virtual target position.
In a third possible implementation form of the audio signal processing method according to the second aspect as such or the first or second implementation form thereof, the adjustment function is configured to adjust the frequency dependence of the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position on the basis of a plurality of infinite impulse response filters, wherein the plurality of infinite impulse response filters are configured to approximate at least a portion of the frequency dependence of a left ear transfer function and a right ear transfer function of a plurality of pairs of measured left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position.
In a fourth possible implementation form of the audio signal processing method according to the third implementation form of the second aspect, the frequency dependence of each infinite impulse response filter is defined by a plurality of predefined filter parameters, wherein the plurality of predefined filter parameters are selected such that the frequency dependence of each infinite impulse response filter approximates at least a portion, in particular prominent spectral features, such as a spectral maximum or a spectral minimum, of the frequency dependence of a left ear transfer function or a right ear transfer function of the plurality of pairs of measured left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position.
In a fifth possible implementation form of the audio signal processing method according to the fourth implementation form of the second aspect, the plurality of infinite-impulse-response filters comprises a plurality of biquad filters, i.e. biquadratic filters. The plurality of biquad filters can be implemented as parallel filters or cascaded filters. The use of cascaded filters is preferred as it approximates the spectral features of the transfer functions better. The order of the plurality of biquad filters can be different.
In a sixth possible implementation form of the audio signal processing method according to the fifth implementation form of the second aspect, the plurality of biquad filters comprises at least one shelving filter, wherein the at least one shelving filter is defined by a cut-off frequency parameter f0 and a gain parameter g0, and/or at least one peaking filter, wherein the at least one peaking filter is defined by a cut-off frequency parameter f0, a gain parameter g0 and a bandwidth parameter Δ0.
In a seventh possible implementation form of the audio signal processing method according to the sixth implementation form of the second aspect, for at least one infinite impulse response filter of the plurality of infinite response filters the plurality of predefined filter parameters are selected by determining a frequency and an azimuth angle and/or an elevation angle, at which a left ear transfer function or a right ear transfer function of the plurality of pairs of measured left ear and right ear transfer functions has a minimal or maximal magnitude, and by approximating the frequency dependence of the left ear transfer function or the right ear transfer function of the plurality of pairs of measured left ear and right ear transfer functions by the frequency dependence of the at least one infinite impulse response filter.
In an eighth possible implementation form of the audio signal processing method according to the sixth or seventh implementation form of the second aspect, the filter parameters, namely the cut-off frequency parameter f0, the gain parameter g0 and the bandwidth parameter Δ0 are determined on the basis of the following equations:
f0=max(mf,min(Mf,af(ϕ−ϕp)2+fp)),
g0=max(mg,min(Mg,ag(ϕ−ϕp)2+gp)),
Δ0=max(mΔ,min(MΔ,aΔ(ϕ−ϕp)2+Δp)).
wherein Mf,g,Δ and mf,g,Δ denote maximal and minimal values of f,g,Δ, respectively, and wherein af,g,Δ denote coefficients controlling the speed of changing the corresponding filter design parameters.
In a ninth possible implementation form of the audio signal processing method according to the second aspect as such or any one of the first to eighth implementation form thereof, the step of filtering the input audio signal on the basis of the determined pair of left ear and right ear transfer functions and the adjustment function comprises the steps of convolving the adjustment function with the left ear transfer function and convolving the result with the input audio signal in order to obtain the left ear output audio signal and/or the steps of convolving the adjustment function with the right ear transfer function and convolving the result with the input audio signal in order to obtain the right ear output audio signal.
In a tenth possible implementation form of the audio signal processing method according to the second aspect as such or any one of the first to eighth implementation form thereof, the step of filtering the input audio signal on the basis of the determined pair of left ear and right ear transfer functions and the adjustment function comprises the steps of convolving the left ear transfer function with the input audio signal and convolving the result with the adjustment function in order to obtain the left ear output audio signal and/or the steps of convolving the right ear transfer function with the input audio signal and convolving the result with the adjustment function in order to obtain the right ear output audio signal.
In an eleventh possible implementation form of the audio signal processing method according to the second aspect as such or any one of the first to tenth implementation form thereof, the audio signal processing method further comprises the step of outputting the left ear output audio signal and the right ear output audio signal by means of a pair of transducers, in particular headphones or loudspeakers using crosstalk cancellation.
In a twelfth possible implementation form of the audio signal processing method according to the second aspect as such or any one of the first to eleventh implementation form thereof, the pairs of predefined left ear and right ear transfer functions are predefined for a plurality of reference positions relative to the listener, which lie in the horizontal plane relative to the listener.
In a thirteenth possible implementation form of the audio signal processing method according to the second aspect as such or any one of the first to twelfth implementation form thereof, the step of determining the pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position comprises the step of selecting a pair of left ear and right ear transfer functions from the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position or the step of interpolating a pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position.
The audio signal processing method according to the second aspect of the disclosure can be performed by the audio signal processing apparatus according to the first aspect of the disclosure.
According to a third aspect the disclosure relates to a computer program comprising program code for performing the audio signal processing method according to the second aspect of the disclosure or any of its implementation forms when executed on a computer.
According to a fourth aspect, the disclosure relates to an audio signal processing apparatus for processing an input audio signal, comprising a memory configured to store a set of pairs of predefined left ear and right ear transfer functions, wherein each pair of the set of pairs of the predefined left ear and right ear transfer functions is predefined for each reference position of a plurality of reference positions relative to a listener, wherein each of the reference positions lies in a two-dimensional plane; a processor coupled to the memory and configured to determine a pair of left ear and right ear transfer functions of the set of pairs of the predefined left ear and right ear transfer functions according to an azimuth angle and an elevation angle of a virtual target position relative to the listener; and an adjustment filter coupled to the memory and the processor and configured to filter the input audio signal on a basis of the determined pair of the left ear and right ear transfer functions and an adjustment function, wherein the adjustment function is configured to adjust a delay between a determined left ear transfer function and a determined right ear transfer function of the determined pair of the left ear and right ear transfer functions; and adjust a frequency dependence of the determined left ear transfer function and the determined right ear transfer function as a function of the azimuth angle or the elevation angle on the basis of a plurality of infinite impulse response filters in order to obtain a left ear output audio signal and a right ear output audio signal, wherein a frequency dependence of each infinite impulse response filter of the plurality of infinite impulse response filters is defined by a plurality of predefined filter parameters, wherein for an infinite impulse response filter, the predefined filter parameters are selected by determining a frequency and the azimuth angle or the elevation angle at which a measured left ear transfer function or a measured right ear transfer function of pairs of measured left ear and right ear transfer functions has a minimal or a maximal magnitude; and a transmitter coupled to the memory and the processor and configured to transmit the left ear output audio signal and the right ear output audio signal to the listener to enable the listener to perceive the input audio signal as arriving from the virtual target position.
The disclosure can be implemented in hardware and/or software.
Further embodiments of the disclosure will be described with respect to the following figures.
In the various figures, identical reference signs will be used for identical or at least functionally equivalent features.
In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, specific aspects in which the present disclosure may be placed. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the present disclosure is defined be the appended claims.
For instance, it is understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless noted otherwise.
The audio signal processing apparatus 100 comprises a memory 103 configured to store a set of pairs of predefined left ear and right ear transfer functions, which are predefined for a plurality of reference positions/directions, wherein the plurality of reference positions define a two-dimensional plane.
Moreover, the audio signal processing apparatus 100 comprises a determiner 105 configured to determine a pair of left ear and right ear transfer functions on the basis of the set of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position. The determiner 105 is configured to determine the pair of left ear and right ear transfer functions for a position/direction associated with the virtual target position which lies in the two-dimensional plane defined by the plurality of reference positions. The determiner 105 is configured to determine the pair of left ear and right ear transfer functions by determining the pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the projection of the virtual target position/direction onto the two-dimensional plane defined by the plurality of reference positions.
In an embodiment, the determiner 105 can be configured to determine the pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position by selecting a pair of left ear and right ear transfer functions from the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position.
In an embodiment, the determiner 105 can be configured to determine the pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position by interpolating, for instance, by means of nearest neighbor interpolation, linear interpolation or the like, a pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position. In an embodiment, the determiner 105 is configured to use a linear interpolation scheme, a nearest neighbor interpolation scheme or a similar interpolation scheme to determine a pair of left ear and right ear transfer functions on the basis of the set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position.
Moreover, the audio signal processing apparatus 100 comprises an adjustment filter 107 for extending the pair of left ear and right ear transfer functions, which has been determined by the determiner 105 for the projection of the virtual target position/direction onto the two-dimensional plane defined by the plurality of reference positions, to the “third dimension”, i.e. to positions/directions above or below the two-dimensional plane defined by the plurality of reference positions. To this end, the adjustment filter 107 is configured to filter the input audio signal 101 on the basis of the determined pair of left ear and right ear transfer functions and a predefined adjustment function M(r, θ, ϕ) 109 configured to adjust a delay between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions and a frequency dependence of the left ear transfer function and the right ear transfer function of the determined pair of left car and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position in order to obtain a left ear output audio signal 111a and a right ear output audio signal 111b.
In an exemplary embodiment, the set of pairs of predefined left ear and right ear transfer functions comprises four pairs of predefined left ear and right ear transfer functions in the horizontal plane. i.e. for an elevation angle ϕ=0°. The four pairs of predefined left ear and right ear transfer functions can be defined for the azimuth angles θ=0°, 90°, 180°, 270°, respectively. In case an exemplary virtual target position is associated with an azimuth angle θ=20° and an elevation angle ϕ=20°, the determiner 105 can determine the pair of left ear and right ear transfer functions for the azimuth angle θ=20° and the elevation angle ϕ=0° by means of a linear interpolation using the pairs of predefined left ear and right ear transfer functions at θ=0°, 90°. In an alternative embodiment, the determiner 105 can determine the pair of left ear and right ear transfer functions for the azimuth angle θ=20° and the elevation angle ϕ=0° by selecting the pair of predefined left ear and right ear transfer functions at θ=0° (which corresponds to a nearest neighbour interpolation). The extension of the determined pair of predefined left ear and right ear transfer functions at the azimuth angle θ=20° and the elevation angle ϕ=0° to the elevation angle ϕ=20° is performed by the adjustment filter 107.
The set of predefined left ear and right ear transfer functions can be, for example, a limited set of HRTFs. The set of pairs of predefined left ear and right ear transfer functions can be either personalized (measured for a specific user) or obtained from a generalized database (modelled).
As already mentioned above, in an embodiment, the set of pairs of predefined left ear and right car head related transfer functions can be defined for a plurality of azimuth angles and a fixed elevation angle. For instance, for a fixed elevation angle ϕ=0° the set of pairs of predefined left ear and right ear head related transfer functions can be defined as left ear HRTFs hL(r, θ, 0) and right ear HRTFs hR(r, θ, 0) parameterized by the azimuth angle θ.
As already mentioned above, in an embodiment, the set of pairs of predefined left ear and right ear head related transfer functions can be defined for a fixed azimuth angle and a plurality of elevation angles. For instance, for a fixed azimuth angle θ=0° the set of pairs of predefined left ear and right ear head related transfer functions can be defined as left ear HRTFs hL(r, θ, θ) and right ear HRTFs hR(r, 0, ϕ) parameterized by the elevation angle ϕ.
The adjustment function M(r, θ, ϕ) 109 shown in
In an embodiment, the adjustment filter 107 is configured to adjust the delay 109a between the left car transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position on the basis of the adjustment function M(r, θ, ϕ) 109 by compensating for sound travel time differences associated with the distances between the virtual target position and a left ear of the listener and between the virtual target position and a right ear of the listener.
In an embodiment, the adjustment function 109 is configured to determine an additional time delay due to the elevation angle ϕ for the set of predefined transfer functions hL(r, θ, 0) and hR(r, θ, 0) on the basis of a new angle of incidence Θ derived in the constant elevation plane.
In an embodiment, the adjustment filter 107 is configured to adjust by means of the adjustment function 109 the delay 109a between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position on the basis of the following equations:
wherein τL denotes a delay applied to the left ear transfer function, wherein τR denotes a delay applied to the right ear transfer function and wherein τ and Θ are defined on the basis of the following equations:
wherein τ denotes a delay in seconds, c denotes the velocity of sound (i.e. c=340 meters per second (m/sec)), a denotes a parameter associated with the head of a listener (e.g. a=0.087 meters (m)), θ denotes the azimuth angle of the virtual target position and ϕ denotes the elevation angle of the virtual target position. The above equations for determining the new angle of incidence Θ are based on a projection of the azimuth angle θ of the virtual target position in the horizontal plane into the constant elevation plane.
The frequency adjustment block 109b of the adjustment function M(r, θ, ϕ) 109 shown in
In an embodiment, the frequency adjustment block 109b of the adjustment function M(r, θ, ϕ) 109 shown in
In an embodiment, the transfer functions derived in the manner described above are replaced by equalizing, i.e. adjusting the frequency dependence, of a set of predefined left ear and right ear transfer functions, which preferably takes into account only the main spectral features relevant to the perception of elevation or azimuth angles. By doing so, the required data to generate elevated transfer functions is significantly reduced. The elevation or azimuth angles can be then rendered as a spectral effect. i.e. applying an equalization or adjustment function, and can be used on any transfer functions.
In an embodiment, the adjustment filter 107 of the audio signal processing apparatus 100 is configured to adjust the frequency dependence of the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle θ and/or the elevation angle ϕ of the virtual target position on the basis of a plurality of infinite impulse response filters, wherein the plurality of infinite impulse response filters are configured to approximate spectrally prominent features, such as a maximum or a minimum, of the frequency dependence of a left ear transfer function and a right car transfer function of a plurality of pairs of measured left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position.
In an embodiment, the frequency dependence of each infinite impulse response filter is defined by a plurality of predefined filter parameters, wherein the plurality of predefined filter parameters are selected such that the frequency dependence of each infinite impulse response filter approximates at least a portion of the frequency dependence of a left ear transfer function or a right ear transfer function of the plurality of pairs of measured left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position.
In an embodiment, the plurality of infinite-impulse-response filters comprises a plurality of biquad filters. The plurality of biquad filters can be implemented as parallel filters or cascaded filters. The use of cascaded filters is preferred as it approximates the spectral features of the transfer functions better.
In an embodiment, the filter parameters can be obtained using numerical optimization methods.
However, in an embodiment, which is more memory efficient, an ad-hoc method can be used to derive the filter parameters on the basis of the spectral information provided, for instance, in
In an embodiment, the filter parameters, namely the cut-off frequency parameter f0, the gain parameter g0 and the bandwidth parameter Δ0 (defined for the peaking filters 403a-c) are determined on the basis of the following equations:
f0=max(mf,min(Mf,af(ϕ−ϕp)2+fp)),
g0=max(mg,min(Mg,ag(ϕ−ϕp)2+gp)),
Δ0=max(mΔ,min(MΔ,aΔ(ϕ−ϕp)2+Δp)).
wherein Mf,g,Δ and mf,g,Δ denote maximal and minimal values of f,g,Δ, respectively, and wherein af,g,Δ denote coefficients controlling the speed of changing the corresponding filter design parameters.
In an embodiment, the parameters Mf,g,Δ, mf,g,Δ and af,g,Δ are set manually for the three filter design parameters f0, g0 and Δ0 to model the selected spectral feature as closely as possible.
Subsequently, the parameters M, m and a can be refined for all spectral features in such a way that the magnitude response of the infinite impulse response filters match the transfer functions obtained by the spectral analysis.
In the above described embodiment for determining the filter parameters only thirteen parameters (ϕp, fp, gp, Δp, Mf,g,Δ, rf,g,Δ, af,g,Δ) have to be stored for each infinite impulse response filter, wherein the first four parameters (ϕp, fp, gp, Δp) can be directly taken from the spectral analysis and the other parameters can be set manually.
Thus, given the equations described above the parameters of the filters 401a,b and 403a-c can be directly derived as a function of the desired elevation angle ϕ. Given a predefined set of transfer functions measured only in the median plane, i.e. containing information only for certain radial distances r and certain elevation angles ϕ. i.e. hL(r, 0, ϕ) and hR(r, 0, ϕ), these transfer functions can be extended to any desired azimuth angle θ, i.e. to the third dimension, in a similar way as described above.
In the example shown in
The audio signal processing method 1000 comprises the following steps of 1001 and 1003. The step 1001 includes determining a pair of left ear and right ear transfer functions on the basis of a set of pairs of predefined left ear and right ear transfer functions for the azimuth angle and the elevation angle of the virtual target position, wherein the pairs of predefined left eat and right ear transfer functions are predefined for a plurality of reference positions relative to the listener, wherein the plurality of reference positions lie in a two-dimensional plane, and the step 1003 includes filtering the input audio signal on the basis of the determined pair of left ear and right ear transfer functions and an adjustment function configured to adjust a delay between the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions and a frequency dependence of the left ear transfer function and the right ear transfer function of the determined pair of left ear and right ear transfer functions as a function of the azimuth angle and/or the elevation angle of the virtual target position in order to obtain a left ear output audio signal and a right ear output audio signal.
Embodiments of the disclosure realize different advantages. The audio signal processing apparatus 100 and the audio signal processing method 1000 provide means to synthesize binaural sound, i.e. audio signals perceived by a listener as coming from a virtual target position. The audio signal processing apparatus 100 functions based on a “two-dimensional” predefined set of transfer functions, which can be either obtained from a generalized database or measured for a specific user. The audio signal processing apparatus 100 can also provide means for reinforcing front-back or elevation effect in synthesized sound. Embodiments of the disclosure can be applied in different scenarios, for example, in media playback, which is virtual surround rendering of more than 5.1 (e.g., 10.2, or even 22.2) by storing only 5.1 transfer functions and parameters to obtain all three-dimensional azimuth and elevation angles based on the basic two-dimensional set. Embodiments of the disclosure can also be applied in virtual reality in order obtain full sphere transfer functions with high resolution based on transfer functions with low resolution. Embodiments of the disclosure provide an effective realization of binaural sound synthesis with regard to the memory required and the complexity of the signal processing algorithms.
While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms “include”, “have”, “with”, or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprise”. Also, the terms “exemplary”, “for example” and “e.g.” are merely meant as an example, rather than the best or optimal. The terms “coupled” and “connected”, along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.
Although specific aspects have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.
Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the disclosure beyond those described herein. While the present disclosure has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the present disclosure. It is therefore to be understood that within the scope of the appended claims and their equivalents, the disclosure may be practiced otherwise than as described herein.
This application is a continuation of International Patent Application No. PCT/EP2015/078805 filed on Dec. 7, 2015, which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5440639 | Suzuki et al. | Aug 1995 | A |
7167567 | Sibbald | Jan 2007 | B1 |
20010040968 | Mukojima | Nov 2001 | A1 |
20050117762 | Sakurai | Jun 2005 | A1 |
20090116657 | Edwards | May 2009 | A1 |
20120213375 | Mahabub | Aug 2012 | A1 |
20130336494 | Bathgate et al. | Dec 2013 | A1 |
20140198918 | Li | Jul 2014 | A1 |
20150124975 | Pontoppidan | May 2015 | A1 |
20160323678 | Pontoppidan | Nov 2016 | A1 |
20170126194 | Jot | May 2017 | A1 |
Number | Date | Country |
---|---|---|
104618843 | May 2015 | CN |
2014506416 | Mar 2014 | JP |
9931938 | Jun 1999 | WO |
Entry |
---|
Algazi, V., et al, “The Use of Head-and-Torso Models for Improved Spatial Sound Synthesis,” Audio Engineering Society Convention Paper, Oct. 5-8, 2002, 18 pages. |
Duda, R., et al., “Modeling Head Related Transfer Functions,” 1993 Conference Record of the Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, Oct. 31-Nov. 3, 1993, 5 pages. |
Duda, R., et al, “Range dependence of the response of a spherical head model,” J. Acoust. Soc. Am., vol. 104, No. 5, Nov. 1998, pp. 3048-3058. |
Gamper, H., “Head-related transfer function interpolation in azimuth, elevation, and distance,” JASA Express Letters, Nov. 12, 2013, J. Acoust. Soc. Am., vol. 134, No. 6, Dec. 2013, pp. 547-553. |
Gardner, B., et al, “HRTF Measurements of a KEMAR Dummy-head Microphone,” MIT media lab perceptual computing-technical report, May 1994, 7 pages. |
Carlile, S., “Virtual Auditory Space: Generation and Applications,” Neuroscience Intelligence Unit, 1996, 13 pages. |
Foreign Communication From a Counterpart Application, PCT Application No. PCT/EP2015/078805, International Search Report dated Jul. 8, 2016, 5 pages. |
Foreign Communication From a Counterpart Application, PCT Application No. PCT/EP2015/078805, Written Opinion dated Jul. 8, 2016, 7 pages. |
Foreign Communication From a Counterpart Application, Chinese Application No. 201580084740.0, Chinese Search Report dated Apr. 25, 2019, 2 pages. |
Foreign Communication From a Counterpart Application, Chinese Application No. 201580084740.0, Chinese Office Action dated May 8, 2019, 6 pages. |
Foreign Communication From a Counterpart Application, Korean Application No. 10-2018-7018740, Korean Office Action dated Jun. 20, 2019, 6 pages. |
Foreign Communication From a Counterpart Application, Korean Application No. 10-2018-7018740, English Translation of Korean Office Action dated Jun. 20, 2019, 5 pages. |
Number | Date | Country | |
---|---|---|---|
20180324541 A1 | Nov 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2015/078805 | Dec 2015 | US |
Child | 16001411 | US |