This application claims priority to French Patent Application FR 1760647, filed Nov. 13, 2017, the disclosure of which is incorporated herein by reference in its entirety.
Embodiments described herein relate to the modelling of individual acoustic transfer functions, such as acoustic transfer functions that are relative to the audition of an individual in three-dimensional space.
Embodiments described herein are relevant in the context of services, in particular services enabling navigation by spatialized sound, telecommunication services delivering spatialized sound (for example a conference call between a number of individuals, playback of a video such as a cinema trailer, a game, etc.), etc. In telecommunication terminals, in particular mobile terminals, a recreation of sound with stereophonic headphones is envisaged.
Among the audio-spatialization or 3-D sound technologies that employ processing of the audio signal that is in particular applied to the simulation of psycho-acoustic and acoustic effects, certain aim to generate signals to be played over loudspeakers, in particular over loudspeakers that are distant from the listener, or over earpieces, in order to give the listener the auditory illusion of sound sources placed at particular respective positions around him. The creation of virtual sound images and sources is then spoken of. Various techniques are applied to the processing of a 3-D sound intended to be played over headphones comprising two earpieces, such as left and right earphones. These techniques aim to reconstruct, in the ears of a listener, the sound field such that his eardrums perceive a sound field that is practically identical to the field that real sources in 3-D space would have induced. These spatialized sound signals may be obtained in two ways:
Binaural synthesis is an effective technique for positioning sound sources in space.
Binaural synthesis is based on the use of what are called “binaural” filters, which reproduce the functions of acoustic transfer between the sound source and the ears of the listener. These filters serve to simulate auditory localization cues, which cues allow a listener to localize sound sources in a real-life listening situation. These filters take into account all the acoustic effects (in particular diffraction by the head and reflections from the outer ear and the top of the torso) that modify the acoustic wave on its path between the source and the ears of the listener. These effects vary greatly with the position of the sound source (mainly with its direction) and these variations allow the listener to localize the source in space. Specifically, these variations define a sort of acoustic code that gives the position of the source. The auditory system of an individual learns to interpret this code in order to localize sound sources. Binaural filters that optionally reproduce the acoustic code that the body of the listener naturally produces, by taking into account the individual particularities of his morphology, are therefore required to achieve high-quality binaural synthesis. This personalization is required to provide a satisfactory and convincing sound quality (quality of the spatialization and of the sound immersion in particular). When these conditions are not met, a decrease in the performance of the binaural rendering is observed: this decrease in performance in particular results in intracranial perception of sources and in front/behind confusions (sources located in front are perceived to be behind and vice versa).
These binaural filters represent acoustic transfer functions, also called HRTFs (acronym of head-related transfer functions), that model the transformations, caused by the torso, the head and the outer ear of the listener, in the signal originating from a sound source. With each sound-source position is associated a pair of individual acoustic transfer functions (an individual acoustic transfer function for the right ear and an individual acoustic transfer function for the left ear). In addition, the individual acoustic transfer functions bear the acoustic imprint of the morphology of the individual on whom they were measured. The individual acoustic transfer functions therefore not only depend on the direction of the sound, but also on the individual. They are thus dependent on the frequency f, on the position (θ□φ) of the sound source (where the angle θ represents the azimuth and the angle φ□elevation) and on the (left or right) ear and on the individual.
Conventionally, individual acoustic transfer functions are obtained by measurement. Initially, a selection of directions, covering more or less finely the whole space surrounding the listener, is decided upon. For each direction, the left and right individual acoustic transfer functions are measured by means of microphones inserted into the entrance of the ear canal of a subject. The measurement must be carried out in an anechoic chamber. In the end, if measurements are taken for M directions, for a given subject, a database of 2M acoustic transfer functions representing each position in space for each ear is obtained. The experimental measurement of individual acoustic transfer functions directly on an individual is, at the present time, the most reliable way of obtaining high-quality binaural filters that are actually personalized (take into account individual particularities and the morphology of the individual).
However, the measurement of these individual acoustic transfer functions presents a few difficulties. It requires specific and expensive equipment (typically an anechoic chamber, a microphone, and a mechanical device for positioning sources). This operation is time-consuming because it is in particular necessary to measure transfer functions for many directions in order to uniformly cover the whole of a 3-D sphere surrounding the listener. Therefore, the measurement procedure is hard work for the subject, in particular because of the constraints imposed on the subject by the measuring system and the duration of the test. This measurement of individual acoustic transfer functions becomes very difficult, or even impossible, in the context of applications of binaural synthesis intended for the general public.
Solutions requiring a minimum of measurements of individual acoustic transfer functions and making greater use of modelling techniques have thus been researched. In particular, mathematical models of individual acoustic transfer functions consisting of a function F allowing an individual acoustic transfer function (Y) to be expressed on the basis of a set of given a priori parameters (X), such that Y=F(X), have been researched. Often, there are two essential elements at play: the development of the mathematical model (function F), and the specification of the set of parameters to be applied as input of the model. The set of parameters consists, for example, in a 3-D mesh of the individual morphology, in particular of the outer ears. The acquisition of a precise mesh remains, at the present time, a critical point.
More simply, databases of acoustic transfer functions have been constructed. These functions are measured on a sample group of individuals and allow a pair of binaural filters to be selected from the database using various techniques, such as a comparison between the morphology of the listener and the morphologies of the sample group of individuals that served to generate the database, or testing of various pairs of binaural filters of the database by the listener. The method of selection of a pair of binaural filters from a database lacks reliability and robustness and may prove to be quite tedious for the user to use.
Embodiments described herein propose an alternative solution that provides improvements with respect to techniques such as those described above.
In one aspect, a method is provided for modelling sets of acoustic transfer functions specific to an individual according to a multiplicity of directions in space, wherein a set of acoustic transfer functions that are specific to an individual in a given direction of the multiplicity of directions is determined depending on the result of a statistical analysis of a plurality of distinct stimuli emitted in the direction of the individual, a stimulus being dependent on at least one set of predetermined acoustic transfer functions that are associated with the given direction, and on responses received from the individual to each emitted stimulus.
Thus, such embodiments are more reliable and more robust than a simple selection of a set of acoustic transfer functions from a database and mitigates the drawback of the critical acquisition of the 3-D mesh of the individual morphology used by conventional numerical modelling.
In some embodiments, the modelling method includes a statistical analysis by direction in space of the emitted stimuli and of the received responses for the given direction of the multiplicity of directions in space.
Thus, the statistical analysis being implemented by the modelling method, the modelling is more rapid and therefore less tedious for the individual.
In some embodiments, the modelling method includes steps that are carried out for the given direction of the multiplicity of directions in space, in which steps:
Thus, the emission of the stimuli and the reception of the responses thereto being implemented by the modelling method, the time lags between the generation of the stimuli and their emission, and the reception of responses and the statistical analysis, respectively, are decreased.
In some embodiments, for the given direction, a plurality of stimuli are generated depending on at least one set of predetermined acoustic transfer functions that are associated with the given direction.
Thus, the generation of the stimuli being implemented by the modelling method, the time lag between the generation of the stimuli and their emission is decreased.
In some embodiments, a stimulus results from the addition of noise to a set of average acoustic transfer functions that are associated with the given direction, said average acoustic transfer functions being calculated depending on sets of acoustic transfer functions, which acoustic transfer functions are recorded in a database of acoustic transfer functions and associated with the given direction.
Thus, the generation of stimuli being based on a set of acoustic transfer functions, it allows the modelling of the acoustic transfer function specific to an individual to be simplified by basing it on the same acoustic transfer function used to generate the stimuli.
In some embodiments, the modelling method includes steps in which:
Thus, the divergence between the set of acoustic transfer functions serving for the modelling and the set of acoustic transfer functions that is specific to the individual is smaller because of the use of average acoustic transfer functions rather than the arbitrary selection of an acoustic transfer function decreasing modelling errors. Therefore, the modelling is less complex and takes less time because it compensates for a smaller divergence.
In some embodiments, the statistical analysis uses the psychophysical technique of reverse correlation.
Thus, the modelling of the set of acoustic transfer functions that is specific to the individual is based on perception, decreasing the risk of intracranial perception and directional confusions.
In some embodiments, the various steps of the method are implemented by a software package or computer program, this software package comprising software instructions intended to be executed by a data processor of a device forming part of a terminal, such as a communication terminal, and being designed to command the execution of the various steps of this method.
In another aspect, a program is provided, the program comprising comprising program-code instructions for executing the steps of the modelling method according to any one of the preceding claims when said program is executed by a processor.
This program may use any programming language and take the form of source code, object code or code intermediate between source code and object code such as code in a partially compiled form or in any other desirable form.
In another aspect, a modeller is provided of sets of acoustic transfer functions specific to an individual according to a multiplicity of directions in space, including a generator of sets of acoustic transfer functions specific to an individual in a given direction of the multiplicity of directions on the basis of the result of a statistical analysis of a plurality of distinct stimuli emitted in the direction of the individual, a stimulus being dependent on at least one set of predetermined acoustic transfer functions that are associated with the given direction, and of responses received from the individual to each emitted stimulus.
In some embodiments, the modeller includes a statistical analyser of the emitted stimuli and of the received responses by given direction of the multiplicity of directions.
In some embodiments, the modeller includes:
In another aspect, a three-dimensional sound card is provided, including:
In another aspect, a system for reproducing three-dimensional sound is provided, including:
In some embodiments, the system includes headphones in which the two loudspeakers of the set of loudspeakers are placed such that each of the two loudspeakers is placed on one of the two ears of the individual when the headphones are placed on his head, and in that the set of acoustic transfer functions is a corresponding pair of transfer functions.
The features and advantages of the embodiments described herein will become more clearly apparent on reading the description, which is given by way of example, and the figures referred to thereby, which show:
By direction in space associated with an acoustic transfer function what is in particular meant is a direction, relative to the user, in which a virtual source is created by means of the modelling.
In particular, the modelling method TFI_MD includes a statistical analysis ST_NLZ by direction di in space of the emitted stimuli (s1,di . . . sN,di) and of the received responses adiU.
In particular, the modelling method TFI_MD includes the following steps, which are carried out for the given direction di of the multiplicity of directions in space:
In particular, for the given direction di, a plurality of stimuli are generated S_GN depending on at least one set of predetermined acoustic transfer functions that are associated with the given direction.
In particular, a stimulus s1,dij . . . sN,dij results from the addition+of noise nj to a set of average acoustic transfer functions avg{tf1,dik}k . . . avg{tfN,dik}k that are associated with the given direction di and that are calculated depending on sets of acoustic transfer functions that are recorded in a database tf_bdd of acoustic transfer functions and that are associated with the given direction.
The addition of noise to generate the stimuli allows the variation space to be explored without a priori hypotheses as to the properties of the spectral profile (of the set of individual acoustic transfer functions) that are responsible for the localization in a given direction (for example the frontal direction).
In particular, the modelling method TFI_MD includes the following steps in which
By set of average acoustic transfer functions what is meant is one average acoustic transfer function per reproduction channel, in particular in the case of binaural synthesis: an average acoustic transfer function for the right ear and an average acoustic transfer function for the left ear of the user U.
In particular, the statistical analysis ST_NLZ uses the psychophysical technique of reverse correlation. It is based on the high-level observation of perceptive processes and employs a testing phase during which the modelling method TFI_MD subjects the individual to a set of stimuli that are obtained by adding noise to a neutral stimulus (for example an average of acoustic transfer functions) and observes the responses of the individual U to these various stimuli. By analysing the statistical relationships between the stimuli and the responses, the modelling method TFI_MD identifies TFI_DT the perceptive filters, in the present case the individual acoustic transfer functions, associated with the studied perceptive process, i.e. the properties of the stimuli that define a given perceptive response.
Thus, the modelling method is based on perception to identify the acoustic transfer functions specific to an individual.
The modelling of frontal sound sources (direction of 0° azimuth and 0° elevation) is particularly critical. The use of generic binaural filters in such modelling engenders a spatialization of sound sources that is often disappointing: the listener tends to locate the source above, or even inside his head.
Using the modelling method TFI_MD, a pair of neutral binaural filters (i.e. a set of what are called neutral acoustic transfer functions) is calculated by averaging AVG a plurality of sets of acoustic transfer functions HRTF, which functions are measured in the frontal direction for a large selection of individuals forming a sample group (said functions optionally being pre-recorded in a database of sets of acoustic transfer functions tf_bdd).
A set of spatialized stimuli synthesized S_GN with binaural filters obtained by adding+noise nj to the pair of neutral filters is played S_TR for the intention of the listener, i.e. of the individual U for whom the modelling method TFI_MD determines the set of personalized acoustic transfer functions. The addition of noise affects the spectral profile.
For each emitted stimulus, the listener U indicates whether he perceives it to be correctly spatialized (i.e. in the direction di that the modelling TFI_MD is attempting to reproduce, in the present case the frontal direction and outside his head) or not. This indication of the listener U forms the response a received A_REC during the modelling TFI_MD.
The analysis ST_NLZ of the statistical relationships between the stimuli and the responses of the author make it possible to determine TFI_DT the spectral profile suited to the listener U and guaranteeing the correct reproduction of sounds in the modeled direction di, in the present case the frontal direction.
This modelling method TFI_MD may be applied to any other direction.
One particular embodiment of the modelling method is a program comprising program-code instructions for executing the steps of the modelling method when said program is executed by a processor.
The modeller 100 of sets of acoustic transfer functions specific to an individual according to a multiplicity of directions in space specific to an individual according to a multiplicity of directions in space, includes a generator 1004 of sets of acoustic transfer functions specific to an individual in a given direction of the multiplicity of directions on the basis of the result of a statistical analysis of a plurality of distinct stimuli emitted in the direction of the individual, a stimulus being dependent on at least one set of predetermined acoustic transfer functions that are associated with the given direction, and of responses received from the individual to each emitted stimulus.
In particular, the modeller 100 includes a statistical analyser 1003 of the emitted stimuli and of the received responses by given direction of the multiplicity of directions.
In particular, the modeller 100 includes:
In one particular embodiment, a three-dimensional sound card 10 includes:
In particular, the modeller 100 includes a stimulus generator 1000 that delivers, for a given direction di, a plurality (j) of sets of stimuli (s1,dij . . . sN,dij). The generator 1000 in particular adds, for each set of stimuli (s1,dij . . . sN,dij), noise nj to a given set of predetermined acoustic transfer functions (tf1,dik′ . . . tfN,dik′). The noise n3 applied to the set of predetermined acoustic transfer functions (tf1,dik′ . . . tfN,dik′) to obtain the set of stimuli (s1,dij . . . sN,dij) is distinct from the noise nj′ applied to the same set of predetermined acoustic transfer functions (tf1,dik′ . . . tfN,dik′) to obtain the set of stimuli (s1,dij′ . . . sN,dij′).
The predetermined set of acoustic transfer functions that is used to generate the stimuli is in particular a set of what are called neutral acoustic transfer functions, namely it does not reflect a specific morphology. Thus, the statistical analysis is not biased by a particular morphological model and the determination of the individual acoustic transfer functions allows a better approximation of the actual acoustic transfer functions of the individual.
In particular, such a what is called neutral set of acoustic transfer functions is obtained by averaging a plurality of sets of acoustic transfer functions, which functions are recorded in a database of acoustic transfer functions. For example, those sets of acoustic transfer functions which are used to calculate this what is called neutral set of acoustic transfer functions are selected randomly from the database of acoustic transfer functions or depending on one or more morphological parameters neighbouring those of the individual, or consist of all the sets of acoustic transfer functions that are recorded in the database of acoustic transfer functions.
Most often a set of acoustic transfer functions is a pair of acoustic transfer functions (for example in the particular case of binaural stimulation) that is composed of the acoustic transfer function corresponding to the right ear and of the acoustic transfer function corresponding to the left ear of an individual.
The emitter 1001 emits, for at least one given direction di, a plurality of sets of stimuli (s1,dij . . . sN,dij) in the direction of the individual U for whom the modeller 100 determines a set of acoustic transfer functions in a given direction di. In particular, the emitter 1001 transmits these sets of stimuli, for example via an output assembly 102 of a 3-D sound card 10 and/or of a terminal 1 including the modeller 100, to a set of loudspeakers (21 . . . 2N) that play the stimuli to the individual U. Each stimuli sn,dij of a set of stimuli is intended for a specific loudspeaker 2n of the set of loudspeakers (21 . . . 2N).
To each set of stimuli (s1,dij . . . sN,dij), the individual U reacts by transmitting a response a in particular by means of an interface 12 of the terminal 1 (by input, by voice command, etc.). The receiver 1002 receives the response ajU to the set j of stimuli of the individual U.
For a given direction di, the analyser 1003 carries out a statistical analysis on the sets of emitted stimuli (s1,dij . . . sN,dij) and the corresponding responses ajU. The generator 1004 then determines the set (tf1,dij . . . tfN,diU) of acoustic transfer functions that is specific to this individual U for the given direction di depending on the result rdiU delivered by the analyser 1003.
The operation is optionally repeated for one or more other distinct directions di′.
Thus, the terminal 1 including a reader 11 of a sound signal as may play a 3-D sound signal in the direction of the individual U. Specifically, the terminal 1 includes a filter 101 the filtering parameters of which are formed, for a least one direction di, by the transfer-function set delivered by the modeller 100. The filter 101 then converts the monophonic sound signal as with a set of sound signals that are played to the individual U by means of the set of loudspeakers.
The system for reproducing three-dimensional sound includes:
In particular, the reproducing system includes headphones 20 in which the two loudspeakers 21 and 22 of the set of loudspeakers are placed such that each of the two loudspeakers is placed on one of the two ears of the individual U when the headphones 20 are placed on his head, the set of acoustic transfer functions being a corresponding pair of transfer functions.
Thus, the modelling does not require specific equipment. It may be implemented with a simple set of headphones.
The embodiments described herein also relate to a medium. The data medium may be any entity or device capable of storing the program. For example, the medium may include a storing means, such as a ROM, for example a CD-ROM or a microelectronic circuit ROM or even a magnetic recording means, for example a floppy disk or a hard disk.
Furthermore, the data medium may be a transmissible medium such as an optical or electrical signal that may be transmitted via an optical or electrical cable, by radio or by other means. The program may in particular be downloaded from a network, the Internet in particular.
Alternatively, the data medium may be an integrated circuit in which the program is incorporated, the circuit being suitable for executing or for being used in the execution of the method in question.
In another implementation, the embodiments described herein are implemented by means of software and/or hardware components. In this light, the term module may correspond either to a software component or to a hardware component. A software component corresponds to one or more computer programs, one or more sub-programs of a program, or more generally to any element of a program or of a software package able to implement a function or a set of functions according to the above description. A hardware component corresponds to any element of a hardware assembly able to implement a function or a set of functions.
In the foregoing description, specific details are given to provide a thorough understanding of the examples. However, it will be understood by one of ordinary skill in the art that the examples may be practiced without these specific details. Certain features that are described separately herein can be combined in a single embodiment, and the features described with reference to a given embodiment also can be implemented in multiple embodiments separately or in any suitable subcombination.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Number | Date | Country | Kind |
---|---|---|---|
1760647 | Nov 2017 | FR | national |
Number | Name | Date | Kind |
---|---|---|---|
5742689 | Tucker | Apr 1998 | A |
6181800 | Lambrecht | Jan 2001 | B1 |
20060045294 | Smyth | Mar 2006 | A1 |
20080130906 | Goldstein et al. | Jun 2008 | A1 |
20170295445 | Christoph | Oct 2017 | A1 |
20180206058 | Murata | Jul 2018 | A1 |
20180310115 | Romigh | Oct 2018 | A1 |
Number | Date | Country |
---|---|---|
103237287 | Mar 2015 | CN |
Entry |
---|
French Search Report dated Sep. 25, 2018 for French Application No. 1760647. |
Hofman, et al., “Bayesian reconstruction of sound localization cues from responses to random spectra”, Biological Cybernetics, 2002, vol. 86, No. 4, pp. 305-316. |
Nicol, et al., “How to make immersive audio available for mass-market listening”, EBU Technical Review, 2016, pp. 1-18. |
Number | Date | Country | |
---|---|---|---|
20190149939 A1 | May 2019 | US |