The present invention relates to the modeling of individual transfer functions called HRTFs (Head-Related Transfer Functions), relating to the hearing of an individual in three-dimensional space.
The invention is particularly applicable to the context of telecommunication services offering a broadcasting of the spatialized sound (for example, an audioconference between a number of listeners, a cinema trailer broadcast). On telecommunication terminals, mobiles in particular, sound rendition with a stereophonic headset is envisaged. The most effective technique for positioning sound sources in the space is then binaural synthesis.
Binaural synthesis relies on the use of filters, called “binaural filters”, which reproduce the acoustic transfer functions between the sound source and the ears of the listener. These filters simulate the auditory locating indices, indices which enable a listener to locate the sound sources in a real-life listening situation. These filters take into account all the acoustic phenomena (notably diffraction by the head, reflections on the auricle and the top of the torso) which modify the acoustic wave in its path between the source and the ears of the listener. These phenomena vary strongly with the position of the sound source (mainly with its direction) and these variations enable the listener to locate the source in the space. In practice, these variations determine a sort of acoustic encoding of the position of the source. The auditory system of an individual knows, by learning, how to interpret this encoding to locate the sound sources. Nevertheless, the acoustic diffraction/reverberation phenomena depend just as strongly on the morphology of the individual. A quality binaural synthesis therefore relies on binaural filters which best reproduce the acoustic encoding that the body of the listener naturally produces, by taking into account the individual specifics of its morphology. When these conditions are not satisfied, a degradation of the efficiency of the binaural rendition is observed, which is reflected notably in an intracranial perception of the sources and front/rear confusions. The sources located in front are perceived to be behind and vice-versa.
Among the 3D sound or sound spatialization technologies, in processing the audio signal applied notably to the simulation of acoustic and psycho-acoustic phenomena, some aim to generate signals to be broadcast on loudspeakers or on headphones, in order to give the listener the auditory illusion of sound sources placed in particular respective positions around the listener. This introduces the concept of the creation of virtual sound sources and images.
The binaural techniques described hereinabove are applied to the processing of a 3D sound intended for broadcasting on a headset with two, left and right, earpieces. These techniques aim to reconstruct the sound field at the level of the ears of a listener, such that his eardrums perceive a sound field that is practically identical to that which the actual sources in the 3D space would have induced. The binaural techniques are therefore based on a pair of binaural signals which respectively feed the two earpieces of the headset. These binaural signals can be obtained in two ways:
The binaural techniques that use binaural filters define the field of the binaural synthesis in an advantageous context of the present invention. Binaural synthesis relies on the binaural filters which model the propagation of the acoustic wave between the source and the two ears of the listener. These filters represent acoustic transfer functions called HRTFS, which model the transformations generated by the torso, the head and the auricle of the listener on the signal originating from a sound source. Each sound source position has an associated pair of HRTFs (one HRTF for the right ear, one HRTF for the left ear). In addition, the HRTFs carry the acoustic imprint of the morphology of the individual on whom they have been measured.
The HRTFs therefore depend not only on the direction of the sound, but also on the individual. They are thus a function of the frequency f, of the position (θ,φ) of the sound source (where the angle θ represents the azimuth and the angle φ represents the elevation), of the ear (left or right) and of the individual.
Conventionally, the HRTFs are obtained by measurement. A selection of directions more or less finely covering all of the space surrounding the listener is initially fixed. For each direction, the left and right HRTFs are measured using microphones inserted at the entry of the auditory canal of a subject. The measurement must be performed in an anechoic room (or “dead room”). Ultimately, if M directions are measured, a database of 2M acoustic transfer functions representing each position of the space for each ear is obtained for a given subject.
In the advantageous context of binaural synthesis, the spatialization effect relies on the use of HRTFs which, for optimum performance, must take into account the acoustic propagation phenomena between the source and the ears, but also the individual specifics of the morphology of the listener. Experimental measurement of the HRTFs directly on an individual is, at the present time, the most reliable solution for obtaining quality and truly individualized binaural filters (taking into account the individual specifics of the morphology of the individual). It will be recalled that the aim is to measure the transfer function between a source located in a given position (θ1, φ1) and the two ears of the subject by means of microphones placed at the entry of the auditory canals of this person.
However, measuring these transfer functions HRTFs presents some difficulties. It requires specific and costly equipment (typically an anechoic room, a microphone, a mechanical device for positioning sources). This operation is lengthy because it is necessary in particular to measure the transfer functions for a large number of directions in order to uniformly cover all of a 3D sphere surrounding the listener.
This measurement of the HRTFs becomes very difficult, even impossible, in the context of binaural synthesis applications intended for the consumer market. HRTF measurement in fact poses at least three main problems:
Solutions that require a minimum of HRTF measurements and implement more modeling techniques have therefore been researched. In particular, mathematical models of HRTFs have been studied that consist of a function F enabling an HRTF (Y) to be expressed based on a set of parameters (X) given a priori, such as Y=F(X). Often, two key elements are involved:
There now follows a description of the state of the art as known to the inventors concerning the HRTF modelings implemented to date, focusing attention on the choice of the model input parameters.
The document US-2003/138107 describes a statistical model of HRTFs based on morphological data. This approach starts from a statistical analysis applied to a database including HRTFs and morphological data. An analysis by main components is first applied on the one hand to the HRTFs and on the other hand to the morphological data, which makes it possible to describe all of the data with a restricted number of components. Then, a linear regression is performed between the components obtained from the analysis by main components of the HRTFs and the components obtained from that of the morphological data. A statistical model is thus established which links the morphological data to the HRTFs. All that is then needed is to measure the morphological parameters of any individual to predict his HRTFs based on the statistical model obtained.
However, this document also provides for the morphological data of an individual to be enriched at the model input with a few HRTFs measured on this individual and in specific respective directions.
Thus, even if the number of measurements is limited in this document, it is still necessary to observe the HRTF measurement protocol, in particular to provide an anechoic room for the measurements and strictly position the sources at very precise distances from the microphones which are attached to the ears of the individual.
The implementation of the present invention overcomes such constraints.
To this end, the present invention aims for a method of modeling transfer functions HRTFs specific to an individual, in which there are provided:
Thus, the present invention intends to exploit the advantages of the technique described in the document FR-2 851 877, whereby it is possible to model, at least roughly, the HRTFs of an individual for whom an appropriate set of morphological parameters have been measured. It typically involves a finite element modeling, which amounts to estimating, according to their direction of origin, the disturbances that the acoustic waves undergo when they encounter an obstacle corresponding to the bust of the individual. In particular in this document FR-2 851 877, it is proposed to measure general dimensions of the head and of the torso of an individual, and to model at least the head and the torso of the individual by simple geometrical shapes (for example, ellipsoids for the head and the torso and a cylinder for the neck), the dimensions of these simple shapes corresponding to the dimensions measured on the individual. The finite element modeling is then applied to these simple shapes. Modeled HRTF results are obtained which are satisfactory in the sense that the HRTFs obtained can at least be differentiated from one individual to another, in particular in the low and medium acoustic frequencies. For the higher frequencies, this document FR-2 851 877 proposes also to identify at least the position of an ear on the head of the individual and preferably the shape of the auricle of the ear as well. However, the quality of the duly modeled HRTFs still had to be perfected and the present invention to this end proposes applying a corrective model, advantageously implementing an artificial neural network, in particular in the model construction step d) of the above method.
When a comparison and learning phase is implemented to construct the model, particularly if an artificial neural network is used, it is preferable for the morphological parameter measurement conditions to be roughly reproducible at least between the model construction step and the current step conducted on any individual. It is also preferable for the simplified geometrical model, and the finite element computation model, to be reproducible.
To this end, the procedure for measuring morphological parameters which is described in FR-2 851 877 can be taken up again here. Typically, an installation can be provided for estimating transfer functions HRTFs specific to an individual, comprising:
The present invention also aims for such an installation.
Advantageously, the installation can be equipped with means of photographing, from at least two different angles (for example front and profile), at least the bust of an individual to deduce therefrom general dimensions of his head, his torso, or other parts. To this end, the booth can include, in a preferred embodiment, a measurement standard such that the photographs show, with the bust of the individual, the measurement standard. Shape recognition means, for example, can then be used to measure the morphological parameters that are involved in the modeling.
Thus, this installation makes it possible to implement at least the current step of the method in the sense of the invention.
It is then sufficient, in the current step, to supply:
In a general embodiment, provision can be made to model by finite elements the HRTFs in all the multiplicity of directions of the space, then to refine the model by comparison/learning between all these modeled HRTFs and all the measured HRTFs of the first base.
As a variant, it is possible to proceed as follows.
It has been possible to prove that the modeling of the HRTFS by finite elements is more effective in certain particular directions, in as much as, for these directions, the HRTFs modeled by finite elements are closer to the measured HRTFS than for the other directions, and this regardless of the individual. Thus, on completion of the finite element modeling, it is possible ultimately to retain only these best modeled HRTFs that correspond to preferred directions and carry out the comparison only on these preferred directions.
On the other hand, the learning will be conducted over all the multiplicity of directions of the space.
Thus, in more generic terms, to apply the model construction step, based on said morphological parameters of the second database and by comparison with the measured HRTFS of the first database, preferred directions of the space are selected according to which the finite element modeling supplies modeled HRTFs close to the measured HRTFs in these preferred directions, and
As a complement or a variant, it is possible to assume that all the directions are not equivalent in terms of individualization and there are preferred directions which are more “individual” than the others, in as much as the HRTFs in these directions carry a greater wealth of individual information than the others. For example, the directions where the contribution of the auricle is more marked, or even predominant, are potentially strongly individual directions. It then seems relevant to focus the finite element modeling on these directions, which can then give another criterion, for example complementary, for the selection of preferred directions of the modeled HRTFs.
The present invention also aims for a computer program product, designed to be stored in a memory of a processing unit or on a removable medium designed to cooperate with a drive of said processing unit, or intended to be transmitted from a server to said processing unit. The program comprises instructions in computer code form to construct a model based on learning and advantageously implementing an artificial neural network, capable of giving transfer functions HRTFs of an individual for a multiplicity of directions, based on a set of measurements, performed on this individual, of morphological parameters of this individual. The program then creates, from a first database including a plurality of HRTFs according to a multiplicity of directions of the space and for a plurality of individuals, and a second database including morphological parameters of these individuals, at least one finite element modeling, followed by a comparison/learning phase.
The present invention also aims for a second computer program product, designed to be stored in a memory of a processing unit or on a removable medium designed to cooperate with a drive of said processing unit, or intended to be transmitted from a server to said processing unit. The program comprises instructions in computer code form to create a model based on learning and advantageously implementing an artificial neural network, this model being capable of giving transfer functions HRTFs of an individual for a multiplicity of directions, based on a set of measurements performed on this any individual, of morphological parameters of this any individual.
Thus, the first program described hereinabove can be used to construct the model, whereas the second program consists of computer instructions representing the model itself.
Other characteristics and advantages of the invention will become apparent from studying the detailed description hereinbelow, and the appended drawings in which:
a diagrammatically illustrates the first model construction step in a method according to the invention,
b diagrammatically illustrates the current step using the model constructed in a method according to the invention,
c diagrammatically illustrates an advantageous embodiment for the construction of the abovementioned model, and
There now follows, first of all, a review of the principle of the construction of a model using a comparison/learning phase.
It involves in particular calculating the transfer functions HRTFs by means of a mathematical model based on a function F which makes it possible to express a transfer function on the basis of a number of input parameters. More specifically, if the desired transfer function is represented in the form of a vector Y (Y ε , n ε ) and if the input parameters are described in the form of a vector X (X ε , m ε ), the function F defines the following relation: Y=F(X). In other words, the function F can be used to deduce a transfer function of a given set of a priori known parameters. The benefit of the mathematical model lies in the use of input parameters which can easily be acquired for any individual, while keeping in mind, however, that their relation with the transfer function is not necessarily direct or obvious. The mathematical model must, in particular, be capable of extracting the information that is more or less hidden in the input parameters in order to deduce therefrom the desired transfer function. The inventive method relies mainly on two points:
The mathematical model of the HRTFs relies on a function F making it possible to express an HRTF on the basis of a given number of input parameters. The input parameters are grouped together in a vector X (X ε , m ε ) which therefore constitutes the input vector of the function F. The output vector of the function is an HRTF which is represented by a vector Y (Y ε , n ε ). For example, this vector Y can comprise frequency coefficients describing the modulus of the spectrum of the transfer function defined by the HRTF. In an equivalent way, Y can comprise:
The function F is therefore a function of in .
The modeling problem involves determining the function F, in association with a relevant set of parameters (X), such that any HRTF (Y) is the solution of: Y=F(X).
Specifically to estimate the HRTFs of an individual, the input vector X of the model mainly contains information relating to:
The output vector Y of the model consists of coefficients associated with a given representation of an HRTF. As indicated hereinabove, the vector Y can correspond to the frequency coefficients describing the modulus of the spectrum of an HRTF, but other representations can be considered (analysis by main components, IIR filter, or other).
As represented in
Generally, modeling based on an artificial neural network consists mainly in:
The determination of F and of the vector X are quite obviously not independent.
There is a wide variety of mathematical methods for determining these two entities F and X. The inventive method is preferably based on statistical learning algorithms and, in a preferred embodiment, on algorithms of the artificial neural network type. These algorithms are briefly described hereinafter.
The statistical learning algorithms are statistical process prediction tools. They have been successfully used to predict processes for which a number of explanatory variables can be identified. The artificial neural networks define a particular category of these algorithms. The benefit of the neural networks lies in their capacity to pick up high-level dependencies, that is, dependencies that involve a number of variables at a time. Process prediction exploits the knowledge and use of high-level dependencies. There is a wide variety of applicable domains for neural networks, notably in financial techniques to predict market fluctuations, in pharmaceuticals, in the banking sector to detect credit card fraud, in marketing to forecast consumer behavior, and other sectors. Neural networks are often considered as universal predictors, in the sense that they are capable of predicting any data from any explanatory variables, provided that there are enough hidden units. In other words, they can be used to model any mathematical function of in , provided that the number of hidden units is sufficient.
Referring to
In the hidden layer, a first step 111 consists in calculating linear combinations of the explanatory variables so as to combine the information potentially originating from several variables. A second step 112 can consist in applying a nonlinear transformation (for example, a function of the “hyperbolic tangent” type) to each of the linear combinations in order to obtain the values of the hidden units or neurons that form the hidden layer. This nonlinear transformation defines the activation function of the neurons. Finally, the hidden units are linearly recombined, in the step 113, in order to calculate the value predicted by the neural network.
Initially, finalizing a neural network involves three operations:
There are various categories of neural network that are distinguished by their architecture (type of interconnection between neurons, choice of activation functions, or other factors) and the learning mode used.
The neural networks are not only used for prediction purposes. They are also used for classifying and/or clustering data with a view to reducing the information. In practice, a neural network can, in a data set, identify common characteristics between the elements of that set, to then combine them according to their resemblance. Each duly constructed cluster then has associated with it an element representative of the information contained in the cluster, called “representative”. This representative can then replace the whole of the cluster. The data set can thus be described by means of a small number of elements, which represents a data reduction. Kohonen maps or self-organizing maps (SOM) can be neural networks dedicated to this clustering task.
A question arises concerning the act of choosing all the HRTFs, roughly estimated by the finite element modeling, as input for the model with artificial neural network 11 or if only a few HRTFs estimated in preferred directions could be used, as indicated hereinabove.
It will also be recalled that the roughly estimated HRTFs can be determined from a finite element modeling by considering, for example, simple geometrical shapes for the head, the torso, the neck, or other parts of an individual, as described in document FR-2 851 877, without going into this description in detail here.
The method that seemed to be the most immediate consisted in a uniform selection from which a subset of roughly estimated HRTF directions was chosen, seeking to cover all of the 3D sphere as uniformly and evenly as possible. This method relied on a regular sampling of the 3D sphere. Now, it turned out that the HRTFs did not vary uniformly according to direction. From this point of view, a uniform selection of the HRTFs was not really optimal.
A more promising method involved applying the abovementioned clustering technique in order to identify the directions of the most “relevant” HRTFs, that is, those most representative of the characteristics of the HRTFs observed over all of the 3D sphere. When it is applied in determining the HRTFs of an individual, this clustering technique can involve:
This “representative” HRTF is one of the HRTFs of the cluster and it is selected as the HRTF which minimizes a distance criterion with all the other HRTFs of the cluster. The representative HRTF contains most of the information from the HRTFS of the cluster. Ultimately, the set of the duly obtained representative HRTFs constitutes a compact description of the properties of the HRTFs for all of the 3D sphere.
This technique had given good results with regard to the model. The first result is a data reduction. The clustering procedure adds supplementary information as to the directions associated with the representative HRTFS, this information making it possible to define a selection of HRTFs intended to feed the input of the HRTF calculation model. This selection is a priori non-uniform, but more effective, and guarantees a better “representativeness” of the whole of the 3D sphere.
Nevertheless, it became apparent to the inventors that the greatest selectivity providing effective “clustering” was observed between distinct morphotypes of individuals, rather than between distinct directions of HRTFs. The inventors then favored the exhaustiveness of the database of morphological parameters, in particular by choosing a wide variety of morphotypes. It was then preferred to deduce from this base a new base containing the HRTFs modeled by finite elements for all these individuals and in all the directions of the space. It is these HRTFs that are then supplied as input to the corrective model illustrated by the step 11 of
Preferably, the invention uses statistical learning algorithms of the “artificial neural network” type, as modeling tool for the corrective calculation of the HRTFs (for example, with a neural network of “Multi-Layer Perceptron” (MLP) type). The input parameters of the neural network are at least the azimuth angle (θ1) and elevation angle (φ1) specifying the direction of an HRTF to be calculated, and the HRTFs roughly estimated by means of the finite element model.
The output parameters of the model are then the coefficients of the vector describing the HRTF for the direction (θ1, φ1) and for the individual for whom the HRTFs had been estimated by the finite element modeling.
Referring again to
To refer now to
To successfully complete these three phases, there is initially a database 20 of HRTFs roughly estimated on one or more individuals. Thus, it will be understood that a preliminary step for collecting morphological parameter measurements for a number of individuals and, from there, their roughly estimated HRTFs in all the directions of the space, is applied. This is how the database 20 is constructed.
This database 20 is subdivided into three distinct sets:
For the learning phase 21, there are pairs available which combine:
The learning involves, for each duly formed pair obtained from the learning set:
One risk of the learning phase is overlearning which is reflected as follows: the neural network learns “by heart” the learning set and seeks to reproduce variations specific to the learning set, although they do not exist at the global level. To avoid overlearning, the validation phase 22 is conducted together with the learning phase 21. It consists in evaluating the prediction error of the neural network on a validation set (distinct from the learning set), which defines the validation error. During the learning process, the validation error begins by decreasing, then starts to increase again when the overlearning occurs. The minimum of the validation error therefore determines the end of learning.
In practice, this observation directly affects the number of estimated HRTFS to be supplied as input to the model, after the learning phase. It will then be understood that an advantageous optional characteristic provides for determining an optimum number of roughly estimated HRTFs to be supplied as input to the model.
The test phase is conducted once the learning phase is finished and consists in evaluating the prediction error on the test set. This so-called “test error” ultimately describes the final performance of the neural network.
On completion of these three phases, there is an operational neural network available, to which it is enough to submit input parameters to obtain the HRTFS of any individual in any direction.
Thus, with reference to
The next step b) consists of the learning of the model by using this database 20 and another database 41 containing HRTFs roughly estimated from a finite element modeling 49 (or “BEM”) applied to the morphological parameters 48 specific to the same individuals. A small number n (with n<N) of directions i representative of HRTFS are chosen arbitrarily in the step 41. This step 41 will be described in detail later, with reference to
Referring to
On the other hand, a second type of measurements 48 is carried out, performed on the same individuals as those on whom the measurements constituting the database 20 of measured HRTFs were conducted, and consisting in recording the morphological parameters of these M individuals (dimensions of the head, torso, neck, position and shape of the ears, etc.). To each set of morphological parameters morphj of an individual j, a finite element modeling 49 is applied to obtain estimated HRTFs in at least some of the directions of the space.
Moreover, during a step 50, the directions (jcal, θjcal) in which the HRTFs must be calculated are specified as input for the model. Preferably, it will obviously concern the greatest possible number of directions of the 3D space. A version of the model 44b, in the learning state, calculates the corrected HRTFs in these directions (jcal, θjcal) from the roughly estimated HRTFs, in a following step 46b. The model compares these calculated and corrected HRTFs with the HRTFs in the database 20 in the same directions (jcal, θjcal). If the difference is deemed to be too great (N arrow), the model in the learning state 44b is refined until this difference is reduced to an acceptable error (◯ arrow): the model then becomes definitive (end step 44).
Referring to
Advantageously, the booth includes a measurement standard ETA which will serve as a scale for measuring these dimensions. In particular, the photographing means S1 and S2 incorporate, in their field, the measurement standard ETA with the bust of the individual IND.
To refer again to
It should be indicated, however, that the protocol for measuring the morphological parameters on the one hand and the measured HRTFs in the base 20 on the other hand, should preferably be defined previously and be followed roughly in the same way, for all the individuals. The duly obtained neural network is capable of calculating the HRTFs of any individual, in any direction, provided that there are measurements of his morphological parameters available.
Of course, the present invention is not limited to the embodiment described hereinabove by way of example; it extends to other variants.
For example, instead of providing two photographs to measure the morphological parameters, it will be possible to provide for a 3D laser reading of the bust of an individual.
Number | Date | Country | Kind |
---|---|---|---|
0510995 | Oct 2005 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR2006/002345 | 10/18/2006 | WO | 00 | 7/18/2008 |