The present invention relates to audio signal encoding devices, intended in particular to find a place in digitized and compressed audio signals storage or transmission applications.
The invention relates more precisely to audio hierarchical encoding systems, having the capacity to provide varied rates, by distributing the information relating to an audio signal to be encoded in hierarchically-arranged subsets, such that this information can be used in order of importance with respect to the audio quality. The criterion taken into account for determining the order is a criterion of optimization (or rather of least degradation) of the quality of the encoded audio signal. Hierarchical encoding is particularly suited to transmission over heterogeneous networks or those having available rates varying over time, or also transmission to terminals having different or variable characteristics.
The invention relates more particularly to the hierarchical encoding of 3D sound scenes. A 3D sound scene comprises a plurality of audio channels corresponding to monophonic audio signals and is also known as spatialized sound.
An encoded sound scene is intended to be reproduced on a sound rendering system, which can comprise a simple headset, two speakers of a computer or also a Home Cinema 5.1 type system with five speakers (one speaker at the level of the screen and in front of the theoretical listener: one speaker to the left and one speaker to the right; behind the theoretical listener: one speaker to the left and one speaker to the right), etc.
For example, consider an original sound scene comprising three distinct sound sources, located at different locations in space. The signals describing this sound scene are encoded. The data resulting from this encoding are transmitted to the decoder, and are then decoded. The decoded data are utilized in order to generate five signals intended for the five speakers of the sound rendering system. Each of the five speakers broadcasts one of the signals, the set of signals broadcast by the speakers synthesizing the 3D sound scene and therefore locating three virtual sound sources in space.
Different techniques exist for encoding sound scenes.
For example, one technique used comprises the determination of elements of description of the sound scene, then operations of compression of each of the monophonic signals. The data resulting from these compressions and the elements of description are then supplied to the decoder.
The rate adaptability (also called scalability) according to this first technique can therefore be achieved by adapting the rate during the compression operations, but it is achieved according to criteria of optimization of the quality of each signal considered individually.
Another encoding technique, which is used in the “MPEG Audio Surround” encoder (cf. “Text of ISO/IEC FDIS 23003-1, MPEG Surround”, ISO/IEC JTC1/SC29/WG11 N8324, July 2006, Klagenfurt, Austria), comprises the extraction and the encoding of spatial parameters from all of the monophonic audio signals on the different channels. These signals are then mixed in order to obtain a monophonic or stereophonic signal which is then compressed by a standard mono or stereo encoder (for example of MPEG-4 AAC, HE-AAC, etc. type). At the level of the decoder, the synthesis of the 3D sound scene is carried out based on the spatial parameters and the decoded mono or stereo signal.
The rate adaptability with this other technique can thus be achieved using a hierarchical mono or stereo encoder, but it is achieved according to a criterion of optimization of the quality of the monophonic or stereophonic signal.
Moreover, the PSMAC (Progressive Syntax-rich Multichannel Audio Codec) method makes it possible to encode the signals of different channels by using the KLT (Karhunen Loeve Transform), which is useful mainly for the decorrelation of the signals and which corresponds to a principal components decomposition in a space representing the statistics of the signals. It makes it possible to distinguish the highest-energy components from the lowest-energy components.
The rate adaptability is based on a cancellation of the lowest-energy components. However, these components can sometimes have great significance with regard to overall audio quality.
Thus, although the known techniques produce good results with respect to rate adaptability, none proposes a completely satisfactory rate adaptability method based on a criterion of optimization of the overall audio quality, aimed at defining compressed data optimizing the perceived overall audio quality, during the restitution of the decoded 3D sound scene.
Moreover, none of the known 3D sound scene encoding techniques allows rate adaptability based on a criterion of optimization of the spatial resolution, during the restitution of the 3D sound scene. This adaptability makes it possible to guarantee that each rate reduction will degrade as little as possible the precision of the locating of the sound sources in space, as well as the dimension of the restitution zone, which must be as wide as possible around the listener's head.
Moreover, none of the known 3D sound scene encoding techniques allows rate adaptability which would make it possible to directly guarantee optimum quality whatever the sound rendering system used for the restitution of the 3D sound scene. The current encoding algorithms are defined in order to optimize the quality in relation to a particular configuration of the sound rendering system. In fact, for example in the case of the “MPEG Audio Surround” encoder described above utilized with hierarchical encoding, direct listening with a headset or two speakers, or also monophonic listening is possible. If it is desired to utilize the compressed bitstream with a sound rendering system of type 5.1 or 7.1, additional processing is required at the level of the decoder, for example using OTT (“One-To-Two”) boxes for generating the five signals from the two decoded signals. These boxes make it possible to obtain the desired number of signals in the case of a sound rendering system of type 5.1 or 7.1, but do not make it possible to reproduce the real spatial aspect. Moreover, these boxes do not guarantee the adaptability to sound rendering systems other than those of types 5.1 and 7.1.
The purpose of the present invention is to improve the situation.
To this end the present invention aims to propose, according to a first aspect, a method for sequencing spectral components of elements to be encoded originating from a sound scene comprising N signals with N>1, one element to be encoded comprising spectral components associated with respective spectral bands.
The method comprises the following steps:
A method according to the invention thus allows the arrangement in order of importance with respect to the overall audio quality of the components of element to be encoded.
A binary sequence is constituted after comparison with each other of the different spectral components of the different elements to be encoded of the overall scene, compared with each other with regard to their contribution to the perceived overall audio quality. The interaction between signals is thus taken into account in order to compress them jointly.
The bitstream can thus be sequenced such that each rate reduction degrades the perceived overall audio quality of the 3D sound scene as little as possible, since the least important elements with respect to their contribution to the level of the overall audio quality are detected, in order to be able not to be inserted (when the rate allocated for the transmission is insufficient to transmit all the components of the elements to be encoded) or be placed at the end of the binary sequence (making it possible to minimize the defects generated by a subsequent truncation).
In an embodiment, the calculation of the influence of a spectral component is carried out in the steps:
a—encoding of a first set of spectral components of elements to be encoded according to a first rate;
b—determination of a first mask-to-noise ratio per spectral band;
c—determination of a second rate lower than said first one;
d—deletion of said usual spectral component of the elements to be encoded and encoding of the remaining spectral components of the elements to be encoded according to the second rate;
e—determination of a second mask-to-noise ratio per spectral band;
f—calculation of a variation in mask-to-noise ratio as a function of the differences determined between the first and second mask-to-noise ratios for the first and the second rate per spectral band;
g—iteration of steps d to f for each of the spectral components of the set of spectral components of elements to be encoded for sequencing and determination of a variation in minimum mask-to-noise ratio; the order of priority allocated to the spectral component corresponding to the minimum variation being a minimum order of priority.
Such a process thus makes it possible to determine at least one component of an element to be encoded which is the least important with respect to the contribution to the overall audio quality, compared to the set of the other components of elements to be encoded for sequencing.
In an embodiment, steps a to g are reiterated with a set of spectral components of elements to be encoded for sequencing restricted by deletion of the spectral components for which an order of priority has been allocated.
In another embodiment, steps a to g are reiterated with a set of spectral components of elements to be encoded for sequencing in which the spectral components for which an order of priority has been allocated are assigned a more reduced quantification rate during the use of an imbricated quantifier.
In an embodiment, the elements to be encoded comprise the spectral parameters calculated for the N channels. These are then, for example, the spectral components of the signals which are encoded directly.
In another embodiment, the elements to be encoded comprise elements obtained by spatial transformation, for example of ambisonic type, of the spectral parameters calculated for the N signals. This arrangement makes it possible on the one hand to reduce the number of data to be transmitted since, in general, the N signals can be described very satisfactorily by a reduced number of ambisonic components (for example, a number equal to 3 or 5), less than N. This arrangement also allows adaptability to any type of sound rendering system, since it is sufficient, at the level of the decoder, to apply an inverse ambisonic transform of size Q′×(2p+1), (where Q′ is equal to the number of speakers of the sound rendering system used at the decoder output and 2p′+1 is equal to the number of ambisonic components received), for determining the signals to be supplied to the sound rendering system, while preserving the overall audio quality.
In an embodiment, instead of the spatial transform, other linear transforms such as KLT etc. are used.
In an embodiment, the mask-to-noise ratios are determined as a function of the errors due to the encoding and relative to elements to be encoded and also as a function of a spatial transformation matrix and of a matrix determined as a function of the transpose of said spatial transformation matrix.
In an embodiment, elements to be encoded are ambisonic components, some of the spectral components then being spectral parameters of ambisonic components. The method comprises the following steps:
A method according to the invention thus makes it possible to sequence at least some of the spectral parameters of ambisonic components of the set to be sequenced, as a function of their relative importance with respect to contribution to spatial precision.
The spatial resolution or spatial precision measures the fineness of the locating of the sound sources in space. An increased spatial resolution allows a finer locating of the sound objects in the room and makes it possible to have a wider restitution zone around the listener's head.
The interactions between signals and their consequence with respect to spatial precision are taken into account to compress them in a joint way.
The bitstream can thus be sequenced such that each rate reduction degrades the perceived spatial precision of the 3D sound scene as little as possible, since the least important elements with respect to their contribution are detected, in order to be placed at the end of the binary sequence (making it possible to minimize the defects generated by a subsequent truncation).
In an embodiment of such a method, the angles ξV and ξE associated with the velocity and energy vectors of the Gerzon criteria are utilized, as indicated below, in order to identify elements to be encoded which are least relevant as regards contribution, with respect to spatial precision, to the 3D sound scene. Thus contrary to customary practice, the velocity and energy vectors are not used to optimize a considered sound rendering system.
In an embodiment, the calculation of the influence of a spectral parameter is carried out in the following steps:
This arrangement makes it possible, in a limited number of calculations, to determine the spectral parameter of the component to be determined, the contribution of which to the spatial precision is minimum.
In an embodiment, steps a to g are reiterated with a set of spectral parameters of components to be encoded for sequencing which is restricted by deletion of the spectral parameters for which an order of priority has been allocated.
In another embodiment, steps a to g are reiterated with a set of spectral parameters of components to be encoded for sequencing in which the spectral parameters for which an order of priority has been allocated are assigned a more reduced quantification rate during the use of an imbricated quantifier.
Such iterative methods make it possible to successively identify, among the spectral parameters of the ambisonic components to which orders of priority have not yet been assigned, those which contribute least with respect to spatial precision.
In an embodiment, a first coordinate of the energy vector is a function of the formula
a second coordinate of the energy vector is a function of the formula
a first coordinate of the velocity vector is a function of the formula
and a second coordinate of the velocity vector is a function of the formula
in which the Ti, i=1 to Q, represent the signals determined as a function of the inverse ambisonic transformation on said quantified spectral parameters according to the rate considered and the ξi, i=1 to Q, are determined angles.
In an embodiment, a first coordinate of an angle vector indicates an angle which is a function of the sign of the second coordinate of the velocity vector and of the arc-cosine of the first coordinate of the velocity vector and according to which a second coordinate of an angle vector indicates an angle which is a function of the sign of the second coordinate of the energy vector and of the arc-cosine of the first coordinate of the energy vector.
According to a second aspect, the invention proposes a sequencing module comprising means for implementing a method according to the first aspect of the invention.
According to a third aspect, the invention proposes an audio encoder suited to encoding a 3D audio scene comprising N respective signals in an output bitstream, with N>1, comprising:
According to a fourth aspect, the invention proposes a computer program for installation in a sequencing module, said program comprising instructions for implementing the steps of a method according to the first aspect of the invention during an execution of the program by processing means of said module.
According to a fifth aspect, the invention proposes a method for decoding a bitstream, encoded according to a method according to the first aspect of the invention, with a view to determining a number Q′ of audio signals for the restitution of a 3D audio scene using Q′ speakers, according to which:
According to a sixth aspect, the invention proposes an audio decoder suited to decoding a bitstream encoded according to a method according to the first aspect of the invention, with a view to determining a number Q′ of audio signals for the restitution of a 3D audio scene using Q′ speakers, comprising means for implementing the steps of a method according to the fourth aspect of the invention.
According to a seventh aspect, the invention proposes a computer program for installation in a decoder suited to decoding a bitstream encoded according to a method according to the first aspect of the invention, with a view to determining a number Q′ of audio signals for the restitution of a 3D audio scene using Q′ speakers, said program comprising instructions for implementing the steps of a method according to the fourth aspect of the invention during an execution of the program by processing means of said decoder.
According to an eighth aspect, the invention proposes a binary sequence comprising spectral components associated with respective spectral bands of elements to be encoded originating from an audio scene comprising N signals with N>1, characterized in that at least some of the spectral components are sequenced according to a sequencing method according to the first aspect of the invention.
Other characteristics and advantages of the invention will become apparent on reading the following description. This is purely illustrative and must be read in relation to the attached drawings, in which:
a represents a binary sequence constructed in an embodiment of the invention;
b represents a binary sequence Seq constructed in another embodiment of the invention;
The encoder 1 comprises a time/frequency transformation module 3, a masking curve calculation module 7, a spatial transformation module 4, a module 5 for definition of the least relevant elements to be encoded combined with a quantification module 10, a module 6 for sequencing the elements, a module 8 for constitution of a binary sequence, with a view to the transmission of a bitstream φ.
A 3D sound scene comprises N channels, over each of which a respective signal S1, . . . , SN is delivered.
The decoder 100 comprises a binary sequence reading module 104, an inverse quantification module 105, an inverse ambisonic transformation module 101, and a frequency/time transformation module 102.
The decoder 100 is suited to receiving at the input the bitstream φ transmitted by the encoder 1 and for delivering at the output Q′ signals S′1, S′2, . . . , S′Q′ intended to feed the respective Q′ speakers H1, H2 . . . , HQ′ of a sound rendering system 103.
Each speaker Hi, i=1 to Q′, is associated with an angle βi indicating the angle of acoustic propagation from the speaker.
Operations Carried Out at the Level of the Encoder:
The time/frequency transformation module 3 of the encoder 1 receives at its input the N signals S1 . . . , SN of the 3D sound scene to be encoded.
Each signal Si, i=1 to N, is represented by the variation in its acoustic omnidirectional pressure Pi and the angle θi of propagation of the acoustic wave in the space of the 3D scene.
Over each time frame of each of these signals indicating the different values taken over time by the acoustic pressure Pi, the time/frequency transformation module 3 carries out a time/frequency transformation, in the present case, a modified discrete cosine transform (MDCT).
Thus it determines, for each of the signals Si, i=1 to N, its spectral representation Xi, characterized by M MDCT coefficients X(i, j), with j=0 to M−1. An MDCT coefficient X(i,j) thus represents the spectrum of the signal Si for the frequency band Fj.
The spectral representations Xi of the signals Si, i=1 to N, are supplied at the input of the spatial transformation module 4, which also receives at its input the acoustic propagation angles θi characterizing the input signals Si.
The spectral representations Xi of the signals Si, i=1 to N, are also supplied at the input of the masking curve calculation module 7.
The masking curve calculation module 7 is suited to determining the spectral masking curve of each signal Si considered individually, using its spectral representation Xi and a psychoacoustic model, which provides a masking level for each frequency band Fj, j=0 to M−1 of each spectral representation Xi. The definition elements of these masking curves are delivered to the module 5 for definition of the least relevant elements to be encoded.
The spatial transformation module 4 is suited to carrying out a spatial transformation of the input signals supplied, i.e. determining the spatial components of these signals resulting from the projection on a spatial reference system dependent on the order of the transformation. The order of a spatial transformation is associated with the angular frequency at which it “scans” the sound field.
In an embodiment, the spatial transformation module 4 carries out an ambisonic transformation, which gives a compact spatial representation of a 3D sound scene, by producing projections of the sound field on the associated spherical or cylindrical harmonic functions.
For more information on ambisonic transformations, reference can be made to the following documents: “Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia [“Representation of acoustic fields, application to the transmission and reproduction of complex sound scenes in a multimedia context”], Doctoral Thesis of the University of Paris 6, Jerôme DANIEL, 31 Jul. 2001, “A highly scalable spherical microphone array based on an orthonormal decomposition of the sound field”, Jens Meyer—Gary Elko, Vol. II-pp. 1781-1784 in Proc. ICASSP 2002.
With reference to
where (Jm) represent the Bessel functions, r the distance between the centre of the frame and the position of a listener placed at a point M, Pi the acoustic pressure of the signal Si, θi the propagation angle of the acoustic wave corresponding to the signal Si and φ the angle between the position of the listener and the axis of the frame.
If the ambisonic transformation is of order p (p being any positive integer), for a 2D ambisonic transformation (in the horizontal plane), the ambisonic transform of a signal Si expressed in the time domain then comprises the following 2p+1 components:
(Pi, Pi. cos θi, Pi. sin θi, Pi. cos 2θi, Pi. sin 2θi, Pi. cos 3θi, Pi. sin 3θi, . . . , Pi. cos pθi, Pi. sin pθi).
In the following, a 2D ambisonic transformation has been considered. Nevertheless the invention can be implemented with a 3D ambisonic transformation (in such a case, it is considered that the speakers are arranged on a sphere).
The ambisonic components Ak, k=1 to Q=2p+1, considered in the frequency domain, each comprises M spectral parameters A(k,j), j=0 to M−1 associated respectively with the Fj bands such that:
if A is the matrix comprising the components Ak, k=1 to Q resulting from the ambisonic transformation of order p of the signals Si, i=1 to N, Amb(p) is the ambisonic transformation matrix of order p for the spatial sound scene, and X is the matrix of the frequency components of the signals Si, i=1 to N, then:
Amb(p)=[Amb(p)(i, j)], with i=1 to Q and j=1 to N, with: Amb(p)(1, j)=1,
if i is even and
if i is odd, i.e.
The spatial transformation module 4 is suited to determining the matrix A, using equation (1) as a function of the data X(i, j) and θi (i=1 to N, j=0 to M−1) which are supplied to it at the input.
In the particular case considered, the ambisonic components Ak, k=1 to Q, i.e. the parameters A(k, j), k=1 to Q and j=0 to M−1, of this matrix A, are the elements to be encoded by the encoder 1 in a binary sequence.
The ambisonic components Ak, k=1 to Q, are delivered to the module 5 for definition of the least relevant elements for quantification and determination of a sequencing of the ambisonic components.
This module 5 for definition of the least relevant elements is suited to implementation of the operations, following the execution on processing means of the module 5, of a first algorithm and/or a second algorithm, with a view to defining the least relevant elements to be encoded and sequencing the elements to be encoded with each other.
This sequencing of the elements to be encoded is used subsequently during the constitution of a binary sequence to be transmitted.
The first algorithm comprises instructions suitable for implementation, when they are executed on the processing means of the module 5, of the steps of the process Proc1 described below with reference to
Process Proc1
The principle of the process Proc1 is as follows: a calculation is made of the respective influence of at least some spectral components which can be calculated as a function of spectral parameters originating from at least some of the N signals, on mask-to-noise ratios determined over the spectral bands as a function of an encoding of said spectral components. Then an order of priority is allocated to at least one spectral component as a function of the influence calculated for said spectral component compared to the other calculated influences.
In an embodiment, the detailed process Proc1 is as follows:
Initialization
Step 1a:
In this step, a first rate D0=Dmax and an allocation of parts of this rate D0 between the elements to be encoded A(k, j), (k, j)εE0={(k, j) such that k=1 to Q and j=0 to M−1} are defined. The rate allocated to the element to be encoded A(k, j), (k, j)εE0 during this allocation (the sum of these rates dk, j|k=1 to Q, j=0 to M−1 is equal to D0) is named dk, j and δ0=min dk,j for (k, j)εE0.
Then the elements to be encoded A(k, j), (k, j)εE0, are quantified by the quantification module 10 as a function of the allocation defined for the rate D0.
Step 1b:
Then, the ratio of the mask to the quantification error (or noise) (“Mask to noise Ratio” or MNR) is calculated for each signal Si and for each sub-band Fj, with i=1 to N and j=0 to M−1, which is equal to the power of the mask of the signal Si in the band Fj divided by the power of the quantification noise (E(i,j)) relating to the signal Si in this band Fj.
In order to do this, the quantification error b(k,j) in each band Fj of the elements to be encoded A(k,j), (k, j)εE0, is first determined as follows:
b(k, j)=A(k, j)−Ā(k, j), with Ā(k, j) being the result of the quantification, then inverse quantification of the element A(k,j) (in general the quantification provides a quantification index indicating the value of the element quantified in a dictionary, the inverse quantifier provides the value of the element quantified as a function of the index).
Then the quantification error E(i, j) in each band Fj for each signal Si with i=1 to N and j=0 to M−1 is determined, due to the quantification of the elements to be encoded according to the rate D0, by calculating the matrix E comprising the elements E (i, j):
where Q=2p+1, Amb(p) is the ambisonic transformation matrix of order p and
Then, the ratio of the mask to the quantification error for each signal Si and for each band Fj, with i=1 to N and j=0 to M−1 is determined as a function of the quantification noise E(i, j) thus calculated relative to the signal Si in this band Fj and of the mask of the signal Si in the band Fj provided by the mask calculation module 7.
MNR(0, D0) refers to the matrix such that the element (i, j) of the matrix MNR (0,D0), i=1 to N and j=0 to M−1, indicates the ratio of the mask to the quantification error for the signal Si and for the band Fj for the quantification previously carried out.
Before describing iteration No. 1 of the process Proc1, an indication is given below of how equation (2) was determined.
We then have
where Amb(p) is the ambisonic transformation matrix of order p and AmbInv(p) is the inverse ambisonic transformation matrix of order p (also called the ambisonic decoding matrix).
The processing chain 210 of
We can then write:
Therefore we deduce from this: E=(AmbInv(p)×Amb(p))−1AmbInv(p)×B. In the case where the ambisonic decoding matrix corresponds to a system with regular speakers, we have
(in fact, the N quantification errors E or B depend only on the encoding carried out and not on the decoding. What will change at the level of the decoding, as a function of the decoding matrix used, corresponding to the system of speakers used, is the way in which the error is distributed between the speakers. This is due to the fact that the psychoacoustics used do not take into account the interactions between the signals. Therefore if the calculation is carried out for a well-defined decoding matrix and the quantification module optimizes the error for this matrix, then for the other decoding matrices the error is sub-optimum).
Equation (2) is therefore deduced from it.
To return to the description of
Iteration No. 1:
Step 1c:
A second encoding rate D1 is now defined, with D1=D0−δ0, and a distribution of this encoding rate D1 between the elements to be encoded A(k, j), k=1 to Q and j=0 to M−1.
Step 1d:
Then, for each pair (k, j)εE0, considered successively from the pair (1.0) up to the pair (Q,M−1) according to the order of lexicographical reading of the pairs of E0, the following operations a1 to a7 are reiterated:
a1—it is considered that the sub-band (k, j) is deleted for operations a2 to a5;
a2—the elements to be encoded A(i,n), with (i,n)εE0\(k, j) (i.e. (i,n) equal to each of the pairs of E0 with the exception of the pair (k, j)) are quantified by the quantification module 10 as a function of a defined distribution of the rate Di between said elements to be encoded A(i,n), with (i,n)εE0 \(k, j);
a3—in the same way as that indicated in step 1b, based on the elements Ā(i,n)εE0\(k, j) resulting from the quantification operations carried out in step a2, the matrix MNRk,j(1,D1)=[MNRk,j(1,D1) (i, t)]i=1 to N and t=0 to M−1 is calculated such that each element MNRk,j(1,D1) (i, t) of the matrix indicates the ratio of the mask to the quantification error (or noise) for each signal Si and for each sub-band Ft, with i=1 to N and t=0 to M−1 following the quantification carried out in step a2 (the sub-band (k, j) being considered as deleted, the quantification noise b(k, j) has been considered as zero in the calculations). The values taken by the elements of this matrix MNRk,j(1,D1) are stored;
a4—then, the matrix ΔMNRk,j(1) of variation in the ratio of the mask to the quantification error ΔMNRk,j(1)=|MNRk,j(1,D1)−
a5—a norm ∥ΔMNRk,j(1)∥ of this matrix ΔMNRk,j(1) is calculated. The value of this norm evaluates the impact on the set of signal to noise ratios of the signals Si, of the deletion of the component A(k, j) among the elements to be encoded A(i,n), with (i,n)εE0.
The norm calculated makes it possible to measure the difference between MNRk,j(1,D1) and
a6—it is considered that the sub-band (k, j) is no longer deleted;
a7—if (k, j)≠max E0=(Q,M−1), the pair (k, j) is incremented in E0 and steps a1 to a7 are reiterated until max E0 is reached.
Step 1e:
(i1, j1) is determined, corresponding to the smallest value among the values ∥ΔMNRk,j(1)∥, obtained for (k,j)εE0, i.e.:
The element to be encoded A(i1, j1) is thus identified as the least relevant element as regards the overall audio quality among the set of elements to be encoded A(i, j) with (i, j)εE0.
Step 1f:
The identifier of the pair (i1, j1) is delivered to the sequencing module 6 as result of the first iteration of the process Proc1.
Step 1g:
The band (i1, j1) is then deleted from the set of elements to be encoded in the remainder of the process Proc1. The set E1=E0\{(i1, j1)} is defined.
Iteration 2 and Following:
Steps similar to steps 1c to 1g are carried out for each iteration n, n≧2, as described hereafter.
Step 1c: an (n+1)th encoding rate Dn is now defined, with Dn=Dn−1−δn−1 such that δn−1=min(dij), for (i, j)εEn−1.
Step 1d: then, for each pair (k, j)εEn−1 and considered successively in lexicographical order, the following operations a1 to a7 are reiterated:
a1—it is considered that the sub-band (k, j) is deleted in operations a2 to a5;
a2—the elements to be encoded A(i,n), with (i,n)εEn−1\{(k,j)} are quantified by the quantification module 10 as a function of a distribution of the rate Dn between the elements to be encoded A(i,n), with (i,n)εEn−1\{(k, j)};
a3—based on the elements Ā(i,n), (i,n)εEn−1\ {(k, j)} determined as a function of the quantification in step a2, the matrix MNRk,j(n, Dn) is calculated, indicating the ratio of the mask to the quantification error (or noise) for each signal Si and for each sub-band Fj, with i=1 to N and j=0 to M−1, following the quantification carried out in step a2;
a4—then the matrix of variation in the ratio of the mask to the quantification error ΔMNRk,j(n)=|MNRk,j(n,Dn)−
a5—it is considered that the sub-band (k, j) is no longer deleted;
a6—if (k, j)≠max En−1, the pair (k, j) is incremented in En−1 and steps a1 to a6 are reiterated until max En−1 is reached.
Step 1e: (in, jn) is determined, corresponding to the smallest value among the values obtained ∥ΔMNRk,j(n)∥, for (k,j)εEn−1, i.e.
The matrix
The element to be encoded A(in, jn) is thus identified as the least relevant element as regards the overall audio quality among the set of elements to be encoded A(l, j), such that (i, j)εEn−1.
Step 1f: the identifier of the pair (in, jn) is delivered to the sequencing module 6 as a result of the nth iteration of the process Proc1.
Step 1g: then the band (in, jn), is deleted from the set of elements to be encoded in the remainder of the process Proc1. The set En=En−1\{(in, jn)} is defined.
The process Proc1 is reiterated r times and a maximum of Q*M−1 times.
Priority indices are thus then allocated by the sequencing module 6 to the different frequency bands, with a view to the insertion of the encoding data into a binary sequence.
Sequencing of the elements to be encoded and constitution of a binary sequence based on the results successively provided by the successive iterations of the process Proc1:
In an embodiment where the sequencing of the elements to be encoded is carried out by the sequencing module 6 solely based on the results successively provided by the successive iterations of the process Proc1 implemented by the module 5 for definition of the least relevant elements to be encoded with the exclusion of the results provided by the process Proc2, the latter defines an order of said elements to be encoded, reflecting the importance of the elements to be encoded with respect to the overall audio quality.
With reference to
The element to be encoded A(i2, j2), corresponding to the pair (i2, j2) determined during the second iteration of Proc1, is considered as the least relevant element to be encoded with respect to the overall audio quality, after that assigned with priority Prio1. It is therefore assigned a minimum priority index Prio2, with Prio2>Prio1. When the iteration number r of the process is strictly less than Q*M−1, the sequencing module 6 thus successively schedules r elements to be encoded each assigned to increasing priority indexes Prio1, Prio2 to Prio r. The elements to be encoded not having been assigned an order of priority during an iteration of the process Proc1 are more important with respect to the overall audio quality than the elements to be encoded to which orders of priority have been assigned.
When r is equal to Q*M−1 times, all the elements to be encoded are sequenced one by one.
In the following, it is considered that the number of iterations r of the process Proc1 carried out is equal to Q*M−1 times.
The order of priority assigned to an element to be encoded A(k, j) is also assigned to the encoded element Ā(k, j) resulting from a quantification of this element to be encoded.
The module 8 for constitution of the binary sequence constitutes a binary sequence corresponding to a frame of each of the signals Si, i=1 to N by successively integrating encoded elements Ā(k, j) into it in decreasing order of assigned priority indices, the binary sequence being to be transmitted in the bitstream φ.
Thus the binary sequence constituted is sequenced according to the sequencing carried out by the module 6.
The binary sequence is thus constituted by spectral components associated with respective spectral bands, of elements to be encoded originating from an audio scene comprising N signals with N>1, and which are sequenced as a function of their influence on mask-to-noise ratios determined on the spectral bands.
The spectral components of the binary sequence are for example sequenced according to the method of the invention.
In an embodiment, only some of the spectral components comprised within the binary sequence constituted are sequenced using a method according to the invention.
In the embodiment considered above, a deletion of a spectral component from a element to be encoded A(i, j) takes place upon each iteration of the algorithm Proc1.
In another embodiment, an imbricated quantifier is used for the quantification operations. In such a case, the spectral component of an identified element to be encoded A(i0, j0) is not deleted, but a reduced rate is assigned to the encoding of this component with respect to the encoding of the other spectral components of elements to be encoded remaining to be sequenced.
The encoder 1 is thus an encoder allowing a rate adaptability taking into account the interactions between the different monophonic signals. It allows definition of compressed data optimizing the perceived overall audio quality.
The operations of sequencing the elements of the binary sequence and constitution of the binary sequence using the process Proc1 have been described above for an embodiment of the invention in which the elements to be encoded comprise the ambisonic components of the signals.
In another embodiment, an encoder according to the invention does not encode these ambisonic components, but the spectral coefficients X(i,j), j=0 to M, of the signals Si.
In such a case, at the first iteration of the process 1 for example a minimum priority index (minimum among the elements remaining to be sequenced) is assigned to the element to be encoded X(i1, j1) such that the deletion of the spectral component X(i1, j1) gives rise to a minimum variation in the mask-to-noise ratio. Then the process Proc1 is reiterated.
Process Proc2
The Gerzon criteria are generally used to characterize the locating of the virtual sound sources synthesized by the restitution of signals from the speakers of a given sound rendering system.
These criteria are based on the study of the velocity and energy vectors of the acoustic pressures generated by a sound rendering system used.
When a sound rendering system comprises L speakers, the signals, i=1 to L, generated by these speakers, are defined by an acoustic pressure Ti and an acoustic propagation angle ξi.
The velocity vector is then defined thus:
A pair of polar coordinates exists (rV, ξV) such that:
The energy vector is defined thus:
A pair of polar coordinates exists (rE, ξE) such that:
The conditions necessary for the locating of the virtual sound sources to be optimum are defined by seeking the angles ξi, characterizing the position of the speakers of the sound rendering system considered, verifying the criteria below, said Gerzon criteria, which are:
The operations described below in an embodiment of the invention use the Gerzon vectors in an application other than that which involves seeking the best angles ξi, characterizing the position of the speakers of the sound rendering system considered.
The Gerzon criteria are based on the study of the velocity and energy vectors of the acoustic pressures generated by a sound rendering system used.
Each of the coordinates xV, yV, xE, yE indicated in equations 3 and 4 relating to the energy and velocity vectors associated with the Gerzon criteria is an element of [−1,1]. Therefore a single pair (ξV, ξE) exists verifying the following equations, corresponding to the perfect case (rV, rE)=(1,1):
The angles ξV and ξE of this single pair are therefore defined by the following equations (equations (5)):
Hereafter the term generalized Gerzon angle vector will generally be used to refer to the vector such that
The second algorithm comprises instructions suited to implementing, when they are executed on processing means of the module 5, the steps of the process Proc2 described below with reference to
The principle of the process Proc2 is as follows: a calculation is made of the influence of each spectral parameter, among a set of spectral parameters to be sequenced, on an angle vector defined as a function of energy and velocity vectors associated with Gerzon criteria and calculated as a function of an inverse ambisonic transformation on said quantified ambisonic components. Furthermore, an order of priority is allocated to at least one spectral parameter as a function of the influence calculated for said spectral parameter compared to the other influences calculated.
In an embodiment, the detailed process Proc2 is as follows:
Initialization (n=0)
Step 2a:
A rate D0=Dmax and an allocation of this rate between the elements to be encoded A(k, j), for (k,j)εE0={(k, j) such that k=1 to Q and j=0 to M−1} are defined.
The rate allocated to the element to be encoded A(k, j), (k, j)εE0, during this initial allocation is referred to as dk,j (the sum of these rates dk, j|i=1 to Q, j=0 to M−1 is equal to D0) and δ0=min dk,j, for (k, j)εE0.
Step 2b:
Then each element to be encoded A(k, j), (k, j)εE0 is quantified by the quantification module 10 as a function of the rate dk, j which has been allocated to it in step 2a.
Ā is the matrix of the elements Ā(k,j), k=1 to Q and j=0 to M−1. Each element Ā(k,j) is the result of the quantification, with the rate dk,j, of the parameter A(k, j), relative to the spectral band Fj, of the ambisonic component A(k). The element Ā(k,j) therefore defines the quantified value of the spectral representation for the frequency band Fj, of the ambisonic component Ak considered.
Step 2c:
Then, these quantified ambisonic components Ā(k, j), k=1 to Q and j=0 to M−1, are subjected to ambisonic decoding of order p such that 2p+1=Q and which corresponds to a regular system of N speakers, in order to determine the acoustic pressures T1i, i=1 to N, of the N sound signals obtained as a result of this ambisonic decoding.
In the case considered, AmbInv(p) is the inverse ambisonic transformation matrix of order p (or ambisonic decoding of order p) delivering N signals T11, . . . , T1N corresponding to N respective speakers H′1, H′N, arranged regularly around a point. As a result, the matrix AmbInv(p) is deduced from the transposition of the matrix Amb(p,N) which is the ambisonic encoding matrix resulting from the encoding of the sound scene defined by the N sources corresponding to the N speakers H′1, H′N and arranged respectively in the positions ξ1, . . . , ξN. Thus we can write that:
T1 is the matrix of the spectral components T1(i, j) of the signals T1i, i=1 to N associated with the frequency bands Fj, j=0 to M−1. These spectral components come from the inverse ambisonic transformation of order p applied to the quantified ambisonic components Ā(k, j), k=1 to Q and j=0 to M−1.
and we have
Thus the components T1(i, j), i=1 to N, depend on the quantification error associated with the considered quantification of the ambisonic components A(k, j), k=1 to Q and j=0 to M−1(in fact, each quantified element Ā(k, j) is the sum of the spectral parameter A(k, j) of the ambisonic component to be quantified and of the quantification noise associated with said parameter).
For each frequency band Fj, j=0 to M−1, using the equations (5), the generalized Gerzon angle vector (0) is then calculated at the initialization of the process Proc2 (n=0), as a function of the spectral components T1 (i, j), i=1 to N and i=0 to M−1 determined following the ambisonic decoding:
with
i=1 to N:
And {tilde over (ξ)}j(0)=(0) is defined.
It will be noted that here an ambisonic decoding matrix has been considered for a regular sound rendering device which comprises a number of speakers equal to the number of the input signals, which simplifies the calculation of the ambisonic decoding matrix. Nevertheless, this step can be implemented by considering an ambisonic decoding matrix corresponding to non-regular sound rendering devices and also for a number of speakers different from the number of the input signals.
Iteration No. 1 (n=1)
Step 2d
A rate D1=D0−δ0 and an allocation of this rate D1 between the elements to be encoded A(k, j), for (k, j)εE0 are defined.
Step 2e:
Then each element to be encoded A(k, j), (k, j)εE0 is quantified by the quantification module 10 as a function of the rate which has been allocated to it in step 2d.
Ā is now the updated matrix of the quantified elements Ā(k,j), (k, j)ε E0 each resulting from this last quantification according to the overall rate D1, of the parameters A(k, j).
Step 2f:
In a manner similar to that described previously in step 2c, after calculation of a new ambisonic decoding of order p carried out as a function of the elements quantified with the overall rate D1, a calculation is made, for the iteration No. 1 of the process Proc2, of a first generalized Gerzon angle vector (1) in each frequency band Fj, as a function of the spectral components T1(i, j), i=1 to N, j=0 to M−1 determined following the new ambisonic decoding, using equation (6).
Then a calculation is made of the vector Δ(1) equal to the difference between the Gerzon angle vector {tilde over (ξ)}j(0) calculated in step 2c of the initialization and the generalized Gerzon angle vector (1) calculated in step 2f of iteration No. 1: Δ(1)=(1)−{tilde over (ξ)}j(0), j=0 to M−1.
Step 2q:
The norm ∥Δ(1)∥ of the variation Δ(1), j=0 to M−1 is calculated in each frequency band Fj.
This norm represents the variation in the generalized Gerzon angle vector following the reduction of the rate from D0 to D1 in each frequency band Fj.
ji, the index of the frequency band Fj
Step 2h:
The spectral parameters of the ambisonic components relative to the spectral band Fj
And the following steps 2h1 to 2h5 are reiterated for any iε F0 considered in turn from 1 to Q:
2
h
1—it is considered that the sub-band (i,j1) is deleted for the operations 2h2 to 2h4: it is therefore considered that A(i,j1) is zero and that the corresponding quantified element Ā(i, ji) is also zero;
2
h
2—In a manner similar to that described previously in step 2c, after calculation of an ambisonic decoding of order p carried out as a function of the quantified elements with the overall rate D1 (Ā(i, ji) being zero), the generalized Gerzon angle vector (A(i, ji)=0, 1) is determined in the frequency band Fj
2
h
3—A calculation is then made of the vector Δ(1) representing the difference in the frequency band Fj
This norm represents the variation in the generalized Gerzon angle vector in the frequency band Fj
2
h
4—If i≠max F0, it is considered that the sub-band (i, j1) is no longer deleted and we pass to step 2h5. If i=max F0, it is considered that the sub-band (i, j1) is no longer deleted and we pass to step 2i.
2
h
5—i in the set F0 is incremented and steps 2h1 to 2h4 are reiterated for the value of i thus updated until i=max F0.
Thus, Q generalized Gerzon angle variation values ∥Δ(1)∥, for each i ε F0=[1,Q] are obtained.
Step 2i:
The values ∥Δ(1)∥, for each iε F0=[1,Q], are compared with each other, the minimum value among these values is identified and the index i1ε F0 corresponding to the minimum value is determined, i.e.
The component A(i1, j1) is thus identified as the least important element to be encoded with respect to spatial precision, compared to the other elements to be encoded A(k, j), (k, j)εE0.
Step 2j:
For each spectral band Fj, the generalized Gerzon angle vector {tilde over (ξ)}j(1) resulting from iteration 1 is redefined, calculated for a rate D1:
{tilde over (ξ)}j(1)=(1) if jε[0,M−1]\{j1};
{tilde over (ξ)}j
This redefined generalized Gerzon angle vector, established for a quantification rate equal to D1, takes into account the deletion of the element to be encoded A(i1, j1) and will be used for the following iteration of the process Proc2.
Step 2k:
The identifier of the pair (i1, j1) is delivered to the sequencing module 6 as result of the 1st iteration of the process Proc2.
Step 2m:
The element to be encoded A(i1, j1) is then deleted from the set of elements to be encoded in the remainder of the process Proc2.
The set E1=E0\(i1, j1) is defined.
δ1=min dk,j, for (k, j)εE1 is defined.
In an iteration No. 2 of the process Proc2, steps similar to steps 2d to 2n indicated above are reiterated.
The process Proc2 is reiterated as many times as desired to sequence some or all of the elements to be encoded A(k, j), (k, j)εE1 remaining to be sequenced.
Thus steps 2d to 2n described above are reiterated for an nth iteration:
Iteration n (n>1):
En−1=E0\{(i1, j1), . . . , (in−1, jn−1)}.
The elements to be encoded A(k, j), for (k, j)εE0\En−1 have been deleted during steps 2m of the previous iterations.
Step 2d:
A rate Dn=Dn−1−δn−1 and an allocation of this rate Dn between the elements to be encoded A(k, j), for (k, j)εEn−1 are defined.
During the calculation of the ambisonic decodings carried out hereafter, it is therefore considered that the quantified elements Ā(k, j), for (k, j)εE0\En−1 are zero.
Step 2e:
Then each element to be encoded A(k, j), (k, j)εEn−1, is quantified by the quantification module 10 as a function of the rate allocated in step 2d above.
The result of this quantification of the element to be encoded A(k, j) is Ā(k,j), (k, j)εEn−1.
Step 2f:
In a manner similar to that described previously for iteration 1, after calculation of an ambisonic decoding of order p carried out as a function of the elements quantified with the overall rate Dn (it was therefore considered during this ambisonic decoding that the components) Ā(i1, j1), . . . , Ā(in−1, jn−1) were zero), for iteration n of the process Proc2, a first generalized Gerzon angle vector (n) in each frequency band Fj is calculated as a function of the spectral components T1i, i=1 to N determined following said ambisonic decoding, using equation (6).
A calculation is then made of the vector Δ(n) equal to the difference between the Gerzon angle vector {tilde over (ξ)}j(n−1) calculated in step 2j of iteration n−1 and the generalized Gerzon angle vector (n) calculated in the present step: Δ(n)=(n)−{tilde over (ξ)}j(n−1) j=0 to M−1.
Step 2g:
The norm ∥Δ(n)∥ of the variation Δ(n), j=0 to M−1, is calculated in each frequency band Fj.
This norm represents the variation in the generalized Gerzon angle vector in each frequency band Fj, following the rate reduction from Dn to Dn−1 (the parameters A(i1, j1), . . . , A(in−1, jn−1) and Ā(i1, j1), . . . , Ā(in−1, jn−1) being deleted).
jn the index of the frequency band Fj
Step 2h:
The spectral parameters of the ambisonic components relative to the spectral band Fj
And the following steps 2h1 to 2h5 are reiterated for any iεFn−1 considered in turn from the smallest element in the set Fn−1 (min Fn−1) to the largest element in the set Fn−1 (max Fn−1):
2
h
1—it is considered that the sub-band (i, jn) is deleted for operations 2h2 to 2h4: it is therefore considered that A(i, jn) is zero and that the corresponding quantified element Ā(i, jn) is also zero;
2
h
2—In a manner similar to that described previously in step 2c, after calculation of an ambisonic decoding of order p carried out as a function of the elements quantified with the overall rate Dn (Ā(i, jn) being zero), the generalized Gerzon angle vector named (A(i, jn)=0,n) in the frequency band Fj
2
h
3—A calculation is then made of the vector Δ(n) equal to the difference, in the frequency band Fj
Then the norm ∥Δ(n)∥ of the vector Δ(n): ∥Δ(n)∥=∥(A(i, jn)=0, n)−(n)∥ is calculated.
This norm represents the variation, in the frequency band Fj
2
h
4—If i≠max Fn−1, it is considered that the sub-band (i, jn) is no longer deleted and we go to step 2h5. If i=max Fn−1, it is considered that the sub-band (i, jn) is no longer deleted and we go to step 2i.
2
h
5—i is incremented in the set Fn−1 and steps 2h1 to 2h4 are reiterated for the value of i thus updated until i=max Fn−1.
Thus, for each iεFn−1, a value ∥Δ(n)∥ is obtained representing the variation in the generalized Gerzon angle vector in the frequency band Fj
Step 2i:
A comparison is made between the values ∥Δ(n)∥, for each iεFn−1, the minimum value among these values is identified and the index inεFn is determined corresponding to the minimum value, i.e.
The component A(in, jn) is thus identified as the element to be encoded of least importance with respect to spatial precision, compared to the other elements to be encoded A(k, j), (k, j)εEn−1.
Step 2j:
For each spectral band Fj, a generalized Gerzon angle vector {tilde over (ξ)}j(n) is redefined resulting from iteration n:
{tilde over (ξ)}j(n)=(n) if jε[0,M−1]\{jn};
{tilde over (ξ)}j
This redefined generalized Gerzon angle, established for a quantification rate equal to Dn, takes into account the deletion of the element to be encoded A(in, jn) and will be used for the following iteration.
Step 2k:
The identifier of the pair (in, jn) is delivered to the sequencing module 6 as result of the nth iteration of the process Proc2.
Step 2m:
Then the band (in, jn) is deleted from the set of elements to be encoded in the remainder of the process Proc2, i.e. the element to be encoded A(in, jn) is deleted.
The set En=En−1\(in, jn) is defined. The elements to be encoded A(i, n, with (i, j)εEn remain to be sequenced. The elements to be encoded A(i, j), with (i, j)ε{(i1, j1) . . . , (in, jn)} have already been sequenced during the iterations 1 to n.
The process Proc2 is reiterated r times and a maximum of Q*M-1 times.
Priority indices are thus then allocated by the sequencing module 6 to the different elements to be encoded, with a view to the insertion of the encoding data into a binary sequence.
Sequencing of the elements to be encoded and constitution of a binary sequence, based on the results successively provided by the successive iterations of the process Proc2:
In an embodiment where the sequencing of the elements to be encoded is carried out by the sequencing module 6 based on the results successively provided by the successive iterations of the process Proc2 implemented by the module 5 for definition of the least relevant elements to be encoded (excluding the results provided by the process Proc1), the sequencing module 6 defines an order of said elements to be encoded, reflecting the importance of the elements to be encoded with respect to spatial precision.
With reference to
The element to be encoded A(i2, j2) corresponding to the pair (i2, j2) determined during the second iteration of the process Proc2, is considered as the least relevant element to be encoded with respect to spatial precision, after that assigned the priority Prio1. It is therefore assigned a minimum priority index Prio2, with Prio2>Prio1. The sequencing module 6 thus successively schedules r elements to be encoded each assigned increasing priority indices Prio1, Prio2 to Prio r.
The elements to be encoded which have not been assigned an order of priority during an iteration of the process Proc2 are more important with respect to spatial precision than the elements to be encoded to which an order of priority has been assigned.
When r is equal to Q*M−1 times, the set of elements to be encoded are sequenced one by one.
In the following, it is considered that the number of iterations r of the process Proc2 carried out is equal to Q*M−1 times.
The order of priority assigned to an element to be encoded A(k, j) is also assigned to the element encoded as a function of the result Ā(k, j) of the quantification of this element to be encoded. The encoded element corresponding to the element to be encoded A(k, j) is also denoted Ā(k, j).
The module 8 for constitution of the binary sequence constitutes a binary sequence Seq corresponding to a frame of each of the signals Si, i=1 to N successively integrating into it encoded elements Ā(k, j) in decreasing order of assigned priority indices, the binary sequence Seq being to be transmitted in the bitstream φ.
Thus the binary sequence constituted Seq is sequenced according to the sequencing carried out by the module 6.
In the embodiment considered above, a deletion of a spectral component from an element to be encoded A(i, j) takes place at each iteration of the process Proc2.
In another embodiment, an imbricated quantifier is used for the quantification operations. In such a case, the spectral component of an element to be encoded A(i, j) identified as the least important with respect to spatial precision during an iteration of the process Proc2 is not deleted, but a reduced rate is assigned to the encoding of this component with respect to the encoding of the other spectral components of elements to be encoded remaining to be sequenced.
The encoder 1 is thus an encoder allowing a rate adaptability taking into account the interactions between the different monophonic signals. It makes it possible to define compressed data optimizing the perceived spatial precision.
Combination of the Methods Proc1 and Proc2
In an embodiment, the least important elements to be encoded are defined using a method Proc combining the methods Proc1 and Proc2 described above, as a function of criteria taking into account the overall audio quality and spatial relevance.
The initialization of the method Proc comprises the initializations of the methods Proc1 and Proc2 as described above.
An iteration n (n>1) of such a method Proc will now be described with reference to
This rate and this set of elements to be encoded are determined during previous iterations of the method Proc based on previous iterations of the method Proc using the methods Proc1 and Proc2. The previous iterations have allowed the determination of elements to be encoded determined as the least important as a function of defined criteria.
These defined criteria have been established as a function of the desired overall audio quality and spatial precision.
An iteration of steps 1d and 1e of the process Proc1 is implemented on this set of elements to be sequenced in parallel, identifying the least relevant element to be encoded A(in1, jn1) with respect to the overall audio quality and an iteration of the steps 2e to 2i of the process Proc2, identifying the least relevant element to be encoded A(in2, jn2) with respect to spatial precision.
As a function of the defined criteria, in step 300, a single one of the two identified elements to be encoded or also both identified elements to be encoded are selected. This or each selected element to be encoded is denoted A(in, jn).
Then, on the one hand, the identifier or identifiers of the pair (in, jn) is/are supplied to the sequencing module 6 as a result of the nth iteration of the process Proc2, which assigns to it a priority Prion in view of the criteria defined. The assigned priority Prion is greater than the priority of the elements to be encoded selected during the previous iterations of the method Proc as a function of the criteria defined. This step replaces steps 1f of the process Proc1 and 2k of the process Proc2 as described previously.
The selected element or elements to be encoded A(in, jn) are then inserted into the binary sequence to be transmitted before the elements to be encoded selected during the previous iterations of the method Proc (as the element to be encoded A(in, jn) is more important with respect to the defined criteria than the elements to be encoded previously selected by the method Proc). The selected element or elements to be encoded A(in, jn) are inserted into the binary sequence to be transmitted after the other elements to be encoded of the set En−1 (as the element to be encoded A(in, jn) is less important with respect to the criteria defined than these other elements to be encoded).
On the other hand, in a step 301, the element or elements to be encoded A(in, jn) selected for the following iteration (iteration n+1) of the method Proc (comprising an iteration n+1 for the Proc1 and Proc2 methods) is/are deleted, which will then be applied to the set of elements to be encoded En=En−1\A(in, jn), based on a reduced rate as defined in step 1c of the process Proc1 and step 2n of the process Proc2.
This step 301 replaces the steps 1g of the methods Proc1 and 2m of the process Proc2 as described previously.
The criteria defined make it possible to select that or those of the least relevant elements identified respectively during step 300 of the method Proc.
For example, in an embodiment, the element identified by the process Proc1 at each iteration n is deleted, with n even and the element identified by the process Proc2 at each iteration n is deleted with n odd, which makes it possible best to retain the overall audio quality and spatial precision.
Other criteria can be used. An encoding implementing such a method Proc thus makes it possible to obtain a bitstream which is adaptable in rate with respect to the audio quality and with respect to spatial precision.
Operations Carried Out at the Level of the Decoder
The decoder 100 comprises a binary sequence reading module 104, an inverse quantification module 105, an inverse ambisonic transformation module 101 and a frequency/time transformation module 102.
The decoder 100 is suited to receiving at the input the bitstream φ transmitted by the encoder 1 and delivering at the output Q′ signals S′1, S′2, . . . , S′Q′ intended to supply the Q′ respective speakers H1, . . . , HQ′ of a sound rendering system 103. The number of speakers Q′ can in an embodiment be different from the number Q of ambisonic components transmitted.
By way of example, the configuration of a sound rendering system comprising 8 speakers h1, h2 . . . , h8 is shown in
The binary sequence reading module 104 extracts from the received binary sequence φ data indicating the quantification indices determined for elements Ā(k, j), k=1 to Q and j=0 to M−1 and supplies them to the input of the inverse quantification module 105.
The inverse quantification module 105 carries out an inverse quantification operation.
The elements of the matrix Ā′ of the elements Ā′(k, j), k=1 to Q and j=0 to M−1, are determined, such that Ā′(k, j)=Ā(k, j) when the received sequence comprised data indicating the quantification index of the element Ā(k, j) resulting from the encoding of the parameters A(k, j) of the ambisonic components by the decoder 100 and Ā′(k, j)=0 when the received sequence did not comprise data indicating the quantification index of the element Ā(k, j) (for example these data have been cut out during the transmission of the sequence at the level of a streaming server in order to adapt to the available rate in the network and/or to the characteristics of the terminal).
The inverse spatial transformation module 101 is suited to determining the elements X′(i, j), i=1 to Q′, j=0 to M−1, of the matrix X′ defining the M spectral coefficients X′(i, j), i=1 to Q′, j=0 to M−1, of each of the Q′i signals S′i, based on the ambisonic components A′ (k, j), k=1 to Q and j=0 to M−1, determined by the inverse quantification module 105.
AmbInv(p′,Q′) is the inverse ambisonic transformation matrix of order p′ for the 3D scene suited to determining the Q′ signals S′i, i=1 to Q′, intended for the Q′ speakers of the sound rendering system associated with the decoder 100, based on the Q ambisonic components received. The angles βi, for i=1 to Q′, indicate the angle of acoustic propagation from the speaker Hi. In the example represented in
X′ is the matrix of the spectral components X′(i, j) of the signals Si′, i=1 to Q′ relative to the frequency bands Fj, j=0 to M−1. Thus:
The inverse spatial transformation module 101 is suited to determining the spectral coefficients X′(i, j), i=1 to Q′, j=0 to M−1, elements of the matrix X′, using equation (7).
These elements X′(i, j), i=1 to Q′, j=0 to M−1, once determined, are delivered to the input of the frequency/time transformation module 102.
The frequency/time transformation module 102 of the decoder 100 transforms the space of frequency representation to the space of time representation based on the spectral coefficients received X′(i, j), i=1 to Q′, j=0 to M−1 (this transformation is, in the present case, an inverse MDCT), and it thus determines a time frame of each of the Q′ signals S′1 . . . , S′Q′.
Each signal S′i, i=1 to Q′, is intended for the speaker Hi of the sound rendering system 103.
At least some of the operations carried out by the decoder are, in an embodiment, implemented following the execution of computer program instructions on processing means of the decoder.
An advantage of the encoding of the components resulting from the ambisonic transformation of the signals S1, . . . , SN as described is that in the case where the number of signals N of the sound scene is large, they can be represented by a number Q of ambisonic components much less than N, while degrading the spatial quality of the signals very little. The volume of data to be transmitted is therefore reduced without significant degradation of the audio quality of the sound scene.
Another advantage of an encoding according to the invention is that such encoding allows adaptability to the different types of sound rendering systems, whatever the number, arrangement and type of speakers with which the sound rendering system is provided.
In fact, a decoder receiving a binary sequence comprising Q ambisonic components operates on the latter an inverse ambisonic transformation of the order of any p′ and corresponding to the number Q′ of speakers of the sound rendering system for which the signals once decoded are intended.
An encoding as carried out by the encoder 1 makes it possible to sequence the elements to be encoded as a function of their respective contribution to the audio quality using the first process Proc1 and/or as a function of their respective contribution to the spatial precision and the accurate reproduction of the directions contained in the sound scene, using the second process Proc2.
In order to adapt to the imposed rate constraints, it is sufficient to truncate the sequence of the elements of lower priority arranged in the sequence. It is then guaranteed that the best overall audio quality (when the process Proc1 is implemented) and/or the best spatial precision (when the process Proc2 is implemented) is provided. In fact, the sequencing of the elements has been carried out in such a way that the elements which contribute least to the overall audio quality and/or spatial precision are placed at the end of the sequence.
The methods Proc1 and Proc 2 can be implemented, according to the embodiments, in combination or even alone, independently of one another in order to define a binary sequence.
Number | Date | Country | Kind |
---|---|---|---|
0703349 | May 2007 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR08/50671 | 4/16/2008 | WO | 00 | 4/19/2010 |