This application is a Section 371 National Stage Application of International Application No. PCT/EP2020/053264, filed Feb. 10, 2020, the content of which is incorporated herein by reference in its entirety, and published as WO 2020/177981 on Sep. 10, 2020, not in English.
This invention relates to the encoding/decoding of spatialized audio data, particularly in an ambiophonic context (hereinafter also referred to as “ambisonic”).
The encoders/decoders (hereinafter called “codecs”) currently used in mobile telephony are mono (a single signal channel for reproduction on a single loudspeaker). The 3GPP EVS codec (for “Enhanced Voice Services”) makes it possible to offer “Super-HD” quality (also called “High Definition+” voice or HD+) with a super-wideband (SWB) audio band for signals sampled at 32 or 48 kHz or full-band (FB) for signals sampled at 48 kHz; the audio bandwidth is from 14.4 to 16 kHz in SWB mode (9.6 to 128 kbps) and 20 kHz in FB mode (16.4 to 128 kbps).
The next evolution in quality in conversational services offered by operators should consist of immersive services, using terminals such as smartphones for example equipped with several microphones or devices for spatialized audio conferencing or telepresence type videoconferencing, or even tools for sharing “live” content, with spatialized 3D audio rendering, much more immersive than a simple 2D stereo reproduction. With the increasingly widespread practice of listening to content on mobile phones with an audio headset and the appearance of advanced audio equipment (accessories such as a 3D microphone, voice assistants with acoustic antennas, virtual reality headsets, etc.) and specific tools (for example for the production of 360° video content), the capturing and rendering of spatialized sound scenes are now common enough to offer an immersive communication experience.
To this end, the future 3GPP standard “IVAS” (for “Immersive Voice And Audio Services”) proposes extending the EVS codec to include immersion, by accepting, as input formats to the codec, at least the spatialized audio formats listed below (and their combinations):
Hereinafter, we are typically interested in the coding of a sound in ambisonic format, as an exemplary embodiment (at least some aspects presented in connection with the invention below can also be applied to formats other than ambisonic).
Ambisonics is a method of recording (“encoding” in the acoustic sense) spatialized sound, and a reproduction system (“decoding” in the acoustic sense). An ambisonic microphone (first-order) comprises at least four capsules (typically of the cardioid or sub-cardioid type) arranged on a spherical grid, for example the vertices of a regular tetrahedron. The audio channels associated with these capsules are called “A-format”. This format is converted into a “B-format”, in which the sound field is divided into four components (spherical harmonics) denoted W, X, Y, Z, which correspond to four coincident virtual microphones. The W component corresponds to an omnidirectional capture of the sound field, while the X, Y, and Z components, more directional, are comparable to pressure gradients oriented in the three spatial dimensions. An ambisonic system is a flexible system in the sense that the recording and reproduction are separate and decoupled. It allows decoding (in the acoustic sense) in any speaker configuration (for example, binaural, type 5.1 surround-sound, or type 7.1.4 periphonic (with height). Of course, the ambisonic approach can be generalized to more than four channels in B-format and this generalized representation is called “HOA” (for “Higher-Order Ambisonics”). The fact that the sound is broken down into more spherical harmonics improves the spatial accuracy of the reproduction when rendering on loudspeakers.
An N-order ambisonic signal comprises (N+1)2 components, and at first-order (if N=1), we find the four components of the original ambisonics which is commonly called FOA (for First-Order Ambisonics). There is also what is called a “planar” variant of ambisonics which breaks down the defined sound into a plane which is generally the horizontal plane. In this case, the number of channels is 2N+1 channels. The first-order ambisonics (4 channels: W, X, Y, Z) and the first-order planar ambisonics (3 channels: W, X, Y) are hereinafter indiscriminately referred to as “ambisonics” to facilitate reading, the processing presented being applicable independently of whether or not the type is planar. However, if in certain text it is necessary to make a distinction, the terms “first-order ambisonics” and “first-order planar ambisonics” are used. Note that it is possible to derive from the first-order B-format a stereo signal (2 channels) corresponding to coincident stereo captures of the types Blumlein Crossed Pair (X+Y and X−Y) or Mid-Side (combining W and X for the Mid and taking Y as the Side).
Hereinafter, a signal in B-format of predetermined order is called “ambisonic sound”. In some variants, the ambisonic sound can be defined in another format such as A-format or channels pre-combined by fixed matrixing (keeping the number of channels or reducing it to a case of 3 or 2 channels), as will be seen below.
The signals to be processed by the encoder/decoder are presented as successions of blocks of sound samples called “frames” or “subframes” below.
In addition, hereinafter, the mathematical notations follow this convention:
The simplest approach to encoding a stereo or ambisonic signal is to use a mono encoder and apply it in parallel to all the channels, possibly with a different bit allocation depending on the channels. This approach here is called “multi-mono” (although in practice we can generalize the approach to multi-stereo or the use of several parallel instances of a same core codec).
Such an embodiment is shown in
The associated quality varies according to the mono coding used, and it is generally satisfactory only at very high bitrate, for example with a bitrate of at least 48 kbps per mono channel for EVS coding. Thus for first-order we obtain a minimum bitrate of 4×48=192 kbps.
The solutions currently proposed for more sophisticated codecs, for ambisonic spatialization in particular, are unsatisfactory, particular in terms of complexity, delay, and efficient use of the bitrate, to ensure effective decorrelation between ambisonic channels.
For example, the MPEG-H codec for ambisonic sounds uses an overlap-add operation which adds delay and complexity, as well as linear interpolation on direction vectors which is suboptimal and introduces defects. A basic problem with this codec is that it implements a decomposition into predominant components and ambiance because the predominant components are meant to be perceptually distinct from the ambience, but this decomposition is not fully defined. The MPEG-H encoder suffers from the problem of non-correspondence between the directions of the main components from one frame to another: the order of the components (signals) can be swapped as can the associated directions. This is why the MPEG-H codec uses a technique of matching and overlap-add to solve this problem.
Furthermore, it would be possible to use frequency coding approaches (in the FFT or MDCT domain) rather than temporal coding as in the MPEG-H codec, but signal processing in the frequency domain (sub-bands) requires transmitting data to a decoder by sub-band, thus increasing the bitrate necessary for this transmission.
The invention improves this situation.
To this end, it proposes a method of encoding for the compression of audio signals forming, over time, a succession of sample frames, in each of N channels in an ambisonic representation of order higher than 0, the method comprising:
The invention thus makes it possible to improve a decorrelation between the N channels that are subsequently to be encoded separately. This separate encoding is also referred to hereinafter as “multi-mono encoding”.
In one embodiment, the method may further comprise:
These parameters can typically be quaternion and/or rotation angle and/or Euler angle values as will be seen below, or else simply elements of this matrix for example.
In one embodiment, the method may further comprise:
Such an embodiment makes it possible to maintain overall homogeneity and in particular to avoid audible clicks from one frame to another, during audio reproduction.
However, certain transformations implemented to obtain the eigenvectors from the covariance matrix (such as “PCA/KLT” seen below) are likely to reverse the direction of certain eigenvectors, and it is then advisable at the same time to verify axis consistency, then directional consistency on this axis, for each eigenvector of the matrix of the current frame. To this end, in one embodiment, as the aforementioned permutation of columns already makes it possible to ensure consistency of the axes of the vectors, the method further comprises:
Typically, with a permutation between columns of the matrix of eigenvectors to invert the sign of a determinant of the matrix of eigenvectors and the determinant of a rotation matrix being equal to 1,
we can estimate the determinant of the matrix of eigenvectors, and if it is equal to −1, we can then invert the signs of the elements of a chosen column of the matrix of eigenvectors so that the determinant is equal to 1, and thus form a rotation matrix.
In one embodiment, the method may further comprise:
Such an interpolation then makes it possible to smooth (“progressively average”) the rotation matrices respectively applied to the previous frame and current frame and thus attenuate an audible click effect from one frame to another during playback.
In such an implementation:
In one embodiment, the ambisonic representation is first-order and the number N of channels is four, and the rotation matrix of the current frame is represented by two quaternions.
In this embodiment and in the case of an interpolation, each interpolation for a current subframe is a spherical linear interpolation (or “SLERP”), conducted as a function of the interpolation of the subframe preceding the current subframe and based on the quaternions of the preceding subframe.
For example, the spherical linear interpolation of the current subframe can be carried out to obtain the quaternions of the current subframe, as follows:
where:
QL,t−1 is one of the quaternions of the previous subframe t−1,
QR,t−1 is the other quaternion of the previous subframe t−1,
QL,t{circumflex over ( )} is one of the quaternions of the current subframe t,
QR,t{circumflex over ( )} is the other quaternion of the current subframe t,
ΩL=Arccos (QL,t−1·QL,t); ΩR=Arccos (QR,t−1·QR,t)
and a corresponds to an interpolation factor.
In one embodiment, the search for eigenvectors is carried out by principal component analysis (or “PCA”) or by Karhunen-Loève transform (or “KLT”), in the time domain.
Of course, other embodiments can be considered (singular value decomposition or others).
In one embodiment, the method comprises a prior step of predicting the bit allocation budget per ambisonic channel, comprising:
This embodiment then makes it possible to manage an optimal allocation of bits to be assigned for each channel to be coded. It is advantageous in and of itself and could possibly be the object of separate protection.
The invention also relates to a method for decoding audio signals forming, over time, a succession of sample frames, in each of N channels in an ambisonic representation of order higher than 0, the method comprising:
Such an embodiment also makes it possible to improve, in decoding, a decorrelation between the N channels.
The invention also relates to an encoding device comprising a processing circuit for implementing the encoding method presented above.
It also relates to a decoding device comprising a processing circuit for implementing the above decoding method.
It also relates to a computer program comprising instructions for implementing the above method, when these instructions are executed by a processor of a processing circuit.
It also relates to a non-transitory memory medium storing the instructions of such a computer program.
Other features and advantages of the invention will be apparent from reading the exemplary embodiments presented in the detailed description below, and from examining the accompanying drawings in which:
The invention aims to enable optimized encoding by:
Adaptive matrixing allows more efficient decomposition into channels than fixed matrixing. The matrixing according to the invention advantageously makes it possible to decorrelate the channels before multi-mono encoding, so that the coding noise introduced by encoding each of the channels distorts the spatial image as little as possible overall when the channels are recombined in order to reconstruct an ambisonic signal in decoding.
In addition, the invention makes it possible to ensure a gentle adaptation of the matrixing parameters in order to avoid “click” type artifacts at the edge of the frame or too rapid fluctuations in the spatial image, or even coding artifacts due to overly-strong variations (for example linked to untimely permutation of audio sources between channels) in the various individual channels resulting from the matrixing which are then encoded by different instances of a mono codec. A multi-mono encoding is presented below preferably with variable bit allocation between channels (after adaptive matrixing), but in some variants multiple instances of a stereo core codec or other can be used.
In order to facilitate understanding of the invention, certain explanatory concepts concerning n-dimensional rotations and PCA/KLT or SVD type decompositions (“SVD” denoting a singular value decomposition) are re-summarized below.
Rotations and “Quaternions”
The signals are represented by successive blocks of audio samples, these blocks being called “subframes” below.
The invention uses a representation of n-dimensional rotations with parameters suitable for quantization per frame and especially an efficient interpolation by subframe. The representations of rotations used in 2, 3, and 4 dimensions are defined below.
A rotation (around the origin) is a transformation of n-dimensional space that changes one vector to another vector, such that:
A matrix M of size n×n is a rotation matrix if and only if MT.M=In where In designates the identity matrix of size n×n (i.e. M is a unitary matrix, MT designating the transpose of M) and its determinant is +1.
Several representations are used in the invention which are equivalent to the representation by rotation matrix:
In two dimensions (in a 2D plane) (n=2): We use the angle of rotation as the representation, as follows.
Given the angle of rotation θ we deduce the rotation matrix:
Given a rotation matrix, we can calculate the angle θ by observing that the trace of the matrix is 2 cos θ. Note that it is also possible to estimate θ directly from a covariance matrix before applying a principal component decomposition (PCA) and eigenvalue decomposition (EVD) which are presented below.
The interpolation between two rotations of respective angles θ1 and θ2 can be done by linear interpolation between θ1 and θ2, taking into account the shortest-path constraint on the unit circle between these two angles.
In three-dimensional (3D) space (n=3): Euler angles and quaternions are used as the representation. In some variants, an axis-angle representation can also be used, which is not mentioned here.
A rotation matrix of size 3×3 can be broken down into a product of 3 elementary rotations of angle θ along the x, y, or z axes.
Depending on the axis combinations, the angles are said to be Euler or Cardan angles.
However, another representation of 3D rotations is given by quaternions. Quaternions are a generalization of representations by complex numbers with four components in the form of a number q=a+bi+cj+dk where i2=j2=k2=ijk=−1.
The real part a is called a scalar and the three imaginary parts (b, c, d) form a 3D vector. The norm of a quaternion is |q|=√{square root over (a2+b2+c2+d2)}. Unit quaternions (of norm 1) represent rotations—however, this representation is not unique; thus, if q represents a rotation, −q represents the same rotation.
Given a unit quaternion q=a+bi+cj+dk (with a2+b2+c2+d2=1), the associated rotation matrix is:
Euler angles do not allow correctly interpolating 3D rotations; to do so, we instead use quaternions or the axis-angle representation. The SLERP (“spherical linear interpolation”) interpolation method consists of interpolating according to the formula:
where 0≤α≤1 is the interpolation factor for going from q1 to q2 and Ω is the angle between the two quaternions:
Ω=arccos(q1.q2)
where q1.q2 denotes the dot product between two quaternions (identical to the dot product between two 4-dimensional vectors).
This amounts to interpolating by following a large circle on a 4D sphere with a constant angular speed as a function of a. One must ensure that the shortest path is used for interpolating by changing the sign of one of the quaternions when q1.q2<0. Note that other methods for quaternion interpolation can be used (normalized linear interpolation or nlerp, splines, etc.).
Note that it is also possible to interpolate 3D rotations by means of the axis-angle representation; in this case, the angle is interpolated as in the 2D case, and the axis can be interpolated for example by the SLERP method (in 3D) while ensuring that the shortest path is taken on a 3D unit sphere and taking into account the fact that the representation given by the axis r and the angle θ is equivalent to that given by the axis of opposite direction −r and the angle 2π−θ.
In the 4th dimension (n=4), a rotation can be parameterized by 6 angles (n(n−1)/2)) and we show that the multiplication of two matrices of size 4×4 called quaternion (Q1) and antiquaternion (Q*2) associated with quaternions q1=a+bi+cj+dk and q2=w+xi+yj+zk gives a rotation matrix of size 4×4.
It is possible to find the associated quaternion pair (q1, q2) and associated quaternion and antiquaternion matrices such that:
Their product gives a 4×4 size matrix:
M4,quat(q1,q2)=Q1Q&2
and it is possible to verify that this matrix satisfies the properties of a rotation matrix (unitary matrix and determinant equal to 1).
Conversely, given a 4×4 rotation matrix, this matrix can be factored into a product of matrices in the form Q1Q*2, for example with the method known as “Cayley's factorization”. This involves calculating an intermediate matrix called a “tetragonal transform” (or associated matrix) and deducing the quaternions from this with some indeterminacy on the sign of the two quaternions (which can be removed by an additional “shortest path” constraint mentioned further below).
Singular Value Decomposition (or “SVD”)
Singular value decomposition (SVD) consists of factoring a real matrix A of size m×n in the form:
A=UΣVT
where U is a unitary matrix (UTU=Im) of size m×m, Σ is a rectangular diagonal matrix of size m×n with real and positive coefficients σi≥0 (i=1 . . . p where p=min (m, n)), V is a unitary matrix (VTV=In) of size n×n, and VT is the transpose of V. The σi coefficients in the diagonal of Σ are the singular values of matrix A. By convention, they are generally listed in decreasing order, and in this case the diagonal matrix Σ associated with A is unique.
The rank r of A is given by the number of non-zero coefficients σi. We can therefore rewrite the singular value decomposition as:
where Ur=[u1, u2, . . . , ur] are the singular vectors on the left (or output vectors) of A, Σr=diag(σ1, . . . , σr), and Vr=[v1, v2, . . . , vr] are the singular vectors on the right (or input vectors) of A. This matrix formulation can also be rewritten as:
If the sum is limited to an index i<r we obtain a “filtered” matrix which represents only the “predominant” information.
We can also write:
Avi=σiui
which shows that matrix A transforms vi into σi ui.
The SVD of A has a relation with the eigenvalue decomposition of AT A and A AT because:
ATA=V(ΣTΣ)VT
AAT=U(ΣΣT)UT
The eigenvalues of ΣTΣ and ΣΣT are σ12, . . . , σr2. The columns of U are the eigenvectors of A AT, while the columns of V are the eigenvectors of AT A.
The SVD can be interpreted geometrically: the image of a sphere in dimension n by matrix A is, in dimension m, a hyper-ellipse having main axes in directions u1, u2, . . . , um and of length σ1, . . . , σm.
Karhunen-Loève Transform (or “KLT”)
The Karhunen-Loève transform (KLT) of a random vector x centered at 0 and of covariance matrix Rxx=E[xxT] is defined by:
y=VTx
where V is the matrix of eigenvectors (with the convention that the eigenvectors are column vectors) obtained by decomposition of Rxx into eigenvalues
Rxx=VΛVT
where Λ=diag(λ1, . . . , λn) is a diagonal matrix whose coefficients are the eigenvalues. The matrix V=[v1, v2, . . . , vn] contains the eigenvectors (columns) of Rxx, such that
Rxxvi=λnvi
We can see the KLT as a change of basis, because the product VT x expresses the vector x in the basis given by the eigenvectors.
The reverse transformation is given by:
x=Vy
KLT makes it possible to decorrelate the components of x; the variances of the transformed vector y are the eigenvalues of Rxx.
Principal Component Analysis (or “PCA”)
Principal Component Analysis (PCA) is a dimensionality-reduction technique that produces orthogonal variables and maximizes the variance of the variables after projection (or equivalently minimizes the reconstruction error).
The PCA presented below, although also based on a decomposition into eigenvalues such as KLT, is such that the estimated covariance matrix {circumflex over (R)}xx is calculated from N observed vectors xi, i=1 . . . N of dimension n:
assuming that these vectors are centered:
The decomposition into eigenvalues of {circumflex over (R)}xx in the form {circumflex over (R)}xx=VΛVT allows calculating the principal components: yn=VTxn.
PCA is a transformation by the matrix VT which projects the data into a new basis in order to maximize the variance of the variables after projection.
Note that the PCA can also be obtained from an SVD of the signal xi put in the form of a matrix X of size n×N. In this case, we can write:
X=UDVT
We verify that XXT=UDDTUT, which corresponds to a diagonalization of XXT. Thus the projection vectors of the PCA correspond to the column vectors of U and the projection gives UTX=DVT as the result.
One will also note that PCA is viewed in general as a dimensionality reduction technique, for “compressing” a set of data of high dimensionality into a set comprising few principal components.
In the invention, PCA advantageously makes it possible to decorrelate the multidimensional input signal, but the elimination of channels (thus reducing the number of channels) is avoided in order to avoid introducing artifacts. This forces a minimum encoding bitrate, to avoid “truncating” the spatial image, except in specific variants where eigenvalues are so low that a zero rate can be allowed (for example to better encode ambisonic sounds created artificially with a single source spatialized synthetically).
We now refer to
Step S1 consists of obtaining the respective signals of the ambisonic channels (here four channels W, Y, Z, X in the example described, using the ACN (Ambisonics Channel Number) channel ordering convention for each frame t. These signals can be put in the form of an n×L matrix (for n ambisonic channels (here 4) and L samples per frame).
In the next step S2, the signals of these channels can optionally be pre-processed, for example by a high-pass filter as described below with reference to
In the next step S3, a principal component analysis PCA or in an equivalent manner a Karhunen-Loève transform KLT is applied to these signals, to obtain eigenvalues and a matrix of eigenvectors from a covariance matrix of the n channels. In variants of the invention, an SVD could be used.
In step S4, this matrix of eigenvectors, obtained for the current frame t, undergoes signed permutations so that it is as aligned as possible with the matrix of the same nature of the previous frame t−1. In principle, we ensure that the axis of the column vectors in the matrix of eigenvectors corresponds as much as possible to the axis of the column vectors at the same place in the matrix of the previous frame, and if not, the positions of the eigenvectors of the matrix of the current frame t which do not correspond are permuted. Then, we also ensure that the directions of the eigenvectors from one matrix to another are also coincident. In other words, initially we are only interested in the straight lines which bear the eigenvectors (just the orientation, without the direction) and for each line we seek the closest line in the matrix of the previous frame t−1. To do this, vectors are permuted in the matrix of the current frame. Then, in a second step, we try to match the orientation of the vectors (directional). To do this, we reverse the sign of the eigenvectors which would not have the right orientation.
Such an embodiment makes it possible to ensure maximum consistency between the two matrices and thus avoid audible clicks between two frames during sound playback.
In step S5, we also ensure that the matrix of eigenvectors of the current frame t, thus corrected by signed permutations, indeed represents the application of a rotation (of an angle for n=2 channels, of three Euler angles, of an axis and an angle, or of a quaternion for n=3 corresponding to the first-order planar ambisonic representation W, Y, Z, and of two quaternions for n=4 in first-order ambisonic representation of type W, Y, Z, X).
To ensure that it is indeed a rotation, the determinant of the matrix of eigenvectors of the current frame t, corrected by permutations, must be positive and equal to (or, in practice, close to)+1 in step S6. If it is equal to (or close to) −1, then one should:
We then obtain a matrix of eigenvectors for the current frame t effectively corresponding to a rotation in step S7.
Parameters of this matrix (for example such as the angle value, value of an axis and of an angle, or of quaternion(s) of this matrix) can then be encoded in a number of bits allocated for this purpose in step S8. In another optional but advantageous embodiment, in the case where a significant difference is observed in step S9 (greater than a threshold for example) between the rotation matrix estimated for the current frame t and the rotation matrix of the previous frame t−1, a variable number of interpolation subframes can be determined: otherwise this number of subframes is fixed at a predetermined value. Step S10 consists of:
In step S11, the interpolated rotation matrices are applied to a matrix n X (L/K) representing each of the K subframes of the signals of the ambisonic channels of step S1 (or optionally S2) in order to decorrelate these signals as much as possible before the multi-mono encoding of step S14. One will recall in fact that we desire to decorrelate these signals as much as possible before this multi-mono transformation, according to a general approach. A bit allocation to the separate channels is done in step S12 and encoded in step S13.
In step S14, before carrying out the multiplexing of step S15 and thus ending the method for compression encoding, it is possible to decide on a number of bits to be allocated per channel as a function of the representativeness of this channel and of the available bitrate on the network RES (
Illustrated in
The encoding device DCOD comprises a processing circuit typically including:
The decoding device DDEC comprises its own processing circuit, typically including:
Of course, this
Reference is now made to
The strategy of the encoder is to decorrelate the channels of the ambisonic signal as much as possible and to encode them with a core codec. This strategy makes it possible to limit artifacts in the decoded ambisonic signal. More particularly, here we seek to apply an optimized decorrelation of the input channels before multi-mono encoding. In addition, an interpolation which is of limited computation cost for the encoder and decoder because it is carried out in a specific domain (angle in 2D, quaternion in 3D, quaternion pair in 4D) makes it possible to interpolate the covariance matrices calculated for the PCA/KLT analysis rather than repeating a decomposition into eigenvalues and eigenvectors, several times per frame.
However, before discussing the core encoding performed within the meaning of the invention, some features of the encoder which are advantageous are presented here, in particular such as the optimization of the allocated bit budget for encoding as a function of perceptual criteria, seen below.
In the embodiment of the encoder described here, the latter can typically be an extension of the standardized 3GPP EVS (for “Enhanced Voiced Services”) encoder. Advantageously, the EVS encoding bitrates can be used without then modifying the structure of the EVS bit stream. Thus, the multi-mono encoding (block 340 of
Of course, it is possible to add additional bitrates (to have a more detailed granularity in the allocation) by modifying the EVS codec. It is also possible to use a codec other than EVS, for example the OPUS® codec.
In general, keep in mind that the finer the granularity of the encoding, the more bits must be reserved to represent the possible combinations of bitrates. A compromise between fineness in allocation and additional information describing the bit allocation must be made. This allocation is optimized here by block 320 of
Referring to
It is assumed that the signal (in each channel) is sampled at 48 kHz, without loss of generality. The frame length is fixed at 20 ms, i.e. L=960 successive samples, without loss of generality.
Alternatively, it is possible for example to use a frame length of L=640 samples for sampling at 32 kHz.
The PCA/KLT analysis and the PCA/KLT transformation which are described below are performed in the time domain. It is thus understood that we remain here in the time domain without necessarily having to perform a sub-band transform or more generally a frequency transform.
At each frame, block 300 of the encoder applies a preprocessing (optional) to obtain the preprocessed input signal denoted Y. This may be a high-pass filtering (with a cutoff frequency typically at 20 Hz) of each new 20 ms frame of the input signal channels. This operation allows removing the continuous component likely to bias the estimate of the covariance matrix so that the signal output from block 300 can be considered to have a zero mean. The transfer function is denoted Hpre(z), so we have for each channel: Xi(z)=Hpre(z)Yi(z). If block 300 is not applied we have X=Y. A low-pass filter in block 340 may also be applied for performing the multi-mono encoding but when block 300 is applied, the high-pass filtering during preprocessing of the mono encoding which can be used in block 340 is preferably disabled, to avoid repeating the same preprocessing and thus reduce the overall complexity.
The transfer function denoted Hpre(z) above can be of the type:
by applying this filter to each of the n channels of the input signal, for which the coefficients may be as shown in the table below:
Alternatively, another type of filter can be used, for example a sixth-order Butterworth filter with a frequency of 50 Hz.
In some variants, the preprocessing could include a fixed matrixing step which could maintain the same number of channels or reduce the number of channels. An example of matrixing applied to the four channels of an ambisonic signal in B-format is given below:
Note that in this case this preprocessing will have to be reversed at decoding by applying a matrixing of the decoded signal via MA→B=MB→A−1, to find the channels in the original format.
The next block 310 estimates, at each frame t, a transformation matrix obtained by determining the eigenvectors by PCA/KLT and verifying that the transformation matrix formed by these eigenvectors indeed characterizes a rotation. Details of the operation of block 310 are given further below with reference to
Block 320 determines the optimal bitrate allocation for each channel (after PCA/KLT transformation) based on a given budget of B bits. This block looks for a distribution of the bitrate between channels by calculating a score for each possible combination of bitrates; the optimal allocation is found by looking for the combination that maximizes this score.
Several criteria can be used to define a score for each combination.
For example, the number of possible bitrates for the mono encoding of a channel can be limited to the nine discrete bitrates of the EVS codec having a super-wide audio band: 9.6; 13.2; 16.4; 24.4; 32; 48; 64; 96 and 128 kbps. However, if the codec according to the invention operates at a given bitrate associated with a budget of B bits in the current frame of index t, in general only a subset of these listed bitrates can be used. For example, if the codec bitrate is fixed at 4×13.2=52.8 kbps to represent four channels and if each channel receives a minimum budget of 9.6 kbps to guarantee a super-wide band for each of the channels, the possible combinations of bitrates for encoding separate channels must respect the constraint that the bitrate used remains lower than the available bitrate which corresponds to:
Bmultimono=B−Boverhead,
where Boverhead is the bit budget for the additional information encoded per frame (bit allocation+rotation data) as described below. For example, Boverhead can be on the order of Boverhead=55 bits per 20 ms frame (i.e. 2.75 kbps) for the case of four-channel ambisonic encoding; this includes 51 bits for encoding the rotation matrix and 4 bits (as described below) for encoding the bit allocation for the encoding of separate channels. For an overall bitrate of 4×13.2=52.8 kbps, this therefore leaves a budget of Bmuitimono=50.05 kbps.
In terms of bitrates per channel, this gives the following permutations of bitrates per channel:
One can see that some combinations respecting the maximum budget limit have a much lower bitrate than others, and finally only two relevant combinations can be retained:
This makes it possible to illustrate that sixteen combinations are of particular interest and can be encoded in 4 bits (16 values). In addition, a certain number of bits remain potentially unused depending on the allocation chosen.
One can see that the encoding of the adaptive matrixing based on PCA/KLT processing and allowing flexible bit allocation can result in unused bits and, for some channels, a lower bitrate (for example 9.6 kbps) than the bitrate equally distributed among each of the channels (for example 13.2 kbps per channel).
To improve this situation, block 320 can then evaluate all possible (relevant) combinations of bitrates for the 4 channels resulting from the PCA/KLT transformation (output from block 310) and assign a score to them. This score is calculated based on:
This score can then be defined by the equation
where Ei is the energy in the current frame (of index t) of signal s(l), l= . . . L−1 on channel i, with:
The optimal allocation can be such that:
Alternatively, the factor Ei can be fixed at the value taken by the eigenvalue associated with the channel i resulting from decomposition into eigenvalues of the signal that is input to block 310 and after a possible signed permutation.
The MOS score Q(bi) is preferably the subjective quality score of the codec used for the multi-mono encoding in block 340 for a budget bi (in numbers of bits) per 20 ms frame corresponding to a bitrate Ri=50 bi (in bits/sec). To start with, we can use the (average) subjective MOS scores of an EVS standardized encoder given by:
Alternatively, other MOS score values for each of the listed bitrates can be derived from other tests (subjective or objective) predicting the quality of the codec. It is also possible to adapt the MOS scores used in the current frame, according to a classification of the type of signal (for example a speech signal without background noise, or speech with ambient noise, or music or mixed content), by reusing classification methods implemented by the EVS codec and by applying them to the W channel of the ambisonic input signal before performing the bit allocation. The MOS score can also correspond to a mean score resulting from different types of methodologies and rating scales: MOS (absolute) from 1 to 5, DMOS (from 1 to 5), MUSHRA (from 0 to 100).
In a variant where the EVS encoder is replaced by another codec, the list of bitrates bi and the scores Q(bi) can be replaced on the basis of this other codec. It is also possible to add additional encoding bitrates to the EVS encoder and therefore supplement the list of bitrates and MOS scores, or even to modify the EVS encoder and potentially the associated MOS scores.
In another alternative, the allocation between channels is refined by weighting the energy by a power a where a takes a value between 0 and 1. By varying the value of α, we can thus control the influence of the energy in the allocation: the closer α is to 1, the more significant the energy is in the score, and therefore the more unequal the allocation between channels. Conversely, the closer α is to 0, the less significant the energy is and the more evenly distributed the allocation between channels. The score is therefore expressed in the form:
In another alternative, to make the allocation more stable, a second weighting can be added to the score function to penalize inter-frame bitrate changes. A penalty is added to the score if the bitrate combination is not the same in frame t as in frame t−1. The score is then expressed in the form:
where βi has a predetermined constant as its value (for example 0.1) when bt,i=bt-1,i, and βi=0 when bt,i≠bt-1,i.
This additional weighting makes it possible to limit overly-frequent fluctuations in the bitrate between channels. With this weighting, only significant changes in energy result in a change in bitrate. In addition, the value of the constant can be varied to adjust the stability of the allocation.
Again with reference to
Referring again to
In frames where a part of the overall budget is not fully used, the multiplexer (block 350) can apply zero-bit stuffing to reach the bit budget allocated to the current frame, i.e. B−Σi=1nbt,iopt bits.
Alternatively, the remaining bit budget can be redistributed for encoding the transformed channels in order to use the entire available budget and if the multi-mono encoding is based on an EVS type technology, then the specified 3GPP EVS encoding algorithm can be modified to introduce additional bitrates. In this case, it is also possible to integrate these additional bitrates in the table defining the correspondence between bi and Q(bi).
A bit can also be reserved in order to be able to switch between two modes of encoding:
The choice between these two modes implies using a bit in the stream to indicate whether the current frame uses a rotation matrix restricted to the identity matrix without transmission of rotation parameters (bit=0) or if a rotation matrix is encoded (bit=1). When bit=0, it is possible in some variants to use an allocation of fixed bits to the separate channels and not transmit a bit allocation.
Reference is now made to
Alternatively, this matrix can be replaced by the correlation matrix, where the channels are pre-normalized by their respective standard deviation, or in general weights reflecting a relative importance can be applied to each of the channels; moreover, the normalization term 1/(L−1) can be omitted or replaced by another value (for example 1/L). The values Cij correspond to the variance between xi and xj.
The encoder then performs, in block 410, a decomposition into eigenvalues (EVD for “Eigenvalue Decomposition”), by calculating the eigenvalues and the eigenvectors of the matrix C. The eigenvectors are denoted Vt here to indicate the index of frame t because the eigenvectors Vt-1 obtained in the previous frame of index t−1 are preferably stored and subsequently used. The eigenvalues are denoted λ1, λ2, . . . , λn.
Alternatively, a singular value decomposition (SVD) of the preprocessed channels X can be used. We thus obtain the singular vectors (U on the left and V on the right) and the singular values σi. In this case we can consider that the eigenvalues λi are λi=σi2 and the eigenvectors Vt are given by the n singular vectors (column) on the left U.
The encoder then applies, in block 420, a first signed permutation of the columns of the transformation matrix for frame t (in which the columns are the eigenvectors) in order to avoid too much disparity with the transformation matrix of the previous frame t−1, which would cause problems with clicks at the border with the previous frame.
Thus, once a rough draft of the transformation matrix is obtained for frame t, block 430 takes n estimated eigenvectors Vt=vt,0, . . . , vt,n from the current frame of index t and n eigenvectors Vt-1 stored from the previous frame of index t−1, and applies a signed permutation on the estimated vectors Vt so that they are as close as possible to Vt-1. Thus the eigenvectors of frame t are permuted so that the associated basis is as close as possible to the basis of frame t−1. This has the effect of improving the continuity of the frames of transformed signals (after the transformation matrix is applied to the channels).
Another constraint is that the transformation matrix must correspond to a rotation. This constraint ensures that the encoder can convert the transformation matrix into generalized Euler angles (block 430) in order to quantize them (block 440) with a predetermined bit budget as seen above. For this purpose, the determinant of this matrix must be positive (typically equal to +1).
Preferably, the optimal signed permutation is obtained in two steps:
In one embodiment, the “Hungarian” method (or “Hungarian algorithm”) is used to determine the optimal assignment which gives a permutation of the eigenvectors of frame t;
If a value on the diagonal of the inter-correlation matrix Γt is negative, this denotes a change in sign between the directions of eigenvectors. A sign inversion is then performed on the corresponding eigenvector in {tilde over (V)}t.
At the end of the two steps, the transformation matrix at frame t is designated by Vt such that at the next frame the stored matrix becomes Vt-1.
Alternatively, the search for the optimal signed permutation can be done by calculating the change of basis matrix Vt-1−1Vt or VtVt-1−1 which is converted to 3D or 4D and by converting this change of basis matrix respectively into a unit quaternion or two unit quaternions. The search then becomes a nearest neighbor search with a dictionary representing the set of possible signed permutations. For example, in the 4D case the twelve possible even permutations (out of 24 total permutations) of 4 values are associated with the following pairs of unit quaternions written as 4D vectors:
The search for the (even) optimal permutation can be done by using the above list as a dictionary of predefined quaternion pairs and by performing a nearest neighbor search against the quaternion pair associated with the change of basis matrix. An advantage of this method is the reusing of rotation parameters of the quaternion and quaternion-pair type.
The operation which is implemented in the next block 460 assumes that the transformation matrix after signed permutation is indeed a rotation matrix; the transformation matrix is necessarily unitary, but its determinant must also be equal to 1
det(Vt)=1
However, the transformation matrix resulting from blocks 410 and 420 (after EVD and signed permutations) is an orthogonal (unitary) matrix which can have a determinant of −1 or 1, meaning a reflection or rotation matrix.
If the transformation matrix is a reflection matrix (if its determinant is equal to −1), it can be modified into a rotation matrix by inverting an eigenvector (for example the eigenvector associated with the lowest value) or by inverting two columns (eigenvectors).
Certain methods of eigenvector decomposition (for example by Givens rotation) or of singular value decomposition can lead to transformation matrices which are intrinsically rotation matrices (with a determinant of +1); in this case, the step of verifying that the determinant is +1 will be optional.
Block 430 converts the rotation matrix into parameters. In the preferred embodiment, an angular representation is used for the quantization (6 generalized Euler angles for the 4D case, 3 Euler angles for the 3D case, and one angle in 2D). For the ambisonic case (four channels) we obtain six generalized Euler angles according to the method described in the article “Generalization of Euler Angles to N-Dimensional Orthogonal Matrices” by David K. Hoffman, Richard C. Raffenetti, and Klaus Ruedenberg, published in the Journal of Mathematical Physics 13, 528 (1972); for the case of planar ambisonics (three channels) we obtain three Euler angles, and for the stereo case we obtain a rotation angle according to methods well known in the state of the art. The values of the angles are quantized in block 440 with a predetermined bit budget. In the preferred embodiment, a scalar quantization is used and the quantization step size is for example identical for each angle. For example, in the case of 4 channels we encode 6 generalized Euler angles with 3x(8+9)=51 bits (3 angles defined in an interval of [−π/2, π/2] encoded in 8 bits with a step size of π/256 and the 3 other angles defined in an interval of [−π, π] encoded in 9 bits with a step size of π/256). The quantization indices of the transformation matrix are sent to the multiplexer (block 350). In addition, block 440 may convert the quantized parameters into a quantized rotation matrix {circumflex over (V)}t, if the parameters used for quantization do not match the parameters used for interpolation.
Alternatively, blocks 430 and 440 can be replaced as follows:
This conversion into a pair of quaternions for the 4D case can be carried out for a rotation matrix whose coefficients are denoted R[i,j], i,j=0 . . . 3, by the following pseudo-code: Calculation of the associated matrix A[i, j] with:
A[0,0]=R[0,0]+R[1,1]+R[2,2]+R[3,3]
A[1,0]=R[1,0]−R[0,1]+R[3,2]−R[2,3]
A[2,0]=R[2,0]−R[3,1]−R[0,2]+R[1,3]
A[3,0]=R[3,0]+R[2,1]−R[1,2]−R[0,3]
A[0,1]=R[1,0]−R[0,1]−R[3,2]+R[2,3]
A[1,1]=−R[0,0]−R[1,1]+R[2,2]+R[3,3]
A[2,1]=−R[3,0]−R[2,1]−R[1,2]−R[0,3]
A[3,1]=R[2,0]−R[3,1]+R[0,2]−R[1,3]
A[0,2]=R[2,0]+R[3,1]−R[0,2]−R[1,3]
A[1,2]=R[3,0]−R[2,1]−R[1,2]+R[0,3]
A[2,2]=−R[0,0]+R[1,1]−R[2,2]+R[3,3]
A[3,2]=−R[1,0]−R[0,1]−R[3,2]−R[2,3]
A[0,3]=R[3,0]−R[2,1]+R[1,2]−R[0,3]
A[1,3]=−R[2,0]−R[3,1]−R[0,2]−R[1,3]
A[2,3]=R[1,0]+R[0,1]−R[3,2]−R[2,3]
A[3,3]=−R[0,0]+R[1,1]+R[2,2]−R[3,3]
A=A/4
Calculation of the 2 quaternions from the associated matrix
A2=square (A) # square of coefficients
q1=sqrt (A2.sum (axis=1)) # sum the rows
q2=sart (A2.sum (axis=0)) # sum the columns
Determination of Signs
The conversion to quaternion for the 3D case can be carried out as follows for a matrix R[i,j] i,j=0 . . . 2 of size 3×3:
Calculation of the simplified associated matrix:
q[0]=(R[0,0]+R[1,1]+R[2,2]+1){circumflex over ( )}2+(R[2,1]−R[1,2]){circumflex over ( )}2+(R[0,2]−R[2,0]){circumflex over ( )}2+(R[1,0]−R[0,1]){circumflex over ( )}2
q[1]=(R[2,1]−R[1,2]){circumflex over ( )}2+(R[0,0]−R[1,1]−R[2,2]+1){circumflex over ( )}2+(R[1,0]+R[0,1]){circumflex over ( )}2+(R[2,0]+R[0,2]){circumflex over ( )}2
q[2]=(R[0,2]−R[2,0]){circumflex over ( )}2+(R[1,0]+R[0,1]){circumflex over ( )}2+(R[1,1]−R[0,0]−R[2,2]+1){circumflex over ( )}2+(R[2,1]+R[1,2]){circumflex over ( )}2
q[3]=(R[1,0]−R[0,1]){circumflex over ( )}2+(R[2,0]+R[0,2]){circumflex over ( )}2+(R[2,1]+R[1,2]){circumflex over ( )}2+(R[2,2]−R[0,0]−R[1,1]+1){circumflex over ( )}2
For i=0 . . . 3: q[i]=sqrt(q[i])/4
Calculation of quaternion q
For the case of a 2×2 matrix the angle is calculated according to already known methods of the state of the art.
In some variants, the unit quaternions q1, q2 (4D case) and q (3D case) can be converted into axis-angle representations known in the state of the art.
We now describe block 460 for interpolation of the rotation matrices between two successive frames. It smoothes out discontinuities in the channels after application of these matrices. Typically, if two sets of angles or quaternions are too different from a previous frame t−1 to the next frame t, audible clicks are a concern if a smoothed transition has not been applied between these two frames, in subframes between these two frames. A transitional interpolation is then carried out between the rotation matrix calculated for frame t−1 and the rotation matrix calculated for frame t. The encoder interpolates, in block 460, the (quantized) representation of the rotation between the current frame and the previous frame in order to avoid excessively rapid fluctuations of the various channels after transformation. The number of interpolations can be fixed (equal to a predetermined value) or adaptive. Each frame is then divided into subframes as a function of the number of interpolations determined in block 450. Thus, if an adaptive interpolation is used, block 450 can encode in a chosen number of bits the number of interpolations to be performed, and therefore the number of subframes to be provided, in the case where this number is determined adaptively; in the case of a fixed interpolation, no information has to be encoded.
Next, block 460 converts the rotation matrices to a specific domain representing a rotation matrix. The frame is divided into subframes, and in the chosen domain the interpolation is carried out for each subframe.
For a first-order ambisonic input signal (with 4 channels W, X, Y, Z), in block 460, the encoder reconstructs a quantized 4D rotation matrix from the 6 quantized Euler angles and this is then converted to two unit quaternions for interpolation purposes. In a variant where the input to the encoder is a planar ambisonic signal (3 channels W, X, Y), in block 460 the encoder reconstructs a quantized 3D rotation matrix from the 3 quantized Euler angles and this is then converted to a unit quaternion for interpolation purposes. In a variant where the encoder input is a stereo signal, the encoder uses, in block 460, the representation of the 2D rotation quantized with a rotation angle.
In the embodiment with 4 channels, for interpolation of the rotation matrix between frame t and frame t−1, the rotation matrix calculated for frame t is factored into two quaternions (a quaternion pair) by means of Cayley's factorization and we use the quaternion pair stored for the previous frame t−1 and denoted (QL,t−1, QR,t−1).
For each subframe, the quaternions are interpolated two by two in each subframe.
For the left quaternion (QL,t), the block determines the shortest path between the two possible (QL,t or −QL,t). Depending on the case, the sign of the quaternion of the current frame is inverted. Then the interpolation is calculated for the left quaternion using spherical linear interpolation (SLERP):
where α corresponds to the interpolation factor (α=1/K, 2/K, . . . 1), and ΩL=arccos(QL,t−1·QL,t) For the right quaternion (QR,t), if there was an inversion for the left quaternion then we must maintain parity and force the sign of the right quaternion. This sign constraint is hereinafter referred to as the “joint shortest-path constraint”. Then the interpolation is calculated similarly to the left quaternion:
where α corresponds to the interpolation factor (α=1/K, 2/K, . . . 1) and ΩR=arccos(QR,t−1·QR,t)
Once the interpolation has been calculated for the two quaternions, the rotation matrix of dimension 4×4 is calculated (respectively 3×3 for planar ambisonics or 2×2 for the stereo case). This conversion into a rotation matrix can be carried out according to the following pseudo-code: 4D case: for a quaternion pair
3D case: for quaternion q=(w, x, y, z) we obtain the matrix M[i,j], i,j=0 . . . 2, of size 3×3
xy=2*x*y
xz=2*x*z
yz=2*y*z
wx=2*w*x
wy=2*w*y
wz=2*w*z
xx=2*x*x
yy=2*y*y
zz=2*z*z
M[0][0]=1−(yy+zz)
M[0][1]=(xy−wz)
M[0][2]=(xz+wy)
M[1][0]=(xy+wz)
M[1][1]=1−(xx+zz)
M[1][2]=(yz−wx)
M[2][0]=(xz−wy)
M[2][1]=(yz+wx)
M[2][2]=1−(xx+yy);
Finally, the matrices Vtinterp(α) (or their transposes) computed per subframe in the interpolation block 460 are then used in the transformation block 470 which produces n channels transformed by applying the rotation matrices thus found to the ambisonic channels that have been preprocessed by block 300.
Below, we return to the number K of subframes to be determined in block 450 for the case where this number is adaptive. The final difference between the current frame and the previous frame is measured, or determined directly from the angular difference of the parameters describing the rotation matrix. In the latter case, we want to ensure that the angular variation between successive subframes is not perceptible. The implementation of an adaptive number of subframes is especially advantageous for reducing the average complexity of the codec, but if reducing the complexity is chosen, it may be preferable to use an interpolation with a fixed number of subframes.
The final difference between the corrected rotation matrix of frame t and the rotation matrix of frame t−1 gives a measure of the magnitude of the difference in channel matrixing between the two frames. The larger this difference, the greater the number of subframes for the interpolation done in block 460. To measure this difference, we use the sum of the absolute value of the inter-correlation matrix between the transformation matrix of the current frame and the previous frame, as follows:
δt=∥In−corr(Vt,Vt−1)
where In is the identity matrix, Vt the eigenvectors of the frame of index t, and ∥M∥ is a norm of matrix M which corresponds here to the sum of the absolute values of all the coefficients. Other matrix norms can be used (for example the Frobenius norm).
If the two matrices are identical then this difference is equal to 0. The more the matrices are dissimilar, the greater the value of the difference δt. Predetermined thresholds can be applied to δt, each threshold being associated with a predefined number of interpolations, for example according to the following decision logic:
Thresholds: {4.0, 5.0, 6.0, 7.0}
Number K of subframes for interpolation: {10, 48, 96, 192}
Thus only two bits can be sufficient to encode the four possible values giving the number of subdivisions (subframes).
The number K of interpolations determined by block 450 is then sent to the interpolation module 460, and in the adaptive case the number of subframes is encoded in the form of a binary index which is sent to the multiplexer (block 350).
The implementation of interpolation enables ultimately applying an optimization of the decorrelation of the input channels before multi-mono encoding. Indeed, the rotation matrices respectively calculated for a previous frame t−1 and a current frame t can be very different due to this search for decorrelation, but even so, interpolation makes it possible to smooth this difference. The interpolation used only requires a limited computing cost for the encoder and decoder since it is performed in a specific domain (angle in 2D, quaternion in 3D, quaternion pair in 4D). This approach is more advantageous than interpolating covariance matrices calculated for the PCA/KLT analysis and repeating an EVD type of eigenvalue decomposition several times per frame.
Block 470 then performs matrixing of the ambisonic channels per subframe, using the transformation matrices calculated in block 460. This matrixing amounts to calculating Vtinterp(α)TX(α) per subframe, where X(α)) corresponds to sub-blocks of size n×(L/K) for α=1/K, 2/K, . . . 1. The signal contained in these channels is then sent to block 340 for multi-mono encoding.
Reference is now made to
After the demultiplexing of the bit stream for the current frame t by block 500, the allocation information is decoded (block 510) which makes it possible to demultiplex and decode (block 520) the bit stream(s) received for each of the n transformed channels.
Block 520 calls multiple instances of the core decoding, executed separately. The core decoding can be of the EVS type, optionally modified to improve its performance. Using a multi-mono approach, each channel is decoded separately. If the encoding previously used is stereo or multichannel encoding, the multi-mono approach can be replaced with multi-stereo or multi-channel for decoding. The channels thus decoded are sent to block 530 which decodes the rotation matrix for the current frame and optionally the number K of subframes to be used for interpolation (if the interpolation is adaptive). For each matrix, the interpolation block 460 divides the frame into subframes, for which the number K can be read in the stream encoded by block 610 (
Block 530 performs the matrixing to reverse that of block 470 in order to reconstruct a decoded signal, as detailed below with reference to
Block 530 in general performs the decoding and the reverse PCA/KLT synthesis to what was performed by block 310 of
Block 620 performs the inverse matrixing of the ambisonic channels per subframe, using the inverses (in practice the transposes) of the transformation matrices calculated in block 460.
Thus, the invention uses an entirely different approach than the MPEG-H codec with overlap-add based on a specific representation of transformation matrices which are restricted to rotation matrices from one frame to another, in the time domain, enabling in particular an interpolation of the transformation matrices, with a mapping which ensures directional consistency (including taking into account the direction by the sign).
The general approach of the invention is an encoding of ambisonic sounds in the time domain by PCA, in particular with PCA transformation matrices forced to be rotation matrices and interpolated by subframes in an optimized manner (in particular in the domain of quaternions/pairs of quaternions) in order to improve quality. The interpolation step size is either fixed or adaptive depending on a criterion of the difference between an inter-correlation matrix and a reference matrix (identity) or between matrices to be interpolated. The quantization of rotation matrices can be implemented in the domain of generalized Euler angles. However, preferably it may be chosen to quantify matrices of dimension 3 and 4 in the domain of quaternions and quaternion pairs (respectively), which makes it possible to remain in the same domain for quantization and interpolation.
In addition, an alignment of eigenvectors is used to avoid the problems of clicks and channel inversion from one frame to another.
Of course, the invention is not limited to the embodiments described above as examples, and extends to other variants.
The above description thus discussed cases of four channels.
However, in some variants, it is also possible to encode a number of channels greater than four.
The implementation remains identical (in terms of functional blocks) to the case of n=4, but the interpolation by quaternion pair is replaced by the general method below.
The transformation matrices at frames t−1 and t are denoted Vt−1 and Vt. The interpolation can be performed with a factor α between Vt−1 and Vt such that:
Vtinterp(α)=Vt−1(Vt−1TVt)α
The term (Vt−1TVt)α can be calculated directly by eigenvalue decomposition of Vt−1TVt. Indeed, if Vt−1TVt=QLQT, we have: (Vt−1TVt)α=QLαQT.
Also note that this variant could also replace the interpolation by pair of unit quaternions (4D case), unit quaternion (3D case), or angle, however this would be less advantageous because it would require an additional diagonalization step and power calculations, while the embodiment described above is more efficient for these cases of 2, 3, or 4 channels.
Although the present disclosure has been described with reference to one or more examples, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of the disclosure and/or the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
19305254 | Mar 2019 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/053264 | 2/10/2020 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/177981 | 9/10/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20140358565 | Peters | Dec 2014 | A1 |
20160155448 | Purnhagen | Jun 2016 | A1 |
Entry |
---|
English translation of the Written Opinion of the International Searching Authority dated Apr. 17, 2020 for corresponding International Application No. PCT/EP2020/053264, filed Feb. 10, 2020. |
International Search Report dated Apr. 7, 2020 for corresponding International Application No. PCT/EP2020/053264, Feb. 10, 2020. |
Written Opinion of the International Searching Authority dated Apr. 7, 2020 for corresponding International Application No. PCT/EP2020/053264, filed Feb. 10, 2020. |
Roumen Kountchev et al, “New method for adaptive karhunen-loeve color transform”, Telecommunication in Modern Satellite, Cable, and Broadcasting Services, 2009. Telsiks '09. 9th International Conference on, IEEE, Piscataway, NJ, USA, Oct. 7, 2009 (Oct. 7, 2009), p. 209-216, XP031573422. |
Number | Date | Country | |
---|---|---|---|
20220148607 A1 | May 2022 | US |