The present disclosure relates to the encoding/decoding of spatial sound data, in particular within the ambiophonics context (hereafter also referred to as “ambisonics”).
The encoders/decoders (hereafter called “codecs”) that are currently used in mobile telephony are mono (a single signal channel for rendering on a single loudspeaker). The 3GPP EVS (“Enhanced Voice Services”) codec allows “Super-HD” quality (also called “High Definition Plus” or HD+ voice) to be provided with a super-wideband (SWB) audio band for signals sampled at 32 or 48 kHz or with a full band (FB) for signals sampled at 48 kHz; the audio bandwidth ranges from 14.4 to 16 kHz in SWB mode (from 9.6 to 128 Kbit/s) and from 20 kHz in FB mode (from 16.4 to 128 Kbit/s).
The next evolution of quality in the conversational services offered by operators should be made up of immersive services, using terminals such as smartphones equipped with several microphones or spatial audio conference or video conference equipment of the telepresence or 360° video type, or even equipment for sharing “live” audio content, with 3D spatial sound rendering that is even more immersive than simple 2D stereo rendering. With the increasingly widespread uses of listening on a mobile phone with headphones and the emergence of advanced audio equipment (accessories such as a 3D microphone, voice assistants with acoustic antennas, virtual reality headsets, etc.) picking up and rendering spatial sound scenes are now fairly widespread for providing an immersive communication experience.
In this respect, the future 3GPP “IVAS” (Immersive Voice and Audio Services) standard proposes extending the EVS codec to Immersive audio by accepting, as the input format of the codec, at least the spatial sound formats listed below (and the combinations thereof):
Typically, encoding a sound in the ambisonic format is of interest hereafter, by way of an embodiment (with at least some aspects presented hereafter with respect to the invention also being able to be applied to formats other than the ambisonic format).
Ambisonics is a method for recording (“encoding” in the acoustic sense) spatial sound and a reproduction system (“decoding” in the acoustic sense). An ambisonic microphone (first order) comprises at least four capsules (typically of the cardioid or sub-cardioid type) arranged on a spherical grid, for example, the vertices of a regular tetrahedron. The audio channels associated with these capsules are referred to as “A-format” channels. This format is converted into a “B-format”, in which the sound field is broken down into four components (spherical harmonics) denoted W, X, Y, Z, which correspond to four coincident virtual microphones. The component W corresponds to an omnidirectional pick up of the sound field, while the more directional components X, Y and Z are similar to microphones with pressure gradients oriented along the three orthogonal axes of the space. An ambisonic system is a flexible system in the sense that recording and rendering are separated and decoupled. It allows decoding (in the acoustic sense) on any configuration of loudspeakers (for example, binaural, 5.1 type “surround” sound or 7.1.4 type peritelephony (with elevation)). The ambisonic approach can be generalized to more than four B-format channels and this generalized representation is commonly referred to as “HOA” (Higher-Order Ambisonics). Breaking down the sound over more spherical harmonics improves the spatial rendering accuracy when rendering on loudspeakers.
An M-order ambisonic signal includes K=(M+1)2 components and, for the first-order (if M=1), the four components W, X, Y, and Z are found, commonly called FOA (First-Order Ambisonics). There is also a variant of the ambisonics (W, X, Y), called “planar” ambisonics, that breaks down the defined sound in a plane that is generally the horizontal plane (where Z=0). In this case, the number of components is K=2M+1 channels. First-order ambisonics (4 channels: W, X, Y, Z), first-order planar ambisonics (3 channels: W, X, Y), as well as higher order ambisonics are all equally referred to hereafter as “ambisonics” to facilitate reading, with the described processes being applicable independently of the planar or non-planar type and of the number of ambisonic components. If, however, in some passages a distinction needs to be made, the terms “first-order ambisonics” and “first-order planar ambisonics” are used.
Hereafter, a B-format signal will be called “ambisonic signal” with a predetermined order with a certain number of ambisonic components. In variants, the ambisonic signal can be defined in another format, such as the A-format or channels pre-combined by fixed matrixing.
The signals to be processed by the encoder/decoder are in the form of series of blocks of sound samples, called “frames” or “sub-frames” hereafter. Furthermore, hereafter, the mathematical notations are in accordance with the following convention:
The simplest approach for encoding an ambisonic signal involves using a mono encoder (for example, EVS) and simultaneously applying this mono encoder to all the channels, optionally with a different allocation of the bits as a function of each input channel. This approach is called “multi-mono” approach herein. The multi-mono approach can be extended to multi-stereo encoding (where pairs of channels are encoded separately by a stereo codec) or, more generally, to the use of several parallel instances of the same core codec.
In multi-mono encoding, the input signal is divided into channels (mono) that are encoded individually. After decoding, the channels are recombined. The associated quality varies according to the mono-encoding that is used, and it is generally only satisfactory at a very high rate, for example, with a rate of at least 48 Kbit/s per mono channel for EVS encoding. Thus, for the first-order, a minimum rate of 4×48=192 Kbit/s is acquired.
Since the multi-mono encoding approach does not take into account the correlation between channels, at a low rate it produces spatial deformations with the addition of various artefacts such as the appearance of phantom sound sources, diffuse noises or movements of the trajectories of sound sources. Thus, encoding an ambisonic signal according to this approach leads to degradations of the spatialization.
Various more advanced solutions have been proposed for encoding ambisonic signals. A particular approach to ambisonic encoding is of interest in the invention, using the quantization and interpolation of rotation matrices, as described, for example, in patent application WO 2020/177981.
In this approach, 4×4 rotation matrices (derived from a PCA/KLT analysis as described, for example, in the aforementioned patent application) are converted, for example, into 6 generalized Euler angles, which are encoded by uniform scalar quantization, before applying an inverse conversion, in order to find matrices of decoded rotations, then an interpolation is applied by sub-frames in the quaternion domain. By way of a reminder, a method for converting a rotation matrix into generalized Euler angles is provided in the article entitled, “Generalization of Euler angles to N-Dimensional Orthogonal Matrices” by David K. Hoffman, Richard C. Raffenetti, and Klaus Ruedenberg, published in the Journal of Mathematical Physics 13, 528(1972).
The strategy of this type of ambisonic encoding is to de-correlate the channels of the ambisonic signal as much as possible and to then encode them separately with a core codec (for example, multi-mono). This strategy allows the artefacts in the decoded ambisonic signal to be limited.
More specifically, an optimized decorrelation of the input signals is applied before encoding (for example, multi-mono). Moreover, the domain of quaternions allows the transformation matrices computed for the PCA/KLT analysis to be interpolated rather than repeating a decomposition into Eigen values and Eigen vectors several times per frame; with the transformation matrices being rotation matrices, for the decoding, the inverse matrixing operation is carried out simply by transposing the matrix applied to the encoding.
The quantization indices of the quantization parameters of the rotation matrix in the current frame are decoded in the block 200. The conversion and interpolation steps (blocks 242, 243, 260, 262) of the decoder are identical to those carried out on the encoder (blocks 142, 143, 160 and 162). If the number of interpolation sub-frames is adaptive, this is decoded (block 210), otherwise, this number of interpolation sub-frames is set to a predetermined value.
The block 220 applies, per sub-frame, the inverse matrixing originating from the block 262 to the decoded signals of the ambisonic channels; by way of a reminder, the inverse of a rotation matrix is its transpose.
In the aforementioned patent application, the quantization of the 3×3 or 4×4 rotation matrices is preferably carried out in the domain of Euler angles (3×3 case) or generalized Euler angles (4×4 case) and the interpolation is carried out in the domain of quaternions. This involves multiple conversions between the matrix and various parameters, and therefore increased complexity since two different types of parameters are used for the quantization and the interpolation. Moreover, the conversion to Euler angles, in particular, for the generalized Euler angles according to the method described in the article by Hoffman et al., can raise certain issues in practice, since it can be digitally ‘unstable’, in the sense that the combination of the direct and inverse conversion (of the matrix with Euler angles followed by the inverse conversion) may not exactly restore the original matrix (even in the absence of quantization of the angles) and the quantization can induce issues such as “gimbal lock”, which involves losing a degree of freedom, which occurs when the axes of two of the three gimbals required for applying or compensating the rotations in the three-dimensional space are supported by the same direction. In such cases, PCA/KLT decorrelation is no longer optimal.
It would be more advantageous to resort to only one type of parameters for encoding the rotation matrices and for their interpolation, a conversion of the rotation matrix can be carried out in the quaternion domain and a quantization of parameters resulting from this conversion can be carried out in order to replace the quantization of parameters such as the Euler angles (which may or may not be generalized). In the literature, no effective methods are found for encoding a quaternion or a dual quaternion with the constraint of representing a rotation matrix with similar precision to the quantization of Euler angles (which may or may not be generalized) and with a given bit budget;
Therefore, a requirement exists for optimizing this quantization of parameters in terms of rate and/or complexity and/or storage of information.
An exemplary aspect of the present disclosure relates to a method for encoding a multichannel audio signal, comprising forming a transformation matrix in the form of a rotation matrix to be applied to the input signals, quantizing the rotation matrix and encoding the transformed signals after applying the rotation matrix, wherein quantizing the rotation matrix comprises the following operations:
The quantization of the quaternions for encoding the rotation matrix allows multiple conversions to be avoided since the quaternion domain is also used to interpolate the rotation matrix before applying this matrix to the multichannel signal.
This quantization is further optimized to restrict the rate to be used by forcing one of the parameters of a quaternion to be positive and thus to encode only the relevant positive quaternion, with the negative quaternion corresponding to the same rotation. The conversion into spherical coordinates and the quantization of these spherical coordinates allows a quantization method to be used that does not require the use of onerous dictionaries both in terms of memory space and of processing capacity. Quantization over half an interval also allows a saving to be provided in terms of the rate.
In a particular embodiment, the positive component of said first quaternion is its real component.
In a simple manner, the real component (a1) of the first quaternion is selected by convention.
In one embodiment, the rotation matrix is converted into a dual quaternion, a first quaternion for which a component is forced to be positive and a second quaternion.
According to one embodiment, the quantization of the first quaternion uses one bit less than the quantization of the second quaternion. The rate is thus optimized.
In one embodiment, converting each of the two quaternions of the dual quaternion into spherical coordinates yields three angles, and the quantization of the angle associated with the positive component of the first quaternion is carried out at a half-length interval relative to the interval used to quantize the same component in the second quaternion.
Acquiring these angles that are acquired for each quaternion allows parameters to be acquired that are less complex to be quantized. Indeed, quantizing the angles thus acquired rather than quantizing the quaternion in question is less complex since it does not necessarily require the use of onerous dictionaries both in terms of memory space and of processing capacity.
Taking into account the positive component of the first quaternion allows a quantization to be carried out over a restricted interval on this quaternion, which minimizes the rate to be allocated for the quantization of this quaternion.
In a particular embodiment of this embodiment, the quantization of the six acquired angles is carried out by uniform scalar quantization.
This quantization method is simple and not very complex.
In another particular embodiment, the quantization of the six acquired angles is carried out by vector quantization with a hyper-rectangular support.
This quantization method is another simple and not very complex alternative.
In a particular embodiment, a binary indication is also encoded to indicate whether the at least one first quaternion assumes default values.
The default values of the quaternions typically can be such that q=(1, 0, 0, 0,), which indicates that the transformation matrix is an identity matrix. In this case, if the binary indication indicates that the quaternions assume these default values, this indicates that the transformation is deactivated for the current frame.
An aspect also relates to a method for decoding a multichannel audio signal, comprising receiving encoded signals originating from a multichannel signal and further comprising the following operations:
Thus, the decoder can receive and decode a set of quaternions that allows a rotation matrix to be constructed that is useful for decoding the multichannel signal.
Acquiring a positivity index of a component of at least one quaternion allows suitable decoding to be applied, by decoding only the positive quaternion in order to deduce the negative quaternion therefrom.
This set of quaternions also allows it to be used for interpolating the acquired rotation matrix, without having to carry out other conversions of this matrix, in order to acquire an interpolated matrix applicable to the signals of the multichannel signal.
This set of quaternions can be decoded with less complexity, in particular when the encoded parameters are angles derived from a dual quaternion.
An inverse scalar quantization method can be implemented in this case, for example.
An aspect also relates to an encoding device comprising a processing circuit for implementing the encoding method as described above. An aspect also relates to a decoding device comprising a processing circuit for implementing the decoding method as described above. An aspect relates to a computer program comprising instructions for implementing the encoding or decoding methods as described above, when they are executed by a processor.
Finally, An aspect relates to a processor-readable storage medium storing a computer program comprising instructions for executing the encoding or decoding methods described above.
Further features and advantages of aspects of the disclosure will become more clearly apparent upon reading the following description of particular embodiments, which are provided by way of simple illustrative and non-limiting examples, and from the accompanying drawings, in which:
A more detailed description of these blocks can be found in the aforementioned patent application. The block for encoding the transformed signals (block 380), which can be multi-mono encoding or any other type of multichannel coding, has been explicitly added herein, as has the multiplexing block (block 390), which forms the bit stream or the payload of an encoded data packet. The difference from
By way of a reminder, some explanatory concepts relating to the rotations in dimension n and their conversion into quaternions in the 3×3 case and dual quaternions in the 4×4 case are provided in this case. As described in the aforementioned patent application, the encoding method uses a representation of the rotations in dimension n=3 or 4, with parameters suitable for a quantization per frame and an efficient interpolation per sub-frame. Some representations of rotations are defined hereafter, which can be used in 3 and 4 dimensions, and the focus is on the 4 dimensions in the preferred embodiment.
A rotation (around the origin) is a transformation of the space in dimension n that changes one vector into another vector, such that:
A matrix M of size n×n is a rotation matrix if and only if MTM=In, where In denotes the identity matrix of size n×n (i.e., M is a unit matrix, with MT designating the transpose of M) and its determinant is equal to +1. Incidentally, it should be noted that the inverse of M is its transpose.
There are several representations equivalent to the representation per rotation matrix.
In the three-dimensional (3D) space (n=3): the Euler angles, the quaternions (unit), or even a representation per axis-angle are often used as a representation of a 3D rotation, with the representation per axis-angle not being described herein. The representation of 3 Euler angles is derived from the fact that a 3×3 rotation matrix can be broken down into a product of 3 elementary rotation matrices; the elementary rotation matrices with the angle θ along the axes x, y, or t are provided below:
Depending on the combinations of axes and depending on whether the axes are defined as absolute or relative, the angles are said to be Euler or Cardan angles.
A 3D rotation also can be represented by a quaternion. Quaternions are a generalization of the complex numbers with four components in the form of a number q=a+bi+cj+dk where i2=j2=k2=ijk=−1.
The real part a is called scalar part and the three imaginary parts (b, c, d) form a 3D vector. The norm of a quaternion is λaλ=√{square root over (a2+b2+c2+c2)}. The unit quaternions (norm 1) represent the rotations; however, this representation is not unique; thus, if q represents a rotation, −q represents the same rotation.
Hereafter, the term quaternion is to be understood in the sense of a unit quaternion, and the qualifier “unit” is not systematically used, except by way of timely reminders, for the sake of conciseness.
It should be noted that, in the literature, the quaternion q=a+bi+cj+dk is generally considered to be a 4-dimensional vector (a, b, c, d); sometimes, the real part is permutated as (b, c, d, a); subsequently, with no loss of generality, the convention is assumed on the order of the elements in the form of (a, b, c, d). Hereafter, each of the elements a, b, c, d will be referred to as a component of a quaternion. Thus, a is also referred to hereafter as the real component of q.
Considering a unit quaternion q=a+bi+cj+dk (with a2+b2+c2=1), the associated 3D rotation matrix is:
Conversely, for a 3×3 rotation matrix, an associated quaternion (to the nearest sign) can be determined; indeed, q and −q represent the same matrix M3,quat(q)=M3,quat(−q). A method for converting M3,quat(q) to ±q is described below with respect to block 330.
The Euler angles do not allow 3D rotations to be correctly interpolated; to this end, the quaternions are used instead. The SLERP (Spherical Linear Interpolation) interpolation method involves interpolating according to the following formula:
where 0≤α>1 is the interpolation factor for proceeding from q1 to q2 and Ω is the angle between the two quaternions:
Ω=arccos(q1.q2)
where q1.q2 designates the scalar product between two quaternions (identical to the scalar product between two 4-dimensional vectors). This amounts to interpolating by following a large circle over a 4D sphere with a constant angular speed as a function of α. It is worthwhile ensuring that the shortest path is used for interpolating by changing the sign of one of the quaternions when q1.q2<0. It should be noted that other quaternion interpolation methods can be used (NLERP “Normalized Linear Interpolation” that amounts to interpolating on a chord and renormalizing the result, splines, etc.).
In 4-dimensions (n=4), a rotation can be parameterized by
generalized Euler angles as indicated in the aforementioned patent application.
In this case, the dual quaternion representation is of interest. This representation requires recourse to the matrix form of a quaternion.
The 4×4 matrices, herein called quaternion (Q) and antiquaternion (Q*), associated with a quaternion (unit) q=a+bi+cj+dk are defined by:
which corresponds to the “column convention”, since a quaternion is then represented as a 4D column vector. The matrices Q and Q* respectively correspond to a left multiplication by q and a right multiplication by q.
It is possible to check that the product of the two quaternions q1q2 is acquired in the same way in the matrix form:
where, for the quaternions q1=a1+b1i+c1j+d1k and q2=a2+b2i+c2j+d2k, with:
Considering two unit quaternions q1 and q2, the product of Q1 and Q2* yields a 4×4 matrix, which confirms the properties of a rotation matrix (unit matrix and determinant equal to 1):
M
4,quat(q1q2)=Q1Q2*
It should be noted that M4,quat(q1,q2)=M4,quat(q2,q1)
Conversely, considering a 4×4 rotation matrix, it is possible to find an associated dual quaternion (q1, q2) and the corresponding quaternion and anti-quaternion matrices. In other words, this matrix can be factorized into a product of matrices in the form Q1Q2*, for example, with the method referred to as the “Cayley factorization” method. This generally involves computing an intermediate matrix, called “associated matrix” (or “tetragonal transform”) and deducing therefrom the quaternions to the nearest indeterminacy on the sign of the two quaternions. It should be noted that M4,quat(q1,q2)=M4,quat(−q1,−q2); this property is used within the scope of the disclosure for more efficiently encoding the quaternions.
In variants, the definition of the quaternion and anti-quaternion matrices can assume various conventions. For example:
In general, it is possible to insert a signed permutation matrix into the Cayley factorization and the present disclosure is applicable in all these cases. For the sake of simplification, and with no loss of generality, only one example of a convention (the “column convention” described above) is described hereafter at the output of the block 330 that allows a dual quaternion (q1, q2) to be acquired in the form of two 4D vectors to be encoded.
These alternative conventions affect the block 330 for converting a dual quaternion matrix (in particular the computation of the “associated matrix”) and affect the inverse block 362 and 462 that computes (explicitly or in an optimized manner) the product of matrices in the form of Q1Q2*. A person skilled in the art will know how to adapt this computation as a function of the adopted convention.
The 3×3 and 4×4 cases are linked by the fact that a 3D rotation can be seen as a 4×4 rotation with the constraint: q1=q and q2=−q. In this case,
As for the 4×4 case, for the sake of simplification and with no loss of generality, only one example of a convention (the “column convention” as described above) is described hereafter.
As described with reference to
An aspect of the disclosure prevents a representation different from that used in the block 360 for the interpolation from being acquired.
According to the an aspect of the disclosure, the 3- and 4-dimension matrices are preferably selected to be quantized in the domain of quaternions and dual quaternions (respectively), which means it is possible to remain in the same domain for the quantization and the interpolation.
More specifically, the case of a 4-dimensional matrix in the embodiment that will now be described is of interest.
The PCA/KLT analysis and the PCA/KLT transformation as described in patent application WO 2020/177981 are carried out in the time domain. However, an aspect of the disclosure also applies to the case whereby a PCA/KLT analysis is carried out, for example, in a frequency domain with an estimate of a (real) covariance matrix by sub-bands.
The encoding method according to an aspect of the disclosure implements the steps described with reference to
This conversion can be carried out as follows for a rotation matrix M4,quat(q1,q2), in this case denoted M=(mij)i,j=0, . . . ,3 in order to simplify the developments:
The intention is to factorize M in the form M=Q1Q2*:
It can be seen that this factorization is equivalent to solving:
where U is the “associated matrix” of M acquired from the coefficients mij of M as follows:
Considering the coefficients U=(uij)i,j=0, . . . ,3, the following are found row-by-row, after squaring and summing all the coefficients per row:
Which yields, as the quaternion q2 is a unit quaternion (i.e., a22+b22+c22+d22=1):
By convention, the sign of the components a1,b1,c1,d1 is respectively provided by the sign of u0k, u1k, u2k, u3k, where k is selected over the interval 0, . . . ,3. It should be noted that the factorization has two possible solutions, since the opposite convention also would be a solution. By selecting the maximum absolute value component (non-zero guarantee since the quaternion q1 is a unit quaternion) from among a1,b1,c1,d1, for example, and with no loss of generality a1≠0, the following is deduced:
(if a component other than a1 is selected, for example b1, then
).
Therefore, two 4D vectors (a1,b1,c1,d1) and (a2,b2,c2,d2) representing the dual quaternion (q1, q2) are acquired at the output of the block 330.
The quaternion conversion for the 3D case can be carried out as follows for a matrix M3,quat(q) denoted M=(aij)ij=0, . . . ,2. The 4×4 case is reused starting from the extended rotation matrix:
The associated matrix is simplified by:
The remainder of the factorization method remains identical, except that only a single quaternion, for example, q1, has to be determined with the other one (q2) being opposite; therefore, a1,b1,c1,d1 are determined per square root of partial sums of terms to the square of U, with a sign being determined according to the designated convention.
Therefore, a 3D vector (a1,b1,c1,d1) representing the quaternion q1=q is acquired at the output of the block 330.
According to an aspect of the disclosure, the block 340 encodes the acquired quaternions, in the embodiment described herein, a dual quaternion for the 4×4 case.
Reference will now be made to
With reference to
In a particular embodiment, it is the real component (a1) of the first quaternion that is forced to be positive, by convention.
To this end, a check is carried out in step E310 to determine whether the real component ai is negative. If so, the two quaternions q1 and q2 are replaced in step E320 by their opposites −q1 and −q2.
By way of a reminder, this operation does not change the 4D rotation matrix associated with the dual quaternion.
These two quaternions are then encoded by quantization in step E330.
In a first embodiment, in order to minimize the complexity, q1 and q2 are quantized in step E330 by the quantization device represented by block 340 in
Considering, in step E380, two unit quaternions qi=(ai,bi,ci,di), i=1, 2 with a first quaternion q1 for which a component has been forced positive. For this quaternion, an indication of the absence of a positive component (denoted F1) is set to the value 0 (since a positive component clearly exists). This indication is set to the value 1 (F2=1) for the second quaternion q2. The parameter F (“Full range”) allows the following intervals to be distinguished [0, π/2] (F=0) and [0, π] (F=1).
A step of encoding the first quaternion q1 is carried out in step E381 (Cod. q1) taking into account the indication of the absence of a positive component F1=0 provided for q1 and a step of encoding the second quaternion q2 is carried out in step E382 (Cod. q2) taking into account the indication of the absence of a positive component F2=1 provided for q2.
These two steps will now be described with reference to
From a unit quaternion qi=(ai,bi,ci,di), in step E510, this quaternion is converted in step E520 into three angles (ωi,θi, φi), which are similar to spherical coordinates.
These three angles are defined as follows for a quaternion qi:
where arccos is the arc cosine function (with a value between [0, π]) and arctan2 is the tangent arc on the 4 quadrants in order to acquire an angle on [−π, π]. In variants, other definitions of the angles ωi, θi or φi can be adopted (for example, from an arc sine function denoted arcsin); in this case, the quantization must take into account a different interval (for example: [−π/2, π/2] instead of [0, π]) and the formulae provided above for determining the 3 angles also must be adapted accordingly (like the inverse conversion).
The three angles ωi, θi and φi as defined above have values on the interval [0, π], [−π, π] and [0, π], respectively.
In order to reduce the complexity as much as possible, the three angles are quantized in step E530, for example, by uniform scalar quantization, taking into account the index Fi as defined above, provided in step E531, for a quaternion qi.
Considering, in step E540, the three angles ωi, θi and φi, and in step E531 the indication Fi, these angles are respectively quantized in steps E541, E542 and E543 in order to acquire three quantization indices (idxi1, idxi2, idxi3) in steps E551, E552 and E553, for a quaternion qi.
To this end, a parameter F (“Full range”) defining the width of the quantization interval and T (“Two sided”) defining the existence of positive and negative values in the interval are defined for each of the angles.
For the case of the angle ωi, the value of F corresponds to the value of Fi, i.e., the indication of the absence of a positive component. In the case of a first quaternion for which the value of a component has been forced positive, this value will be 0, i.e., the width of the quantization interval will be reduced to a half-interval.
Conversely, for a second quaternion, for which the value of Fi is 1, the entire quantization interval is used (F=1).
For the case of the angle ωi, the value of T is set to 0, i.e., the quantization interval does not include negative values. Indeed, the quantization interval is [0, π/2] or [0, π] for this angle.
For the case of the angle θi, the quantization interval is [−π, π]. The value of F is defined as 1 and that of T is defined as 1.
For the case of the angle φi, the quantization interval is [0, π]. The value of F is defined as 1 and that of T is defined as 0.
The step of quantizing an angle, generically denoted α, such as step E541, E542 or E543, will now be described in the flowchart of
Starting, in step E560, with the respective values of α (value of the ωi, θi and φi), T and F as defined above, the parameters N and N′ are defined in step E561.
N=2R and N′=N/2, where R is the number of bits set, for example, to 8 bits.
In step E562, the value of F is checked. If F=0, the width of the quantization interval is reduced by half and therefore has the maximum value (maxval) of π/2 in step E563. Otherwise, if F=1, the quantization interval is not reduced and its maximum value (maxval) is π.
In step E565, the value of the quantization index idx is set to the value 0.
In step E566, the value of T is checked. If T is equal to 1, a check is carried out in step E567 to determine whether the value of α is negative. If so, the absolute value of the value of the angle α is taken in step E568 and an offset (N′) is added to the quantization index.
The value of N is updated in step E569 (N=N′) in order to indicate that half the quantization indices are reserved for the positive values.
In the case whereby α is positive in step E567, the method proceeds directly to step E569.
In step E570, a quantization pitch d is defined as being the maximum value (maxval) of the interval on N−1.
In the case whereby T is equal to 0 in step E566, the method proceeds directly to step E570.
In step E571, the value of α is checked by comparing it with the maximum value of the interval.
In the case whereby α is greater than this maximum value (maxval), step E572 defines as a quantized value m=N−1, otherwise, this quantized value is defined as
In step E574, the value of the quantization index is updated with this quantized value m.
Thus, the quantization value idx of the angle encoded in step E575, which can be transmitted to a decoder, is acquired. In variants, other scalar quantization forms can be implemented (for example: other quantization steps, decision thresholds or reconstruction levels) and the binary allocation R can be different from 8 bits for each of the angles ωi, θi and φi in order to have a specific bit budget for each of the angles.
It should be noted that, according to an aspect of the disclosure, the encoding of q1 requires one bit less than the encoding of q2, since the constraint a1≥0 is exploited by defining ω1 on [0, π/2] instead of [0, π]. Indeed, for the case of a uniform scalar quantization on R bits (for example, R=8 bits), with N=2R quantization values (or reconstruction levels), if the quantization pitch d is set to d=(π/2)/(N−1) to cover the half-length interval ranging from 0 to π/2 (inclusive) with N−1 sub-intervals, there will need to be N−2 additional sub-intervals of the same width d in order to cover the complete interval ranging from 0 to π, which requires 1 additional bit. In order to avoid leaving an unused value, it is also possible to slightly adapt the pitch to d′=π/(N−1), where N=2R+1. For R=8 bits, the difference d−d′ is negligible, it is (π/2)/255−π/511 of the order of 0.0007 degrees.
Thus, for encoding q1, 8 bits are used for ω1, 9 bits for θ1 and 8 bits for φ1, that is 25 bits. For encoding q2, 9 bits are used for ω2, 9 bits for θ2, 8 bits for φ2, that is 26 bits. In total, the budget used is therefore 51 bits.
The roles of q1 and q2 obviously can be interchanged in order to force a component to be positive and for the quantization.
The encoding method as described for
In the embodiment described herein, the quaternions used in the interpolation step of the block 360 of
However, in a variant, the quaternions used for the interpolation step of the block 360 can originate directly from the conversion step of the block 330 without passing through the steps of quantization and of inverse quantization.
This interpolation can be carried out as described in the aforementioned patent application. Other embodiments of the interpolation are possible.
Once the interpolation is computed separately for the two quaternions q1 and q2, a 4×4 dimension rotation matrix is computed, for example, by explicitly computing the matrix product M=Q1Q2*=Q2Q1* after having formed the matrices Q1 and Q2* (or Q2 and Q1*) as previously defined.
In a variant, the 6 acquired angles, ω1, θ1, φ1, ω2, θ2, φ2 can be quantized by a vector quantization method with a “hyper-rectangular” support ([0,π/2]×[−π,π]×[0,π]×[0,π]×[−π,π]×[0,π]), taking into account the respective intervals as defined above, for example, according to the TCQ method described in the article by J. P. Adoul entitled, “Lattice and Trellis Coded Quantizations for efficient Coding of Speech”, In: Ayuso, Soler (eds), Speech Recognition and Coding, NATO ASI Series, 1995.
In another embodiment, a scalar quantization can be implemented for the two angles ω1 and ω2 (corresponding to the real part of the quaternions) and a separate 3-dimensional spherical quantization can be carried out for the other angles, θ1, φ1 and θ2, φ2, corresponding to the imaginary parts of the quaternions. The quantization dictionary generally can be any discretization of the sphere by a finite number of points, but it is advantageous to use a quasi-uniform discretization (of the Lebedev, t-design type, etc.) for better performance, and, in variants, a discretization of the lat-long type (latitude-longitude) also can be used.
A lat-long approach is described below:
The components (bi,ci,di) are seen as a 3D Cartesian vector, which can be implemented in the form of spherical coordinates (ri, θi, φi). The radius ri is provided by ri=sin (ωi). The angles θi, φi are determined as described in the present disclosure, they respectively correspond to the longitude and the latitude. These angles can be quantized according to the dictionary (with angles in degrees) as described in section 3.2 in the article by Perotin et al. entitled, “CRNN-based multiple DoA estimation using acoustic intensity features for Ambisonics recordings”, IEEE Journal of Selected Topics in Signal Processing, 2019:
However, it should be noted that the latitude is defined in the article by Perotin et al. with an arcsine function, therefore, it is worthwhile applying the conversion 90−{circumflex over (φ)}n in order to encode the angle φi defined herein, with no loss of generality, with an arccos function.
This involves a conditional quantization of 2 angles where the latitude θi is encoded first by finding the nearest neighbor from among 90−{circumflex over (φ)}n,n=0, . . . , 1(α) and then the longitude θi is encoded by finding the nearest neighbor from among {circumflex over (θ)}mn as a function of the selected index n.
The number of necessary bits is ┌log2N(α)┐, where
The size of the dictionary is, for example, N(α)=429, 1687 or 4645 for α=10, 5 or 3 degrees (respectively), that is 9, 11, or 13 bits (rounding up to the nearest whole number).
Other levels of reconstructions 90−{circumflex over (φ)}n and/or {circumflex over (θ)}m (and other decision thresholds) can be used for a dictionary with a given size N(α).
In variants of the disclosure, a unit quaternion qi=(ai,bi,ci,di) is encoded in step E330 using the 4-dimensional spherical coordinates (while noting in this case that the radius is set to 1 because the quaternion qi is a unit quaternion):
a
i=cos(ϕi0)
b
i=sin(ϕi0)cos(ϕi1)
c
i=sin(ϕi0)sin(ϕi1)cos(ϕi2)
d
i=sin(ϕi0)sin(ϕi1)sin(ϕi2)
where ϕi0 is on [0, π] or [0, π/2] according to an aspect of the disclosure, ϕi1 on [0, π] and ϕi2 on [0, 2π].
The spherical coordinates can be determined by:
The special cases where some components are zero (for example: ci=0, di=0) are not processed in this case, but by convention the angle corresponding to zero values can be set to 0.
For this variant, all the embodiments described above for encoding 6 angles (separate or joint) apply.
In variants, other definitions of spherical coordinates can be used.
In a second embodiment, a 4-dimensional spherical vector quantization is implemented.
The fact that a component of the first quaternion has been forced to be positive allows the sign of said quaternion to be known and only the relevant positive quaternion to be encoded, with the negative quaternion corresponding to the same rotation. Thus, for the first quaternion, a hemispherical vector quantization is sufficient and allows the rate to be reduced compared to a spherical quantization.
Thus, q1 can be quantized with a hemispherical dictionary (in which the first component of each code word is positive) and q2 can be quantized with a spherical dictionary. The roles of q1 and q2 obviously can be interchanged in order to force a component to be positive and for the quantization.
Examples of dictionaries can be provided by predefined points in 4-dimensional regular or irregular polyhedrons. A simple example of a 4-dimensional spherical dictionary on 7 bits is provided by the 120 vertices of a “600-cell” that correspond to the combination of the following points:
The hemispherical version of such a dictionary comprises the following 60 points (on 6 bits):
with the first positive component.
Considering a spherical or hemispherical dictionary, the quantization firstly involves finding the nearest neighbor of the quaternion to be encoded; in general, this operation can be optimized by exploiting the underlying algebraic structure and by comparing only “absolute vectors” by scalar product. The quantization also includes an explicit step of indexing (computation of the quantization index identifying the nearest code word), and in general the index is computed by a permutation index (signed) and an offset depending on the absolute vector representing the nearest neighbor. These concepts are considered to be already known to a person skilled in the art.
This embodiment by vector quantization has the disadvantage of having to explicitly determine the quantization index and also of having to store certain elements describing the quantization dictionary. In order to be able to encode a quaternion with a budget of approximately 25 bits, the 4-dimensional dictionary must combine a very large number of combinations of representative points (leaders).
Also, in other variants, a conditional scalar quantization can be applied.
The following definition of the 4-dimensional spherical coordinates is used in this case:
a
i=cos(ϕi0)
b
i=sin(ϕi0)cos(ϕi1)
c
i=sin(ϕi0)sin(ϕi1)cos(ϕi2)
d
i=sin(ϕi0)sin(ϕi1)sin(ϕi2)
where ϕi0 is on [0, π] or [0, π/2] according to an aspect of the disclosure, ϕi1 on [0, π] and ϕi2 on [0, 2π].
The principle of the 3-dimensional spherical quantization is generalized in 4-dimensions by a dictionary of the lat-long type. To this end, the angle πi0 is converted into degrees and quantized with a scalar dictionary with a uniform pitch that depends on the interval:
The parameter α indicates the angular resolution (for example, α=5 degrees).
Then, the angle πi1 is converted into degrees and quantized with a scalar dictionary on the interval [0, π] for which the number of sub-intervals (and of levels of reconstructions {circumflex over (π)}i1)) depends on the value of {circumflex over (π)}i0. Finally, the angle πi2is converted into degrees and quantized with a scalar dictionary on the interval [0, πn] for which the number of sub-intervals (and of levels of reconstruction {circumflex over (π)}i2) depends on the value of the dictionaries of {circumflex over (π)}i0 and {circumflex over (π)}i1. Separate quantization dictionaries defining {circumflex over (π)}i1 and {circumflex over (π)}i2 can be defined, in one example, from a 3-dimensional spherical quantization dictionary of the lat-long type so that the size N (α′) thereof is adapted as a function of the value of {circumflex over (π)}i0. The particular benefit of this variant is to have an element of the vector quantization dictionary that corresponds to q1=q2=(1,0,0,0) and also to be able to more evenly distribute the elements of the 4-dimensional quantization dictionary compared to an independent scalar quantization for each of the angles πi), πi1, πi2.
In another variant of variable rate encoding, a step of selecting the type of encoding and of deciding to multiplex the indices is added. A bit is defined that can indicate that the dual quaternion is encoded by q1=q2=(1,0,0,0) (defined as default values), which at the same time indicates that the transformation matrix is an identity matrix. In this case, the bit budget for encoding a dual quaternion according to the main embodiment varies between two possible values: 1 bit if the dual quaternion is encoded by the values q1=q2=(1,0,0,0) and 52 bits (1+51) otherwise; in this latter case, an aspect of the disclosure described previously and in variants is applied, the quantization dictionary that is used may not contain the case q1=q2=(1,0,0,0) since this case is covered when a single bit is encoded.
In addition, the multiplexing block (390) adds an additional bit in order to indicate the encoding mode that is used.
An embodiment will now be described that is specific to the case of a 3×3 rotation matrix. The disclosure described above also applies with, in this case, q1=q and q2=−q. Therefore, only one quaternion (for example, q1) is encoded and advantageously the indication of the absence of a positive component F1=0 is used in order to save one bit. With the budget described in the embodiment, encoding on 25 bits is therefore acquired. All the possible variants of encoding are found in the 3×3 case (spherical vector quantization with a hemispherical dictionary, conversion into spherical coordinates and quantization of the angles, etc.).
In a variant of this embodiment, a step of selecting the type of encoding and of deciding to multiplex the indices also can be added. An overall bit is defined that can indicate that the quaternion is encoded by q=(1,0,0,0). In this case, the bit budget for encoding a quaternion according to this variant varies between two possible values: 1 bit if the quaternion is encoded by q=(1,0,0,0) and 26 bits (1+25) otherwise.
In all the variants described above for the encoding, entropic encoding (for example, of the Huffman or arithmetic type) also can be optionally used after the quantization, which in general can reduce the average rate at the expense of a variable rate.
The decoding method will now be described, and in particular the inverse quantization step carried out by the block 400 of
The quantization indices idx1, idx2 and idx3 for each of the quaternions i are received by the decoder in steps E601, E602 and E603 and are decoded in steps E611, E612, E613, respectively.
The quantization interval width parameters F and the existence of positive and negative values of the interval T are also acquired in steps E611, E612 and E613. The indication of the absence of a positive component F1 is also acquired in step E611.
Thus, in step E611, Fi is acquired as a value of F and 0 as a value of T.
In step E612, 1 is acquired as a value of F and 1 is acquired as a value of T and, in step E613, 1 is acquired as a value of F and 0 is acquired as a value of T.
The inverse quantization steps carried out in steps E611, E612 and E613 allow the angles ωi, to be decoded in step E621, θi in step E622 and φi in step E623.
The components of the respective quaternions ai,bi,ci,di are then decoded in step E630 from the angles thus decoded, according to the following formulae:
a
i=cos(ωi)
b
i=cos(θi)sin(ϕi)sin(ωi)
c
i=sin(θi)sin(ϕi)sin(ωi)
d
i=cos(ϕi)sin(ωi)
The step of inverse quantization of an angle, such as step E611, E612 or E613 will now be described in the flowchart of
In step E640, starting with the respective values of idx (quantization index of an angle), T and F as defined above, the parameters N and N′ are defined in step E641. N=2R and N′=N/2, with R being a bit value set to 8 bits in this embodiment. In step E642, the value of F is checked. If F=0, the width of the quantization interval is reduced by half and therefore has π/2 as the maximum value in step E643. Otherwise, if F=1, the quantization interval is not reduced and its maximum value in step E644 is π.
In step E645, the value of a sign parameter s is set to the value 0.
In step E646, the value of T is checked. If T is equal to 1, the value of N is updated in step E648 (N=N′).
In step E649, the value of idx is checked. If said value is not greater than or equal to N′, then, in step E670, the value of s is set to 1 and that of idx is updated by subtracting it from N′. Indeed, the indices from 0 to N′−1 correspond to the positive values and the indices from N′ to N−1 correspond to the negative values.
In step E671, a quantization pitch d is defined as being the maximum value of the interval on N−1.
In the case where T is equal to 0 in step E646, the method proceeds directly to step E671.
In the case where idx is greater than or equal to N′ in step E649, the method proceeds directly to step E671.
In step E672, the value of a is computed as being the following value: (−1)s.idx.d.
The value of α, i.e., the value of the angle in step E673, is thus decoded.
In other embodiments, the step of inverse quantization of the decoding method, carried out by the block 400 of
Thus, inverse spherical vector quantization can be carried out. An indication of the existence of a positive component received for the decoding makes it possible to know whether a hemispherical quantization has been carried out for the encoding in order to encode a first quaternion. In this case, the inverse quantization will be an inverse hemispherical vector quantization for decoding this first quaternion.
Similarly, in the case whereby a conditional scalar quantization using spherical coordinates is used on the encoder, the decoder will carry out an inverse quantization in order to retrieve these spherical coordinates and decode the corresponding quaternions.
The encoding device DCOD comprises a processing circuit typically including:
The decoding device DDEC comprises a clean processing circuit, typically including:
Of course, this
Although the present disclosure has been described with reference to one or more examples, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of the disclosure and/or the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2013954 | Dec 2020 | FR | national |
This Application is a Section 371 National Stage Application of International Application No. PCT/FR2021/052257, filed Dec. 9, 2021, which is incorporated by reference in its entirety and published as WO 2022/136760 A1 on Jun. 30, 2022, not in English.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FR2021/052257 | 12/9/2021 | WO |