The present invention relates to spherical vector quantization applied to the coding/decoding of sound data, in order to code source directions of arrival (abbreviated to “DoA”) that are generally represented by spherical coordinates (for example azimuth and elevation, at a predetermined distance).
Encoders/decoders (hereinafter called “codecs”) that are currently used in mobile telephony are mono (a single signal channel to be rendered on a single loudspeaker). The 3GPP EVS (for “Enhanced Voice Services”) codec makes it possible to offer “Super-HD” quality (also called “High Definition Plus” or HD+ voice) with a super-wideband (SWB) audio band for signals sampled at 32 or 48 kHz or full band (FB) audio band for signals sampled at 48 kHz; the audio bandwidth is 14.4 to 16 kHz in SWB mode (9.6 to 128 kbit/s) and 20 kHz in FB mode (16.4 to 128 kbit/s).
The next quality evolution in conversational services offered by operators should consist of immersive services, using terminals such as smartphones equipped with multiple microphones or remote presence or 360° video spatialized audio-conferencing or video-conferencing equipment, or even “live” audio content sharing equipment, with spatialized 3D sound rendering that is much more immersive than simple 2D stereo rendering. With the increasingly widespread use of listening on a mobile telephone with an audio headset and the onset of advanced audio equipment (accessories such as a 3D microphone, voice assistants with acoustic antennas, virtual reality or augmented reality headsets, etc.), capturing and rendering spatialized sound scenes is now widespread enough to offer an immersive communication experience.
To this end, the future 3GPP standard “IVAS” (for “Immersive Voice And Audio Services”) is proposing to extend the EVS codec to immersive audio by accepting, as codec input format, at least the spatialized sound formats listed below (and their combinations):
There is also the issue of potentially considering other input formats such as the format called MASA (Metadata assisted Spatial Audio), which corresponds to a parametric representation of a sound pick-up on a mobile telephone equipped with multiple microphones. This format is studied in more detail below.
The signals to be processed by the encoder/decoder take the form of successions of blocks of sound samples called “frames” or “subframes” below.
Furthermore, below, mathematical notations follow the following convention:
Hereinafter, we will denote the sphere δn of radius r in dimension n+1 defined as
where ∥⋅∥ denotes the Euclidean norm. When the radius r is not specified, it will be assumed that r=1 (unit sphere). The focus here is on the case of dimension 3, where n=2. A reminder will be given here of the definition of the spherical coordinates in dimension 3. For a point (x, y, z) in dimension 3, there are generally at least two classical conventions of spherical coordinates denoted (r, ϕ, θ):
The angles ϕ, θ are defined here in radians, without loss of generality.
The radius r and the azimuth (or longitude) θ are the same in these two definitions, but the angle ϕ differs depending on whether it is defined with respect to the horizontal plane 0xy (elevation or latitude over the interval [−π/2, π/2]) or based on the axis 0z (colatitude or polar angle over the interval [0, π]). The azimuth θ may be defined over an interval [−π,π], and, in equivalent fashion, it may be defined over [0,2π] by a simple operation of modulo 2π. Hereinafter, the same angular coordinates will preferably be represented in degrees, but other units may be used. It should be noted that the symbols may be different in the literature (for example φ instead of ϕ) and/or swapped (for example θ for colatitude and φ for longitude). Hereinafter, the convention that is adopted will preferably be that of using the elevation and azimuth pair, but the invention is applicable to all variant definitions of spherical coordinates.
What are of interest in the invention are exemplary embodiments of spherical vector quantization applied to the coding of 3D directions of audio sources. The invention may also be applied to other audio formats and to other signals (for example images or 360 video) in which spherical data in dimension 3 are to be coded.
The principles of DiRAC (Directional Audio Coding) will be recalled below. In some variants, it is possible to apply the invention to other coding schemes, in particular for transform-based audio coding.
DiRAC coding is described for example in the article V. Pulkki, Spatial sound reproduction with directional audio coding, Journal of the Audio Engineering Society, vol. 55, no. 6, pp. 503-516, 2007. In that document, mapping is carried out through directional analysis in order to find a direction (DoA) for each sub-band. This DoA is supplemented by a “diffuseness” parameter, thereby giving a parametric description of the sound scene. The multi-channel input signal is coded in the form of transport channels (typically a mono or stereo signal obtained by reducing multiple picked-up channels) and spatial metadata (DoA and “diffuseness” for each sub-band).
The DoA is coded (block 140) on a predetermined number of bits (for example 7 bits) per pair (ϕ, θ) in each frame and each sub-band. The “diffuseness” ψ is a parameter between 0 and 1, and is coded here (block 150) by scalar quantization (for example on 6 bits). In the example given, the spatial metadata coding budget is therefore 24×(7+6)=312 bits per frame, that is to say 15.6 kbit/s, for a global budget of 24.4+15.6=40 kbit/s. The “downmix” signal coding bitstream and the coded spatial parameters are multiplexed (block 160) so as to form the bitstream of each frame.
Based on the decoded signal, a decorrelation is carried out (block 230) so as to have a “diffuse” version (corresponding to a maximum source width); this decorrelation also achieves an increase in the number of channels so as, at the output of block 230, to obtain a 1st-order ambisonic signal with 4 channels (W, Y, Z, X). The decorrelated signal is decomposed into times/frequencies (block 240). The signals resulting from blocks 240 and 260 are combined (block 275) by sub-band, after applying a scaling factor (blocks 273 and 274) obtained from the decoded “diffuseness” (blocks 271 and 272); this adaptive mixing makes it possible to “dose” the source width and the diffuse character of the sound field in each sub-band. The mixed signal is converted into the time domain (block 280) by a filter bank or an inverse short-time transform.
The directions of sources in the DiRAC format are therefore represented in the form of 3D spherical data, typically in the form of spherical coordinates (azimuth, elevation) according to the geographical convention. In this context, there is a need to represent this DoA information effectively, this being able to be formulated as a vector quantization problem on the sphere 2 in dimension 3.
Another example of a parametric format for immersive audio is the MASA format described in the contribution “3GPP Tdoc S4-180087: On IVAS audio formats for mobile capture devices. Source: Nokia Corporation”. The principle is summarized in
Block 310 carries out parametric analysis of the signals coming from block 300, using a similar DiRAC approach, which provides transport channels and metadata. This MASA analysis is generally proprietary and selected by the manufacturer of the telephone. The number of transport channels is typically limited to 1 (mono) or 2 (stereo), and may be defined simply by selecting the primary microphone in the mono case or two opposite microphones (for example one at the bottom and another at the top of the telephone) in the stereo case. One example of a MASA metadata format is described for example in the contribution “3GPP Tdoc S4-191167 (October 2019), Description of the IVAS MASA C Reference Software, Source: Nokia Corporation”. What is of particular interest here is the parameter called “Direction index”, which is coded on 16 bits and described as follows in that document: “Direction of arrival of sound in a time-frequency interval; Spherical representation with an accuracy of around 1 degree; Interval of values: “covers all directions with an accuracy of around 1°”.
This therefore involves a source direction (DoA) according to a 3D spherical grid whose (angular) resolution is close to 1 degree. This DoA information is provided for each frame and frequency sub-band by a DoA estimate (block 311). Block 312 (inside block 310) therefore codes DoA information coded on 16 bits per DoA.
Block 320 represents the IVAS codec, which is not yet available as a 3GPP standard and is still under development. However, it has been proposed in the 3GPP for the MASA parametric format defining transport channels and metadata (including DoA per frame and sub-band) to be an input format of the IVAS codec. The (future) IVAS encoder should then implement a step of decoding the DoA information (block 321) in order to be able to fully exploit this DoA information and compress it at a lower rate. The implementation details regarding the compression of an input MASA format into an IVAS bitstream at a given rate and the associated decoding are beyond the scope of this invention, but it may for example be noted that the MASA format is based on an extended principle of DiRAC coding, the transport channels may be coded separately (by a mono core codec) or together (by a stereo core codec), and the metadata may be coded at a rate lower than in the MASA input format.
In general, any discretization of the sphere δ2 may be used as a spherical vector quantization dictionary. However, without any particular structure, searching for the nearest neighbor and indexing in this dictionary may prove costly to implement, above all when the coding rate of the DoA information is excessively high (for example 16 bits per 3D vector indicating a DoA).
One example of a 3D spherical grid is given in the Appendix and in the source code attached to the contribution “3GPP Tdoc S4-191167 (October 2019), Description of the IVAS MASA C Reference Software, Source: Nokia Corporation”.
The spatial direction of an audio source in a given frame and a given sub-band of a WMASA format proposal is represented by two angles: azimuth and elevation. The notations used hereinafter are ϕ for elevation and θ for azimuth, while the opposite convention is used in the document 3GPP Tdoc S4-191167.
That document gives a definition of a spherical grid as follows:
The grid consists of Ntot=216−208=65328 points discretizing the surface of a 3D sphere of radius 1; each point is represented by a single index on 16 bits. This grid is defined by three stored elements:
R(1)≈0.999916868023083
It is possible to verify that the total number of points in the grid is:
The document cited above gives one method for coding a given point (ϕ, θ).
Given a point (ϕ, θ) to be coded, the quantization (search for the nearest neighbor) on the grid is carried out according to the following steps:
The quantization index (on 16 bits), denoted index here, of the selected point (sgnϕ·{circumflex over (ϕ)}(idϕ), {circumflex over (θ)}(idϕ, idθ)) is obtained by enumerating the points on the grid starting from the equator (all points of elevation ({circumflex over (ϕ)}(0)=0), then considering the first layer above the equator (all points of elevation +{circumflex over (ϕ)}(1)=δϕ), then the first layer below the equator (all points of elevation −{circumflex over (ϕ)}(1)=−δϕ), etc.
This gives an index in the form index within the interval 0, . . . , Ntot−1 where:
The cumulative cardinality values cumN are computed on the fly each time the index index is determined:
cumN(0)=Nθ(0)
The decoding method in the document cited above is explained in the flowchart in
The principle of the decoding is that of successively comparing the value index with the successive cumulative cardinality values cumN (or cardinality sums), which are computed recursively on the fly for i=0, . . . , Nϕ−1, taking into account the fact that the cardinalities Nθ(i) are identical for elevations of the same absolute value (in the Northern and Southern hemispheres). The sign of the elevation sgnϕ is decoded by exploiting the predefined order in which the spherical layers are written: equator, first layer with a positive elevation (+), first layer with a negative elevation (−), . . . , up to the North pole (+) and South pole (−) . . . . The values of idϕ, sgnϕ, cumN(0) are initialized (block 401).
If index≥cumN(0) (block 402), the decoding of the information is carried out for the “elevation layers” outside the equator of index i>0. The search for the “elevation layer” is carried out in a loop, starting from i=1 up to i=Nϕ−1 (blocks 403, 404, 411). In iteration i, the cumulative cardinality is computed recursively (blocks 405, 408) and compared with the index (block 406, 409) in order to decode the indices (blocks 407, 410).
If index<cumN(0) (block 402), the decoding of the indices of the information is carried out for the layer corresponding to the equator (block 412).
It should be noted that, in the implementation in the source code attached to the contribution 3GPP Tdoc S4-191167, a test for verifying whether i=Nϕ−1 is implemented in order to explicitly decode idϕ=Nϕ−1, sgnϕ=−1, idθ=0. This part is not adopted because the sign sgnϕ=1 should also be possible in a grid containing the North and South pole, and it is normally needless because the definition cumN(Nϕ−1) should allow the points associated with the poles to be decoded. The specific management of the poles may be neglected; the important thing is the principle of carrying out iterative decoding by comparing the index with a cumulative cardinality (or sum of cardinalities) computed over time. Once the indices idϕ, sgnϕ and idθ have been decoded, the reconstruction of the spherical coordinates (sgnϕ·{circumflex over (ϕ)}(idϕ), {circumflex over (θ)}(idϕ, idθ)), in 413, adopts the definition of the grid defined above with:
This method as implemented in the contribution 3GPP Tdoc S4-191167 cited above requires preliminary storage of Nϕ=122 floating values ({circumflex over (ϕ)}(i)) for scalar quantization of the (positive) elevation, Nϕ integer values giving Nθ(i) values for each (positive) elevation layer, and an integer value giving Nϕ. The grid does not use all possible values of indices on 16 bits, since 208 indices (from 65328 to 65535) are unused.
The main drawback of this method is that its complexity is very high, of the order of 123 WMOPS for coding (for weighted millions of operations per second) and 12 WMOPS for decoding, assuming 24 sub-bands (therefore 24 DoA per frame) and a temporal resolution of 5 ms (therefore one frame every 5 ms). This cost is high in particular due to the scalar quantization of the elevation being implemented by searching in a stored dictionary and above all due to the cumulative cardinalities cumN(i) being computed on the fly.
There is therefore a need to improve the methods from the prior art for 3D dimension spherical data quantization, in particular in order to efficiently code DoA data, with if possible the least possible complexity and while avoiding having unused indices for a given total number of points (or equivalently a given bit budget).
The invention aims to improve the prior art.
To this end, the invention targets a method for coding a spatial direction of a sound source, this direction being defined by spherical coordinates comprising an elevation coordinate and an azimuth coordinate, wherein a spherical quantization dictionary is defined on a 3D sphere by an elevation coding and an azimuth coding, and wherein:
The cumulative cardinality values used to define the spherical quantization dictionary, in particular to determine the number of quantization levels for the azimuth coordinate, are thus based on a direct estimate of the area of spherical zones, thus avoiding on-the-fly and recursive computing of the sum of cardinalities used in the method proposed in the prior art, which is highly resource-intensive.
The method proposed here is significantly less resource-intensive, and is for example of the order of 2 WMOPS for coding and 1 WMOPS for decoding.
Defining such a quantization dictionary also makes it possible to exploit all possible points (or codewords) of the dictionary so as to make the quantization more efficient and avoid having unused indices (or codewords) in the grid. The invention is applied in particular in order to implement a more efficient method for coding and decoding DoA information on 16 bits to define the MASA format at input of an IVAS coding.
In one embodiment, the elevation coding includes levels corresponding to the equator and to the poles of the 3D sphere, thereby making it possible to include all particular points (equators and poles) of the sphere in the quantization dictionary.
In one embodiment, a number of points for the azimuth coding is predetermined for the elevation level corresponding to the equator, and the total number of points is obtained by subtracting, from a target number of points, the predetermined number of points corresponding to the equator and each of the North and South poles of the sphere, according to the following expression: Ntot′=Ntot−Nθ(0)−2Nθ(Nϕ−1),
The method is thus adapted to the knowledge of the number of points for certain particular spherical layers, such as the one corresponding to the equator and those corresponding to the poles, which may be defined at a fixed value.
In one particular embodiment, the cumulative cardinality value for a coded elevation index is representative of a number of points proportional to the total number of points according to the area (Ai) of a spherical zone delimited by the upper horizontal plane of the positive elevation level of the coded elevation index and this same plane of the sphere symmetrical with respect to the equator minus the area (A0) corresponding to the elevation level of the equator, according to the following ratio:
Nϕ−2 being the number of elevation quantization levels without the equator and the North and South poles of the sphere and AN
In one variant embodiment, the cumulative cardinality value for a coded elevation index is representative of a number of points proportional to the total number of points according to the area (A′i) of a spherical zone delimited by the upper horizontal plane of the positive elevation level of the coded elevation index and that of the equator minus half the area corresponding to the elevation level of the equator, according to the following ratio:
Nϕ−2 being the number of elevation quantization levels without the equator and the North and South poles of the sphere and A′N
These ratios of areas of spherical zones make it possible to estimate, easily and directly through a simple rule of three, the number of points in the corresponding spherical zones that are subsets of the complete surface of the 3D sphere.
These ratios make it possible to express cumulative cardinality values as follows:
In one embodiment, the elevation coding gives a coded elevation index (i) on a number of elevation levels (Nϕ) and sign information.
Thus, only one hemisphere is considered to define the quantization dictionary, the number of elevation levels and the number of points per level being symmetrical about the equator.
In one embodiment, a global quantization index to be transmitted (index) is determined based on an azimuth index coded by scalar quantization on the determined number of points per level (Nθ(i)) and a cumulative cardinality value obtained based on at least the coded elevation index.
The cardinality values thus defined may be estimated directly (analytically) in order to define the global index to be transmitted, thereby making it possible to reduce the maximum computational complexity.
The invention also relates to a method for decoding a spatial direction of a sound source, this direction being defined by spherical coordinates comprising an elevation coordinate and an azimuth coordinate, wherein a spherical quantization dictionary is defined on a 3D sphere by an elevation coding and an azimuth coding, and wherein:
The decoding method has the same advantages as the coding method, and makes it possible to optimize computing resources by using an optimized spherical quantization dictionary.
In the same way as for the coding and according to the same advantages, in one embodiment, the elevation decoding includes levels corresponding to the equator (0°) and to the poles (+/−90°) of the 3D sphere.
According to one particular embodiment, a number of points (Nθ(0)) for the azimuth decoding is predetermined for the elevation level corresponding to the equator, and the total number of points (Ntot′) is obtained by subtracting, from a target number of points (Ntot=216), the predetermined number of points corresponding to the equator and each of the North and South poles of the sphere, according to the following expression: Ntot′=Ntot−Nθ(0)−2Nθ(Nϕ−1),
In one embodiment, the cumulative cardinality value (cumN(i)) for a decoded elevation index (i) is representative of a number of points proportional to the total number of points according to the area (Ai) of a spherical zone delimited by the upper horizontal plane
of the positive elevation level of the decoded elevation index (i) and this same plane of the sphere symmetrical with respect to the equator
minus the area (A0) corresponding to the elevation level of the equator, according to the following ratio:
Nϕ−2 being the number of elevation quantization levels without the equator and the North and South poles of the sphere and AN
In one possible example, the expression for the cumulative cardinality value is as follows:
with
According to one embodiment, the elevation decoding gives a decoded elevation index (i) on a number of elevation levels (Nϕ) and sign information.
In one embodiment, the decoding comprises receiving a global quantization index (index) and determining, based on this index, a cumulative cardinality value obtained on the basis of at least the decoded elevation index and a decoded azimuth index on a determined number of points per level (Nθ(i)).
The invention targets a coding device comprising a processing circuit for implementing the steps of the coding method as described above.
The invention also targets a decoding device comprising a processing circuit for implementing the steps of the decoding method as described above.
The invention relates to a computer program comprising instructions for implementing the coding or decoding methods as described above when they are executed by a processor. Finally, the invention relates to a storage medium able to be read by a processor and storing a computer program comprising instructions for executing the coding method or the decoding method described above.
Other features and advantages of the invention will become more clearly apparent on reading the following description of particular embodiments, which are given by way of mere illustrative and non-limiting examples, and the appended drawings, in which:
The invention described below relates to the quantization of spherical data in dimension 3. This is applicable, by way of example, to the coding and decoding of spatial directions of sound sources (DoA), for coding and decoding for example of MASA data as described later with reference to
Without loss of generality, the definition of 3D spherical coordinates in degrees in line with the geographical convention that is used in the description of the MASA format reference proposal will be adopted here.
The radius, which is set to 1 here, will be omitted, keeping only the azimuth and the elevation in the case of coding a direction of sources (or DoA), as in a DiRAC or MASA scheme. In some variants and for certain applications (for example quantization of a sub-band in transform-based coding), it will be possible to code a radius separately (corresponding to a mean amplitude level per sub-band for example).
In some variants, units other than degrees (for example radians) will be used, and conventions other than the geographical convention will be used; for example elevation may be replaced with colatitude. Thus, other equivalent spherical coordinate systems (obtained for example by permuting or inverting Cartesian coordinates) may be used according to the invention—it will be sufficient to apply the necessary conversions in the definition of the scalar quantization dictionaries, the reconstruction, etc. The coding and the decoding according to the invention is applicable to all definitions of spherical coordinates, and it is thus possible to replace ϕ, θ with other spherical coordinates by adapting the conversion between Cartesian coordinates and spherical coordinates.
This discretization uses a number of positive levels (Nϕ) for the Northern hemisphere and an elevation sign indication (indicating the Northern or Southern hemisphere), which is tantamount to 2Nϕ−1 levels for coding elevation in both the Northern and Southern hemispheres of the 3D sphere.
As illustrated in
It should be noted that
The azimuth coding uses a scalar quantization, according to the number of azimuth levels (Nθ(i)) (also called number of points per level) depending on the coded (positive or absolute) elevation index (i=0, . . . , Nϕ−1), this number of levels Nθ(i) being symmetrical about the equator for the Northern and Southern hemispheres.
The determination of this number of azimuth levels is described below.
In order not to overload this figure, the subdivision of these horizontal layers into equally distributed “regions” according to the discretization of the azimuth with a number Nθ(i) of azimuth levels depending on the elevation level is not shown.
The spherical grid according to the invention discretizes the elevation and the azimuth separately by scalar quantization, with a uniform discretization of the azimuth according to a number of levels Nθ(i)) depending on the (positive or absolute) coded elevation value i. However, the optimum search for the nearest neighbor in the grid involves selecting two elevation candidates, and therefore also two associated azimuth candidates in order to select the best candidate; this is therefore tantamount to joint coding even though it is separate in practice, and the actual decision regions of the grid (in terms of Voronoi regions on the surface of the sphere) are therefore not spherical rectangles. For indexing purposes (coding of the global index and decoding), the discretization of the surface of the 3D sphere may nevertheless be seen as a separate elevation and azimuth division to obtain spherical rectangles (excluding caps at the poles).
The coordinates ϕ and θ are coded separately, with a scalar quantization dictionary {s·{circumflex over (ϕ)}(idϕ), idϕ=0, . . . , Nϕ−1, s=+1 or −1} with Nϕ levels for |ϕ| and with sign information s (indicating the Northern hemisphere for s=+1 or Southern hemisphere for s=−1) and a set of uniform scalar quantization dictionaries {{circumflex over (θ)}(i, j), j=0, . . . , Nθ(i)−1} with Nθ(i) levels for θ according to the coded (positive or absolute) elevation index i.
The total number of points of the sphere discretized according to the various determined numbers of levels, also called the total number of points in the 3D grid, is given, in one particular embodiment, by:
This number includes the points Nθ(0) on the elevation level (the spherical layer) corresponding to the equator (C0, idϕ=0) and the points Nθ(i) of each positive elevation level of index i=1, . . . , Nϕ−1, also called elevation layer Ci, these being symmetrical between the Northern and Southern hemisphere and therefore counted in duplicate.
The spherical grid is therefore defined as the following spherical vector quantization dictionary (with a radius assumed to be equal to 1 by convention):
It should be noted that the value of s for i=0 is arbitrary, since {circumflex over (ϕ)}(i=0)=0.
In the preferred embodiment, a 3D grid is defined for a given bit budget on 16 bits, for example, thus giving a total number of points of the sphere, that is to say Ntot=216. In some variants, other values of Ntot (and therefore bit budget values) will be possible.
The elevation is coded by scalar quantization on Nϕ reconstruction levels. In the preferred embodiment, Nϕ=122 is set as the number of positive levels, as in the grid in the MASA format described above. This makes it possible in particular to have an even number of levels in the Northern hemisphere (Nϕ including the North pole and the equator). If also taking into account the Southern hemisphere, the elevation is therefore coded on 2Nϕ−1 levels (counting the equator only once). The inclusion of the poles allows a complete representation of the sphere, and the impact is minimal since only 2 points of the grid are associated with the poles (when the sign is applied).
For the elevation coding by scalar quantization, a uniform quantization step δϕ (outside poles) is defined and the following is adopted:
with for example δϕ=0.7388 degrees, as in the grid in the MASA format described above. The quantization step is uniform over the interval [−{circumflex over (ϕ)}(Nϕ−2), {circumflex over (ϕ)}(Nϕ−2)] or [−(Nϕ−2)δϕ, (Nϕ−2)δϕ], if the sign is taken into account.
The azimuth θ is coded by scalar quantization on Nθ(i) levels. Use is preferably made of a uniform scalar quantization with a uniform scalar quantization dictionary, taking into account the cyclic nature of the interval [−180,180] degrees:
The azimuth dictionaries have an offset, as it is known, set to 0 for even values of i and
for odd values of i, in order to “shift” the “horizontal slice” (spherical layer) of the sphere (delimited by the elevation decision thresholds) associated with each elevation of index i such that the coded azimuths are aligned as little as possible from one successive layer to another.
In some variants, a uniform scalar quantization over the interval [0,90] degrees (including the values 0 and 90 as reconstruction levels) may be used for the elevation coding:
This is tantamount to changing the quantization step δϕ in order to have δϕ=90/(Nϕ−1), that is to say δϕ≈0.7438 when Nϕ=122; in this case, the poles are naturally included as codewords. The quantization step is uniform over the interval [−{circumflex over (ϕ)}(Nϕ−1), {circumflex over (ϕ)}(Nϕ−1)], that is to say [−90,90] degrees, if the sign is taken into account.
In other variants, it is possible to change the number of levels Nϕ or to take other definitions from the scalar quantization dictionary {{circumflex over (ϕ)}(i), i=0, . . . , Nϕ−1} for the (positive or absolute) elevation. It will however be assumed that {circumflex over (ϕ)}(i=0)=0° and {circumflex over (ϕ)}(i=Nϕ−1)=90°.
In other variants, the offset applied to the azimuth depending on the elevation layer may be different, the important aspect being that the number of azimuth levels is defined according to the invention.
According to the invention, the number of points per level (Nθ(i)) is determined on the basis of two successive cumulative cardinality values (cumN(i), cumN(i−1)), the cumulative cardinality value (cumN(i)) for a coded elevation index (i) being representative of a number of points proportional to a total number of points, and according to the area of a spherical zone comprising at least one zone delimited by the upper horizontal plane
of the given positive elevation level (i) and a lower horizontal plane (for example ϕ=δϕ/2). In the preferred embodiment, this spherical zone also comprises the symmetrical part in the Southern hemisphere comprising a zone delimited by the upper horizontal plane
of the given elevation level (i) and a lower horizontal plane (ϕ=−δϕ/2). The notation cumN(i) is adopted here, but it should not be confused with that used previously in the description of the prior art.
In one particular embodiment, the number of azimuth levels Nθ(i) is determined by predefined values for Nθ(0) and Nθ(Nϕ−1), which correspond to the equator and to one of the poles, respectively.
These predetermined numbers of points (Nθ(0) and Nθ(Nϕ−1)) are taken into account when determining the cumulative cardinality values and to define the total number Ntot′ of points used for this determination.
In this embodiment, the total number of points (Ntot′) is obtained by subtracting, from a target number of points (Ntot=216), the predetermined number of points corresponding to the equator and each of the North and South poles of the sphere according to the following expression: Ntot′=Ntot−Nθ(0)−2Nθ(Nϕ−1), Ntot being the target number of points of the sphere for a given bit budget, Nθ(0), the predetermined number of points for the elevation level corresponding to the equator and 2Nθ(Nϕ−1) the predetermined number of points for the North and South poles of the sphere.
In the main embodiment, Nθ(0) is an even value, Nθ(Nϕ−1)=1 and therefore when Ntot is even, Ntot′ is also even.
In one example illustrated in
of the given positive elevation level (i) and this same plane of the sphere symmetrical with respect to the equator
is illustrated by the hatched zone Ai. The area corresponding to the elevation level of the equator, shown at A0, is subtracted from this area in order to determine the cumulative cardinality value (cumN(i)) of the given positive elevation level (i).
For this purpose, a number of points is estimated based on the ratio (Ai−A0)/(AN
By design, this ratio gives exactly Ntot′ when i=Nϕ−2, thereby guaranteeing that the total number of points is used in full.
Since this ratio is generally a fractional number, it will have to be rounded to obtain the cumulative cardinality, and since the number of points is determined in the main embodiment for both the Northern and Southern hemispheres (thus in duplicate), the rounding will in this case be carried out to an even integer (the nearest lower or higher one).
It will be recalled that the surface area of an element around the point (θ, ϕ) on the sphere δ2 is given by dA=r2 cos ϕdθdϕ, where ϕ here is the elevation (if colatitude were to be used, this would give a term in sin ϕ). The partial surface area defined by a spherical zone delimited by two horizontal planes brought about by an elevation interval [ϕmin, ϕmax], where −90°≤ϕmin<ϕmax≤90°, the azimuth being over [−180°, 180°], is given by:
In particular, this gives the known result that the surface area of the sphere 2 of radius r is Atot=A(−90°, 90°)=4πr2 (for ϕmin=−90° and ϕmax=) 90°.
For a remaining number of points Ntot′ in the spherical grid (or spherical vector quantization dictionary) to be distributed in a spherical zone (subset of the surface of the 3D sphere) delimited by the horizontal planes
outside the central zone corresponding to the equator
each decision region associated with a point of the grid is approximated here by a “spherical rectangle” for indexing purposes (this corresponding to a separate coding decision in relation to the spherical coordinates). Each of these regions should ideally have a surface area of 4πr2/Ntot′ if the grid is uniform.
For a uniform discretization of the elevation over the interval [−(Nϕ−2)δϕ, (Nϕ−2)δϕ], as in the main embodiment, it is therefore possible to estimate the number of points on the grid contained within a spherical zone (or “spherical slice”) delimited by two horizontal planes associated with the decision thresholds
of the positive part (Northern hemisphere) of the sphere.
According to the ratio expressed above, expressing a simple rule of three
it is possible to express
and with
In one exemplary embodiment, the following is set:
N
θ(0)=430
In some variants, the value of Nθ(0) may be different but even.
Moreover, by convention Nθ(Nϕ−1)=1 is set, since a single point is sufficient to represent a pole.
The number of points per elevation level i is expressed by:
where
cumN(0)=0
and Arri( ) is a rounding to the (nearest lower or higher) integer depending on i. In the preferred embodiment, Arr1( ) is taken as the rounding to the upper integer, and Arri( ) is taken as the rounding to the closest integer for i=2, . . . , Nϕ−2.
It should be noted that the function 2Arri(x/2) corresponds in fact to a rounding to a (nearest lower or higher) even integer, thereby making it possible to divide the result by two in order to assign the integer half to each of the hemispheres.
It should be noted that, by definition,
Moreover, it should be noted that the notation cumN(i) is adopted here even though it is different from the one used previously in the description of a MASA format proposed in the prior art. Indeed, here cumN(i) corresponds to the cumulative cardinality of the spherical grid up to and including elevation layer i (with the layers in the Northern and Southern hemispheres), but not counting the equator. The definition of cumN(i) for i=1, . . . , Nϕ−2 in the description of the invention therefore corresponds to the equivalent of cumN(2i)−Nθ(0) in the definition from the prior art.
In variants where other quantization levels are defined for the elevation, the definition of cumN(i) according to sin
i=0, . . . , Nϕ−2, will be adapted by replacing
with other corresponding decision thresholds in the form
Thus, more generally, it will be possible to write:
One example of values obtained for the preferred embodiment is given below:
It may easily be verified that:
Nθ(0)+2Σi=1N
The cardinality Nθ(i) according to the invention thus makes it possible to guarantee that there is no unused index for a given total number Ntot. This property stems from the fact that the cumulative cardinality cumN(i) is defined such that cumN(Nϕ−2)=Ntot′.
In another exemplary embodiment, predetermined numbers are not set for the elevation levels corresponding to the equator.
In this case, the cumulative cardinality value is a rounded value of the following ratio:
with for example Ntot=216.
This then gives (outside the poles)
One example is given below of values obtained for this variant definition of cumN(i) in the case where Nϕ=122 and δϕ is defined according to the preferred embodiment:
It may in this case too easily be verified that:
In variants where other quantization levels are defined for the elevation, the definition of cumN(i) according to sin
i=0, . . . , Nϕ−2, will be adapted by replacing
with other corresponding decision thresholds in the form
In other variants, predetermined numbers are not set for the elevation levels corresponding to the equator (Nθ(0)) or to the North and South poles (Nθ(Nϕ−1)). This variant applies in particular to the case where the scalar quantization of ϕ is uniform over the interval [0,90] degrees (including the values 0 and 90 as reconstruction levels) with:
In this case, it is possible to define:
this gives:
N
θ(0)=cumN(0)
and
One example is given below of values obtained for this variant definition of cumN(i) in the case where Nϕ=122 and δϕ is defined according to the preferred embodiment:
It may in this case too easily be verified that:
In a second example illustrated in
of the given positive elevation level (i) and that of the equator is illustrated by the zone denoted A′i. Half of the area corresponding to the elevation level of the equator, shown at A0, is subtracted from this area in order to determine the cumulative cardinality value (cumN(i)) of the given positive elevation level (i).
For this purpose, a number of points is estimated based on the ratio (A′i−A0/2)/(A′N
with A′N
of the positive elevation level Nϕ−2 and that of the equator.
The result is equivalent to what was described above, since
In one variant embodiment, the cumulative cardinality value may be expressed taking into account only the number of points of the positive part of the sphere.
In this scenario, a number of points is estimated based on the ratio (A′i−A0/2)/(A′N
The expression of the cumulative cardinality value is given for example by:
And Arri( ) is a rounding to the nearest integer depending on i. In the preferred embodiment, Arr1( ) is taken as the rounding to the upper integer, and Arri( ) is taken as the rounding to the closest integer for i=2, . . . , Nϕ−2.
And the number of points per elevation level i is expressed by:
The quantization of the spherical coordinates and the search is carried out as follows:
The quantization indices selected in E206 correspond to the selected point: (sgnϕ·{circumflex over (ϕ)}(idϕ), {circumflex over (θ)}(idϕ, idθ)).
The indexing step E207 consists, based on the information sgnϕ, idϕ and idθ, in determining a unique index 0≤index<Ntot to be transmitted.
In this step, a global quantization index is determined on the basis of the separate indices resulting from the separate quantization of the spherical coordinates for the selected closest point.
This step is now described with reference to
In other cases,
It will be recalled here that the term
corresponds to the cumulative cardinality of the spherical grid up to and including elevation layer i (with the layers in the Northern and Southern hemispheres, and with the equator).
The index of a point (codeword) in elevation layer i being of the form:
the value of offset must therefore correspond to the cumulative cardinality up to the first point (codeword)—exclusive—of elevation layer i. In addition, the positive elevation layer i (Northern hemisphere) comes, by convention, before the negative elevation layer i, but these two layers have the same number of points Nθ(i).
The value offset is thus given by the cumulative cardinality including these positive and negative layers of index i, but subtracting either 2Nθ(i) when it is the positive layer or Nθ(i) when it is the negative layer.
For the coding of a point (or codeword) in this same layer, it is possible, as an equivalent, to define:
Here, the term
gives the cumulative cardinality up to the elevation layer of index i−1. This value corresponds directly to the value offset for the positive elevation layer of index i, and must be corrected by Nθ(idϕ) for the negative layer of index i.
According to the invention, this analytical method for determining the value offset gives the same result as the following sum, but with reduced complexity because the determination is more direct when Nϕ is high (for example Nϕ=122):
The global index index is thus obtained in E606, through separate coding of the separate quantization indices sgnϕ, idϕ and idθ of the best candidate and through the use of the corresponding cumulative cardinality values.
It should be noted that the determination of the value offset is described here for the interval Nθ(0)≤index<Ntot−2. In some variants, the interval in question will be able to be divided into sub-intervals of indices, and the value of offset will be able to be determined either analytically or by direct summing, with Nθ(i) defined according to the invention, according to the sub-interval under consideration.
In one embodiment, it will be possible to use pre-storage (tabulation) of the cumulative cardinality values offset according to idϕ and sgnϕ, which gives (analytically or by direct summing) the result of the cumulative sum of cardinalities of successive spherical layers (or “sets of horizontal slices”). This sum may be interpreted as the cardinality of a spherical zone (the number of points of the partial grid ranging from the elevation of index 0 to the elevation of index i, alternating between Northern and Southern hemisphere).
In some variants, it will be possible not to store the values offset according to idϕ and sgnϕ, but to compute them “online” (on the fly) based on the definition of offset as the cumulative sum of Nθ(i) with the correction on the basis of idϕ and sgnϕ.
However, this adds computational complexity that may be non-negligible if the grid contains a large number of elevation levels (Nϕ high).
In some variants, it will be possible to replace offset using the definition of cumN′ and taking into account the fact that the cardinality in this case corresponds to a hemisphere.
The corresponding decoding method is now described with reference to
Like for the coding, the spherical quantization dictionary is defined on a 3D sphere by an elevation decoding and an azimuth decoding. This spherical quantization dictionary is illustrated and described with reference to
In the same way as for the coding, the elevation decoding uses a scalar quantization, giving at least one decoded elevation index (i) on a number of elevation levels (Nϕ), the azimuth decoding uses a scalar quantization, according to a number of points per level (Nθ(i)) depending on the decoded elevation index (i), the number of points per level (Nθ(i)) is determined on the basis of two successive cumulative cardinality values (cumN(i), cumN(i−1)), the cumulative cardinality value (cumN(i)) for a decoded elevation index (i) being representative of a number of points proportional to a total number of points and according to the area of a spherical zone comprising at least one zone delimited by the upper horizontal plane
of the positive elevation level of the decoded elevation index (i) and a lower horizontal plane of the sphere.
First of all, indexing described with reference to
Given the global index index in step E210 of
In step E212-1, as in coding step E203-1, a number of scalar quantization levels Nϕ is determined. In the main embodiment, this step is tantamount to simply setting Nϕ=122.
The decoding of the elevation information sgnϕ, idϕ is in E213-1. This step is detailed later with reference to
The decoded elevation is reconstructed in E214-1 as sgnϕ·{circumflex over (ϕ)}(idϕ) where
In some variants, other uniform or non-uniform quantization dictionaries {{circumflex over (ϕ)}(i)} will be possible, in a manner identical to the coding.
The decoding of the azimuth index idθ is in E213-2. This step is detailed later with reference to
The value of the offset is determined as defined in the coding, and the azimuth {circumflex over (θ)} (idϕ, idθ) is reconstructed in E214-2 as:
This gives in particular {circumflex over (θ)} (idϕ=Nϕ−1, idθ=0)=−180 with Nθ(idϕ=Nϕ−1)=1. This thus gives the spherical coordinates ({circumflex over (ϕ)}(i), {circumflex over (θ)}(i, j)) of the decoded point in E215.
Steps E213-1 and E213-2 are detailed together in
Based on the global index 0≤index<Ntot to be decoded (E700), the sign information sgnϕ=1 is set by default (E701). If the index satisfies index<Nθ(0), this indicating that it is a point on the equator (E702), direct decoding is carried out:
Otherwise, if the index satisfies index≥Ntot−2, this indicating that it is a point on the North or South pole (E704), direct decoding is carried out:
Otherwise, in the other cases (Nθ(0)≤index<Ntot−2), in one preferred embodiment, the index idθ is for example estimated by inverting the analytical computation carried out in step E605 of
It is possible to estimate idϕ as:
In some variants, an approximation of the arcsine function is used.
The following is adopted (in E708):
In some variants, other approximations of the arcsine function may be used, in particular other polynomials P(x) of a different degree may be used.
It should be noted that the estimate of idϕ used above has the noteworthy property of being accurate to within an overestimate of idϕ by one unit—in general, it gives the correct value of idϕ, and if not, idϕ is underestimated by one unit.
In some variants, other estimates of idϕ (exact or to within a value) may be used.
In the preferred embodiment, the decoding (E709) is then carried out as follows based on the estimate of idϕ by inverting the arcsine function or based on an approximation. An initial value is determined for:
In some variants, it will be possible to compute, in an equivalent manner (with the same result):
Based on this initial value offsetinit, the values of idϕ, sgnϕ and offset may then be determined as follows:
Where a←b indicates that the existing value of a is replaced by the result of the expression b.
The step of correcting idϕ←idϕ+1 when index≥offsetinit is specific to the exemplary embodiment of estimating idϕ by inverting the arcsine function. If this step is carried out, the value of offsetinit corresponds to the cumulative cardinality of the grid up to the lower layer idϕ−1.
Otherwise, if this step of correcting idϕ is not carried out, the value of offsetinit corresponds to the cumulative cardinality of the grid up to the elevation layer idϕ. The justification of the steps of correcting the value offsetinit in the form offset−Nθ(idϕ) is detailed with reference to
In some variants, it will be possible to define, as initial value offsetinit, the cumulative cardinality for the lower elevation layer of index i−1 and correct the value of offsetinit equivalently. The initial value is given by:
Based on this initial value offsetinit, the values of idϕ, sgnϕ and offset may then be determined as follows:
Where a←b indicates that the existing value of a is replaced by the result of the expression b. In this case, the steps of correcting the value offsetinit are in the form offsetinit+Nθ(idϕ).
In some variants, it will be possible to adapt the principle of correcting the value of idϕ and of offsetinit according to the method for estimating idϕ, in order to determine the values of idϕ, sgnϕ and offset.
In some variants where the cumulative cardinality is defined differently, with corresponding values Nθ(i), the analytical definition of the initial estimate of offset will be adapted.
In some variants, other integer and exact estimates (giving the same results) of idϕ and other direct or indirect methods for determining idϕ, sgnϕ and offset will be able to be used, as long as they do not change the decoding result. Indeed, since the values of idϕ and offset are integers, the sign sgnϕ being able to be seen as a signed integer, alternative methods may be implemented, as long as they give identical values for idϕ, sgnϕ and offset.
One example of a decoding variant for idϕ, sgnϕ, and offset that is of only little interest but that has the merit of illustrating one example of an alternative method would consist in simply exhaustively running through all possible values idϕ, sgnϕ and idϕ and in computing the corresponding index as at the encoder and in selecting the combination that leads exactly to index=offset+idϕ.
The decoding of the index is deterministic in the sense that, for a given value index, the values idϕ, sgnϕ and idϕ are unique and are integers.
In all cases, the decoding of idϕ, sgnϕ and offset relies on the values of Nθ(i) according to the invention with the possibility of analytically determining a cumulative cardinality.
Finally, decoding idθ (E710) is tantamount simply to subtracting the decoded cumulative cardinality value (offset) from the received global index (index):
In some variants, it will be possible to replace offset using the definition of cumN′ and taking into account the fact that the cardinality in this case corresponds to a hemisphere.
In
Thereby resulting in the value of offset being corrected by offset←offsetinit−2Nθ(idϕ) and giving offset=65514.
In
Thereby resulting in the value of offset being corrected by offset←offsetinit−Nθ(idϕ) and giving offset=65524.
In some variants, the value of offsetinit may correspond to the cumulative cardinality up to the lower elevation layer, or it may be obtained by a direct sum based on the values Nθ(i) according to the invention.
In some variants where the cumulative cardinality is defined differently, with corresponding values Nθ(i), the analytical definition of the initial estimate of offsetinit will be adapted.
The coding device DCOD comprises a processing circuit typically including:
The decoding device DDEC comprises its own processing circuit, typically including:
Of course, this
Although the present disclosure has been described with reference to one or more examples, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of the disclosure and/or the appended claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| FR2201286 | Feb 2022 | FR | national |
This application is a Section 371 National Stage Application of International Application No. PCT/EP2023/053413, filed Feb. 13, 2023, and published as WO 2023/152348 A1 on Aug. 17, 2023, not in English, which claims priority to French Patent Application No. 2201286, filed Feb. 14, 2022, the contents of which are incorporated herein by reference in their entireties.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/EP2023/053413 | 2/13/2023 | WO |