Coding and Decoding of Spherical Coordinates Using an Optimized Spherical Quantization Dictionary

FIELD OF THE DISCLOSURE

The present invention relates to spherical vector quantization applied to the coding/decoding of sound data, in order to code source directions of arrival (abbreviated to “DoA”) that are generally represented by spherical coordinates (for example azimuth and elevation, at a predetermined distance).

BACKGROUND OF THE DISCLOSURE

Encoders/decoders (hereinafter called “codecs”) that are currently used in mobile telephony are mono (a single signal channel to be rendered on a single loudspeaker). The 3GPP EVS (for “Enhanced Voice Services”) codec makes it possible to offer “Super-HD” quality (also called “High Definition Plus” or HD+ voice) with a super-wideband (SWB) audio band for signals sampled at 32 or 48 kHz or full band (FB) audio band for signals sampled at 48 kHz; the audio bandwidth is 14.4 to 16 kHz in SWB mode (9.6 to 128 kbit/s) and 20 kHz in FB mode (16.4 to 128 kbit/s).

The next quality evolution in conversational services offered by operators should consist of immersive services, using terminals such as smartphones equipped with multiple microphones or remote presence or 360° video spatialized audio-conferencing or video-conferencing equipment, or even “live” audio content sharing equipment, with spatialized 3D sound rendering that is much more immersive than simple 2D stereo rendering. With the increasingly widespread use of listening on a mobile telephone with an audio headset and the onset of advanced audio equipment (accessories such as a 3D microphone, voice assistants with acoustic antennas, virtual reality or augmented reality headsets, etc.), capturing and rendering spatialized sound scenes is now widespread enough to offer an immersive communication experience.

To this end, the future 3GPP standard “IVAS” (for “Immersive Voice And Audio Services”) is proposing to extend the EVS codec to immersive audio by accepting, as codec input format, at least the spatialized sound formats listed below (and their combinations):

- stereo or 5.1 multichannel format (channel-based), in which each channel feeds a loudspeaker (for example L and R in stereo or L, R, Ls, Rs and C in 5.1);
- object format (object-based), in which sound objects are described as an audio signal (generally mono) associated with metadata describing the attributes of this object (position in space, spatial width of the source, etc.),
- ambisonic format (scene-based), which describes the sound field at a given point, generally captured by a spherical microphone or synthesized in the domain of spherical harmonics.

There is also the issue of potentially considering other input formats such as the format called MASA (Metadata assisted Spatial Audio), which corresponds to a parametric representation of a sound pick-up on a mobile telephone equipped with multiple microphones. This format is studied in more detail below.

The signals to be processed by the encoder/decoder take the form of successions of blocks of sound samples called “frames” or “subframes” below.

Furthermore, below, mathematical notations follow the following convention:

- Scalar: s or N (lower-case for variables or upper-case for constants)
- Vector: q (lower-case, bold and italicized)
- Matrix: M (upper-case, bold and italicized)

Hereinafter, we will denote the sphere δ_nof radius r in dimension n+1 defined as

$𝕊_{n} = {x = (x_{1}, \dots, x_{n + 1}) \in ℝ^{n + 1} ❘  x  = \sqrt{x_{1}^{2} + \dots + x_{n + 1}^{2}} = r}$

where ∥⋅∥ denotes the Euclidean norm. When the radius r is not specified, it will be assumed that r=1 (unit sphere). The focus here is on the case of dimension 3, where n=2. A reminder will be given here of the definition of the spherical coordinates in dimension 3. For a point (x, y, z) in dimension 3, there are generally at least two classical conventions of spherical coordinates denoted (r, ϕ, θ):

- the geographical convention: x=r cos ϕ cos θ, y=r cos ϕ sin θ, z=r sin ϕ with r≥0, −π/2≤ϕ≤π/2 and −π≤θ≤π
- the physical convention: x=r sin ϕ cos θ, y=r sin ϕ sin θ, z=r cos ϕ with r≥0, 0≤ϕ≤π and −π≤θ≤π

The angles ϕ, θ are defined here in radians, without loss of generality.

The radius r and the azimuth (or longitude) θ are the same in these two definitions, but the angle ϕ differs depending on whether it is defined with respect to the horizontal plane 0xy (elevation or latitude over the interval [−π/2, π/2]) or based on the axis 0z (colatitude or polar angle over the interval [0, π]). The azimuth θ may be defined over an interval [−π,π], and, in equivalent fashion, it may be defined over [0,2π] by a simple operation of modulo 2π. Hereinafter, the same angular coordinates will preferably be represented in degrees, but other units may be used. It should be noted that the symbols may be different in the literature (for example φ instead of ϕ) and/or swapped (for example θ for colatitude and φ for longitude). Hereinafter, the convention that is adopted will preferably be that of using the elevation and azimuth pair, but the invention is applicable to all variant definitions of spherical coordinates.

What are of interest in the invention are exemplary embodiments of spherical vector quantization applied to the coding of 3D directions of audio sources. The invention may also be applied to other audio formats and to other signals (for example images or 360 video) in which spherical data in dimension 3 are to be coded.

The principles of DiRAC (Directional Audio Coding) will be recalled below. In some variants, it is possible to apply the invention to other coding schemes, in particular for transform-based audio coding.

DiRAC coding is described for example in the article V. Pulkki, Spatial sound reproduction with directional audio coding, Journal of the Audio Engineering Society, vol. 55, no. 6, pp. 503-516, 2007. In that document, mapping is carried out through directional analysis in order to find a direction (DoA) for each sub-band. This DoA is supplemented by a “diffuseness” parameter, thereby giving a parametric description of the sound scene. The multi-channel input signal is coded in the form of transport channels (typically a mono or stereo signal obtained by reducing multiple picked-up channels) and spatial metadata (DoA and “diffuseness” for each sub-band).

FIG. 1 describes one exemplary implementation of DiRAC coding. In this example, the coding uses a reduction in the number of channels (downmixing-block 100), where coding (block 110) is carried out for example on only one channel with a mono codec—for example 3GPP EVS at a given bit rate (24.4 kbit/s). The input signal is also decomposed (block 120) into frequency sub-bands, for example by a filter bank or by a short-time Fourier transform. A division into Bark bands, for example 24 sub-bands that are distributed into frequencies on the Bark scale known from the prior art, will be assumed here. In each frame and each sub-band, the DiRAC coding typically estimates two parameters (block 130)—to lighten the notations, no frame index or sub-bands are used for the various parameters: the direction of the dominant source (DoA) in terms of elevation (ϕ) and azimuth (θ), and “diffuseness” ψ as described in the abovementioned article by Pulkki. The DoA is generally estimated by way of an active intensity vector with a temporal mean; in some variants, it will be possible to implement other methods for estimating ϕ, θ, ψ.

The DoA is coded (block 140) on a predetermined number of bits (for example 7 bits) per pair (ϕ, θ) in each frame and each sub-band. The “diffuseness” ψ is a parameter between 0 and 1, and is coded here (block 150) by scalar quantization (for example on 6 bits). In the example given, the spatial metadata coding budget is therefore 24×(7+6)=312 bits per frame, that is to say 15.6 kbit/s, for a global budget of 24.4+15.6=40 kbit/s. The “downmix” signal coding bitstream and the coded spatial parameters are multiplexed (block 160) so as to form the bitstream of each frame.

FIG. 2 illustrates one exemplary embodiment of a DiRAC decoder. After demultiplexing the bitstream (block 200), the “downmix” signal is decoded (block 210). The spatial parameters are decoded (block 250 and block 270). The decoded signal ŝ is then decomposed into times/frequencies (block 220 identical to block 120) so as to spatialize it as a point source (plane wave) in the block (block 260) that generates a spatialized 1st-order ambisonic signal as follows:

$X (\hat{ϕ}, \hat{θ}) = [\begin{matrix} 1 \cos \hat{ϕ} \cos \hat{θ} \\ \cos \hat{ϕ} \sin \hat{θ} \\ \sin \hat{ϕ} \end{matrix}] \cdot \hat{s}$

Based on the decoded signal, a decorrelation is carried out (block 230) so as to have a “diffuse” version (corresponding to a maximum source width); this decorrelation also achieves an increase in the number of channels so as, at the output of block 230, to obtain a 1st-order ambisonic signal with 4 channels (W, Y, Z, X). The decorrelated signal is decomposed into times/frequencies (block 240). The signals resulting from blocks 240 and 260 are combined (block 275) by sub-band, after applying a scaling factor (blocks 273 and 274) obtained from the decoded “diffuseness” (blocks 271 and 272); this adaptive mixing makes it possible to “dose” the source width and the diffuse character of the sound field in each sub-band. The mixed signal is converted into the time domain (block 280) by a filter bank or an inverse short-time transform.

The directions of sources in the DiRAC format are therefore represented in the form of 3D spherical data, typically in the form of spherical coordinates (azimuth, elevation) according to the geographical convention. In this context, there is a need to represent this DoA information effectively, this being able to be formulated as a vector quantization problem on the sphere custom-character ₂in dimension 3.

Another example of a parametric format for immersive audio is the MASA format described in the contribution “3GPP Tdoc S4-180087: On IVAS audio formats for mobile capture devices. Source: Nokia Corporation”. The principle is summarized in FIG. 3. It will be assumed that a mobile telephone is equipped with multiple microphones (for example 4 microphones) placed at predetermined locations (for example two at the bottom of the telephone, one at the top of the telephone and a last one on the back shell of the telephone). These microphones are seen as grouped together in block 300, which provides as many signals (channels) as there are microphones-possibly with additional information such as the placement or the characteristics of the microphones.

Block 310 carries out parametric analysis of the signals coming from block 300, using a similar DiRAC approach, which provides transport channels and metadata. This MASA analysis is generally proprietary and selected by the manufacturer of the telephone. The number of transport channels is typically limited to 1 (mono) or 2 (stereo), and may be defined simply by selecting the primary microphone in the mono case or two opposite microphones (for example one at the bottom and another at the top of the telephone) in the stereo case. One example of a MASA metadata format is described for example in the contribution “3GPP Tdoc S4-191167 (October 2019), Description of the IVAS MASA C Reference Software, Source: Nokia Corporation”. What is of particular interest here is the parameter called “Direction index”, which is coded on 16 bits and described as follows in that document: “Direction of arrival of sound in a time-frequency interval; Spherical representation with an accuracy of around 1 degree; Interval of values: “covers all directions with an accuracy of around 1°”.

This therefore involves a source direction (DoA) according to a 3D spherical grid whose (angular) resolution is close to 1 degree. This DoA information is provided for each frame and frequency sub-band by a DoA estimate (block 311). Block 312 (inside block 310) therefore codes DoA information coded on 16 bits per DoA.

Block 320 represents the IVAS codec, which is not yet available as a 3GPP standard and is still under development. However, it has been proposed in the 3GPP for the MASA parametric format defining transport channels and metadata (including DoA per frame and sub-band) to be an input format of the IVAS codec. The (future) IVAS encoder should then implement a step of decoding the DoA information (block 321) in order to be able to fully exploit this DoA information and compress it at a lower rate. The implementation details regarding the compression of an input MASA format into an IVAS bitstream at a given rate and the associated decoding are beyond the scope of this invention, but it may for example be noted that the MASA format is based on an extended principle of DiRAC coding, the transport channels may be coded separately (by a mono core codec) or together (by a stereo core codec), and the metadata may be coded at a rate lower than in the MASA input format.

In general, any discretization of the sphere δ₂may be used as a spherical vector quantization dictionary. However, without any particular structure, searching for the nearest neighbor and indexing in this dictionary may prove costly to implement, above all when the coding rate of the DoA information is excessively high (for example 16 bits per 3D vector indicating a DoA).

One example of a 3D spherical grid is given in the Appendix and in the source code attached to the contribution “3GPP Tdoc S4-191167 (October 2019), Description of the IVAS MASA C Reference Software, Source: Nokia Corporation”.

The spatial direction of an audio source in a given frame and a given sub-band of a WMASA format proposal is represented by two angles: azimuth and elevation. The notations used hereinafter are ϕ for elevation and θ for azimuth, while the opposite convention is used in the document 3GPP Tdoc S4-191167.

That document gives a definition of a spherical grid as follows:

The grid consists of N_tot=2¹⁶−208=65328 points discretizing the surface of a 3D sphere of radius 1; each point is represented by a single index on 16 bits. This grid is defined by three stored elements:

- a number N_ϕ=122 of discrete values to code the positive elevation (that is to say |ϕ|)
- a scalar quantization dictionary for elevation (for the Northern hemisphere corresponding to |ϕ|): {{circumflex over (ϕ)}(i), i=0, . . . , N_ϕ−1}
- a number of points (size of the dictionary) N_θ(i), i=0, . . . , 121, to code the azimuth, at a given discrete elevation of index i
  
  The precise definition of the grid is detailed below:
- Each point on the 3D grid is given by a coded elevation value-decomposed into a coded absolute value {circumflex over (ϕ)}(i) where i=0, . . . , N_ϕ−1 and a sign (+1 or −1)—and a coded azimuth value {circumflex over (θ)}(i,j), j=0, . . . , N_θ(i)−1 which depends on the elevation index i. The coded elevation value is {circumflex over (ϕ)}(0)=0 for i=0 and ±{circumflex over (ϕ)}(i) for i=1, . . . , N_ϕ−1.
- The number N_ϕ=122 thus corresponds to the number of (coded) elevations with a positive value (including the value zero); the elevation scalar quantization dictionary therefore comprises 2N_ϕ−1=243 coded values taking into account the sign, and these values may be ordered from the North pole to the South pole as:
  - +{circumflex over (ϕ)}(N_ϕ−1) corresponding to the North pole
  - +{circumflex over (ϕ)}(N_ϕ−2)
  - . . .
  - +{circumflex over (ϕ)}(1) corresponding to the first layer above the equator
  - {circumflex over (ϕ)}(0)=0 corresponding to the equator
  - −{circumflex over (ϕ)}(1) corresponding to the first layer below the equator
  - . . .
  - −{circumflex over (ϕ)}(N_ϕ−2)
  - −{circumflex over (ϕ)}(N_ϕ−1) corresponding to the South pole.
- The elevation ϕ is coded by uniform scalar quantization over the interval [−88.65, 88.65] degrees with, in addition, two codewords for the poles (±90 degrees). The value 0 degrees (corresponding to the equator) is contained in the dictionary. The quantization step is set to

$δ_{ϕ} = \sin^{- 1} \frac{2 \sqrt{3} \sin \frac{π}{n (1)}}{6 \sqrt{1 - {(\sin \frac{π}{n (1)})}^{2}}} + \sin^{- 1} (\frac{2 \sqrt{3}}{3} \sin \frac{π}{n (1)})$

- thereby giving δ_ϕ≈0.7388 degrees. This therefore gives {circumflex over (ϕ)}(i)=iδ_ϕ for i=0, . . . , N_ϕ−2 and {circumflex over (ϕ)}(i)=90 for i=N_ϕ−1.
- The size N_θ(i) of the uniform scalar quantization dictionary for the azimuth θ depends on the coded elevation i; the azimuth step is set such that the distance between successive codewords is identical. The size of the azimuth dictionaries is symmetrical with respect to the equator (layers with negative elevations have the same number of points as positive ones).
- The number N_θ(i) of coded azimuth values is given by:

$N_{θ} (0) = 422 N_{θ} (i) = \frac{π}{\sin^{- 1 \frac{r}{2 R (i)}}}, i = 1, \dots, N_{ϕ} - 2 Where r = 2 \sin \frac{π}{N_{θ} (0)} \approx 0.0 1 4 8 8 8 9 27181374 N_{θ} (N_{ϕ} - 1) = 1$

- with

R(1)≈0.999916868023083

$R (i) = \cos (i δ_{ϕ}), i = 2, \dots, N_{ϕ} - 2$

- In practice, this gives:
  - N_θ(i=0, . . . , 121)=[422 421 421 421 421 421 420 420 419 419 418 417 416 416 415 414 413 411 410 409 408 406 405 403 401 400 398 396 394 392 390 388 386 384 382 379 377 374 372 369 367 364 361 358 355 352 349 346 343 340 337 333 330 327 323 320 316 313 309 305 301 298 294 290 286 282 278 274 269 265 261 257 252 248 244 239 235 230 225 221 216 211 207 202 197 192 188 183 178 173 168 163 158 153 148 143 137 132 127 122 117 111 106 101 96 90 85 80 74 69 64 58 53 47 42 37 31 26 20 15 9 1]

It is possible to verify that the total number of points in the grid is:

$N_{tot} = N_{θ} (0) + 2 \sum_{i = 1}^{N_{ϕ} - 1} N_{θ} (i) = 6 5 3 2 8$

- Each coded elevation {circumflex over (ϕ)}_idefines a spherical zone (a spherical zone delimited by the elevation values {circumflex over (ϕ)}_i±δ_ϕ) in which an azimuth dictionary is used. The azimuth dictionaries have an offset set to 0 for even values of i and

$\frac{π}{N_{θ} (i)}$

- for odd values of i.
- In other words, the coded azimuth value (in degrees) is, for j=0, . . . , N_θ(i)−1:

$\hat{θ} (i, j) = {\begin{matrix} 360 j / N_{θ} (i) - 180 & i even \\ 360 (j + 0.5) / N_{θ} (i) - 180 & i odd \end{matrix}$

The document cited above gives one method for coding a given point (ϕ, θ).

Given a point (ϕ, θ) to be coded, the quantization (search for the nearest neighbor) on the grid is carried out according to the following steps:

- The sign sgn_ϕ and the absolute value |ϕ| of the elevation ϕ are determined; in particular sgn_ϕ=1 if ϕ≥0, −1 otherwise. The absolute value |ϕ| is coded by uniform scalar quantization by selecting the two nearest neighbors. This coding with “2 survivors” may for example be carried out by a preliminary search for the nearest neighbor in the (positive) elevation dictionary, through an exhaustive search.

$i_{1} = \arg \min_{i = 0, \dots, N_{ϕ} - 1} ❘ ❘ ϕ ❘ - \hat{ϕ} (i) ❘ = \arg {\min_{i = 0, \dots, N_{ϕ} - 1} (❘ ϕ ❘ - \hat{ϕ} (i))}^{2}$

- i₁denotes the index of the nearest neighbor. The index i₂of the second nearest value is then determined according to the value of i₁:
  - i₂=1 if i₁=0

$i_{2} = N_{ϕ} - 2 if i_{1} = N_{ϕ} - 1 i_{2} = \arg \min_{i = i_{1} - 1, i_{1} + 1} ❘ ❘ ϕ ❘ - \hat{ϕ} (i) ❘ if 0 < i_{1} < N_{ϕ} - 1$

- This thus gives two candidates sgn_ϕ·{circumflex over (ϕ)}(i₁), sgn_ϕ·{circumflex over (ϕ)}(i₂), where {circumflex over (ϕ)}(i_k) is the coded absolute elevation, k=1 or 2, to represent the elevation ϕ. In terms of absolute value, these two candidates are simply {circumflex over (ϕ)}(i₁) and {circumflex over (ϕ)}(i₂).
- The azimuth θ is coded by uniform scalar quantization (with an elevation-dependent offset) according to the dictionary {{circumflex over (θ)}(i_k,j), j=0, . . . , N_θ(i_k)} corresponding respectively to k=1 or 2. The index j_kis obtained as follows:

$j_{k} = \mod_{N_{θ} (i_{k})} ⌊ \frac{θ - Δ + 180 / N_{θ} (i_{k})}{360 / N_{θ} (i_{k})} ⌋$

- where └⋅┘ is the rounding to the lower integer, Δ=0 if i_kis even, 180/N_θ(i_k) if i_kis odd, and mod_N_θ_(i_k₎is the modulo operation such that mod_N_θ_(i_k₎(i)=i if i=0, . . . , N_θ(i_k)−1 and mod_N_θ_(i_k₎(N_θ(i_k))=0. The index j_ktherefore satisfies: 0≤j_k≤N_θ(i_k)−1.
- The best candidate is selected by minimizing the spherical distance between (ϕ, θ) and (sgn_ϕ·{circumflex over (ϕ)}(i_k), {circumflex over (θ)}(i_k,j_k)) according to k=1 or 2, which may be written independently of the sign sgn_ϕ (since the sign of sgn_ϕ·{circumflex over (ϕ)}(i_k) is identical to that of ϕ) as:

$d (i_{k}, j_{k}) = - (\sin ϕ \sin \hat{ϕ} (i_{k}) + \cos ϕ \cos \hat{ϕ} (i_{k}) \cos (θ - \hat{θ} (i_{k}, j_{k})))$

- - The closest pair (sgn_ϕ·{circumflex over (ϕ)}(i_k), {circumflex over (θ)}(i_k, j_k)) in the sense of this distance is selected as the quantized value to be indexed. This selected point is denoted (sgn_ϕ·{circumflex over (ϕ)}(id_ϕ), {circumflex over (θ)}(id_ϕ, id_θ)), where:

$k^{*} = \arg \max_{k = 1, 2} d (i_{k}, j_{k})$

- - and id_ϕ=i_k*and id_θ=j_k*

The quantization index (on 16 bits), denoted index here, of the selected point (sgn_ϕ·{circumflex over (ϕ)}(id_ϕ), {circumflex over (θ)}(id_ϕ, id_θ)) is obtained by enumerating the points on the grid starting from the equator (all points of elevation ({circumflex over (ϕ)}(0)=0), then considering the first layer above the equator (all points of elevation +{circumflex over (ϕ)}(1)=δ_ϕ), then the first layer below the equator (all points of elevation −{circumflex over (ϕ)}(1)=−δ_ϕ), etc.

This gives an index in the form index within the interval 0, . . . , N_tot−1 where:

$index = {\begin{matrix} {id}_{θ} & if {id}_{ϕ} = 0 \\ cum N (2 {id}_{ϕ} - 2) + {id}_{θ} & if {id}_{ϕ} > 0 and {sgn}_{ϕ} > 0 \\ cum N (2 {id}_{ϕ} - 1) + {id}_{θ} & if {id}_{ϕ} > 0 and {sgn}_{ϕ} < 0 \end{matrix}$

The cumulative cardinality values cumN are computed on the fly each time the index index is determined:

cumN(0)=N_θ(0)

$cumN (1) = cumN (0) + N_{θ} (1) = N_{θ} (0) + N_{θ} (1) cumN (2) = cumN (1) + N_{θ} (1) = N_{θ} (0) + 2 N_{θ} (1) cumN (3) = cumN (2) + N_{θ} (2) = N_{θ} (0) + 2 N_{θ} (1) + N_{θ} (2) cumN (4) = cumN (3) + N_{θ} (2) = N_{θ} (0) + 2 N_{θ} (1) + 2 N_{θ} (2) \dots cumN (2 i - 1) = cumN (2 i - 2) + N_{θ} (i) cum (2 i) = cumN (2 i - 1) + N_{θ} (i)$

The decoding method in the document cited above is explained in the flowchart in FIG. 4. The decoding consists, starting from the index index (block 400), in retrieving the elevation information id_ϕ, sgn_ϕ and azimuth information id_θ(block 413), thereby then making it possible to reconstruct the point (sgn_ϕ·{circumflex over (ϕ)}(id_ϕ), {circumflex over (θ)}(id_ϕ, id_θ)).

The principle of the decoding is that of successively comparing the value index with the successive cumulative cardinality values cumN (or cardinality sums), which are computed recursively on the fly for i=0, . . . , N_ϕ−1, taking into account the fact that the cardinalities N_θ(i) are identical for elevations of the same absolute value (in the Northern and Southern hemispheres). The sign of the elevation sgn_ϕ is decoded by exploiting the predefined order in which the spherical layers are written: equator, first layer with a positive elevation (+), first layer with a negative elevation (−), . . . , up to the North pole (+) and South pole (−) . . . . The values of id_ϕ, sgn_ϕ, cumN(0) are initialized (block 401).

If index≥cumN(0) (block 402), the decoding of the information is carried out for the “elevation layers” outside the equator of index i>0. The search for the “elevation layer” is carried out in a loop, starting from i=1 up to i=N_ϕ−1 (blocks 403, 404, 411). In iteration i, the cumulative cardinality is computed recursively (blocks 405, 408) and compared with the index (block 406, 409) in order to decode the indices (blocks 407, 410).

If index<cumN(0) (block 402), the decoding of the indices of the information is carried out for the layer corresponding to the equator (block 412).

It should be noted that, in the implementation in the source code attached to the contribution 3GPP Tdoc S4-191167, a test for verifying whether i=N_ϕ−1 is implemented in order to explicitly decode id_ϕ=N_ϕ−1, sgn_ϕ=−1, id_θ=0. This part is not adopted because the sign sgn_ϕ=1 should also be possible in a grid containing the North and South pole, and it is normally needless because the definition cumN(N_ϕ−1) should allow the points associated with the poles to be decoded. The specific management of the poles may be neglected; the important thing is the principle of carrying out iterative decoding by comparing the index with a cumulative cardinality (or sum of cardinalities) computed over time. Once the indices id_ϕ, sgn_ϕand id_θhave been decoded, the reconstruction of the spherical coordinates (sgn_ϕ·{circumflex over (ϕ)}(id_ϕ), {circumflex over (θ)}(id_ϕ, id_θ)), in 413, adopts the definition of the grid defined above with:

$\hat{ϕ} (i) = i δ_{ϕ} for i = 0, \dots, N_{ϕ} - 2 \hat{ϕ} (i) = 90 for i = N_{ϕ} - 1 . \hat{θ} (i, j) = {\begin{matrix} 360 j / N_{θ} (i) - 180 & i even \\ 360 (j + 0.5) / N_{θ} (i) - 180 & i odd \end{matrix}$

This method as implemented in the contribution 3GPP Tdoc S4-191167 cited above requires preliminary storage of N_ϕ=122 floating values ({circumflex over (ϕ)}(i)) for scalar quantization of the (positive) elevation, N_ϕinteger values giving N_θ(i) values for each (positive) elevation layer, and an integer value giving N_ϕ. The grid does not use all possible values of indices on 16 bits, since 208 indices (from 65328 to 65535) are unused.

The main drawback of this method is that its complexity is very high, of the order of 123 WMOPS for coding (for weighted millions of operations per second) and 12 WMOPS for decoding, assuming 24 sub-bands (therefore 24 DoA per frame) and a temporal resolution of 5 ms (therefore one frame every 5 ms). This cost is high in particular due to the scalar quantization of the elevation being implemented by searching in a stored dictionary and above all due to the cumulative cardinalities cumN(i) being computed on the fly.

There is therefore a need to improve the methods from the prior art for 3D dimension spherical data quantization, in particular in order to efficiently code DoA data, with if possible the least possible complexity and while avoiding having unused indices for a given total number of points (or equivalently a given bit budget).

SUMMARY

The invention aims to improve the prior art.

To this end, the invention targets a method for coding a spatial direction of a sound source, this direction being defined by spherical coordinates comprising an elevation coordinate and an azimuth coordinate, wherein a spherical quantization dictionary is defined on a 3D sphere by an elevation coding and an azimuth coding, and wherein:

- the elevation coding uses a scalar quantization, giving at least one coded elevation index on a number of elevation levels,
- the azimuth coding uses a scalar quantization, according to a number of points per level depending on the index of the coded elevation,
- the number of points per level is determined on the basis of two successive cumulative cardinality values,
- the cumulative cardinality value for a coded elevation index being representative of a number of points proportional to a total number of points and according to the area of a spherical zone comprising at least one zone delimited by the upper horizontal plane of the positive elevation level of the coded elevation index and a lower horizontal plane of the sphere.

The cumulative cardinality values used to define the spherical quantization dictionary, in particular to determine the number of quantization levels for the azimuth coordinate, are thus based on a direct estimate of the area of spherical zones, thus avoiding on-the-fly and recursive computing of the sum of cardinalities used in the method proposed in the prior art, which is highly resource-intensive.

The method proposed here is significantly less resource-intensive, and is for example of the order of 2 WMOPS for coding and 1 WMOPS for decoding.

Defining such a quantization dictionary also makes it possible to exploit all possible points (or codewords) of the dictionary so as to make the quantization more efficient and avoid having unused indices (or codewords) in the grid. The invention is applied in particular in order to implement a more efficient method for coding and decoding DoA information on 16 bits to define the MASA format at input of an IVAS coding.

In one embodiment, the elevation coding includes levels corresponding to the equator and to the poles of the 3D sphere, thereby making it possible to include all particular points (equators and poles) of the sphere in the quantization dictionary.

In one embodiment, a number of points for the azimuth coding is predetermined for the elevation level corresponding to the equator, and the total number of points is obtained by subtracting, from a target number of points, the predetermined number of points corresponding to the equator and each of the North and South poles of the sphere, according to the following expression: N_tot′=N_tot−N_θ(0)−2N_θ(N_ϕ−1),

- N_totbeing the target number of points of the sphere for a given bit budget,
- N_θ(0), the predetermined number of points for the elevation level corresponding to the equator; and
- 2N_θ(N_ϕ−1) the predetermined number of points for the North and South poles of the sphere.

The method is thus adapted to the knowledge of the number of points for certain particular spherical layers, such as the one corresponding to the equator and those corresponding to the poles, which may be defined at a fixed value.

In one particular embodiment, the cumulative cardinality value for a coded elevation index is representative of a number of points proportional to the total number of points according to the area (A_i) of a spherical zone delimited by the upper horizontal plane of the positive elevation level of the coded elevation index and this same plane of the sphere symmetrical with respect to the equator minus the area (A₀) corresponding to the elevation level of the equator, according to the following ratio:

$\frac{(A_{i} - A_{0})}{(A_{N_{ϕ} - 2} - A_{0})} N_{tot}^{'},$

N_ϕ−2 being the number of elevation quantization levels without the equator and the North and South poles of the sphere and A_N_ϕ_-2, the area of the spherical zone corresponding to an elevation index N_ϕ−2.

In one variant embodiment, the cumulative cardinality value for a coded elevation index is representative of a number of points proportional to the total number of points according to the area (A′_i) of a spherical zone delimited by the upper horizontal plane of the positive elevation level of the coded elevation index and that of the equator minus half the area corresponding to the elevation level of the equator, according to the following ratio:

$\frac{(A_{i}^{'} - A_{0} / 2)}{(A_{N_{ϕ} - 2}^{'} - A_{0} / 2)} N_{tot}^{'},$

N_ϕ−2 being the number of elevation quantization levels without the equator and the North and South poles of the sphere and A′_N_ϕ_-2, the area of the spherical zone corresponding to an elevation index N_ϕ−2.

These ratios of areas of spherical zones make it possible to estimate, easily and directly through a simple rule of three, the number of points in the corresponding spherical zones that are subsets of the complete surface of the 3D sphere.

These ratios make it possible to express cumulative cardinality values as follows:

$cumN (i) = 2 A r r_{i} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i + \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})})$

- with
- i=1, . . . , N_ϕ−2, N_ϕ−2 being the number of elevation quantization levels without the equator and the North and South poles of the sphere,
- Arr_i( ) being a rounding to the nearest integer depending on i, 2Arr_i(x/2) corresponding to a rounding to an even integer and δ_ϕ being a given quantization step of the elevation.

In one embodiment, the elevation coding gives a coded elevation index (i) on a number of elevation levels (N_ϕ) and sign information.

Thus, only one hemisphere is considered to define the quantization dictionary, the number of elevation levels and the number of points per level being symmetrical about the equator.

In one embodiment, a global quantization index to be transmitted (index) is determined based on an azimuth index coded by scalar quantization on the determined number of points per level (N_θ(i)) and a cumulative cardinality value obtained based on at least the coded elevation index.

The cardinality values thus defined may be estimated directly (analytically) in order to define the global index to be transmitted, thereby making it possible to reduce the maximum computational complexity.

The invention also relates to a method for decoding a spatial direction of a sound source, this direction being defined by spherical coordinates comprising an elevation coordinate and an azimuth coordinate, wherein a spherical quantization dictionary is defined on a 3D sphere by an elevation coding and an azimuth coding, and wherein:

- the elevation decoding uses a scalar quantization, giving at least one decoded elevation index (i) on a number of elevation levels (N_ϕ),
- the azimuth decoding uses a scalar quantization, according to a number of points per level (N_ϕ(i)) depending on the decoded elevation index (i),
- the number of points per level (N_θ(i)) is determined on the basis of two successive cumulative cardinality values (cumN(i), cumN(i−1)),
- the cumulative cardinality value (cumN(i)) for a decoded elevation index (i) being representative of a number of points proportional to a total number of points and according to the area of a spherical zone comprising at least one zone delimited by the upper horizontal plane

$(ϕ = (i + \frac{1}{2}) δ_{ϕ})$

- of the positive elevation level of the decoded elevation index (i) and a lower horizontal plane of the sphere.

The decoding method has the same advantages as the coding method, and makes it possible to optimize computing resources by using an optimized spherical quantization dictionary.

In the same way as for the coding and according to the same advantages, in one embodiment, the elevation decoding includes levels corresponding to the equator (0°) and to the poles (+/−90°) of the 3D sphere.

According to one particular embodiment, a number of points (N_θ(0)) for the azimuth decoding is predetermined for the elevation level corresponding to the equator, and the total number of points (N_tot′) is obtained by subtracting, from a target number of points (N_tot=2¹⁶), the predetermined number of points corresponding to the equator and each of the North and South poles of the sphere, according to the following expression: N_tot′=N_tot−N_θ(0)−2N_θ(N_ϕ−1),

- N_totbeing the target number of points of the sphere for a given bit budget,
- N_θ(0), the predetermined number of points for the elevation level corresponding to the equator; and
- 2N_θ(N_ϕ−1) the predetermined number of points for the North and South poles of the sphere.

In one embodiment, the cumulative cardinality value (cumN(i)) for a decoded elevation index (i) is representative of a number of points proportional to the total number of points according to the area (A_i) of a spherical zone delimited by the upper horizontal plane

$(ϕ = (i + \frac{1}{2}) δ_{ϕ})$

of the positive elevation level of the decoded elevation index (i) and this same plane of the sphere symmetrical with respect to the equator

$(ϕ = - (i + \frac{1}{2}) δ_{ϕ})$

minus the area (A₀) corresponding to the elevation level of the equator, according to the following ratio:

$\frac{(A_{i} - A_{0})}{(A_{N_{ϕ} - 2} - A_{0})} N_{tot}^{'}$

In one possible example, the expression for the cumulative cardinality value is as follows:

$cumN (i) = 2 A r r_{i} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i + \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})})$

with

- i=1, . . . , N_ϕ−2, N_ϕ−2 being the number of elevation quantization levels without the equator and the North and South poles of the sphere,
- Arr_i( ) being a rounding to the nearest integer depending on i, 2Arr_i(x/2) corresponding to a rounding to an even integer and δ_ϕ being a given quantization step of the elevation.

According to one embodiment, the elevation decoding gives a decoded elevation index (i) on a number of elevation levels (N_ϕ) and sign information.

In one embodiment, the decoding comprises receiving a global quantization index (index) and determining, based on this index, a cumulative cardinality value obtained on the basis of at least the decoded elevation index and a decoded azimuth index on a determined number of points per level (N_θ(i)).

The invention targets a coding device comprising a processing circuit for implementing the steps of the coding method as described above.

The invention also targets a decoding device comprising a processing circuit for implementing the steps of the decoding method as described above.

The invention relates to a computer program comprising instructions for implementing the coding or decoding methods as described above when they are executed by a processor. Finally, the invention relates to a storage medium able to be read by a processor and storing a computer program comprising instructions for executing the coding method or the decoding method described above.

BRIEF DESCRIPTION OF THE DRAWINGS

Other features and advantages of the invention will become more clearly apparent on reading the following description of particular embodiments, which are given by way of mere illustrative and non-limiting examples, and the appended drawings, in which:

FIG. 1, described above, illustrates a DiRAC coding device with coding of spherical data in dimension 3;

FIG. 2, described above, illustrates a DiRAC decoding device with decoding of spherical data in dimension 3;

FIG. 3, described above, illustrates a smartphone device equipped with multiple microphones with a MASA spatial pre-analysis in order to provide a MASA format as input for a coding;

FIG. 4, described above, illustrates the steps implemented in the decoding of spherical data in dimension 3, in the form of a flowchart;

FIG. 5a illustrates spherical zones brought about by the discretization of the elevation to define the spherical quantization dictionary, according to one embodiment of the invention;

FIG. 5b illustrates the areas used for determining the number of points in azimuth to define the spherical quantization dictionary, according to one embodiment of the invention;

FIG. 6a illustrates the steps implemented in a coding method according to one embodiment of the invention, in the form of flowcharts;

FIG. 6b illustrates the indexing step implemented in the coding method according to one embodiment of the invention, in the form of a flowchart;

FIG. 7a illustrates the steps implemented in a decoding method according to one embodiment of the invention, in the form of flowcharts;

FIG. 7b illustrates the steps of decoding the quantization indices implemented in the decoding method according to one embodiment of the invention, in the form of a flowchart;

FIG. 8a illustrates the indexing (in the Northern hemisphere of the 3D sphere) implemented in a decoding method according to one embodiment of the invention, in the form of a diagram;

FIG. 8b illustrates the indexing (in the Southern hemisphere of the 3D sphere) implemented in a decoding method according to one embodiment of the invention, in the form of a diagram; and

FIG. 9 illustrates examples of structural embodiments of a coding device and of a decoding device according to one embodiment of the invention.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

The invention described below relates to the quantization of spherical data in dimension 3. This is applicable, by way of example, to the coding and decoding of spatial directions of sound sources (DoA), for coding and decoding for example of MASA data as described later with reference to FIGS. 5 to 7. In some variants, the invention may be applied to DiRAC coding or any type of coding/decoding of audio data or coding of any other type of data in which 3D direction information is coded.

Without loss of generality, the definition of 3D spherical coordinates in degrees in line with the geographical convention that is used in the description of the MASA format reference proposal will be adopted here.

The radius, which is set to 1 here, will be omitted, keeping only the azimuth and the elevation in the case of coding a direction of sources (or DoA), as in a DiRAC or MASA scheme. In some variants and for certain applications (for example quantization of a sub-band in transform-based coding), it will be possible to code a radius separately (corresponding to a mean amplitude level per sub-band for example).

In some variants, units other than degrees (for example radians) will be used, and conventions other than the geographical convention will be used; for example elevation may be replaced with colatitude. Thus, other equivalent spherical coordinate systems (obtained for example by permuting or inverting Cartesian coordinates) may be used according to the invention—it will be sufficient to apply the necessary conversions in the definition of the scalar quantization dictionaries, the reconstruction, etc. The coding and the decoding according to the invention is applicable to all definitions of spherical coordinates, and it is thus possible to replace ϕ, θ with other spherical coordinates by adapting the conversion between Cartesian coordinates and spherical coordinates.

FIG. 5a illustrates a 2D partial representation of a spherical quantization dictionary (grid) according to one embodiment of the invention. According to the invention, the spherical quantization dictionary is defined by an elevation coding and an azimuth coding. The elevation coding uses a scalar quantization, with a discretization of the elevation that, according to one embodiment, includes levels corresponding to the equator (zero elevation) and to the poles (elevation +/−90°) of the 3D sphere. The elevation coding gives at least one coded elevation index (i) on a number of elevation levels.

This discretization uses a number of positive levels (N_ϕ) for the Northern hemisphere and an elevation sign indication (indicating the Northern or Southern hemisphere), which is tantamount to 2N_ϕ−1 levels for coding elevation in both the Northern and Southern hemispheres of the 3D sphere.

As illustrated in FIG. 5a, it is therefore possible to see the spherical quantization dictionary, also hereinafter called a grid according to the invention, as a set of 2N_ϕ−1 “spherical layers” or “horizontal slices” shown as C₀, C₁, C₋₁, . . . C_i, . . . , C₍N_ϕ-1), C₋₍N_ϕ-1)brought about by the elevation quantization (the limits of each dotted slice are given by the elevation quantization decision thresholds, outside poles). The surface of the Northern hemisphere is divided into layers C₀(only the upper half of C₀, above the equator, is in the Northern hemisphere), C₁, . . . C_i, . . . , C₍N_ϕ-1), whereas the surface of the Southern hemisphere is divided into layers C₀(only the lower half of C₀, below the equator, is in the Southern hemisphere), C-₁, . . . C-_i, . . . , C-₍N_ϕ-1).

It should be noted that FIG. 5a shows a 2D projection of the 3D sphere, taking an arbitrary vertical cutting plane. The axis 0z in Cartesian coordinates is indicated, with a graduation according to the sine of the elevation, since the z coordinate corresponds to z=sin ϕ in the chosen convention.

The azimuth coding uses a scalar quantization, according to the number of azimuth levels (N_θ(i)) (also called number of points per level) depending on the coded (positive or absolute) elevation index (i=0, . . . , N_ϕ−1), this number of levels N_θ(i) being symmetrical about the equator for the Northern and Southern hemispheres.

The determination of this number of azimuth levels is described below.

In order not to overload this figure, the subdivision of these horizontal layers into equally distributed “regions” according to the discretization of the azimuth with a number N_θ(i) of azimuth levels depending on the elevation level is not shown.

The spherical grid according to the invention discretizes the elevation and the azimuth separately by scalar quantization, with a uniform discretization of the azimuth according to a number of levels N_θ(i)) depending on the (positive or absolute) coded elevation value i. However, the optimum search for the nearest neighbor in the grid involves selecting two elevation candidates, and therefore also two associated azimuth candidates in order to select the best candidate; this is therefore tantamount to joint coding even though it is separate in practice, and the actual decision regions of the grid (in terms of Voronoi regions on the surface of the sphere) are therefore not spherical rectangles. For indexing purposes (coding of the global index and decoding), the discretization of the surface of the 3D sphere may nevertheless be seen as a separate elevation and azimuth division to obtain spherical rectangles (excluding caps at the poles).

The coordinates ϕ and θ are coded separately, with a scalar quantization dictionary {s·{circumflex over (ϕ)}(id_ϕ), id_ϕ=0, . . . , N_ϕ−1, s=+1 or −1} with N_ϕ levels for |ϕ| and with sign information s (indicating the Northern hemisphere for s=+1 or Southern hemisphere for s=−1) and a set of uniform scalar quantization dictionaries {{circumflex over (θ)}(i, j), j=0, . . . , N_θ(i)−1} with N_θ(i) levels for θ according to the coded (positive or absolute) elevation index i.

The total number of points of the sphere discretized according to the various determined numbers of levels, also called the total number of points in the 3D grid, is given, in one particular embodiment, by:

$N_{tot} = N_{θ} (0) + 2 \sum_{i = 1}^{N_{ϕ} - 1} N_{θ} (i)$

This number includes the points N_θ(0) on the elevation level (the spherical layer) corresponding to the equator (C0, id_ϕ=0) and the points N_θ(i) of each positive elevation level of index i=1, . . . , N_ϕ−1, also called elevation layer Ci, these being symmetrical between the Northern and Southern hemisphere and therefore counted in duplicate.

The spherical grid is therefore defined as the following spherical vector quantization dictionary (with a radius assumed to be equal to 1 by convention):

${(s \cdot \hat{ϕ} (i), \hat{θ} (i, j)) ❘ s = - 1, 1 if i > 1, 1 if i = 0; i = 0, \dots, N_{ϕ} - 1; j = 0, \dots, N_{θ} (i) - 1}$

It should be noted that the value of s for i=0 is arbitrary, since {circumflex over (ϕ)}(i=0)=0.

In the preferred embodiment, a 3D grid is defined for a given bit budget on 16 bits, for example, thus giving a total number of points of the sphere, that is to say N_tot=2¹⁶. In some variants, other values of N_tot(and therefore bit budget values) will be possible.

The elevation is coded by scalar quantization on N_ϕ reconstruction levels. In the preferred embodiment, N_ϕ=122 is set as the number of positive levels, as in the grid in the MASA format described above. This makes it possible in particular to have an even number of levels in the Northern hemisphere (N_ϕ including the North pole and the equator). If also taking into account the Southern hemisphere, the elevation is therefore coded on 2N_ϕ−1 levels (counting the equator only once). The inclusion of the poles allows a complete representation of the sphere, and the impact is minimal since only 2 points of the grid are associated with the poles (when the sign is applied).

For the elevation coding by scalar quantization, a uniform quantization step δ_ϕ (outside poles) is defined and the following is adopted:

$\hat{ϕ} (i) = i δ_{ϕ} for i = 0, \dots, N_{ϕ} - 2 and \hat{ϕ} (i) = 90 for i = N_{ϕ} - 1$

with for example δ_ϕ=0.7388 degrees, as in the grid in the MASA format described above. The quantization step is uniform over the interval [−{circumflex over (ϕ)}(N_ϕ−2), {circumflex over (ϕ)}(N_ϕ−2)] or [−(N_ϕ−2)δ_ϕ, (N_ϕ−2)δ_ϕ], if the sign is taken into account.

The azimuth θ is coded by scalar quantization on N_θ(i) levels. Use is preferably made of a uniform scalar quantization with a uniform scalar quantization dictionary, taking into account the cyclic nature of the interval [−180,180] degrees:

$\hat{θ} (i, j) = {\begin{matrix} 360 j / N_{θ} (i) - 180 i even \\ 360 (i + 0.5) / N_{θ} (i) - 180 i odd \end{matrix}$

The azimuth dictionaries have an offset, as it is known, set to 0 for even values of i and

$\frac{π}{N_{θ} (i)}$

for odd values of i, in order to “shift” the “horizontal slice” (spherical layer) of the sphere (delimited by the elevation decision thresholds) associated with each elevation of index i such that the coded azimuths are aligned as little as possible from one successive layer to another.

In some variants, a uniform scalar quantization over the interval [0,90] degrees (including the values 0 and 90 as reconstruction levels) may be used for the elevation coding:

$\hat{ϕ} (i) = \frac{i}{N_{ϕ} - 1} 9 0, i = 0, \dots, N_{ϕ} - 1$

This is tantamount to changing the quantization step δ_ϕ in order to have δ_ϕ=90/(N_ϕ−1), that is to say δ_ϕ≈0.7438 when N_ϕ=122; in this case, the poles are naturally included as codewords. The quantization step is uniform over the interval [−{circumflex over (ϕ)}(N_ϕ−1), {circumflex over (ϕ)}(N_ϕ−1)], that is to say [−90,90] degrees, if the sign is taken into account.

In other variants, it is possible to change the number of levels N_ϕ or to take other definitions from the scalar quantization dictionary {{circumflex over (ϕ)}(i), i=0, . . . , N_ϕ−1} for the (positive or absolute) elevation. It will however be assumed that {circumflex over (ϕ)}(i=0)=0° and {circumflex over (ϕ)}(i=N_ϕ−1)=90°.

In other variants, the offset applied to the azimuth depending on the elevation layer may be different, the important aspect being that the number of azimuth levels is defined according to the invention.

According to the invention, the number of points per level (N_θ(i)) is determined on the basis of two successive cumulative cardinality values (cumN(i), cumN(i−1)), the cumulative cardinality value (cumN(i)) for a coded elevation index (i) being representative of a number of points proportional to a total number of points, and according to the area of a spherical zone comprising at least one zone delimited by the upper horizontal plane

$(ϕ = (i + \frac{1}{2}) δ_{ϕ})$

of the given positive elevation level (i) and a lower horizontal plane (for example ϕ=δ_ϕ/2). In the preferred embodiment, this spherical zone also comprises the symmetrical part in the Southern hemisphere comprising a zone delimited by the upper horizontal plane

$(ϕ = - (i + \frac{1}{2}) δ_{ϕ})$

of the given elevation level (i) and a lower horizontal plane (ϕ=−δ_ϕ/2). The notation cumN(i) is adopted here, but it should not be confused with that used previously in the description of the prior art.

FIG. 5b illustrates these spherical zones for which the surface area of the 3D sphere is taken into account in a first and a second embodiment.

In one particular embodiment, the number of azimuth levels N_θ(i) is determined by predefined values for N_θ(0) and N_θ(N_ϕ−1), which correspond to the equator and to one of the poles, respectively.

These predetermined numbers of points (N_θ(0) and N_θ(N_ϕ−1)) are taken into account when determining the cumulative cardinality values and to define the total number N_tot′ of points used for this determination.

In this embodiment, the total number of points (N_tot′) is obtained by subtracting, from a target number of points (N_tot=2¹⁶), the predetermined number of points corresponding to the equator and each of the North and South poles of the sphere according to the following expression: N_tot′=N_tot−N_θ(0)−2N_θ(N_ϕ−1), N_totbeing the target number of points of the sphere for a given bit budget, N_θ(0), the predetermined number of points for the elevation level corresponding to the equator and 2N_θ(N_ϕ−1) the predetermined number of points for the North and South poles of the sphere.

In the main embodiment, N_θ(0) is an even value, N_θ(N_ϕ−1)=1 and therefore when N_totis even, N_tot′ is also even.

In one example illustrated in FIG. 5b, the area of a spherical zone delimited by the upper horizontal plane

$(ϕ = (i + \frac{1}{2}) δ_{ϕ})$

of the given positive elevation level (i) and this same plane of the sphere symmetrical with respect to the equator

$(ϕ = - (i + \frac{1}{2}) δ_{ϕ})$

is illustrated by the hatched zone A_i. The area corresponding to the elevation level of the equator, shown at A₀, is subtracted from this area in order to determine the cumulative cardinality value (cumN(i)) of the given positive elevation level (i).

For this purpose, a number of points is estimated based on the ratio (A_i−A₀)/(A_N_ϕ_-2−A₀). This number of points is proportional to the total number of points N_tot′ expressed above, according to the following ratio:

$\frac{(A_{i} - A_{0})}{(A_{N_{ϕ} - 2} - A_{0})} N_{tot}^{'}$

By design, this ratio gives exactly N_tot′ when i=N_ϕ−2, thereby guaranteeing that the total number of points is used in full.

Since this ratio is generally a fractional number, it will have to be rounded to obtain the cumulative cardinality, and since the number of points is determined in the main embodiment for both the Northern and Southern hemispheres (thus in duplicate), the rounding will in this case be carried out to an even integer (the nearest lower or higher one).

It will be recalled that the surface area of an element around the point (θ, ϕ) on the sphere δ₂is given by dA=r²cos ϕdθdϕ, where ϕ here is the elevation (if colatitude were to be used, this would give a term in sin ϕ). The partial surface area defined by a spherical zone delimited by two horizontal planes brought about by an elevation interval [ϕ_min, ϕ_max], where −90°≤ϕ_min<ϕ_max≤90°, the azimuth being over [−180°, 180°], is given by:

$A (ϕ_{m i n}, ϕ_{ma x}) = r^{2} \int_{- 1 8 0^{\circ}}^{180^{\circ}} d θ \int_{ϕ_{m i n}}^{ϕ_{m ax}} \cos ϕ d ϕ = 2 π r^{2} (\sin ϕ_{m ax} - \sin ϕ_{m i n})$

In particular, this gives the known result that the surface area of the sphere custom-character ₂of radius r is A_tot=A(−90°, 90°)=4πr²(for ϕ_min=−90° and ϕ_max=) 90°.

For a remaining number of points N_tot′ in the spherical grid (or spherical vector quantization dictionary) to be distributed in a spherical zone (subset of the surface of the 3D sphere) delimited by the horizontal planes

$ϕ = - (N_{ϕ} - 2 + \frac{1}{2}) δ_{ϕ}$

outside the central zone corresponding to the equator

$(ϕ = \pm \frac{δ_{ϕ}}{2}),$

each decision region associated with a point of the grid is approximated here by a “spherical rectangle” for indexing purposes (this corresponding to a separate coding decision in relation to the spherical coordinates). Each of these regions should ideally have a surface area of 4πr²/N_tot′ if the grid is uniform.

For a uniform discretization of the elevation over the interval [−(N_ϕ−2)δ_ϕ, (N_ϕ−2)δ_ϕ], as in the main embodiment, it is therefore possible to estimate the number of points on the grid contained within a spherical zone (or “spherical slice”) delimited by two horizontal planes associated with the decision thresholds

$ϕ_{m i n} = \frac{δ_{ϕ}}{2} and ϕ_{m ax} = (i + \frac{1}{2}) δ_{ϕ}$

of the positive part (Northern hemisphere) of the sphere.

According to the ratio expressed above, expressing a simple rule of three

$\frac{(A_{i} - A_{0})}{(A_{N_{ϕ} - 2} - A_{0})} N_{tot}^{'},$

it is possible to express

$\frac{(A_{i} - A_{0})}{2} = A (ϕ_{m i n}, ϕ_{m ax}) with ϕ_{m i n} = \frac{δ_{ϕ}}{2} and ϕ_{m ax} = (i + \frac{1}{2}) δ_{ϕ},$

and with

$A (\frac{δ_{ϕ}}{2}, (i + \frac{1}{2}) δ_{ϕ}) = 2 π r^{2} (\sin (i + \frac{1}{2}) δ_{ϕ} - \sin \frac{δ_{ϕ}}{2})$

In one exemplary embodiment, the following is set:

N
_θ(0)=430

In some variants, the value of N_θ(0) may be different but even.

Moreover, by convention N_θ(N_ϕ−1)=1 is set, since a single point is sufficient to represent a pole.

The number of points per elevation level i is expressed by:

$N_{θ} (i) = \frac{(cumN (i) - cumN (i - 1))}{2}$

where

cumN(0)=0

$cumN (i) = 2 {Arr}_{i} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i + \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}) for i = 1, \dots, N_{ϕ} - 2 with N_{tot}^{'} = N_{tot} - N_{θ} (0) - 2 N_{θ} (N_{ϕ} - 1)$

and Arr_i( ) is a rounding to the (nearest lower or higher) integer depending on i. In the preferred embodiment, Arr₁( ) is taken as the rounding to the upper integer, and Arr_i( ) is taken as the rounding to the closest integer for i=2, . . . , N_ϕ−2.

It should be noted that the function 2Arr_i(x/2) corresponds in fact to a rounding to a (nearest lower or higher) even integer, thereby making it possible to divide the result by two in order to assign the integer half to each of the hemispheres.

It should be noted that, by definition,

$cumN (N_{ϕ} - 2) = 2 {Arr}_{i} (\frac{N_{tot}^{'}}{2}) = N_{tot}^{'} .$

Moreover, it should be noted that the notation cumN(i) is adopted here even though it is different from the one used previously in the description of a MASA format proposed in the prior art. Indeed, here cumN(i) corresponds to the cumulative cardinality of the spherical grid up to and including elevation layer i (with the layers in the Northern and Southern hemispheres), but not counting the equator. The definition of cumN(i) for i=1, . . . , N_ϕ−2 in the description of the invention therefore corresponds to the equivalent of cumN(2i)−N_θ(0) in the definition from the prior art.

In variants where other quantization levels are defined for the elevation, the definition of cumN(i) according to sin

$((i + \frac{1}{2}) δ_{ϕ}),$

i=0, . . . , N_ϕ−2, will be adapted by replacing

$(i + \frac{1}{2}) δ_{ϕ}$

with other corresponding decision thresholds in the form

$\frac{\hat{ϕ} (i) + \hat{ϕ} (i + 1)}{2} .$

Thus, more generally, it will be possible to write:

$cumN (i) = 2 {Arr}_{i} (\frac{N_{tot}^{'}}{2} \frac{\sin (\frac{\hat{ϕ} (i) + \hat{ϕ} (i + 1)}{2}) - \sin (\frac{\hat{ϕ} (0) + \hat{ϕ} (1)}{2})}{\sin (\frac{\hat{ϕ} (N_{ϕ} - 2) + \hat{ϕ} (N_{ϕ} - 1)}{2}) - \sin (\frac{\hat{ϕ} (0) + \hat{ϕ} (1)}{2})}) for i = 1, \dots, N_{ϕ} - 2.$

One example of values obtained for the preferred embodiment is given below:

- N_θ(i=0, . . . , 121)=423 422 422 422 422 421 421 420 420 419 418 417 417 416 414 414 412 412 409 409 407 406 404 402 401 399 397 395 394 391 389 387 385 383 380 378 375 373 370 368 365 362 359 356 354 350 347 344 341 338 334 331 328 324 321 317 313 310 306 302 299 294 291 287 282 279 274 270 266 262 258 253 249 244 240 235 231 226 222 217 212 208 202 198 194 188 183 179 173 169 163 159 153 148 144 138 133 127 123 117 112 107 102 96 91 85 81 75 69 64 59 53 48 43 37 32 26 21 15 10 1]

It may easily be verified that:

N_θ(0)+2Σ_i=1^N^ϕ^-1N_θ(i)=65536, this indeed corresponding to N_tot.

The cardinality N_θ(i) according to the invention thus makes it possible to guarantee that there is no unused index for a given total number N_tot. This property stems from the fact that the cumulative cardinality cumN(i) is defined such that cumN(N_ϕ−2)=N_tot′.

In another exemplary embodiment, predetermined numbers are not set for the elevation levels corresponding to the equator.

In this case, the cumulative cardinality value is a rounded value of the following ratio:

$\frac{A_{i}}{A_{N_{ϕ} - 2}} (N_{tot} - 2)$

with for example N_tot=2¹⁶.

This then gives (outside the poles)

$cumN (i) = 2 {Arr}_{i} (\frac{N_{tot} - 2}{2} \frac{\sin ((i + \frac{1}{2}) δ_{ϕ})}{\sin ((N_{ϕ} - \frac{3}{2}) δ_{ϕ})})$

- for i=0, . . . , N_ϕ−2. In this case: N_θ(0)=cumN(0)
- and

$N_{θ} (i) = \frac{(cumN (i) - cumN (i - 1))}{2}$

- for i=1, . . . , N_ϕ−2. N_θ(N_ϕ−1)=1 is also set.

One example is given below of values obtained for this variant definition of cumN(i) in the case where N_ϕ=122 and δ_ϕ is defined according to the preferred embodiment:

- N_θ(i=0, . . . , 121)=423 422 423 422 421 422 420 421 419 419 419 417 417 416 414 414 412 412 410 408 407 406 404 403 400 400 397 395 393 392 389 387 385 383 380 378 375 373 370 368 365 362 359 356 354 350 348 344 341 337 335 331 328 324 321 317 313 310 306 302 299 294 291 287 282 279 274 271 266 261 258 253 249 244 240 236 230 227 221 217 213 207 203 198 193 188 184 178 174 168 164 158 154 148 144 138 133 127 123 117 112 107 102 96 91 85 81 75 69 64 59 53 48 43 37 32 26 21 15 10 1]

It may in this case too easily be verified that:

- N_θ(0)+2Σ_i=1^N^ϕ^-1N_θ(i)=65536, this indeed corresponding to N_tot.

In variants where other quantization levels are defined for the elevation, the definition of cumN(i) according to sin

$((i + \frac{1}{2}) δ_{ϕ}),$

i=0, . . . , N_ϕ−2, will be adapted by replacing

$(i + \frac{1}{2}) δ_{ϕ}$

with other corresponding decision thresholds in the form

$\frac{\hat{ϕ} (i) + \hat{ϕ} (i + 1)}{2} .$

In other variants, predetermined numbers are not set for the elevation levels corresponding to the equator (N_θ(0)) or to the North and South poles (N_θ(N_ϕ−1)). This variant applies in particular to the case where the scalar quantization of ϕ is uniform over the interval [0,90] degrees (including the values 0 and 90 as reconstruction levels) with:

$\hat{ϕ} (i) = \frac{i}{N_{ϕ} - 1} 90, i = 0, \dots, N_{ϕ} - 1 with δ_{ϕ} = 90 / N_{ϕ} - 1.$

In this case, it is possible to define:

$cumN (i) = 2 {Arr}_{i} (\frac{N_{tot}}{2} \sin ((i + \frac{1}{2}) δ_{ϕ})) for i = 0, \dots, N_{ϕ} - 2, and cumN (N_{ϕ} - 1) = N_{tot}$

this gives:

N
_θ(0)=cumN(0)

and

$N_{θ} (i) = \frac{(cumN (i) - cumN (i - 1))}{2} for i = 1, \dots, N_{ϕ} - 1.$

One example is given below of values obtained for this variant definition of cumN(i) in the case where N_ϕ=122 and δ_ϕ is defined according to the preferred embodiment:

- N_θ(i=0, . . . , 121)=425 425 425 425 425 424 423 423 423 422 421 420 419 419 417 416 415 414 413 411 410 408 406 405 403 402 399 398 395 394 391 390 387 384 382 380 377 375 372 369 367 364 360 358 355 352 349 345 342 339 336 332 328 325 322 318 314 310 307 302 299 295 291 287 283 278 275 270 266 261 257 253 248 244 239 235 230 225 221 215 212 206 201 197 191 187 182 177 171 167 161 157 151 146 141 136 130 125 120 114 110 104 98 93 88 82 77 72 66 60 55 50 44 38 34 27 22 17 11 5 1]

It may in this case too easily be verified that:

- N_θ(0)+2Σ_i=1^N^ϕ^-1N_θ(i)=65536, this indeed corresponding to N_tot.

In a second example illustrated in FIG. 5b, the area of a spherical zone delimited by the upper horizontal plane

$(ϕ = (i + \frac{1}{2}) δ_{ϕ})$

of the given positive elevation level (i) and that of the equator is illustrated by the zone denoted A′i. Half of the area corresponding to the elevation level of the equator, shown at A0, is subtracted from this area in order to determine the cumulative cardinality value (cumN(i)) of the given positive elevation level (i).

For this purpose, a number of points is estimated based on the ratio (A′_i−A₀/2)/(A′_N_ϕ_-2−A₀/2). This number of points is proportional to the total number of points N_tot′ expressed above, according to the following ratio:

$\frac{A_{i}^{'} - A_{0} / 2}{A_{N_{ϕ} - 2}^{'} - A_{0} / 2} N_{tot}^{'},$

with A′_N_ϕ_-2the area of the spherical zone delimited by the upper horizontal plane

$(ϕ = (N_{ϕ} - \frac{1}{2}) δ_{ϕ})$

of the positive elevation level N_ϕ−2 and that of the equator.

The result is equivalent to what was described above, since

$\frac{A_{i}^{'} - A_{0} / 2}{A_{N_{ϕ} - 2}^{'} - A_{0} / 2} = \frac{2 A_{i}^{'} - A_{0}}{2 A_{N_{ϕ} - 2}^{'} - A_{0}} = \frac{A_{i} - A_{0}}{A_{N_{ϕ} - 2} - A_{0}}$

In one variant embodiment, the cumulative cardinality value may be expressed taking into account only the number of points of the positive part of the sphere.

In this scenario, a number of points is estimated based on the ratio (A′_i−A₀/2)/(A′_N_ϕ_-2−A₀/2). This number of points is proportional to the total number of points N_tot′ expressed above, according to the following ratio:

$\frac{(A_{i}^{'} - A_{0} / 2)}{(A_{N_{ϕ} - 2}^{'} - A_{0} / 2)} \frac{N_{tot}^{'}}{2}$

The expression of the cumulative cardinality value is given for example by:

${cumN}^{'} (i) = A r r_{i} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i + \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})})$

$for i = 1, \dots, N_{ϕ} - 2$

$with N_{tot}^{'} = N_{tot} - N_{θ} (0) - 2 N_{θ} (N_{ϕ} - 1)$

And Arr_i( ) is a rounding to the nearest integer depending on i. In the preferred embodiment, Arr₁( ) is taken as the rounding to the upper integer, and Arr_i( ) is taken as the rounding to the closest integer for i=2, . . . , N_ϕ−2.

And the number of points per elevation level i is expressed by:

$N_{θ} (i) = ({cumN}^{'} (i) - {cumN}^{'} (i - 1))$

FIG. 6a describes a method for coding spherical coordinates (ϕ, θ) of an input point (E201) on a 3D sphere. This coding method may be implemented, in one embodiment, by block 312 from FIG. 3 for the MASA data format or in block 140 from FIG. 1 for a DiRAC coder.

The quantization of the spherical coordinates and the search is carried out as follows:

- The elevation ϕ is first of all coded in E202-1. For the search to be optimum, it is necessary to select the 2 values sgn_ϕ·{circumflex over (ϕ)}(i₁), sgn_ϕ·{circumflex over (ϕ)}(i₂) where sgn_ϕis the sign of ϕ and {circumflex over (ϕ)}(i_k) is the coded absolute elevation, where k=1 or 2. This coding may be carried out by searching for the 2 nearest neighbors in the determined elevation dictionary of N_ϕlevels (E203-1). Preferentially, the exhaustive search in the elevation scalar dictionary will be replaced by direct determination of the elevation index i₁by a rounding:

$i_{1} = \min ([\frac{❘ ϕ ❘}{δ_{ϕ}}], N_{ϕ} - 1)$

$i_{1} \leftarrow \arg \min_{i = N_{ϕ} - 1, N_{ϕ} - 2} ❘ ❘ ϕ ❘ - \hat{ϕ} (i) ❘, if i_{1} \geq N_{ϕ} - 1$

- The last step is necessary here because, in the preferred embodiment: {circumflex over (ϕ)}(i)=iδ_ϕ is adopted for i=0, . . . , N_ϕ−2 and {circumflex over (ϕ)}(i)=90 is adopted for i=N_ϕ−1
- The quantization step is thus uniform only over the interval [−(N_ϕ−2)δ_ϕ, (N_ϕ−2)δ_ϕ]. For the codewords of index i=N_ϕ−1, N_ϕ−2, an explicit nearest neighbor search is required.
- In some variants where the step is uniform over [−90, 90] degrees, it is possible to take:

$i_{1} = [\frac{❘ ϕ ❘}{δ_{ϕ}}]$

- With δ_ϕ=90/(N_ϕ−1), knowing that:

$\hat{ϕ} (i) = \frac{i}{N_{ϕ} - 1} 9 0, i = 0, \dots, N_{ϕ} - 1$

- The index i₂may be determined as described above in the MASA method from the prior art, namely:
  - i₂=1 if i₁=0

$i_{2} = N_{ϕ} - 2 if i_{1} = N_{ϕ} - 1$

$i_{2} = \arg \min_{i = i_{1} - 1, i_{1} + 1} ❘ ❘ ϕ ❘ - \hat{ϕ} (i) ❘ if 0 < i_{1} < N_{ϕ} - 1$

- It should be noted in all cases that the values i₁and i₂may be swapped without this changing the result of the coding.
- The azimuth θ is coded by way of uniform scalar quantization in E204-2 with an adaptive number of levels N_θ(i) where i=i₁or i₂, determined according to the exemplary embodiments described in the MASA method from the prior art (E203-2), in order to obtain, in E204-2, the two values {circumflex over (θ)} (i₁, j₁), {circumflex over (θ)}(i₂, j₂), respectively. More specifically, it is possible to take:

$j_{k} = \mod_{N_{θ} (i_{k})} ⌊ \frac{θ - Δ + 180 / N_{θ} (i_{k})}{360 / N_{θ} (i_{k})} ⌋$

- This therefore gives two candidates (sgn_ϕ·{circumflex over (ϕ)}(i_k), {circumflex over (θ)} (i_k,j_k)) in step E205.
- In step E206, the closest candidate (ϕ, θ) is selected according to k, for example as in the MASA method from the prior art.

$d (i_{k}, s_{k}, j_{k}) = - (\sin ϕ \sin \hat{ϕ} (i_{k}) + \cos ϕ \cos \hat{ϕ} (i_{k}) \cos (θ - \hat{θ} (i_{k}, j_{k})))$

- The closest pair (sgn_ϕ·{circumflex over (ϕ)}(i_k), {circumflex over (θ)}(i_k, j_k)) is selected as quantized value to be indexed. This selected point is denoted (sgn_ϕ·{circumflex over (ϕ)}(id_ϕ), {circumflex over (θ)}(id_ϕ, id_θ)).
- In some variants, the distance criterion may be evaluated by converting the points into Cartesian coordinates in order to evaluate the Euclidean distance that is to be minimized or the scalar product that is to be maximized.

The quantization indices selected in E206 correspond to the selected point: (sgn_ϕ·{circumflex over (ϕ)}(id_ϕ), {circumflex over (θ)}(id_ϕ, id_θ)).

The indexing step E207 consists, based on the information sgn_ϕ, id_ϕand id_θ, in determining a unique index 0≤index<N_totto be transmitted.

In this step, a global quantization index is determined on the basis of the separate indices resulting from the separate quantization of the spherical coordinates for the selected closest point.

This step is now described with reference to FIG. 6b, showing one exemplary embodiment. Based on the information sgn_ϕ, id_ϕand id_θ, in E600, the value of id_ϕis tested in E601 and E603.

- if id_ϕ=0 in E601, then index=id_θ(E602)
- or
- if id_ϕ=N_ϕ−1 in E603, then index=N_tot−2+(sgn_ϕ<0) (E604)

In other cases,

$\begin{matrix} index = offset + {id}_{θ} & (E 605) \end{matrix}$

$With$

$offset = N_{θ} (0) + 2 A r r_{i d_{ϕ}} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i + \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}) - {\begin{matrix} 2 N_{θ} ({id}_{ϕ}) & if {sgn}_{ϕ} > 0 \\ N_{θ} ({id}_{ϕ}) & if {sgn}_{ϕ} < 0 \end{matrix}$

It will be recalled here that the term

$N_{θ} (0) + 2 A r r_{i_{d_{ϕ}}} (\frac{N_{t o t}^{'}}{2} \frac{\sin ((i d_{ϕ} + \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})})$

corresponds to the cumulative cardinality of the spherical grid up to and including elevation layer i (with the layers in the Northern and Southern hemispheres, and with the equator).

The index of a point (codeword) in elevation layer i being of the form:

$index = offset + {id}_{θ}$

the value of offset must therefore correspond to the cumulative cardinality up to the first point (codeword)—exclusive—of elevation layer i. In addition, the positive elevation layer i (Northern hemisphere) comes, by convention, before the negative elevation layer i, but these two layers have the same number of points N_θ(i).

The value offset is thus given by the cumulative cardinality including these positive and negative layers of index i, but subtracting either 2N_θ(i) when it is the positive layer or N_θ(i) when it is the negative layer.

For the coding of a point (or codeword) in this same layer, it is possible, as an equivalent, to define:

$offset = N_{θ} (0) + 2 A r r_{i d_{ϕ}} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i d_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}) + {\begin{matrix} 0 if {sgn}_{ϕ} > 0 \\ N_{θ} ({id}_{ϕ}) if {sgn}_{ϕ} < 0 \end{matrix}$

Here, the term

$N_{θ} (0) + 2 {Arr}_{{id}_{ϕ}} (\frac{N_{t o t}^{'}}{2} \frac{\sin ((i d_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})})$

gives the cumulative cardinality up to the elevation layer of index i−1. This value corresponds directly to the value offset for the positive elevation layer of index i, and must be corrected by N_θ(id_ϕ) for the negative layer of index i.

According to the invention, this analytical method for determining the value offset gives the same result as the following sum, but with reduced complexity because the determination is more direct when N_ϕis high (for example N_ϕ=122):

$offset = N_{θ} (0) + 2 \sum_{i = 1}^{i d_{ϕ} - 1} N_{θ} (i) - {\begin{matrix} 2 N_{θ} (i d_{ϕ}) & if {sgn}_{ϕ} > 0 \\ N_{θ} (i d_{ϕ}) & if {sgn}_{ϕ} < 0 \end{matrix}$

$and$

$offset = N_{θ} (0) + 2 \sum_{i = 1}^{i d_{ϕ} - 2} N_{θ} (i) + {\begin{matrix} 0 & if {sgn}_{ϕ} > 0 \\ N_{θ} (i d_{ϕ}) & if {sgn}_{ϕ} < 0 \end{matrix}$

The global index index is thus obtained in E606, through separate coding of the separate quantization indices sgn_ϕ, id_ϕand id_θof the best candidate and through the use of the corresponding cumulative cardinality values.

It should be noted that the determination of the value offset is described here for the interval N_θ(0)≤index<N_tot−2. In some variants, the interval in question will be able to be divided into sub-intervals of indices, and the value of offset will be able to be determined either analytically or by direct summing, with N_θ(i) defined according to the invention, according to the sub-interval under consideration.

In one embodiment, it will be possible to use pre-storage (tabulation) of the cumulative cardinality values offset according to id_ϕ and sgn_ϕ, which gives (analytically or by direct summing) the result of the cumulative sum of cardinalities of successive spherical layers (or “sets of horizontal slices”). This sum may be interpreted as the cardinality of a spherical zone (the number of points of the partial grid ranging from the elevation of index 0 to the elevation of index i, alternating between Northern and Southern hemisphere).

In some variants, it will be possible not to store the values offset according to id_ϕand sgn_ϕ, but to compute them “online” (on the fly) based on the definition of offset as the cumulative sum of N_θ(i) with the correction on the basis of id_ϕand sgn_ϕ.

However, this adds computational complexity that may be non-negligible if the grid contains a large number of elevation levels (N_ϕhigh).

In some variants, it will be possible to replace offset using the definition of cumN′ and taking into account the fact that the cardinality in this case corresponds to a hemisphere.

The corresponding decoding method is now described with reference to FIG. 7a. This decoding method may be implemented, in one embodiment, by block 321 from FIG. 3 for the MASA data format or in block 250 from FIG. 2 for a DiRAC decoder.

Like for the coding, the spherical quantization dictionary is defined on a 3D sphere by an elevation decoding and an azimuth decoding. This spherical quantization dictionary is illustrated and described with reference to FIGS. 5a and 5b above.

In the same way as for the coding, the elevation decoding uses a scalar quantization, giving at least one decoded elevation index (i) on a number of elevation levels (N_ϕ), the azimuth decoding uses a scalar quantization, according to a number of points per level (N_θ(i)) depending on the decoded elevation index (i), the number of points per level (N_θ(i)) is determined on the basis of two successive cumulative cardinality values (cumN(i), cumN(i−1)), the cumulative cardinality value (cumN(i)) for a decoded elevation index (i) being representative of a number of points proportional to a total number of points and according to the area of a spherical zone comprising at least one zone delimited by the upper horizontal plane

$(ϕ = (i + \frac{1}{2}) δ_{ϕ})$

of the positive elevation level of the decoded elevation index (i) and a lower horizontal plane of the sphere.

First of all, indexing described with reference to FIG. 6b is assumed.

Given the global index index in step E210 of FIG. 7a, separate decoding of the two spherical coordinates is carried out in steps E211-1 and E211-2.

In step E212-1, as in coding step E203-1, a number of scalar quantization levels N_ϕis determined. In the main embodiment, this step is tantamount to simply setting N_ϕ=122.

The decoding of the elevation information sgn_ϕ, id_ϕis in E213-1. This step is detailed later with reference to FIG. 7b. Preferably, this decoding uses an analytical estimate of the index id_ϕ. In some variants, sgn_ϕ, id_ϕmay be decoded by searching the cardinality table computed on the fly or stored or using other methods that give an identical result.

The decoded elevation is reconstructed in E214-1 as sgn_ϕ·{circumflex over (ϕ)}(id_ϕ) where

${sgn}_{ϕ} \cdot \hat{ϕ} (i) = {sgn}_{ϕ} \cdot i δ_{ϕ} for i = 0, \dots, N_{ϕ} - 2 and$

${sgn}_{ϕ} \cdot \hat{ϕ} (i) = {sgn}_{ϕ} \cdot 90 for i = N_{ϕ} - 1$

In some variants, other uniform or non-uniform quantization dictionaries {{circumflex over (ϕ)}(i)} will be possible, in a manner identical to the coding.

The decoding of the azimuth index id_θis in E213-2. This step is detailed later with reference to FIG. 7b. The index id_θof the azimuth is obtained in the general case by subtraction according to the following formula: id_θ=index-offset based on the global index index and the decoded elevation information sgn_ϕand id_ϕ, but some special cases (id_ϕ=0 and id_ϕ=N_ϕ−1) are defined in FIG. 7b.

The value of the offset is determined as defined in the coding, and the azimuth {circumflex over (θ)} (id_ϕ, id_θ) is reconstructed in E214-2 as:

$\hat{θ} (i, j) = {\begin{matrix} 360 j / N_{θ} (i) - 180 & i even \\ 360 (i + 0 .5) / N_{θ} (i) - 180 & i odd \end{matrix}$

This gives in particular {circumflex over (θ)} (id_ϕ=N_ϕ−1, id_θ=0)=−180 with N_θ(id_ϕ=N_ϕ−1)=1. This thus gives the spherical coordinates ({circumflex over (ϕ)}(i), {circumflex over (θ)}(i, j)) of the decoded point in E215.

Steps E213-1 and E213-2 are detailed together in FIG. 7b.

Based on the global index 0≤index<N_totto be decoded (E700), the sign information sgn_ϕ=1 is set by default (E701). If the index satisfies index<N_θ(0), this indicating that it is a point on the equator (E702), direct decoding is carried out:

- id_θ=index and id_ϕ=0 is set (E703).

Otherwise, if the index satisfies index≥N_tot−2, this indicating that it is a point on the North or South pole (E704), direct decoding is carried out:

- id_ϕ=N_ϕ−1, id_θ=0 (E705). The sign sgn_ϕ is corrected from its default value to −1 (E707) if index=N_tot−1 in E706, because the indices are ordered by elevation layers alternating between Northern hemisphere and Southern hemisphere, so index=N_tot−2 corresponds to the North pole (sgn_ϕ=1) and index=N_tot−1 corresponds to the South pole (sgn_ϕ=−1).

Otherwise, in the other cases (N_θ(0)≤index<N_tot−2), in one preferred embodiment, the index id_θis for example estimated by inverting the analytical computation carried out in step E605 of FIG. 6b.

It is possible to estimate id_ϕas:

$i d_{ϕ} = [\frac{1}{δ_{ϕ}} arc \sin (x)]$

$Where x = (index - N_{θ} (0) - 2 - \frac{1}{2}) \frac{2 (\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2}))}{N_{t o t}^{'}} + \sin (\frac{δ_{ϕ}}{2})$

- and [⋅] is the rounding to the nearest integer.

In some variants, an approximation of the arcsine function is used.

The following is adopted (in E708):

$i d_{ϕ} = [\frac{1}{δ_{ϕ}} (\frac{π}{2} - \sqrt{1 - x} \cdot P (x))]$

$where$

$P (x) = 1.57078786 + x (- 0.2 1 4 1 2 4 53 + x (0.08466649 + x (- 0.0 3 5 7 5663 + 0.008648884 \cdot x)))$

- is a polynomial of degree 4.

In some variants, other approximations of the arcsine function may be used, in particular other polynomials P(x) of a different degree may be used.

It should be noted that the estimate of id_ϕ used above has the noteworthy property of being accurate to within an overestimate of id_ϕ by one unit—in general, it gives the correct value of id_ϕ, and if not, id_ϕ is underestimated by one unit.

In some variants, other estimates of id_ϕ(exact or to within a value) may be used.

In the preferred embodiment, the decoding (E709) is then carried out as follows based on the estimate of id_ϕ by inverting the arcsine function or based on an approximation. An initial value is determined for:

${offset}_{init} = N_{θ} (0) + 2 {Arr}_{i d_{ϕ}} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i d_{ϕ} + \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})})$

In some variants, it will be possible to compute, in an equivalent manner (with the same result):

${offset}_{init} = N_{θ} (0) + 2 \sum_{i = 1}^{i d_{ϕ}} N_{θ} (i)$

Based on this initial value offset_init, the values of id_ϕ, sgn_ϕ and offset may then be determined as follows:

- If index≥offset_init, id_ϕ←id_ϕ+1, offset←offset_init, otherwise:
  - offset←offset_init−N_θ(id_ϕ)
  - If index≥offset: sgn_ϕ←−1
  - Otherwise: offset←offset−N_θ(id_ϕ)

Where a←b indicates that the existing value of a is replaced by the result of the expression b.

The step of correcting id_ϕ←id_ϕ+1 when index≥offset_initis specific to the exemplary embodiment of estimating id_ϕ by inverting the arcsine function. If this step is carried out, the value of offset_initcorresponds to the cumulative cardinality of the grid up to the lower layer id_ϕ−1.

Otherwise, if this step of correcting id_ϕ is not carried out, the value of offset_initcorresponds to the cumulative cardinality of the grid up to the elevation layer id_ϕ. The justification of the steps of correcting the value offset_initin the form offset−N_θ(id_ϕ) is detailed with reference to FIGS. 8a and 8b described below.

In some variants, it will be possible to define, as initial value offset_init, the cumulative cardinality for the lower elevation layer of index i−1 and correct the value of offset_initequivalently. The initial value is given by:

${offset}_{init} = N_{θ} (0) + 2 A r r_{i d_{ϕ}} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i d_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})})$

Based on this initial value offset_init, the values of id_ϕ, sgn_ϕ and offset may then be determined as follows:

- If index≥offset_init+2N_θ(id_ϕ), id_ϕ←id_ϕ+1, offset←offset_init, otherwise:
  - If index≥offset_init+N_θ(id_ϕ): sgn_ϕ←−1 and offset←offset_init+N_θ(id_ϕ)
  - Otherwise: offset←offset_init,

Where a←b indicates that the existing value of a is replaced by the result of the expression b. In this case, the steps of correcting the value offset_initare in the form offset_init+N_θ(id_ϕ).

In some variants, it will be possible to adapt the principle of correcting the value of id_ϕ and of offset_initaccording to the method for estimating id_ϕ, in order to determine the values of id_ϕ, sgn_ϕ and offset.

In some variants where the cumulative cardinality is defined differently, with corresponding values N_θ(i), the analytical definition of the initial estimate of offset will be adapted.

In some variants, other integer and exact estimates (giving the same results) of id_ϕ and other direct or indirect methods for determining id_ϕ, sgn_ϕ and offset will be able to be used, as long as they do not change the decoding result. Indeed, since the values of id_ϕ and offset are integers, the sign sgn_ϕ being able to be seen as a signed integer, alternative methods may be implemented, as long as they give identical values for id_ϕ, sgn_ϕ and offset.

One example of a decoding variant for id_ϕ, sgn_ϕ, and offset that is of only little interest but that has the merit of illustrating one example of an alternative method would consist in simply exhaustively running through all possible values id_ϕ, sgn_ϕ and id_ϕ and in computing the corresponding index as at the encoder and in selecting the combination that leads exactly to index=offset+id_ϕ.

The decoding of the index is deterministic in the sense that, for a given value index, the values id_ϕ, sgn_ϕ and id_ϕ are unique and are integers.

In all cases, the decoding of id_ϕ, sgn_ϕ and offset relies on the values of N_θ(i) according to the invention with the possibility of analytically determining a cumulative cardinality.

Finally, decoding id_θ(E710) is tantamount simply to subtracting the decoded cumulative cardinality value (offset) from the received global index (index):

$i d_{θ} = index - offset$

In some variants, it will be possible to replace offset using the definition of cumN′ and taking into account the fact that the cardinality in this case corresponds to a hemisphere.

FIGS. 8a and 8b illustrate the coding and decoding of the index according to the embodiment of the invention.

FIG. 8a corresponds to the case of coding or decoding of a point in the Northern hemisphere (excluding equator and pole) of the grid according to the invention, while FIG. 8b corresponds to the case of coding or decoding of a point in the Southern hemisphere (excluding equator and pole). In this example, N_ϕ=122, id_ϕ=120, and sgn_ϕ=1, id_θ=3, index=65517 in FIG. 8a and sgn_ϕ=−1, id_ϕ=5, index=65529 in FIG. 8b. In this example, N_θ(id_ϕ)=10 and the initial value of offset_initis:

${offset}_{init} = N_{θ} (0) + 2 A r r_{i d_{ϕ}} (\frac{N_{tot}^{'}}{2} \frac{\sin ((i d_{ϕ} + \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})}{\sin ((N_{ϕ} - \frac{1}{2}) δ_{ϕ}) - \sin (\frac{δ_{ϕ}}{2})})$

- thereby giving, in this example: offset_init=65534.

In FIG. 8a, the decoding of index=65517 gives rise to the following correction:

- If index≥offset_init, id_ϕ←id_ϕ+1, offset←offset_initotherwise:
  - offset←offset_init−N_θ(id_ϕ)
  - If index≥offset: sgn_ϕ←−1
  - Otherwise: offset←offset−N_θ(id_ϕ)

Thereby resulting in the value of offset being corrected by offset←offset_init−2N_θ(id_ϕ) and giving offset=65514.

In FIG. 8b, the decoding of index=65529 gives rise to the following correction:

- If index≥offset_init, id_ϕ←id_ϕ+1, offset←offset_init, otherwise:
  - offset←offset_init−N_θ(id_ϕ)
  - If index≥offset: sgn_ϕ←−1
  - Otherwise: offset←offset−N_θ(id_ϕ)

Thereby resulting in the value of offset being corrected by offset←offset_init−N_θ(id_ϕ) and giving offset=65524.

In some variants, the value of offset_initmay correspond to the cumulative cardinality up to the lower elevation layer, or it may be obtained by a direct sum based on the values N_θ(i) according to the invention.

In some variants where the cumulative cardinality is defined differently, with corresponding values N_θ(i), the analytical definition of the initial estimate of offset_initwill be adapted.

FIG. 9 illustrates a coding device DCOD and a decoding device DDEC, within the sense of the invention, these devices being dual to one another (in the sense of “reversible”) and connected to one another by a communication network RES or an internal bus BUS in a terminal (for communication between a MASA analysis module and an IVAS codec or another processing operation).

The coding device DCOD comprises a processing circuit typically including:

- a memory MEM1 for storing instruction data of a computer program within the sense of the invention (these instructions possibly being distributed between the encoder DCOD and the decoder DDEC);
- an interface INT1 for receiving an original multichannel signal B, for example a signal distributed over various channels or a parametric version in compression with source direction parameters within the sense of the invention;
- a processor PROC1 for receiving this signal and processing it by executing the computer program instructions stored in the memory MEM1, with a view to coding it; and
- a communication interface COM1 for transmitting the coded signals via the network or an internal bus of a terminal.

The decoding device DDEC comprises its own processing circuit, typically including:

- a memory MEM2 for storing instruction data of a computer program within the sense of the invention (these instructions possibly being distributed between the encoder DCOD and the decoder DDEC as indicated above);
- an interface COM2 for receiving the coded signals from the network RES or from an internal bus BUS with a view to compression-decoding them within the sense of the invention;
- a processor PROC2 for processing these signals by executing the computer program instructions stored in the memory MEM2, with a view to decoding them; and
- an output interface INT2 for delivering source direction parameters.

Of course, this FIG. 9 illustrates one example of a structural embodiment of a codec (encoder or decoder) within the sense of the invention. FIGS. 5 to 8, commented on above, describe more functional embodiments of these codecs in detail.

Although the present disclosure has been described with reference to one or more examples, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of the disclosure and/or the appended claims.

Coding and Decoding of Spherical Coordinates Using an Optimized Spherical Quantization Dictionary

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

CROSS-REFERENCE TO RELATED APPLICATIONS

PCT Information