Triangle mesh data is an important component of a virtual acoustic environment. The mesh is composed of a list of vertexes and a list of triangle faces. Each vertex is a point in 3D space, localized by its X, Y, and Z coordinates, and has an associated index in the vertex list. Each triangle identifies a simple surface, and contains three vertex indexes, and an associated acoustic material. The vertex indexes for a triangle are listed in a particular order, which defines the outside pointing normal of the simple surface.
There are many interchange and compression formats for generic triangle mesh data. However, they are usually intended for coding visual triangle mesh data, typically of objects and environments. In contrast, mesh triangle data for virtual acoustic environments and objects have several particular properties. For example, the mesh data usually contains only the acoustically relevant surfaces of enough size. A significant number of object surfaces are located on a small number of planes, or have a layered structure. Surfaces that do not contain an acoustic material are invisible for acoustic purposes and can be discarded. There may also be coordinate symmetries generated by the fact that objects with regular shapes are using a relative coordinate system centered in their apparent center of gravity. All these additional properties may be used for a more efficient and at the same time low complexity custom coding scheme.
According to an embodiment, an apparatus for decoding an acoustic environment, the acoustic environment including at least one audio source and at least one audio object, the at least one audio object being represented by a structural-acoustic data which links positional data of polygons with acoustic properties of acoustic materials, wherein the positional data includes, for each polygon, the position of the vertexes, may have: a bitstream reader for reading, from the bitstream, an encoded version of structural-acoustic data and at least one audio stream to be rendered as generated by the at least one audio source in the acoustic environment; an audio source decoding block to decode the at least one audio stream representing the at least one audio source; a structural-acoustic data decoding block to decode the structural-acoustic data, wherein the structural-acoustic data decoding block uses, for at least one dimension, an ordered shortlist, in which coordinate values of previously decoded vertexes are stored according to an order, wherein the structural-acoustic data decoding block is configured, in case the bitstream has encoded therein an ordinal value of the ordered shortlist, to reconstruct the coordinate value as the value stored in the ordered shortlist associated with the ordinal value.
According to another embodiment, a method for decoding an acoustic environment, the acoustic environment including at least one audio source and at least one audio object, the at least one audio object being represented by a structural-acoustic data list which links positional data of polygons onto structural-acoustic properties of materials, wherein the positional data includes, for each polygon, the position of one primary structural-acoustic vertex and the position of the remaining structural-acoustic vertexes, may have the steps of: reading, from the bitstream, an encoded version of structural-acoustic data and at least one audio stream to be rendered as generated by the at least one audio source in the acoustic environment; decoding the at least one audio stream; and decoding the structural-acoustic data, the method using, for at least one dimension, an ordered shortlist, in which coordinate values of previously decoded vertexes are stored according to an order, wherein, in case the bitstream has encoded therein an ordinal value of the ordered shortlist, to reconstruct the coordinate value as the value stored in the ordered shortlist associated with the ordinal value.
In accordance to an example, there is provided an apparatus for decoding an acoustic environment, the acoustic environment including at least one audio source and at least one audio object, the at least one audio object being represented by a structural-acoustic data which links positional data of polygons with acoustic properties of acoustic materials, wherein the positional data includes, for each polygon, the position of the vertexes, the apparatus comprising:
There is also provided an apparatus for encoding an acoustic environment, the acoustic environment including at least one audio source and at least one audio object, the at least one audio object being represented by at least one structural-acoustic data which links positional data of polygons with acoustic properties of acoustic materials, wherein the structural-acoustic data include, for each polygon, the position the vertexes, the apparatus comprising:
There is also provided a method for encoding an acoustic environment, the acoustic environment including at least one audio source and at least one audio object, the at least one audio object being represented by at least one structural-acoustic data which links positional data of polygons onto structural-acoustic properties of materials, wherein the positional data include, for each polygon, the position of one primary polygonal vertex and the position of the remaining polygonal vertexes, the method comprising:
There is also provided a bitstream encoding audio information in which an acoustic environment is encoded, the acoustic environment including at least one audio source and at least one audio object, the at least one audio object being represented by at least one structural-acoustic data list which maps positional data of polygons onto acoustic materials, wherein the positional data include, for each polygon, the position of one vertex, the bitstream comprising:
There is also provided a non-transitory storage unit storing instructions which, when executed by a processor, cause the processor to
There is also provided a non-transitory storage unit storing instructions which, when executed by a processor, cause the processor to
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
Encoder
In general terms, the encoder 200 may be seen as an apparatus for encoding an acoustic environment, the acoustic environment including at least one audio source and at least one audio object, the at least one audio object being represented by at least one structural-acoustic data list which links positional data of polygons with acoustic properties of acoustic materials. The positional data may include, for each polygon, the position of one primary polygonal vertex (110ax, 110ay, 110az) and the position of the remaining polygonal vertexes (110b, 110c, 120b). The apparatus may comprise:
Also the audio stream 212 is in general associated with audio source positional data, so that the audio source 211 represented by the audio stream 212 can correspond to determined positions in the acoustic environment in which they are virtually generated. In general terms, the at least one audio source, which is encoded in the bitstream 204 in association with the position in which it is virtually generated in the acoustic environment, is also encoded with side information providing its virtual position in the acoustic environment. Therefore, spatial data may also be encoded, as side information of the at least one audio stream 212, indicating positional relationships between the at least one audio source and the acoustic environment. Once decoded, the audio source will be rendered by keeping into account the spatial relationships between the audio source and the at least one audio object.
Decoder
The renderer 350 will receive the decoded environment 302 (including its components 311 and 321) to render the audio signal 301 as closest as possible to the original audio signal 202. In particular, the renderer 350 may represent the at least one audio source by keeping into account its position (e.g. virtual position) in the acoustic environment and the conditioning to which the sound is (virtually or in reality) subjected by virtue of the presence of the at least one audio object.
In general terms, the audio source, which is encoded in the bitstream 204 in association with the position in which it is virtually generated in the acoustic environment, is also encoded with side information providing its virtual position in the acoustic environment. Therefore the renderer 350 may represent the sound as being virtually generated in a particular location (e.g. indicated by the positional data of the audio source), under the effect of the presence (e.g. virtual presence) of the at least one audio object.
The decoder 300 may be an apparatus for decoding the acoustic environment 302, the acoustic environment 302 including at least one audio source and at least one audio object, the at least one audio object being represented by a structural-acoustic data list which links positional data of polygons with acoustic properties of acoustic materials. The positional data may include, for each polygon, the position of one primary structural-acoustic vertex and the position of the remaining structural-acoustic vertexes. The apparatus may comprise at least one of:
Structural-Acoustic Data
As shown in
Basically,
an x coordinate 110ax of the primary vertex 110a; and
the x coordinates 110bx and 110cx of the remaining vertexes 110b and 110c, respectively.
Analogously, in a corresponding record of the y coordinates of the structural-acoustic data list 400 (not shown in
In the second record (second horizontal row from above) of the structural-acoustic data list 400, the coordinates of the second triangle 120 are stored. It is possible to see that the coordinates of the primary vertex 110ax (but also 110ay, 110az) are repeated (for example, the x coordinate 110ax of the primary vertex 110a of the second triangle 120 repeats the same value stored for representing the primary vertex of the first triangle, despite the fact that these values are identical). The same applies to the vertex 110c whose coordinates 110cx, 110cy, 110cz are the same for the first triangle 110 and the second triangle 120.
Audio Source Encoding/Decoding
The audio source encoding block 210 and the audio source decoding block 210 are important elements of the encoder 200 and the decoder 300, respectively. The sound source to be encoded and decoded may be represented by the at least one audio stream 212, 312. Notwithstanding, it is not. The at least one sound source may be associated with positional data (e.g. metadata) which locate the position (e.g. virtual position) of the at least one sound source in the acoustic environment. Accordingly, the sound (audio signal) 301 may be rendered (e.g. by the renderer 350) based on the structural-acoustic relationships between the positional data of the at least one audio object, the acoustic properties of the materials (imagined as being the materials of the object), and the positional data of the at least one audio source. This operation may be performed by the renderer 350 at the decoder (which may be an external device).
For example, the at least one audio source may have positional data which include coordinates which permit to localize the at least one audio source in the acoustic environment, by taking into account the positional data of the at least one object (and in particular, the vertexes and the triangles) and the structural properties of the materials. The at least one audio source will therefore be localized in a particular position in the acoustic environment, and the listener will experience the sound as coming from that position and under the effect of the properties of the materials.
When it is referred to acoustic environment, therefore, reference is made not only to a spatial environment, but also to a complete audio scene which is to be encoded/decoded before being rendered. The acoustic environment has its own spatial characteristics (e.g., positional data, such as vertex list and triangle list, either compressed or non-compressed), but also the properties of the materials which constitute the objected in the environment, and also the sound which may be virtually generated at an audio source localized in a particular position in the spatial environment, and which is virtually conditioned by the structural-acoustic data (positional data and properties of the acoustic materials) which are encountered in the spatial environment.
Structural-Acoustic Data Encoding Block
The structural-acoustic data encoding block 220 may comprise a vertex list encoder 800, which may encode the vertex list 804 to obtain an encoded vertex list 808. It will be explained later how the encoder vertex list 808 may be generated.
The structural-acoustic data encoding block 220 may include a triangle list encoder 850. The triangle list encoder may be inputted by the triangle list 802 including the acoustic features 806, and the encoded vertex list 808 in the cases in which the encoded vertex list is provided in an encoded version or, as an alternative, by the vertex list 804 in a non-encoded version. Therefore, in some cases, it is not necessary that both the input 804 and 808 are provided to the triangle list encoder 850. The triangle list encoder 850 may provide an encoded triangle list 852 in which the triangle list 802 is compressed. Even though
Structural-Acoustic Data Encoding Block
Vertex Index Encoding and Decoding
It could be theoretically possible to simply encode all the coordinates of each vertex in the bitstream 204. For example, it could be possible to encode, for the primary vertex, all its x, y, z coordinates (110ax, 110ay, 110az); the same for the remaining vertexes 110b and 110c of the first triangle 110 and repeating all the fields also for the second triangle 120 (i.e., to represent all the x, y, z coordinates for the primary vertex 110a and for the remaining vertexes 120bx and 110cx). However, it has been understood that, in this way, a repetition of data fields would be caused. The fact, for example, that the coordinates of the primary vertex 110a (which is common to both the triangles 110 and 120) are repeated increases the length of the bitstream 204 and reduces efficiency.
Hence, it has been advantageous to adopt a technique according to which, for at least one dimension (x, y, z) (and in some examples for each dimension of the acoustic environment) it is possible to write the coordinate only once for a first triangle (e.g., 110), and by referring to at least one previously encoded coordinate when encoding at least one coordinate of a subsequent triangle (e.g., 120). In the example of
The example above may also apply to single coordinates of each vertex. For example, if a group of vertexes has the same x coordinate, or z coordinate, or y coordinate, they can be encoded by referring to the previous one (notably, the encoder may decide to decode them in closed succession, so that the stored coordinates are maintained in the shortlist, before the update). For example, the coordinate (whether x, y, or z) may be actually written in the bitstream 204 only for the first vertex, which is encoded, while the subsequent vertexes may be encoded by simply referring to the preceding encoded coordinate. For example, the y coordinates 110ay and 130ay of the vertex 110a (in the triangles 110 and 120) and of the vertex 130a, respectively, are the same (see
When the primary vertex 110a of the first triangle 110 is encoded, no other vertex has actually previously stored in the shortlist 450: this means that the shortlist 450 is void, and it is therefore not possible to refer to a position 455 of any previously encoded coordinate. Hence, all the binary values 160x, 160y, 160z of the mask 160 are 0 (it is here imagined that 0 means that the coordinates are to be encoded in the encoded version 222 of the structural-acoustic data 221, while the binary value 1 means that only the ordered value of the ordinate list 450 is encoded, but the binary values could have the opposite meaning in different examples). Subsequently, both the vertex index 403 is encoded (or another identifier of the vertex) and, in coordinate value data fields 170c, also the coordinate values 110ax, 110ay, 110az are encoded. The same is repeated for encoding the remaining vertexes 110b, 110c.
In examples, the vertex 110a is not encoded twice in the encoded version 222 of the structural-acoustic data 221 (or more in particular in the encoded version 808, 3808 of the encoded vertex list). Simply, the triangle list will refer to the same vertex 110a for both the triangles 110 and 120.
In general terms, for each coordinate of each vertex (primary vertex or remaining vertex), the structural-acoustic data encoding block 220 (and in particular the vertex list encoder 800) encodes: a value selected between
The choice between encoding the value coordinate and the ordinal value 455 (455x, 455y, 455z) can be made based on whether the previously encoded coordinate is in the shortlist 450.
Of course, if the second vertex of a triangle shares the same coordinate axis with another triangle (but is not coincident), then the binary values in the fields 160x, 160y, 160z of the mask 160 may be different (because may be one or two coordinates are to be actually encoded in the encoded version 122 of the structural-acoustic data 221) while at least one binary field shall be 1 (and it shall be indicated which ordered value in the encoded version 222 of the polygon data 221). This is the case of vertex 130a, which shares the same y coordinate with vertex 110a. For this reason, for each vertex, we may have some coordinates which may be written in the coded version 222 of the polygon data 221, while those that have already been written perfectly (and stored in the ordered shortlist 450) can simply be defined with ordinal values.
The shortlist 450 may be updated, e.g. in such a way that less frequent coordinate values are expelled from the shortlist 450 (e.g. by virtue of more frequent coordinate values being encoded). In addition or alternative, the shortlist 450 may be updated in such a way that the last coordinate values encoded in the bitstream 204 take over the previously encoded coordinate values. These techniques may be combined with each other: for example, a ranking may be established among the already encoded coordinates, the ranking being based on a score assigned to each already encoded coordinate based on a mixed criterion which encompasses both the frequency of the encoding of a coordinate (by increasing the score for the most frequent coordinate) and the freshness of the coordinate (by increasing the score for the last coordinate), so as to award the first positions in the shortlist 450 (associated with smaller bitlengths) to those already encoded coordinates having higher score, and by excluding from the shortlist 450 the those already encoded coordinates having lower score, to the point of excluding those already encoded coordinates having minimal score.
It is also possible to encode different polygons (e.g. triangles) having vertexes which share the same coordinates in short succession from each other, so as to increase the probability that a coordinate is already present in the shortlist 450. More in general, it is possible to order the encoding of the structural-acoustic data 221 in such a way that polygons (e.g. triangles) having vertexes which share the same coordinates are encoded at steps closer than polygons (e.g. triangles) having vertexes which do not share the same coordinates. In examples, the ordering of the encoding may be chosen in such a way that the more the common coordinates are, the closer the encoding of the vertexes.
By virtue of the techniques above, those indexes (ordinal values) 450 which are mostly used are extremely reduced in dimension, implying that also the encoded version 222 of the structural-acoustic data 221 are compressed and the bitstream 204 is reduced in length.
It is to be noted that, here above, it has been imagined that the values in the ordered shortlist 450 are encoded on the fly, by subsequently updating the ordered shortlist. This may occur, for example, in case of streaming.
Therefore, the structural-acoustic data encoding block 220 may use, for at least one dimension (x, y, z) of the acoustic environment (or of the bounding box, see below), the ordered shortlist 450, in which coordinate values of previously encoded polygonal vertexes are stored according to an order (index, ordinal value 450). The structural-acoustic data encoding block 220 may, in case a coordinate value of one current main polygonal vertex or remaining polygonal vertex is the same of one coordinate value of one previously encoded main polygonal vertexes or remaining polygonal vertexes stored in the shortlist in a determined ordinal value, to encode the ordinal value of the shortlist 450. In case the coordinate value of one current polygonal is different of any coordinate value of one previously encoded main polygonal vertex or remaining vertex stored in the shortlist 450, then the coordinate value is encoded in the bitstream 204. In turn, the structural-acoustic data decoding block 320 of the encoder 200 may also use, for at least the same dimension, an ordered shortlist, in which coordinate values of previously decoded main polygonal vertexes or remaining polygonal vertexes are stored according to an order. The structural-acoustic data decoding block 320 may, in case the bitstream 204 has encoded therein a particular ordinal value of the shortlist, reconstruct the coordinate value as the value stored in the shortlist 450 associated with the ordinal value.
Basically, the shortlist 450 at the decoder 300 may be understood as a replica of the shortlist 450 at the encoder 200.
The structural-acoustic data encoding block 220 of the encoder 200 may encode, for at least one dimension (but advantageously for each of the three dimensions), the binary mask value (160x, 160y, 160z), indicating whether the coordinate value or the ordinal value in the shortlist is encoded (in the field 170c or 170v). In turn, the structural-acoustic data decoding block 320 of the decoder 300 may evaluate, for each vertex, the binary mask value 160 (160x, 160y, 160z) indicating whether the coordinate value or the ordinal value in the shortlist is encoded in the bitstream (204). Accordingly, the structural-acoustic data decoding block 320 may determine whether each coordinate is encoded as coordinate value or as index (ordinal value).
As explained above, if two vertexes have only one or two same coordinates (like the vertexes 130a and 110a), the encoding/decoding of the ordinal value (index) instead of the coordinate value will occur only for the same coordinates, while for the different one or two coordinates there will be independently encoded/decoded the coordinate value.
As explained above, the shortlist 450 may be divided in instantiations 450x, 450y, 450z, which can be independently treated. In examples, the structural-acoustic data encoding block 220 is configured so that:
Analogously, at the decoder 300:
Triangle List Encoding/Decoding
Basically, both the encoder and the decoder comprise a second shortlist (MTF list), and the second shortlist of the decoder is understood to be a replica of the second shortlist of the encoder. Basically, the operations are the same, apart from the fact that in one case the encoded triangle list 852 is encoded and the other is decoded.
In any case, arithmetic coding may be used, in which shorter codes are assigned to more recurring index values to be encoded.
Bounding Box
It is possible to reduce the data length to be encoded in the encoded version 222 and/or in the bitstream 204 by intelligently changing the spatial coordinate system from an original spatial coordinate system to a coordinate system having advantageous peculiarities.
For example, it is possible to change the coordinate system to have the origin closer to the polygonal vertexes, therefore reducing the distances from the origin (and the length of the coordinates, as well). For example, it may be chosen to reduce the volume to be encoded to a bounding box in particular in the cases in which no vertex is excluded. In particular, a bounding box may be contained in the acoustic environment and the structural-acoustic data to be encoded in the bitstream 204 and/or in the version 222 are encoded with reference to a spatial coordinate system defined by one determined vertex of the bounding box.
An example of bounding box 500 is provided by
In addition or alternative, other kinds of optimizations may be performed, associated with the definition of the bounding box. For example, it is possible to evaluate possible recurring patterns. Were the acoustic environment presents at least one recurring pattern, it is possible to limit the bounding box to the at least one recurring pattern. The recurring pattern may be for example a symmetric pattern. For example, the symmetry could be radial symmetry or planar symmetry (other symmetries are possible). In case of symmetry (or more in general of a recurrent pattern) it is not necessary to encode all the vertexes of all the polygons, since it is possible to only encode the coordinates of the recurrent pattern (e.g. in the case of planar symmetry, of half of the symmetrical volume) so as to reduce the amount to of vertexes to be encoded in the bitstream 204. Basically, there may be defined the bounding box 500 so as to contain the recurring pattern once, without re-encoding the other recurring patterns (e.g. in the case of planar symmetry, it is not only necessary that the bounding box 500 contains the half of the symmetrical volume from the symmetry planar towards one of the two directions). Recurring pattern data are signaled in the bitstream 204 (e.g. in the case of planar symmetry, symmetry data are to be encoded so that the encoder 200 may reconstruct the shape of the represented acoustic object; e.g. in the case of a planar symmetry it is simply possible to provide information defining the symmetry plan, so that the decoder can reconstruct the final shape by reinserting the non-encoded half of the symmetric shape). The same can be performed in the case recurring patterns are periodical shapes: the bounding box may be limited to the shape which is periodically repeated, while the recurring pattern data may include the information which permits to reconstruct the final shape of the acoustic object by the decoder (e.g. including the spatial period, e.g. in the three dimensions, and so on). The same can also apply in the case of variable symmetry, according to which only an angular shape is defined, and recurring pattern data are signaled regarding the symmetry point and/or the symmetry radius, so that the decoder 300 can reconstruct the final radial symmetrical shape based on the symmetry data.
As will be apparent in the following passages, it is also possible to define a bounding block which, when the spatial coordinate system is changed onto the spatial coordinate system defined by the bounding box, the coordinates whose values have a greatest common divisor greater than 1 is maximized.
In examples, the audio source encoding block 210 of the encoder 200 may therefore define a bounding box contained in the acoustic environment and encode the structural-acoustic data within the bounding box, thereby refraining from writing structural-acoustic data in the regions outside the bounding box. The bounding box may exclude portions of the acoustic environment which do not contain any primary vertex and any remaining polygonal vertex. Information on the bounding box including positional data of the bounding box may be signalled in the bitstream 204 so as to permit the localization of the bounding box in the acoustic environment. The structural-acoustic data may be therefore subjected to a change of coordinate system onto a new coordinate system defined by the bounding box, and the coordinates of the vertexes of the polygons may therefore be encoded with reference to the new coordinate system defined by the bounding box. In turn, the audio source decoding block 310 of the decoder 300 may read, in the side information of the bitstream 204, the information on the bounding box, and in particular the positional data. Hence, the audio source decoding block 310 may localize the bounding box within the environment. Moreover, the audio source decoding block 310 may decode the structural-acoustic data within the bounding box, and, based on the localization performed through the positional data of the bounding box, the audio source decoding block 310 may reconstruct the positional data of the bounding box in the environment. The audio source decoding block 310 perform a change of coordinates form the coordinate system defined by the bounding box onto the original coordinate system of the environment, e.g. by performing the change of coordinates inverse with respect to that carried out at the encoder 200.
As explained above, the audio source encoding block 210 of the encoder 200 may also evaluate whether the acoustic environment presents at least one recurring pattern, and limit the bounding box to the at least one recurring pattern. Recurring pattern data may therefore be signaled in the bitstream 204. In this case, the audio source decoding block 310 of the decoder 300 may reconstruct the at least one acoustic object by applying a recurrence (e.g., by prolonging by symmetry, by periodicity, etc.) to the recurring pattern within the bounding box.
For example, the at least one recurring pattern may be a symmetric pattern (e.g. a plaraly symmetric pattern), and the recurring pattern data may therefore be symmetry data (e.g., the positional data indicating the position and/or the orientation of the symmetry plan) may be signalled in the bitstream 204. In turn, the audio source decoding block 310 of the decoder 300 may reconstruct the at least one object by symmetrically generating structural-acoustic data in positions symmetrical to the positions of the primary vertexes and the remaining polygonal vertexes in the bounding box (e.g. with respect to the symmetry plan).
Greatest Common Divisor
With or without the presence of the bounding box, in examples, the encoder may search for vertexes which have, at least in one coordinate, a common greatest common divisor. Hence, the encoder 200 is further configured to search, for a particular dimension of the acoustic environment to be encoded, and for a multiplicity of primary polygonal vertexes or remaining polygonal vertexes, at least one common divisor dividing the coordinates of the primary polygonal vertexes or remaining polygonal vertexes, to thereby encode a divided version of the coordinate value.
Let us consider, in
In examples, the audio source encoding block 210 of the encoder 200 may therefore search, for at least one dimension of the environment or of the bounding box, a common divisor, different from (greater than) 1, among the coordinates of a plurality of primary polygonal vertexes or remaining polygonal vertexes, to thereby encode in the bitstream 204 the common divisor and the results of the divisions of the coordinates by the common divisor. Hence, the bitstream 204 has encoded herein at least two coordinate values of at least two different vertexes in a factorized form according to a common divisor. This is signalled in the bitstream 204 (also the common divisor is encoded). In turn, the audio source decoding block 310 of the decoder 300 may reconstruct the at least two coordinate values encoded in factorized form by multiplying each of the at least two coordinate values by the common divisor, so as to reconstruct the at least two coordinate values.
In
Quantization
It is to be noted that the structural-acoustic data encoding block may preliminarily perform a quantization on the structural-acoustic data 221, eliminating duplicate vertexes and degenerate polygons. When this applies, the example discussed above (with reference to
Discussion
The new triangle mesh coding approach is composed of several stages, each contributing to improved efficiency. It is not strictly necessary to carry out all the stages together.
The first stage uniformly quantizes the vertex coordinates, using an encoder selectable quantization step and eliminates all duplicate vertexes and all duplicate and degenerate triangles. The second stage computes the bounding box for the entire list of vertexes. The third stage acts as a preprocessor, detecting implicitly reduced precision of coordinates, meaning all vertex coordinates are multiple of some integer number, separately on each dimension. The fourth stage takes advantage of geometries where a significant number of vertexes are located on common planes that are parallel with the coordinate axes. The fifth stage refines the fourth stage, by taking into account and creating a statistical model of the recency information of repeated coordinates, separately on each dimension. Each of these stages compute several model parameters for the best representation found by the encoder, which are coded very efficiently as side information together with the data itself using a range coder.
The first stage uniformly quantizes the vertex coordinates, using an encoder selectable quantization step. The quantization step is usually chosen to be small enough so that the quantization process does not introduce any acoustic artifacts, typically on the range from 1 mm to 2 cm. After quantization, all duplicate vertexes and all duplicate and degenerate triangles are eliminated. Depending on the generation algorithm for the triangle mesh, the same exact duplicate vertexes can potentially appear many times.
The second stage computes the exact bounding box for the entire list of vertexes, to exclude from coding any ranges that are not actually used. The bounding box is coded very efficiently, optimizing for some frequently encountered patterns. One frequent pattern is when a bounding box coordinate range is symmetric around zero, e.g., −150 and 150, where only the absolute value is coded once. This applies when the acoustic environment or object is symmetric with the respect to that coordinate. Another pattern is when a bounding box coordinate range has width zero, e.g. 150 and 150, where only one value is coded once. This applies when the acoustic object is completely flat.
The third stage acts as a preprocessor, detecting implicitly reduced precision of coordinates, meaning all vertex coordinates are multiples of some integer number, separately on each dimension. For example, if we assume that the quantization precision is set to 1 mm, and that all the X coordinates are actually expressed in multiples of 10 mm, then all the quantized X coordinates, and therefore all the quantized sizes on the X coordinate, are multiples of 10. Moreover, if the coordinates are made relative to the bounding box, complete translation invariance is achieved (e.g., all the sizes may be multiples of 10, but the coordinates may be all shifted with 1). These common multiples, when different from 1, can to be removed from the values to reduce the data range. The common multiples for each coordinate are coded very efficiently, optimizing for some frequently encountered patterns, like for a value of 1 (meaning no common multiple was found) and for a value exactly equal to the corresponding width of the bounding box (meaning there are exactly two different values present for that coordinate). Thus, a cube aligned to the coordinate axes will have, on each coordinate, as common multiple exactly the width of the bounding box of that each coordinate. Therefore, after preprocessing, for each coordinate, only values of 0 and 1 will remain.
The fourth stage takes advantage of geometries where a significant number of vertexes are located on common planes that are parallel with the coordinate axes. For example, if a number of vertexes are located on a plane parallel with the X and Y coordinate axes, this means that all the Z coordinate values of those vertexes are identical. The way to take advantage of repeating coordinate values, separately on each axis, is to code each coordinate value either as an index into a list of previously coded unique values, or as a new value coded explicitly, which will them be added to the list of previously coded unique values. Indicating which of the two ways a value is actually coded requires a “mask” bit, separately for each coordinate. These 3 “mask” bits per vertex are optimally coded using an adaptive binary probability estimator. If the number of distinct values for each coordinate is significantly smaller than the number of vertexes, then even with the overhead of the “mask” bits, the coded size is significantly smaller. The encoder can optimally decide, based only on the number of unique values for a coordinate and the number of vertexes if this representation is more efficient than the direct one.
The fifth stage refines the fourth stage, by taking into account and creating a statistical model of the recency information of repeated coordinates, separately on each dimension. The fourth stage would code for a repeating value its index into the list of previously coded unique values, using a uniform distribution. However, a significant proportion of repeating values map to indexes that were used very recently. Introducing a parameter representing the maximum number nr of recent index values to be remembered, a statistical model is created to code the last nr unique index values used more efficiently than all the others. A Move To Front (MTF) list of nr+1 entries is used to keep track of the values of the most recent nr index values, while the last entry represents all the other indexes. If an index value is found in the MTF list, its position in the list is coded, and that index value is moved to the beginning of the MTF list. Otherwise, the position nr in the list is coded, indicating that the index was not used recently, followed by the uniform coding of the index value itself. The positions in the MTF list are coded using an adaptive probability estimator, to match optimally the relative recency distribution. Increasing nr improves coding efficiency, however a small value of 8 for nr already achieves close to optimal results, allowing for a low complexity implementation.
It is to be mentioned here that all alternatives or aspects as discussed before and all aspects as defined by independent claims in the following claims can be used individually, i.e., without any other alternative or object than the contemplated alternative, object or independent claim. However, in other embodiments, two or more of the alternatives or the aspects or the independent claims can be combined with each other and, in other embodiments, all aspects, or alternatives and all independent claims can be combined to each other. An inventively encoded signal can be stored on a digital storage medium or a non-transitory storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
21176345.3 | May 2021 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2022/064327, filed May 25, 2022, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 21176345.3, filed May 27, 2021, which is also incorporated herein by reference in its entirety. There are disclosed apparatus and methods for encoding and decoding of acoustic environment.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2022/064327 | May 2022 | US |
Child | 18515502 | US |