The present application is concerned with transform block coding as applicable, for instance, in image and video coding.
Entropy coding is a crucial step in every compression algorithm. By using variable length code words, it eliminates redundancy contained in a sequence of symbols, such that the data can be represented with less bits per symbol compared to a fixed length code word representation without introducing any losses.
Once the probability distribution of a symbol source is known, generation of a nearly optimal code word representation is a solved problem. However, such implementations come along with a complexity that is too large for applications like mezzanine compression in video over IP applications.
In the following, let us consider a compression of two-dimensional images that have been processed with a frequency transform as illustrated in
The coefficients resulting from the frequency transform are arranged in blocks of a given width and height and those blocks of coefficients are entropy encoded. Each block contains coefficients belonging to different frequencies. In case of a 2D-DCT (discrete cosine transform), block building is trivial, since the output of the 2D-DCT is already a block of coefficients belonging to different horizontal and vertical frequencies. In case of a wavelet transform, a block groups a different number of coefficients from different subbands as illustrated in
Such a block is then entropy coded for a more efficient representation without information theoretical redundancy. Since the result needs to obey rate and latency constraints, a rate control allocates an admissible bit budget to each block. The entropy coder then tries to include as many bit planes as possible for each block, given the rate constraint, while starting with the most significant bit plane (MSB).
Entropy coding is a very well-studied field of research having brought up many different solutions, such as Huffman coding [9], Arithmetic coding or binary arithmetic coding [10]. All of them have in common to rely on constant or even dynamic image statistics and are hence very complex to implement in hardware like FPGAs.
Other algorithms like Golomb [11], Elias-Delta-Coding [12] etc. operate on single coefficients and assume certain statistics like geometric distribution. Unfortunately, in reality such statistics are not exactly met, which causes a possibly tremendous loss in coding efficiency.
Interestingly, these negative impacts can be diminished by not only considering one coefficient at a time, but several of them [5][6][7][8].
Reference [7], for instance groups four coefficients belonging to the same frequency, and splits them into two parts:
The leading zero bit planes can be efficiently represented by counting their number and by performing entropy encoding of this value. A prediction of the leading zero bit plane count results in a roughly geometric distribution that can be easily encoded using a unary code. Given that this unary code is only generated for every 4th coefficient, a mismatch in the statistics is not as critical as if we encode individual coefficients. Moreover, given that the possible value range for the group is limited, and given that neighboring coefficients are expected to have similar values, a uniform distribution of the remainder is a good approximation, at least when the number of non-zero bit planes is small.
However, in case the coefficients get larger, the approximation works less well. Moreover, redundancy between different frequencies is not taken into account.
Summarizing the above, there are many picture and video codecs differentiating, for instance, in a different waking up between computational complexity for an encoder and decoder on the one hand and compression rate on the other hand. It would be favorable to have a picture and/or video codec at hand which allows for low complexity coding/decoding at a nevertheless acceptable compression ratio.
Thus, the object of the present invention is to provide a concept for coding a transform block which allows for lower computational or implementation complexity at a certain compression ratio or alternatively, higher compression ratio at a certain computational complexity.
An embodiment may have an encoder for encoding a transform block into a data stream, configured to code first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and code second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the encoder is configured to code the first magnitude bits into the data stream prior to, and in a non-interleaved manner relative to, the second magnitude bits.
Another embodiment may have an encoder for encoding a transform block into a data stream, configured to code first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and code second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the encoder is further configured to determine a first predetermined bit plane among bit planes of the spectral coefficients with signaling information revealing the first predetermined bit plane in the data stream, or performing the determination uniquely depending on an amount of data consumed in the data stream by a previously coded portion of the data stream; and identify the second magnitude bits out of the magnitude bits of the spectral coefficients using the first predetermined bit plane.
Another embodiment may have an encoder for encoding a transform block into a data stream, configured to code first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and code second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the encoder is further configured to signal in the data stream information on a second predetermined bit plane for a transform block group to which the transform block belongs, and which represents spectral decompositions of a group of different portions of a two-dimensional image, and restrict the coding of the first magnitude bits and the coding of the second magnitude bits to bit planes which are not more significant than the second predetermined bit plane and to spectral coefficients of a first subset of spectral coefficients, and further code a second subset of spectral coefficients, at least having a DC coefficient, into the data stream in a manner taking into account magnitude bits of the second subset of spectral coefficients lying in bit planes more significant than the second predetermined bit plane.
Still another embodiment may have an encoder for encoding a transform block into a data stream, configured to code first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and code second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the encoder is further configured to select a scan order ordering the spectral coefficients out of a plurality of scan orders, use the selected scan order so that the magnitude bits of the spectral coefficients are arranged in the matrix with the spectral coefficients of the transform block ordered along the row direction of the matrix in accordance with the selected scan order, and code a signalization into the data stream which identifies the selected scan order.
Another embodiment may have an image encoder configured to encode a two-dimensional image having a spectral decomposer configured to spectrally decompose portions into which the two-dimensional image is subdivided, into a plurality of transform blocks; and an inventive encoder for coding a transform block of the plurality of transform blocks into a data stream as mentioned above.
Another embodiment may have a decoder for decoding a transform block from a data stream, configured to decode first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decode second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the decoder is configured to decode the first magnitude bits from the data stream prior to, and in a non-interleaved manner relative to, the second magnitude bits.
Another embodiment may have a decoder for decoding a transform block from a data stream, configured to decode first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decode second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the decoder is further configured to determine a first predetermined bit plane among bit planes of the spectral coefficients with deriving the first predetermined bit plane from information signaled in the data stream, or performing the determination uniquely depending on an amount of data consumed in the data stream by a previously decoded portion of the data stream; and identify the second magnitude bits out of the magnitude bits of the spectral coefficients using the first predetermined bit plane.
Another embodiment may have a decoder for decoding a transform block from a data stream, configured to decode first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decode second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the decoder is further configured to derive from a signalization in the data stream a second predetermined bit plane for a transform block group to which the transform block belongs, and which represents spectral decompositions of a group of different portions of a two-dimensional image, and restrict the decoding of the first magnitude bits and the decoding of the second magnitude bits to bit planes which are not more significant than the second predetermined bit plane and to spectral coefficients of a first subset of spectral coefficients, and further decode a second subset of spectral coefficients, at least having a DC coefficient, from the data stream, the further decoding revealing magnitude bits of the second subset of spectral coefficients lying in bit planes more significant than the second predetermined bit plane.
Still another embodiment may have a decoder for decoding a transform block from a data stream, configured to decode first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decode second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the decoder is further configured to decode a signalization from the data stream which identifies a selected scan order out of a plurality of scan orders, and use the selected scan order so that the magnitude bits of the spectral coefficients are entered into the matrix with the spectral coefficients of the transform block ordered along the row direction of the matrix in accordance with the selected scan order.
Another embodiment may have an image decoder configured to decode a two-dimensional image, having an inventive decoder for decoding a transform block from a data stream as mentioned above; and a spectral decomposition inverter configured to spectrally compose portions into which the two-dimensional image is subdivided, from a plurality of transform blocks to which the transform block belongs.
According to another embodiment, a method for encoding a transform block into a data stream may have the steps of: coding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and coding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the first magnitude bits are coded into the data stream prior to, and in a non-interleaved manner relative to, the second magnitude bits.
According to another embodiment, a method for encoding a transform block into a data stream may have the steps of: coding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and coding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further has determining a first predetermined bit plane among bit planes of the spectral coefficients with signaling information revealing the first predetermined bit plane in the data stream, or performing the determination uniquely depending on an amount of data consumed in the data stream by a previously coded portion of the data stream; and identifying the second magnitude bits out of the magnitude bits of the spectral coefficients using the first predetermined bit plane.
According to another embodiment, a method for encoding a transform block into a data stream may have the steps of: coding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and coding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further has signaling in the data stream information on a second predetermined bit plane for a transform block group to which the transform block belongs, and which represents spectral decompositions of a group of different portions of a two-dimensional image, and wherein the coding of the first magnitude bits and the coding of the second magnitude bits is restricted to bit planes which are not more significant than the second predetermined bit plane and to spectral coefficients of a first subset of spectral coefficients, and the method further has further coding a second subset of spectral coefficients, at least having a DC coefficient, into the data stream in a manner taking into account magnitude bits of the second subset of spectral coefficients lying in bit planes more significant than the second predetermined bit plane.
According to still another embodiment, a method for encoding a transform block into a data stream may have the steps of: coding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and coding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further has selecting a scan order ordering the spectral coefficients out of a plurality of scan orders, using the selected scan order so that the magnitude bits of the spectral coefficients are arranged in the matrix with the spectral coefficients of the transform block ordered along the row direction of the matrix in accordance with the selected scan order, and coding a signalization into the data stream which identifies the selected scan order.
According to another embodiment, a method for decoding a transform block from a data stream may have the steps of: decoding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decoding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the first magnitude bits are decoded from the data stream prior to, and in a non-interleaved manner relative to, the second magnitude bits.
According to another embodiment, a method for decoding a transform block from a data stream may have the steps of: decoding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decoding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further has determining a first predetermined bit plane among bit planes of the spectral coefficients with deriving the first predetermined bit plane from information signaled in the data stream, or performing the determination uniquely depending on an amount of data consumed in the data stream by a previously decoded portion of the data stream; and identifying the second magnitude bits out of the magnitude bits of the spectral coefficients using the first predetermined bit plane.
According to another embodiment, a method for decoding a transform block from a data stream may have the steps of: decoding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decoding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further has deriving from a signalization in the data stream a second predetermined bit plane for a transform block group to which the transform block belongs, and which represents spectral decompositions of a group of different portions of a two-dimensional image, and the decoding of the first magnitude bits and the decoding of the second magnitude bits to bit planes which are not more significant than the second predetermined bit plane and to spectral coefficients of a first subset of spectral coefficients, and the method further has decoding a second subset of spectral coefficients, at least having a DC coefficient, from the data stream, the further decoding revealing magnitude bits of the second subset of spectral coefficients lying in bit planes more significant than the second predetermined bit plane.
According to still another embodiment, a method for decoding a transform block from a data stream may have the steps of: decoding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decoding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further has decoding a signalization from the data stream which identifies a selected scan order out of a plurality of scan orders, and using the selected scan order so that the magnitude bits of the spectral coefficients are entered into the matrix with the spectral coefficients of the transform block ordered along the row direction of the matrix in accordance with the selected scan order.
Another embodiment may have a non-transitory digital storage medium having stored thereon a computer program for performing a method for encoding a transform block into a data stream, having coding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and coding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the first magnitude bits are coded into the data stream prior to, and in a non-interleaved manner relative to, the second magnitude bits, when said computer program is run by a computer.
Another embodiment may have a non-transitory digital storage medium having stored thereon a computer program for performing a method for decoding a transform block from a data stream, having decoding first magnitude bits of spectral coefficients of the transform block into the data stream, the first magnitude bits forming a profile in a matrix in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, the profile enveloping non-zero magnitude bits of the spectral coefficients in the matrix; and decoding second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the first magnitude bits are decoded from the data stream prior to, and in a non-interleaved manner relative to, the second magnitude bits, when said computer program is run by a computer.
Another embodiment may have a data stream formed by the inventive method for encoding as mentioned above.
An idea underlying the present invention is the fact that transform block coding may be achieved very efficiently in terms of computational complexity and compression ratio, by coding the magnitude bits of the transform coefficients distributed in a matrix, in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix. That is, magnitude bits within a certain column of the matrix belong to a certain spectral coefficient, while magnitude bits within a certain row of the matrix belong to a certain bit plane. In this configuration, the distribution of non-zero magnitude bits may be condensed towards one corner of the matrix, corresponding to, for instance, the least significant bit plane and corresponding to, by using a scan order among the transform coefficients which sorts the transform coefficients generally in a manner from lowest to highest frequency, the lowest frequency. In particular, the idea of the present invention is to code the transform coefficients within such a matrix with separately treating first magnitude bits and second magnitude bits of the spectral coefficients: the first magnitude bits form a profile in the aforementioned matrix which, in turn, forms an envelope of non-zero magnitude bits of the spectral coefficients in the matrix. The second magnitude bits of the spectral coefficients reside in the matrix at a lower significance side of the profile. They refine the profile bits. On the one hand, the separation mirrors the fact that the first magnitude bits which form the profile, are more important or significant with respect to information content, and, on the other hand, the separation concurrently results in an effective way of restricting the further coding, namely the coding of the refining second magnitude bits, to a small fraction of the matrix, i.e. the area at the lower significance side of the profile. That is, the second magnitude bits to be coded merely reside in this area. In accordance with an embodiment, the coding of the first magnitude bits, which form the profile, are coded in a manner traversing the matrix from higher to lower significance bit planes, such as from top to bottom of the matrix, and traversing magnitude bits of a row of the matrix, which corresponds to a currently traversed bit plane, along the row direction, such as from low to high frequencies so that the profile forms a convex hull in the matrix which envelops, at a more significance side, the non-zero magnitude bits of the spectral coefficients. The profile may, for instance, already comprise some of the non-zero magnitude bits of the spectral coefficients, or alternatively speaking the most significant non-zero magnitude bits of at least some spectral coefficients. The profile may be coded, for instance, by the use of certain symbols which may be code words of a variable length code.
In accordance with a first aspect of the present application, the just-outlined concept of separating the coding of the magnitude bits of the spectral coefficients into a coding of the profile on the one hand and a coding of the lower significance magnitude bits on the other hand, is exploited so as to render the coding and decoding low in computational complexity by coding the first magnitude bits, which form the profile, into the data stream prior to, and in a non-interleaved manner relative to, the second magnitude bits. By this measure, the data stream reveals information on a data amount available in the data stream for coding the second magnitude bits or, the other way around, the data amount having already been allocated to, and consumed by, the coding of the first magnitude bits, which form the profile, at the earliest opportunity possible, i.e. prior to the coding of the second magnitude bits. By this measure, the encoder and decoder may both determine which of the magnitude bits residing in the matrix at a lower significance side of the profile are to be coded as the second magnitude bits of the spectral coefficients after the first magnitude bits, i.e. to identify the second magnitude bits, and in which order. In turn, this the enables encoder and decoder to perform the coding of the second magnitude bits into a data stream in a manner such that the second magnitude bits belonging to different spectral coefficients are not interleaved. That is, the second magnitude bits may be coded into the data stream spectral-coefficient-wise, thereby reducing implementation complexity by avoiding revisiting the coefficients too often. Even any interleaving between second magnitude bits of spectral coefficients of different transform blocks of a transform block group to which the transform block belongs and for which a certain amount of coding rate has been reserved commonly, may be avoided. In turn, this alleviates the implementation complexity, or differently speaking computational complexity, on the encoder and decoder sides.
In accordance with a second aspect of the present application, the above-outlined concept of separating the coding of the magnitude bits of the spectral coefficients into profile on the one hand and lower significance magnitude bits beneath the profile on the other hand, is exploited and made less complex in terms of computational complexity by designing the encoder and decoder in a manner such that both determine a predetermined bit plane using which the second magnitude bits may be identified out of the magnitude bits of the spectral coefficients. Together, the predetermined bit plane and the profile result in an exact identification of a portion of the matrix within which the second magnitude bits reside. That is, the second magnitude bits lying within this fraction of the matrix may be conveyed within the data stream in an advantageous manner such as, for instance, spectral-coefficient-wise without interleaving of second magnitude bits belonging to different spectral coefficients. In other words, the predetermined bit plane may determine a least significant completely coded bit plane so that the second magnitude bits are identified to cover at least all those magnitude bits lying at the lower significance side of the profile and being above, or being above or within, the predetermined bit plane. A next lower significant bit plane may be partially populated with second magnitude bits in order to consume a remaining data amount in the data stream reserved for, or allocated to, the transform block. The encoder and decoder may identify second magnitude bits within this fractional bit plane in a manner so that the same second magnitude bits within this fractional bit plane are identified. However, as this process pertains to merely one bit plane, the computational complexity associated with this identification may be kept low. In other words, the predetermined bit plane determination may be coupled to a rate control and this rate control may involve a signalization of respective information revealing the first predetermined bit plane within the data stream in the form of, for example, an explicitly signaled quantization step size, or the rate control may be performed on encoder and decoder sides depending on a data amount consumed by of a previous decoded portion of the data stream such as, for instance, by surveying a buffer fullness of a virtual or real buffer at encoder and decoder sides.
In accordance with a third aspect of the present application, the above-outlined concept of separating the coding of the magnitude bits of the spectral coefficients into coding of the profile on the one hand and coding of the enveloped magnitude bits on the other hand is made more efficient in terms of coding efficiency by signaling in the data stream from encoder to decoder a second predetermined bit plane for a transform block group to which the transform block belongs, and which represents spectral decompositions of a group of different portions of a two-dimensional image. The second predetermined bit plane is used to restrict the coding of the profile, i.e. the coding of the first magnitude bits which form the profile, and the coding of the second magnitude bits. In particular, the coding of the first and second magnitude bits is restricted to bit planes which are not more significant than the second predetermined bit plane, and to a first subset of spectral coefficients such as all AC spectral coefficients. The second predetermined bit plane signaled in the data stream is set to be high enough, for instance, to cover all non-zero magnitude bits of the subset of spectral coefficients such as all non-DC or AC spectral coefficients, i.e. so that the profile envelops at least the latter non-zero spectral coefficients' magnitude bits. With respect to another, disjoint subset of spectral coefficients which may, for example, merely comprise the DC coefficient, the coding of the coefficient value may be performed separately in a manner considering also magnitude bits thereof lying in bit planes more significant than the second predetermined bit plane. By this measure, the third aspect of the present application takes into account that the statistical spread of the values of DC or lower frequency spectral coefficients may be considerably larger than the spread of the remaining, possibly higher frequency, spectral coefficients in terms of the most significant bit plane populated by any of the non-zero magnitude bits of the respective spectral coefficients. Accordingly, treating transform blocks of the transform block group together in order to allow for a mutual exchange of available code rate among the transform coefficients, may be made more efficient by treating the DC spectral coefficient separately. In order to increase coding efficiency, spatial prediction and/or variable length coding may be used in order to increase the coding efficiency with respect to the coding of the values of the separately coded subset of spectral coefficients.
In accordance with a fourth aspect of the present application, the concept outlined above of separately coding profile magnitude bits and magnitude bits enveloped by the profile is made more efficient in terms of coding efficiency by selecting a scan order ordering the spectral coefficients out of a plurality of scan orders at the encoder side, signaling the selection to the decoder, and using at the encoder and decoder the selected scan order so that the magnitude bits of the spectral coefficients are arranged in the matrix with the spectral coefficients of the transform block ordered along the row direction of the matrix in accordance with the selected scan order. The encoder may perform the selection in a brute force manner, i.e. by testing out all possible scan orders of the plurality of scan orders and using the one leading to the best compression in terms of, for instance, rate/distortion ratio, or may select the scan order heuristically, such as by computing a measure for the transform block's distribution of the spectral coefficients' energy towards low frequencies with respect a first spatial direction, or low frequencies with respect to a second spatial direction such as x- and y-direction. Although the signalization overhead seems to decrease the coding efficiency, the opposite is the case: by providing the encoder with the opportunity to select the most efficient scan order, the aforementioned condensing of the distribution of non-zero magnitude bits may be improved, thereby also increasing the coding efficiency of coding the magnitude bits using profile and magnitude bits enveloped by the profile.
The aspects outlined above may be combined individually or all together.
Embodiments of the present application are described below with respect to the figures, among which:
the shading of the samples may correspond, for instance, to a one pixel value, whereas unshaded ones may correspond to a zero pixel value;
The following description of embodiments of the present application starts with a motivation of the underlying separated coding of profile magnitude bits and magnitude bits enveloped by the profile. Subsequently, the description involved with this motivation section is extended in three different ways with many variants also being presented. The four different aspects already outlined above are used in respective ones of the presented modified concepts. Finally, embodiments for the encoder and decoder are described, which use the separated coding of profile magnitude bits and magnitude bits enveloped by the profile and for each of the four aspects presented above, another implementation of the encoder and decoder embodiments is described with trying to reference sections of the description of the three modified concepts.
1. Coding Strategy
Frequency transforms used for compression show an energy compaction property. This means that the relevant information is concentrated in a few frequency coefficients, while the other frequency coefficients have only low amplitudes.
This property is exploited by the coding scheme embodiments outlined below. To this end, the frequency coefficients of a transform block as described with respect to Figs. A and B are ordered in such a way that in mean they have decaying amplitude. Since the ordering needs to be known by the decoder for proper decoding, the ordering scheme is fixed, or a limited number of orderings may be chosen as explained later on in Section 1.10. Please note that while for a single block frequency coefficients might not be decaying, this property holds when considering the entirety of blocks occurring in typical images.
1.1 Block Profile
(∀a≤k∧∀j<i:faf=0)∧(∃a≤k:fai=1).
In other words, the profile forms the convex hull of the most significant ‘1’-bits. All bits that are situated in a bit plane below the profile bits are called refinement bits, since they gradually improve the precision of the coefficient value.
With this definition, a block can be entropy encoded by efficiently describing the profile, and copying the refinement bits to the output bit stream without any modifications.
1.2 Definitions
Let fki be the i-th bit of frequency coefficient fk, fk0 being the LSB. Then the following definitions hold:
1.3 Profile Encoding
In order to describe the form of the profile, the following symbols are used when encoding a bit fki being part of the profile, where ZR is a so called zero run state:
As illustrated in
Encoding starts at the most significant bit plane with zero run disabled. In case no significant bit will appear any more in the current bit plane, and EOP symbol is sent, and encoding continues with the next lower bit plane of the same frequency coefficient. Otherwise, the profile bit is encoded in different ways depending the zero run mode is enabled or not. If zero run-mode is enabled, simply the bit value is output. Otherwise, two bits are used to encode the bit value.
1.4 Profile decoding
In order to be able to decode the profile, the decoder also needs to track a zero run state. In case ZR=1, the decoder simply needs to read one bit in order to determine the value of the next. Otherwise (ZR=0), a two bit symbol needs to be read, possibly prepended by a string of zero-bits with variable length.
1.5 Advantages of the concept described so far are that all (or nearly all) dependencies are within a block, thereby increasing the possible parallelism. No grouping of coefficients is used, thereby reducing possible waste of bits. No negative impact results in case neighboring blocks have very different statistics. However, care should be taken to avoid waste of bits by a hull of a block which is not decaying. Coefficients should be brought into a favorable order.
1.6 Strategies for Encoding the Sign Bits
Besides the profile and the data bits, also the sign bits need to be signaled to the decoder. For this end, two different strategies are possible:
1.7 Controlling the Granularity of Parallelism
1.7.1 General Concept
Given that entropy coding exploits redundancy within a block, but not between blocks (except for possibly the DC coefficient when doing DC prediction), parallel encoding is rather straight forward. Parallel decoding, on the other hand is much more difficult, since the decoder needs to know where a block starts in the bit stream. Two strategies are possible to make this possible:
Both strategies decrease the coding efficiency:
Consequently, it is important to control the granularity of parallelism. This is done by the concept of a workgroup. A workgroup is a set of blocks that can only be decoded sequentially. The start of a workgroup, however, can be determined by the decoder without decoding previous workgroups in order to allow sufficient decoder parallelism.
1.7.2 Handling of Color Components
For color images, a pixel consists of three (or even more) samples. These samples are typically processed in parallel by the frequency transform, such that in principle it is also possible to perform entropy encoding and decoding in parallel. Then, however, the entropy decoder needs to be capable to locate the individual color components in the codestream. This means that they need to be placed in different working groups.
Alternatively, blocks belonging to the same spatial region, but different color components can be placed into the same working group. While this simplifies rate control in some cases (see Section 2.3), it increases the memory needed, since then more spatial regions need to be processed in parallel.
Color subsampling is supported by allowing only an even number of blocks for the luminance channel within a workgroup. For each of the chroma channels, only half the number of blocks makes part of the working group.
1.8 Encoding the Profile of the DC Coefficients 1.8.1 Problem Formulation
Frequency transforms, being it a 2D-DCT, or a wavelet transform, have the property to condense the signal energy into the low pass coefficients. Consequently, the DC coefficients have typically much larger values than the other frequency coefficients. This is illustrated in
In order to allow for proper interpretation, the decoder first of all needs to know the number of zero bit planes. Given that the number of zero bit planes is expected to be similar for neighboring blocks, rate can be saved by signaling this quantity on a workgroup level instead of a block level. This means that the maximum DC coefficient determines the number of bits to transmit for all other blocks (of the same color component) as illustrated in
The next step consists in signaling the profile of the blocks. However, usage of the principles from Section 1.3 would be inefficient, since for every block there is a significant drop in the profile between the DC coefficients and the first AC coefficient. Given that one output bit is used to signal a drop of the profile by one bit plane, this is again not very efficient. Instead, a more efficient representation within the workgroup may be used as outlined in the following.
1.8.2 Proposed Coding Mode
Given that the statistics for the different color channels can be quite different, redundancy in signaling the profile can only be reasonably exploited within a color channel. In case a color channel in a workgroup only contains one block, there is no redundancy that can be exploited. Consequently, normal profile coding as described in section 1.3 can be applied.
This means that in the following we can assume that the workgroup contains at least two blocks for the considered color component. Moreover, our strategy is to only output one variable length code per coefficient, since generation and decoding of VLC codes is relatively expensive in hardware due to the barrel shifters needed (see also Section 4.7).
Let fai(b, c) be the i-th bit of frequency coefficient fa in block b and color component c of the current workgroup. fai=0(b, c) shall be the least significant bit, and f0(b, c) the DC coefficient. Then the coding algorithm is as follows
1.8.3 Alternative Coding Mode
An alternative way for encoding the profile of the DC coefficients is described in the following. It differs from Section 1.8.3 basically in the order by which different parts are signaled to the decoder:
The drawback of this approach is that the decoder needs to decode two variable length codes before it can decode the first DC coefficient. This, however, might have negative consequences on hardware implementations (see Section 3.8.1).
1.9 Scan Pattern Optimization
1.9.1 Problem Formulation
As obvious from Section 1.1, the faster the profile is decaying, the more efficiently a code block can be represented. The form of the profile depends among others on the order by which the frequencies are visited during entropy coding. Thus, the scanning order also determines how fast a code block is decaying.
To this end, consider two extreme cases illustrated in
The first block consists of a vertical edge. Consequently, a vertical frequency transform can ideally compact the energy into the lowpass coefficients. A subsequent horizontal transform will then show many high frequency coefficients for the vertical low pass coefficients, while the other ones are zero. Similarly, for a horizontal edge, the horizontal transform can condense the signal into the low pass coefficients, such that a subsequent vertical transform will only show large frequencies for the horizontal low pass coefficients.
Consequently, coding efficiency can be significantly improved by defining a set of possible scanning orders, dynamically select them on the encoder side and inform the decoder about this decision. Then both encoder and decoder can use the best possible scanning order in order to reduce the number of bits used to describe a transform block.
1.9.2 Brute Force Selection of the Scanning Order
Let's suppose that we have defined a set of possible scanning orders as motivated in Section 1.9. Then the encoder can obviously perform entropy encoding with each possible scanning order and select that one with the smallest number of bits needed, or the smallest distortion.
The first criterion is only applicable if the entropy coder performs lossless encoding of possibly pre-quantized frequency coefficients. For the more typical scenario of a rate constraint compression as illustrated in
This can be approximated by selecting the scanning order that includes the largest number of bit planes from a code block for a given bit budget. In case the number of completely included bit planes is the same, the decision needs to be performed based on the included fractional bit planes. A fractional bit plane is a bit plane of a transform block that is only included partially in the codestream.
Given that the typical zig-zag order weights the frequencies accordingly to their (visual) importance, one strategy could consist in selecting that scanning order where the number of contiguously encoded coefficients in the zig-zag scanning order is the largest one. Please note that this does not necessarily select the zig-zag scanning order itself, since it might abort encoding much earlier than a more optimized scanning order.
1.9.3 Heuristic for Selection of the Scanning Order
While brute force selection of the scanning order delivers good image quality, it is typically too complex to be applied in hardware implementations with small footprint. To solve this problem, a measure can be used by the encoder that can select between, e.g. the zig-zag, the predominantly horizontal and the predominantly vertical scanning mode illustrated in
If, for instance, the measure is greater than a first certain amount, then the predominantly vertical scanning order shown in
If the measure is, for instance, smaller than a second certain amount (smaller than the first), then the predominantly horizontal scanning order shown in
Otherwise, the traditional zig-zag scanning order may be applied.
1.10 Summary
Section 1 has explained some concepts for subsequently exemplified coding schemes. In a nutshell, it exploits that frequency transform coefficients typically show decaying amplitudes. To this end, the profile, being the convex hull of the frequency coefficients is encoded, such that bits lying “outside” of the convex hull do not need to be signaled to the decoder.
Sections 2-4 will now describe three variants of this entropy coding scheme.
2. Bit Plane Based Embodiment
Profile-based entropy coding as explained in Section 1 is most easily applied when encoding the frequency coefficients bit plane by bit plane as done, for instance, in [2]. The following coding scheme renders plane-by-plane based magnitude bit coding more efficient and/or less complex to implement by paying special attention to:
In more detail, encoding is performed as follows:
2.1 Bit Plane Coding for a Single Block
In order to explain the bit plane based mode, let's consider the example block illustrated in
Zero bit planes (marked with an orange shading), and DC overhang bits (marked with a gray shading) are handled in a special fashion as described in Section 1.8. Consequently, they shall not be further considered here. Hence, the first bit to encode is bit 8 of frequency f0. Since f0 is already significant in bit plane 8, f08 can be simply written into the codestream. Given that all other coefficients fk are zero in bit plane 8, only an EOP-symbol will be sent before moving to the next bit plane.
In the next bit plane 7, coefficient f0 is refined again by outputting f07 into the codestream. Next, profile encoding is performed for coefficients f1-f4 as explained in Section 1.3. Since coefficient f4 gets significant, also its sign bit needs to be output. Finally the bit plane is terminated with an EOP symbol.
In bit plane 6, first coefficients f0-f4 are refined by putting bits f06-f46 into the codestream. Then profile encoding is continued as explained in Section 1.3. For coefficients f6 and f9, a sign bit needs to be added as well. After profile encoding, the bit plane is terminated with an EOP symbol.
For bit plane 5, bits f05-f95 are simply output to the codestream, followed by an EOP symbol. Bit plane 6 is handled in a similar way.
When refining bit plane 3 by putting the bits f03-f93 into the codestream, the sign bits for coefficients f5 and f8 need to be added as well, since both coefficients are getting significant.
2.2 Termination of Encoding and Decoding
Depending on the rate control strategy, different possibilities exist to decide when to stop encoding and decoding.
2.2.1 Bit Plane Based Coding Termination
For the bit plane based coding termination, both encoder and decoder agree on a number of bit planes that should not be transmitted. The number of dropped bit planes can also be expressed by a quantization factor. It can be computed by the encoder and signaled to the decoder on different levels of granularity, such as blocks, workgroups, lines, slices, or even images. Alternatively, the quantization factor can be computed on both the encoder and the decoder side.
In case the codestream layout imposes that the bit stream for a block, workgroup or slice terminates on a byte or word boundary, and the bits generated to include the desired bit plane would leave some spare bits, they can be simply used to refine the next bit plane for a small number of coefficients.
The decoder can detect this situation and is free to decode them or simply skip them for reasons of complexity.
2.2.2 Bit Budget based Coding Termination
In this case, coding termination is not controlled by a target bit plane, but by a target bit rate. This target bit rate can be a constant for the image, or it can be signaled by the encoder to the decoder on different levels of granularity, such as blocks, workgroups, lines or slices. Alternatively, both the encoder and the decoder can compute them in the same manner.
As a result, both the encoder and the decoder know how many bits are available to encode a certain block (or workgroup, see Section 2.3). Consequently, both of them simply stop when this target bit rate is met, which is very simply due to the bit plane based entropy encoding.
2.3 Bit Plane Coding for a Workgroup
Section 2.1 assumed that only a single transform block needs to be encoded or decoded. However, when combined with the bit budget based coding termination, this causes that the target rate needs to be met on a granularity of a block. Given, however, that some blocks might be more difficult to encode than other ones, this would result in low PSNR values, since the difficult block cannot be represented with enough precision.
Consequently, Section 2.3.1 explains how this challenge can be addressed. Then Section 2.3.2 discusses the situation for a bit plane based coding termination.
2.3.1 Bit Budget Based Coding Termination
As explained in Section 2.2.2, encoding and decoding shall be stopped when reaching a certain target bit budget. In the following, we assume that this budget is not given for a single block, but for a complete workgroup. By these means, difficult blocks in the workgroup can consume more bits than simple blocks, leading to an increased overall PSNR value.
Then, however, the question arises, how to distribute the bit budget over the different blocks. This question can be solved by interleaving encoding and decoding of the different blocks. In more detail, bit plane i is encoded in all blocks before moving to bit plane i−1 in any of the blocks. Within a bit plane i, first frequency coefficient fki is encoded for all blocks, before moving to frequency fa′ia>k in any block.
By these means, the bits are implicitly prioritized and the encoder continues coding the workgroup until the bit budget is exceeded.
7.3.2 Bit Plane based Coding Termination
For the bit plane based coding termination, the distribution of the rate between the different blocks within a workgroup is implicitly defined. Consequently, no interleaving of the different blocks is necessary. The only exception is the filling to the next byte or word boundary.
Here, ideally an interleaving of the frequency coefficients from the different blocks should happen. However, for complexity reasons, it is also possible to simply refine the blocks in sequential order.
2.4 Pseudo Code
The pseudo code of
2.4.1 Definition of Functions
bitplane(b,coeff): Returns bitplane b of coeff (in sign-magnitude representation) sign(coeff): Returns sign-bit of coeff
2.4.2 Encoding
The encoding is illustrated in the pseudo code of
3. Coefficient Based Embodiment with Two Passes
3.1 Problem Formulation
Low complexity compression as addressed by this invention typically targets compression ratios between 1:2 and 1:6.
This results in varying target compression rates in bits-per-component as shown by Table 2, ranging from 6 bpc (bits per component) up to 1.33 bpc. A worst cases analysis assuming that an input bit can be represented by a single output bit (like in refinement coding) then leads to the conclusion that a sample needs to be revisited in mean between 6 and 1.33 times when using the bit plane based embodiment of Section 2.
In order to evaluate the impact of this fact, we need to consider in addition the target resolution of 4 k60 fps. This results in a pixel frequency of 4096·2160·60≈531 MHz. Assuming a processing clock of 150-200 Mhz for the entropy coder, this means processing 3-4 pixels or 9-12 samples in parallel. Given that a sample needs to be revisited up to 6 times, this ends up with up to 54-72 instances for the entropy coder. Given that workgroups can only be processed sequentially, 54-72 workgroups need to be buffered.
Table 3 illustrates the resulting memory requirements, expressed in component rows. For instance, a value of 1.5 means that the buffer needs to be sufficiently large to hold 1.5 image rows with 3 components each. Please note that this memory has to be provided in addition to the memory used to perform the frequency transform.
Table 4 illustrates the number of parallel accesses to this memory. They need to be compared to the number of RAM modules used without any parallel access. Assuming a memory block size of approximately 16 kbits [3] as available in modern FPGAs, and 14 bits per DCT coefficient for 8 bit input images (including the gain of the color and the frequency transform), a single component row uses approximately 4 memory blocks. For all components, 12 memory blocks are thus sufficient, which is slightly less than the parallelism needed.
8 bits
10 bits
12 bits
Consequently, for better hardware efficiency the throughput per entropy encoder needs be improved, which is subject to the next subsections.
3.2 General Strategy
From Table 2 it gets obvious that the bit plane based approach is particularly problematic for low compression ratios, since a coefficient needs to be revisited multiple times. As however a 1:2 compression leads to particularly good image quality, which allows much more subsequent editing operations than a 1:4 or even 1:6 compression, having a codec that also supports these operation points is very important.
This can be achieved by switching from the bit plane to a coefficient based approach. The goal is to ideally consider each coefficient only one time. Unfortunately, this is very difficult to achieve. However, at least the number of times a coefficient needs to be revisited can be bounded and made independent of the compression ratio.
The following subsections will describe a corresponding approach that can be used for bit budget based rate control as explained in Section 2.3.1. Section 4 then details an approach that is better suited for quantization based rate control as explained in Section 2.3.2.
3.3 Solution Concept for a Single Block
In the proposed solution, entropy coding of a block is performed in two passes. In the first pass, only the profile of a block is encoded without emitting any refinement bits (bits without shading in
For a more detailed explanation, let's reconsider the block shown in
Next, profile coding continues in bit plane 7 by considering frequency coefficients f1-f4 in the same way as explained in Section 1.3. Then, bit plane 7 is terminated by means of an EOP symbol.
The encoder continues this operation until the target bit budget which it is aware of is exceeded. To this end, it is important to notice that the encoder can easily determine the number of refinement bits used when terminating a bit plane. For instance, when terminating bit plane number 8, the encoder knows that in bit plane 7, only frequency coefficient f0 needs to be refined. Hence, before starting to encode the profile in bit plane 7, the encoder needs to allocate one bit for f07, although the latter is not directly output to the codestream.
Similarly, terminating bit plane 7, the encoder needs to allocate refinement bits for coefficients f0-f6 before starting to encode the profile of bit plane 6.
By these means, while encoding the profile, the encoder can track how many bits will be used including the refinement bits needed. Consequently, it can simply stop profile encoding, when the target bit budget is exceeded. Next, it simply outputs for every coefficient the number of refinement bits needed.
Overall, every coefficient needs to be revisited twice at most when assuming, that for every bit plane the maximum contributing frequency is computed beforehand as illustrated in
Since each time a coefficient is visited at least one bit is output, for lower compression ratios, not all coefficients need to be revisited twice. For instance, when having an 8-bit input image and a compression ratio of 1:6, in the worst case each sample needs to be visited 1.33 times in mean. Consequently, encoding will be done much faster than for the bit plane based approach, because when refining a coefficient, much more bits can be output.
In order to interpret the codestream, the decoder decodes the profile and tracks the number of refinement bits needed in the same way than the encoder did. Moreover, it is also aware of the available bit budget for the current block. Consequently, the decoder knows when to stop profile decoding and interprets the remaining bits as refinement bits.
3.4 Sign Bit
The attentive reader might have observed that Section 3.3 did not discuss the sign bits. Ideally, a sign bit is only transmitted when the quantized coefficient is unequal zero. While for the encoder it is easily possible to determine exactly, when a sign bit needs to be emitted, the decoder cannot derive this information when decoding the profile, since the refinement bits will follow much later. This, however, is problematic, since during profile coding and decoding, also decision about coding abortion is performed, and this decision needs to be identical on encoder and decoder side.
In order to solve this problem, there exist two different strategies:
1. Whenever a coefficient is included in the profile, also allocate a sign bit. By these means, a corresponding worst case analysis is performed, since some of the coefficients will never get significant, and hence theoretically do not need any sign bit.
2. When performing profile coding, only allocate a sign bit for the coefficients whose profile bit is one. Allocate a sign bit for the other coefficients when they are first refined, independent what the refinement value is. Please note that this option is much more difficult to implement in hardware.
When doing the actual refinement coding, this sign bit might then indeed be emitted for every coefficient. Alternatively, the sign bit can be omitted when the quantized coefficient is zero. Then the block will not completely consume the available bit budget. Depending on the rate control, this bit budget can be allocated to the next block, or it can be used for refining some more coefficients. However, such an approach will significantly increase complexity.
Please note that the sign bit can be easily omitted for coefficients where already from profile coding it is obvious that the value will be zero (coefficients f13-f15 in
3.5 Fractional Bit Planes
Assuming that every coefficient will have an associated sign bit (except when profile coding already signals them as zero), both encoder and decoder can predict exactly how many bits will be needed for both profile and refinement coding. This allows generating so called fractional bit planes.
Let's consider again
3.6 Encoding a Workgroup
The concept described so far assumed encoding of a single block. Consequently, the rate constraint needs to hold for a single block. As already discussed in Section 2.3, this might lead to reduced image quality in case not all blocks are of the same complexity.
Instead, the rate is better distributed within a workgroup. This can be achieved as described in the following.
3.6.1 Interleaving of Frequency Coefficients
In accordance with Section 2.3.1, the bit budget can be distributed between blocks by interleaving the frequencies of different blocks. To this end, the profile is encoded bit plane by bit plane. Moreover, within a bit plane, the profile is encoded frequency by frequency instead of block:
Only when all blocks of a workgroup have terminated a bit plane bp, the corresponding refinement bits are allocated. In case the bit budget is not sufficient to refine all blocks completely, the available bit budget needs to be distributed among the blocks. In this case, the rate control ideally solves the following problem: Given a workgroup consisting of n blocks with m frequencies, and the profile has been encoded till bit plane b for all blocks. Then find two numbers n1 ϵ [1, n] and 0≤k<m, such that for n1 blocks all frequencies f1≤fk having been included in the coded profile are refined until bit plane b−1 and for n−n1 blocks, all frequencies f1≤fk−1 having been included in the coded profile are refined only till bit plane b.
Since this causes a certain complexity for hardware implementation, an alternative solution consists in attributing the refinement bits in a first-come-first-served basis. Start with the first block and allocate the refinement bits needed. Then continue with the next block until the bit budget is exceeded.
3.6.2 Encoding without Frequency Interleaving
Interleaving the frequencies as explained in Section 3.6.1 is rather complex to implement in hardware. The reason is that the hardware implementation needs to process one sample per clock. However, finding the block and frequency to process next is not obvious.
In order to solve this challenge, it is important to notice that the unique reason for interleaving the frequencies of the different blocks are a precise rate control. In fact, all blocks shall a priori be refined until the same bit plane, and the remaining bit budget should then be attributed to the “important” coefficients. In this context, the lower is the frequency, the more important is a coefficient. Moreover, coefficients that are signaled as zero by the profile coding should obviously be excluded from the refinement.
However, if this constraint is relaxed, it is also possible to process all frequencies of a block in a given bit plane before switching to the next bit plane:
Such an organization has the huge benefit of allowing high performance implementations as explained in Section 4.7.
3.7 Pseudo Code for the Encoder
The concepts described in the previous sections are clarified in the following by corresponding pseudo code assuming frequency interleaving. Moreover, it uses a the most simple sign bit allocation in that every coefficient being included into the profile coding will contain a sign bit (see also Section 3.4). In addition, refinements bits are allocated in a first-come-first-served method as described in Section 3.6.1. For reasons of simplicity, the special handling of DC coefficients is excluded from the pseudo code.
3.7.1 Input
3.7.2 Variables
3.7.3 Encode Working Group
The pseudo code is depicted in
3.8 Pseudo Code for the Decoder
3.8.1 Input
3.8.2 Definitions
3.8.3 Variables
3.8.4 Decode Working Group
The pseudo code for the decoder is depicted in
4. Coefficient Based Embodiment with one Pass
4.1 Problem Formulation
While the bit plane based embodiment of Section 2 delivers good coding performance, it suffers from limited performance, in particular for small compression ratios like 1:2. This is solved by the coefficient based approach from Section 3, in that the maximum number of times a coefficient needs to be revisited is bounded to two.
Unfortunately, the control flow is still rather complex due to frequency interleaving as discussed in Section 3.6.1. But even when avoiding frequency interleaving as described in Section 3.6.2, profile coding needs to interleave between blocks, since profile encoding needs to be in descending bit plane order. Finally, rate control is combined with profile coding, which complicates flexible rate control schemes that distribute rate between different workgroups. In addition, control flow within the entropy coder is getting more complex.
In the following an alternative approach is presented that clearly distinguishes between rate control and entropy coding. The actual entropy coding is performed in one single pass. The rate control beforehand, on the other hand can easily be run in parallel, simplifying thus the hardware implementation.
3.2 Solution Concept
Instead of a bit budget driven termination of the coding as discussed in Section 3, a bit plane based coding termination is applied. In other words, the decoder is informed about the performed quantization in terms of number of dropped bit planes. Signaling of the quantization factor can be achieved in the same manner than described in [4].
The number of dropped bit planes needs to be computed by the rate control. This rate control can work on a block or workgroup level. Alternatively, it can even consider multiple working groups, such as a complete line of working groups. In the following, we use the term rate control group to define the set of coefficients on which the rate control is operating. Given that rate control is quite simple (see Section 4.3) it can be run in parallel for all blocks within a rate control group. Hence, a single rate control block can maintain a high throughput. Moreover, given that the result of the rate control is signaled to the decoder, sign bits can be handled in a precise manner in that sign bits are only emitted when the coefficient value is not zero.
Once the entropy coder knows the desired quantization step size, entropy coding can happen on a coefficient base. First the entropy coder encodes the profile information for the current coefficient as explained in Section 1.3. Then it can immediately output the refinement bits, since the target quantization is already known before moving to the next coefficient.
Please note that the refinement bits can be interleaved with the profile bits, or they can be put into separate FIFOs and then output sequentially. The latter option simplifies execution on GPUs due to better parallelism. Moreover, it allows unequal error protection of the codestream, in that the profile bits are better protected than the data bits, since a bit error in the data bits only have a local impact, while a bit error in the profile bits causes decoding-errors in large pixel areas. On the other hand, slightly larger output buffers are used. In fact, without separating profile and refinement bits, a single entropy coder instance would not need any output buffer. On the other hand, as soon as multiple entropy coder instances are used, an output buffer is needed in any case. The only difference is that for separate output of profile and refinement bits, two instead of one buffer per entropy coder instance are needed. This, however, is not a big problem. For instance, in case throughput considerations allow for, even the same memory block can be used, when profile bits are stored with increasing addresses, starting at memory address zero, and refinement bits are stored with decreasing addresses, starting at the largest possible memory address. Moreover, the number of profile bits is bounded, since only one or two profile bits can be output per coefficient plus the EOP symbols used, which are bounded by the bit depth of the coefficients.
4.3 Fractional Bit Planes
4.3.1 Problem Formulation
With the insights of Section 1.5, the rate control of
Consequently, for all blocks the bits bounded by a bold line are included. Moreover, the quantization factor for both blocks is typically set equal.
By these means, only a very limited set of quantization points need to be checked, making it easy to implement it in hardware and software. However, on the other hand, quantization is rather coarse. Consequently, typically the rate associated to a workgroup cannot be met exactly, limiting the PSNR coding performance.
The following subsections discuss a couple of strategies how to solve this issue.
4.3.2 Propagation of the Rate to the Next Rate Control Unit
Bits that are not used in one rate control group will be passed to the next rate control group.
Advantageously, this is simple to implement.
Disadvantageously, at the end of the image, there might be some bits left that cannot be used, and this propagation concept may not be possible when every block should have a strictly fixed size. Further, this may increase the end-to-end latency of the system, since an additional smoothing buffer is used, and it may limit parallelism on encoder side.
4.3.3 Refinement Based on Budget Signaling or Tracking
The rate control can compute the number of bits that will not be used for a rate control group.
Advantageously, this also works when every block should have a strictly fixed size.
Disadvantageously, an ideal solution would use frequency interleaving between blocks. In case the decoder cannot derive the bit budget available for refinement bits, they need to be explicitly signaled, thereby causing a corresponding overhead: In the worst case, one data bit needs to be encoded with 3 bits. Having for instance a working group with three color components and four 4×4 DCT blocks, the maximum number of bits that are not allocated by the rate control when only considering complete bit planes equals 3·4·4·4·3−1=575 bits. Encoding such a number uses 10 bits. Assuming a target compression of 4 bits per pixel, a workgroup consists of 4·4·4·4=256 bits, such that the overhead is 3.9%. In case a workgroup consists of 16 DCT blocks, the overhead is only 1.2%
4.3.4 Use of Separate Output Buffer
In this case, the refinement bits are placed in a separate buffer and append later on to the bit stream. Then the decoder can decode them in a second phase. Since the decoder does not need any rate control phase, the number of phases in encoder and decoder is identical.
Advantageously, no signaling overhead is needed.
Disadvantageously, this complicates the decoder design in terms of, for example, hardware resources and memory. An additional buffer may be used at the encoder side.
4.4 Pseudo Code for Encoder
In order to clarify the descriptions of the previous sections, they are detailed in the following subsections by means of pseudo code. For the creation of fractional bit planes, the concepts of Section 4.3.3 are used.
In order to keep the code as simple as possible, the special handling of the DC coefficients as explained in Section 1.8 is not included, but it could be.
4.4.1 Definitions
4.4.2 Input to the Function
4.4.3 Variables
4.4.4 Encoding a Workgroup with Precise Fractional Bit Planes
The pseudo code is depicted in
4.5 Pseudo Code for Decoder
4.5.1 Definitions
4.5.2 Input to the Function
4.5.3 Decoding a Workgroup with Precise Fractional Bit Planes
The pseudo code is depicted in
4.6 Simplification of Fractional Bit Plane Encoding
Pseudo code marked at 10 in
This can be prevented by using the bit budget for fractional coding only for refinement bits, but not for profile bits. In other words, fractional bit plane coding stops when reaching the hatched shaded bits in
4.6.1 Encoding a Workgroup with Limited Fractional Bit Planes
The pseudo code is depicted in
4.6.2 Decoding a Workgroup with Limited Fractional Bit Planes
The pseudo code is depicted in
4.7 Throughput Considerations and Codebook Reinterpretation
For high-speed hardware implementations, it is important that during both profile and refinement encoding, one coefficient can be processed per clock cycle. Thanks to the fact, that the codec operates coefficient by coefficient, this is indeed possible as explained in the following.
For refinement coding only the correct number of raw bits need to be output. Consequently, the throughput constraint is rather easily fulfilled for refinement coding.
For profile coding, coding of AC and DC coefficients need to be distinguished.
4.7.1 AC Coefficient Coding
For high performance hardware implementations, it is important to process only one variable length code word per clock cycle. Fortunately, this is easily possible by reinterpretation of the coding alphabet shown in Table 1.
When encoding the profile of coefficient fa, and zero run state is active, a fixed code word of one bit needs to be emitted or read per coefficient. This is obviously trivial.
When encoding the profile of a coefficient fa, and zero run state is not active, then the codeword consists of a variable number of zero bits, followed by a separating one-bit, followed by the bit value as depicted in
The leading zeros correspond to EOP symbols. Their number hence determines how many bit planes need to be terminated and varies between zero and nmax, where nmax is the maximum number of bitplanes a coefficient can have. These leading zero bits are separated by a one bit, before the value of the profile bit follows.
Thanks to this simple codeword structure, a high performance implementation is easily possible.
4.7.2 DC Coefficient Coding
Coding of the profiles is also easy. In case a workgroup consists only of one block per color component, the profile for a DC coefficient is simply a variable length code for the number of zero bit planes. In case the workgroup consists for more than one block per color, the number of zero bit planes is only signaled once for all DC coefficients of a component. In addition, the number of maximum AC bit planes need to be signaled. However, all together, less than one variable length code word per DC coefficient is used, enabling a high throughput implementation as well.
4.8 Limitation of Code Word Length
The maximum size of the variable codeword described in Section 3.8.1 equals nmax+2 bits. Since longer codewords use larger barrel shifters in hardware, the maximum codeword size can be limited by introduction of a clipping threshold. Let x be the number of EOP symbols to encode. Then instead of outputting x zero bits, followed by a one, the following variable length code can be applied:
5. The above sections presented a motivation of the usage of the concept of coding the magnitude bits of spectral coefficients of a transform block in a manner distinguishing between profile on the one hand and magnitude bits enveloped by the profile on the other hand, and provided, in Sections 2 to 4, examples of how to exploit the concept of the separate coding of profile on the one hand and magnitude bits enveloped by the profile on the other hand, in a manner allowing for a less complex implementation of an encoder and decoder and/or an increase in coding efficiency. In the following, certain aspects applied in Sections 2 to 4 are specifically made the subject of embodiments described further below. Insofar, the embodiments for the encoder and decoder described below represent abstractions of the specific details set out in Sections 2 to 4. In order to alleviate associating the subsequently explained embodiments with the embodiments set out in Sections 2 to 4, the description outlined below contains references to Sections 2 to 4.
For the sake of completeness, it is noted that in case if the transform block 14 representing the result of a subband spectral decomposition 22 such as a wavelet transform, block 14 is, for instance, of size n×n with n=2N. A set of 2(N−l)·2(N−l), registered to one corner of block 14, would represent, for example, the lower frequency portion of subband l in x and y. This set or subarray would represent a quadrant of a subarray of subband l which comprises three further subarrays of size 2(N−1)·2(N−l) of spectral coefficients of subband l concerning a higher frequency portion of subband l in the horizontal direction and/or vertical direction. The 2×2 array of these subband l subarrays would then, in turn, represent a quadrant of an even larger subarray of block 14, the other three quadrants of which would represent subarrays of size 2(n−l+1)·2(N−1+1) and relate to subband l−1 and so forth.
As already denoted above, the encoder 12 is for encoding transform block 14 into a data stream. The data stream is illustrated in
Encoder 12 is configured to code magnitude bits of spectral coefficients of transform block 14. That is, encoder 12 receives, or converts inbound spectral coefficients 16 into, a sign and magnitude representation. The sign is optional is, accordingly, not further taken into account in the following description of the encoder as well as with respect to the decoder described hereinafter. In order to efficiently code the magnitude bits, encoder 12 codes the magnitude bits by describing their distribution of zeros and ones in a matrix 32 in which the magnitude bits 34 of the spectral coefficients 16 are arranged, or into which the magnitude bits 34 are entered, column-wise with the spectral coefficients 16 of the transform block 14 ordered along a row direction 36 of the matrix 32. By dashed lines 38,
It should be noted here that encoder 12 may or may not be provided with, or have access to, block 14 in a manner where the transform coefficients' magnitude bits 34 are already arranged in a manner so as to correspond to matrix 32. If not, encoder 12 takes the order of the block's 14 spectral coefficients 16 in accordance with the scan order 40 into account when coding the magnitude bits of the spectral coefficients in the manner outlined further below.
When implementing encoder 12 in a manner complying with a first aspect of the present application, the encoder 12, in encoding the magnitude bits 34 of the spectral coefficients 16 into data stream 30, performs the coding of first magnitude bits of the spectral coefficients 16 into the data stream 30 first before coding second magnitude bits of the spectral coefficients 16 into data stream 30. The first magnitude bits were illustrated using hatching in
Besides coding the first magnitude bits 46 which form profile 44, encoder 12 codes second magnitude bits of the spectral coefficient 16, with these second magnitude bits being ones residing in the matrix 32 beneath, i.e. at a lower significance side of, profile 44. In
Before describing an embodiment for a decoder fitting to the encoder of
As already noted above, encoder 12 may code profile 44, and the magnitude bits 46 forming the same, using symbols which might be code words of a variable length code. An example of such code words has been set out above with respect to Table 1 and the usage and mode of operation in order to code profile 44 using these symbols has been described with respect to
While profile 44 may be possibly encoded in this manner with using, for instance, a variable length code in order to code the symbols output thereby, encoder 12 may alternatively use another approach. For instance, the aforementioned symbols of Table 1 could be coded using arithmetic coding. In
As to the second magnitude bits 48, it has already been taught above that the magnitude bits 48 may be coded into data portion 78 of data stream 30 using a compression ratio of one, i.e.
for each magnitude bit 48 one bit is written into data stream 30. For instance, the second magnitude bits 48 may be written into the data stream as they are, i.e. in a plane manner. Moreover, the second magnitude bits 48 may be coded into data portion 78 in a non-interleaved manner with respect to their membership to spectral coefficients 16. For instance, magnitude bits 48 of transform block 14 belonging to one coefficient 16 may be coded into data portion 78 immediately consecutively. Second magnitude bits 78 belonging to different coefficients 16 would then not be interleaved within data stream 30. An interleaving could also be left off between coefficients of different transform blocks 14.
The description of the encoder of
The decoder 80 is configured to decode the first magnitude bits 46 forming profile 44 and to decode the second magnitude bits 48 residing in matrix 32 at a lower significance side of profile 44, i.e. “behind” profile 44 when seen along the significance direction 42, or ones hit by the profile when the latter is projected along matrix column direction 42.
The details set out above with respect to the encoder of
However, as already indicated above, a different sort of coding/decoding of the profile magnitude bits 46 may be used. As far as the second magnitude bits 48 are concerned, the same is true. That is, decoder 80 may, in a plane manner, read the second magnitude bits 48 from data stream 30 and may, in this regard, read the magnitude bits 48 from the data stream in an non-interleaved manner with respect to the membership to transform coefficients 16, but alternatives would be available as well.
After having described the encoder of
In accordance with a first aspect of the present application, encoder 12 codes the first magnitude bits 46 into data stream 30 prior to, and in a non-interleaved manner to, the second magnitude bits 48, and likewise decoder 80 decodes the first magnitude bits 46 from the data stream 30 prior to, and in a non-interleaved manner relative to, the second magnitude bits 48 from data stream 30. As a result, the corresponding data sections or data portions 46 and 78 of data stream 30 do not interleave and are accordingly illustrated in a corresponding manner in
An example of a mode of operation of encoder 12 and decoder 80 when corresponding to the first aspect of the present application is explained below with respects to
Both tasks, i.e. the coding task of encoder 12 and the decoding task of decoder 80, start with obtaining a data rate target at step 110 and 112, respectively, or in alternative terms, a data amount, data consumption or bit budget reserved in the data stream 30 for the transform block or a transform block group to which the transform block belongs. Step 110 at the encoder side may involve encoder 12 receiving a data amount target from, for example, a rate control such as the rate control depicted in
This knowledge may be used or may not be used in the subsequent step of coding the first magnitude bits 114 at the encoder side and decoding the first magnitude bits 116 at the decoder side. For example, the codec may be designed such that the data amount target suffices inevitably to completely code profile 44 down to the least significance bit plane, i.e. bit plane zero in
Alternatively, the data amount target is taken into account when performing coding/decoding 114/116 of the first magnitude bits in the sense that the data rate target controls when to stop coding the profile 44. In other words, the data rate target may also be used in steps 114/116 to determine or identify the first magnitude bits until which same are coded into, or decoded from, the data stream such as, for instance, along the direction generally pointing along directions 36 and 42, respectively. As described above, the encoder could determine for each bit plane how much data would have to spent for coding the second magnitude bits completely within the respective bit plane in addition to coding the profile completely with respect to this bit plane, and accordingly encoder 12 is able to choose the least significant non-fractional bit plane till which it is feasible to code both the first magnitude bits and the second magnitude bits into the data stream 30 while complying with the data rate target. The decoder may determine at each transition from one bit plane to the next in decoding the profile bits, as to when the lest significant non-fractional bit plane has been reached. Again, potential remaining data amount still complying with the data rate target may distributed onto additional second magnitude bits below the least significant non-fractional bit plane. Further, the second magnitude bits thus identified at the encoder and decoder may be coded/decoded column-wise, i.e. without interleaving magnitude bits of different spectral coefficients.
The concept presented in section 2 represents an example for the process described with respect to
Before proceeding with describing implementation of encoder 12 of
In accordance with the second aspect of the present application, encoder 12 is configured to determine, in the manner described above with respect to the first steps in
As it turns out from the description brought forward above with respect to an implementation of encoder and decoder
As shown in
Again, it is clear that
The statement performed with respect to the first aspect, according to which the coding of the first and second magnitude bits may be restricted to a certain subset of the coefficients of block 14, such as all but the DC coefficient and/or all but a subset of lowest frequency coefficients in scan order, is applicable to implementations of encoder and decoder corresponding to the second aspect of the present application as well.
Similar statements performed above with respect to the second aspect of the present application or, to be more precise, when starting the description of implementations of the encoder and decoder of
In order to examples of implementations of the encoder and decoder of
This means that all the above described embodiments may be used to code the first and second magnitude bits, i.e. the profile magnitude bits 46 and enveloped magnitude bits 48, with merely restricting the coding to the sub array of matrix 32 relating to bit planes other than bit planes 150 and to coefficients of set 158. The first and second magnitude bits 46 and 48 merely lie within this sub array. The coding of the value of any coefficients within set 152 may take place before in coding order, the same applying to decoding.
Taking into accounts bits 154 in coding the value of a coefficient in set 154 mans the following. For example, on a per coefficient basis, VLC coding may be used for coding the value and decoding the value. A predetermined spectral coefficient of the set 152 is, for example, coded by the encoder by mapping a value of the predetermined spectral coefficient, as represented by at least a subset of the magnitude bits of this predetermined spectral coefficient, which includes at least one magnitudes bit lying in bit planes 150, onto a variable length code 160 and writing the variable length code 160 into the data stream 30. For example, the value of this coefficient in set 152, is determined by the sequence 162 of magnitude bits from some most significant bit plane 164 among coefficients within set 152 within a certain set of transform blocks, to a least significant bit plane 166. In other words, the just-mentioned most significant bit plane 164 of bit sequence 162 may be signaled in the data stream 30 for a set of transform blocks which, for instance, covers all transform blocks of an image, is a the transform block group relating to the signalization of the least significant non-fractional bit plane, or is the transform block group, for which plane 156 is signaled in the data stream 30, i.e. the group of transform blocks relating to group 28. The latter groups may all be the same. Alternatively, the most significant bit plane 164 may be the most significant bit plane of all bit planes irrespective of the values of coefficients, i.e. bit plane n−1 in nomenclature of
That is, in accordance with the third aspect of the present application, the encoder signals in the data stream a second predetermined bit claim, such as the most significant non “beyond profile” bit plane indicated 156 in
An alternative is illustrated in
Instead of coding the VLC code directly, prediction, such as spatial prediction from a transform block corresponding to a neighboring sample block, and/or temporal prediction from a transform block corresponding to a spatially collocated sample block of a previously coded image of a video to which both images belong, may be used with restricting VLC coding to the prediction residual, i.e. the prediction of the count 167 with treating the plane written magnitude bits as they are (
A possible implementation involving the aspect of possibly treating with more than one color component, is described above in section 1.8.2.
Finally, the fourth aspect of the present application relates to an implementation of the encoder and decoder of
With respect to the third and fourth aspects, it is noted that it may very well be that the first and second magnitude bits may be transmitted within a data stream in a bit plane manner rather than in a coefficient by coefficient manner. A similar statement would also be true for first and second aspects, respectively.
Finally,
Even though the above described embodiments comprise varying specific features, the main components of the encoder and decoder of the embodiments may be mutually applicable to within all embodiments.
It is to be understood that in this specification, the signals on lines are sometimes named by the reference numerals for the lines or are sometimes indicated by the reference numerals themselves, which have been attributed to the lines. Therefore, the notation is such that a line having a certain signal is indicating the signal itself. A line can be a physical line in a hardwired implementation. In a computerized implementation, however, a physical line does not exist, but the signal represented by the line is transmitted from one calculation module to the other calculation module.
Although the present invention has been described in the context of block diagrams where the blocks represent actual or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding method steps where these steps stand for the functionalities performed by corresponding logical or physical hardware blocks.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
The inventive transmitted or encoded signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive method is, therefore, a data carrier (or a non-transitory storage medium such as a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which will be apparent to others skilled in the art and which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
16190491.7 | Sep 2016 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2016/074730, filed Oct. 14, 2016, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. 16190491.7, filed Sep. 23, 2016, which is also incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2016/074730 | Oct 2016 | US |
Child | 16360634 | US |