The invention relates to a method and to an apparatus for encoding for video encoding, and for decoding for video decoding, the distribution of significant coefficients in a block of coefficients, wherein any non-zero amplitude coefficient is denoted a significant coefficient.
In known image compression processing, e.g. MPEG2 and MPEG4 AVC, following quantisation, a very sparse distribution of significant (i.e. non-zero) amplitude coefficients of the (e.g. DCT-) transformed image signal may be obtained while most quantised coefficients are zeros. Although run-length coding for zeros can be used, the most costly task for a transform-based image compression in terms of resulting overall data rate is to record the locations of such significant coefficients within the coding blocks or macroblocks. Encoding the location of a significant coefficient within a block is more expensive than encoding its magnitude and sign, because of the sparse distribution of significant coefficients.
In existing codecs, for example JPEG2000, as described in D. Taubman, “High Performance Scalable Image Compression with EBCOT”, IEEE Transactions on Image Processing, Vol. 9, No. 7, July 2000, pp. 1158-1170, in the bit plane encoding process, coefficients are repeatedly scanned and encoded in a one-dimensional sample-by-sample pattern. Therefore a large number of zeros are to be encoded in order to record the locations of significant coefficients. Although run-length coding for zeros may be employed in the clean-up pass under some conditions, the chance for reducing redundant coding data information is relatively small.
Addressing this issue, in US2007/0071331A1 a quaternary reaching method is proposed which is essentially a quartation to reach the significant coefficients efficiently. In that quartation, a ‘significant square’ (i.e. containing at least one non-zero amplitude coefficient) with pixel size 2N*2N is recursively divided into four smaller squares by evenly dividing the height and width, until single significant coefficients are reached. Then, the significance statuses of all generated squares are encoded. The resulting total quantity of information needs to be recorded and encoded but the number of coding operations for reaching significant coefficients is reduced.
However, the significance distribution of the sparse image signal is multifarious. Although the quartation processing is a good choice for recording the locations of significant coefficients in a sparse matrix, it is not optimum in all cases.
For example, for an 16*16 square as shown in
Encoding the number and the coordinates (40 bits or less): Encode the integer number of significant coefficients by fixed-length coding as a binary number 00000100. Fewer bits are required if Exp-Golomb Code is used. For example, when zero-order Exp-Golomb code is used the integer ‘4’ is encoded as 00101 and only 5 instead of 8 bits are required. Encode the x-y-coordinates as binary numbers:
A problem to be solved by the invention is to provide an improved way of recording or encoding various coefficient significance distributions. This problem is solved by the methods disclosed in claims 1 and 3. Apparatuses that utilise these method are disclosed in claims 2 and 4, respectively.
The invention is related to entropy encoding/decoding of a group of data of image/video signals. Because a single mode of pattern information encoding cannot be optimum for the different significance distributions, the invention uses several pattern determination or encoding modes for encoding a square in the above sense, and the encoding side selects one of these modes and transfers the corresponding mode information to the decoding side for accurate decoding. Advantageously, the cost of that side information is negligible if the square size is big enough.
In principle, the inventive encoding method is suited for encoding for video encoding the distribution of significant coefficients in a block of coefficients, wherein a non-zero amplitude coefficient is denoted a significant coefficient, said method including the steps:
In principle the inventive encoding apparatus is suited for encoding for video encoding the distribution of significant coefficients in a block of coefficients, wherein a non-zero amplitude coefficient is denoted a significant coefficient, said apparatus including:
In principle, the inventive decoding method is suited for decoding for video decoding the distribution of significant coefficients in a block of coefficients, wherein a non-zero amplitude coefficient is denoted a significant coefficient, said method including the steps:
In principle the inventive decoding apparatus is suited for decoding for video decoding the distribution of significant coefficients in a block of coefficients, wherein a non-zero amplitude coefficient is denoted a significant coefficient, said apparatus including:
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
According to the invention, multiple modes for determining/recording/encoding the locations of significant coefficients are used in order to adapt to various significant-coefficients distribution patterns. At least the following four modes are checked or analysed. It is assumed that the block size is 2N*2N, N=4, and that there are m (m>0) significant coefficients in a block.
This is the less sophisticated mode. The significance statuses of the coefficients are scanned following a fixed 1-dimensional sequential order through the lines (or columns, this choice is known to the decoder) of a block. The number of bits to be encoded (denoted NoB) for the block is 22N, i.e. is 256 for a 16*16 block.
The number of the significant coefficients and their coordinates or locations within the block are encoded as fixed-length binary numbers, cf. the 40-bits example given above. The NoB is a linear function with m, i.e. NoB=m*2N+2N. The second term 2N of the sum means the maximum number of bits required for representing the number of significant coefficients in the block. Selecting this mode is optimal when the number of significant coefficients in the block is small.
This mode corresponds to the above-described quartation processing. A significant square (i.e. a square wherein not all coefficients have amplitude zero) is recursively divided into four equal-size squares by evenly dividing its height and its width, until single significant coefficients are reached. Thereafter the resulting significance statuses of all generated squares are encoded. Selecting this mode is optimal when the distribution of the significant coefficients in the block is relatively concentrated.
A significant square is recursively divided into sixteen equal-size squares by evenly dividing its height and its width, until single significant coefficients are reached. Thereafter the resulting significance statuses of all generated squares are encoded according to the above-described quartation processing encoding principle. This mode is similar to mode 3 and is selected when the distribution of the significant coefficients is relatively dispersed.
It is an encoding issue to decide which mode to be used for a coefficient block. In order to achieve the highest compression efficiency, the best one of these modes can be selected by determining at encoding side for each coefficient block of the video signal to be encoded the corresponding encoding cost (i.e. the resulting bit rate) per candidate mode, and selecting for the current block the candidate mode that produces the minimum cost, i.e. by carrying out a 2-pass encoding.
Alternatively, for 1-pass encoding, the mode can be selected according to the characteristics of the coefficient significance distribution of the current block, e.g. by comparing the number of significant coefficients with a first threshold value, and by checking whether there are squares within the block that have no significant coefficients or a number of coefficients smaller than a second threshold value. For accurate decoding, two bits (in case of no more than four candidate modes) are sent in the bit stream to indicate the mode to be used for the current block, whereby the cost of that side information is negligible if the block size is big enough.
As a further example,
256 (mode 1, sample-by-sample),
424 (mode 2, point coordinate),
228 (mode 3, quartation),
208 (mode 4, sixteen partition).
Therefore, the encoder should select the sixteen partition mode 4 as the optimum mode for encoding this block. As mentioned above, two more bits are required for signalling the mode to the decoding side.
The video data input signal IE of the encoder in
In the case of non-intra video data, predicted block or macroblock data PMD are subtracted from the input signal IE in the subtractor SUB and the difference data RES are fed to the entropy encoder ECOD via the transform means/stage/step T and the quantising means/stage/step Q. The output signal of Q is also processed in inverse quantising means/stage/step QE−1, the output signal of which is fed via inverse transform means/stage/step TE−1 to the combiner step/stage ADDE in the form of reconstructed block or macroblock difference data RMDD. The output signal of ADDE is buffer-stored in a frame store in motion estimation and compensation means/stage/step FS_MC_E, which carry out motion compensation for reconstructed block or macroblock data and output block or predicted macroblock data PMD to the subtracting input of SUB and to the other input of the combiner ADDE.
In
For the current block the positions of the significant coefficients are established according to the selected mode:
For the video decoding, the current block is filled with zero amplitude coefficients at the locations where no significant coefficient is present. Such filling is carried out in MEV, EDEC, QD−1 or TD−1. The corresponding coefficient block as output from the quantiser step/stage Q at encoding side is reconstructed before entering the inverse transform means/stage/step TD−1.
In case of non-intra block or macroblock data, the output signal of ADDD is buffer-stored in a frame store in motion compensation means/stage/step FS_MC_D, which effect a motion compensation for reconstructed block or macroblock data. The block or macroblock data PMD predicted in FS_MC_D are passed to the second input of the combiner ADDD.
In
In practice, the inventive coding processing for locations of significant signals can be applied to block sizes that are much bigger than 16*16. For example, in wavelet based coding/decoding, the size of the first level of LH, HL, and HH subbands of a 512*512 image is as big as 256*256. The overhead for mode indication is only 2 bits, which is quite negligible when compared with the total amount of entropy-coded bits of such subband.
Another example of using this invention in DCT based coding/decoding is to treat all the quantised transformed coefficients in one frame (or in a part of a frame, e.g. a slice or a 64*64 block) as a whole, and to encode the locations of significant coefficients in the whole map by the above-described processing. This processing can be used not only for a still image, but also for I frames, P frames and B frames of a video sequence. The corresponding side information is only two or only a few bits (if more modes are included) for one frame.
The inventive processing is not limited to square blocks.
Any signal format (1-dimensional, 2-dimensional, or multi-dimensional signals) can be shaped into the proper format and is thereafter adaptively encoded using the inventive processing.
For non-square blocks, or where the width W or height H is not a power of ‘2’, there are a several ways to use the invention. For example, for 720*576 pixels active picture size, when the quartation method is used, the following calculation can be carried out during the division:
W′=floor(W/2), and H′=floor(H/2)
This means that the block is divided into [1, . . . , W′] [W′+1, . . . , W] in its width, and is divided into [1, . . . , H′] [H′+1, . . . , H] in its height. Accordingly, the quartation processing can be continued recursively, wherein the recursion is stopped when W==2 or H==2.
This means that the smallest unit is 2*N, or N*2. For the smallest unit, a sample-by-sample processing is used.
A similar procedure can be applied in the sixteen-partition mode.
In another embodiment for 720*576 pixels picture size, regular quartation or sixteen-partition processings are used for part of the image, for instance 16*16 or 256*256, and the whole image is separated into several ones of this basis unit. For the remaining part of the image, i.e. the pixels or macroblocks located at the edge of the image, the sample-by-sample mode can be used.
Furthermore, combination of quartation method and sixteen partition method can also be used. For example, the first several levels of recursion use quartation method, while the final level uses sixteen-partition method.
The invention is not limited to the four modes described above. Nine-partition or N2-partition (N being an integer greater than ‘4’) according to the principle of the above-described 16-partition, as well as zigzag scan with run-length coding can be used as optional modes.
The described adaptive (entropy) encoding of locations of significant values of sparse signals can also be used for audio coding or mesh data coding, and applied to signals following prediction, DCT/wavelet transform, and/or quantisation.
Number | Date | Country | Kind |
---|---|---|---|
08305483.3 | Aug 2008 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2009/060284 | 8/7/2009 | WO | 00 | 2/10/2011 |