The invention refers to a method for coding digital image data as well as to a corresponding decoding method. Furthermore, the invention refers to an apparatus for coding digital image data and an apparatus for decoding digital image data.
Increasing resolution and quality requirements on visual content, like images, videos, or multi-dimensional medical data raise the demand for highly efficient coding methods. In predictive coding techniques, the pixel values of pixels in the image data are predicted. The difference between the predicted pixel values and the original pixel values (i.e. the prediction error) is compressed, thus forming a part of the coded image data.
In documents [1] to [3], different variants of so-called piecewise autoregressive pixel-prediction methods are described. In those methods, a pixel value of a current pixel to be predicted is calculated based on a weighted sum of reconstructed, previously processed pixels in a neighborhood region adjacent to the current pixel. In order to determine the weights, a system of linear equations based on the weighted sums for known pixel values in a training region adjacent to the current pixel is solved.
For a precise prediction, piecewise autoregressive pixel-prediction methods require a large causal neighborhood region of known reconstructed pixels around the current pixel. Usually, such a large neighborhood region is not available for all image positions, e.g. at image borders. This problem becomes worse if image regions must be coded independently from each other, as it is the case for a block-wise processing in parallel coding implementations.
The above described border problem occurring in piecewise autoregressive pixel-prediction methods is often not addressed in prior art publications, or it is suggested to skip border regions when using autoregressive pixel-prediction methods. A direct way to address this problem without an algorithmic change is an image padding at border regions, e.g. with known values of already transmitted border pixel values (e.g. constant border extension). In document [4], a reduction of the training region size at border positions is suggested. However, this leads to an over-fitting and often causes badly conditioned systems of linear equations. Another option for handling border regions is a special border pixel treatment using different prediction schemes with relaxed context requirements like median prediction (see document [5]). Such special treatment requires additional implementation effort, leads to inhomogeneous predictions, and may often considerably jeopardize prediction accuracy.
It is an object of the invention to provide a method for coding digital image data including a piecewise autoregressive pixel-prediction method overcoming the above disadvantages and enabling an efficient compression with low complexity. Furthermore, it is an object of the invention to provide a corresponding decoding method as well as an apparatus for coding and an apparatus for decoding.
This object is solved by the independent patent claims. Preferred embodiments of the invention are defined in the dependent claims.
According to the method of the invention, digital image data is coded comprising one or more arrays of pixels (i.e. 2-D, 3-D, or even N-dimensional data) with corresponding pixel values, where pixel values of the pixels to be coded in each array are predicted by a prediction in which the predicted value of a current pixel is determined based on a weighted sum of reconstructed pixel values of reconstructed, previously processed pixels in a specific neighborhood region adjacent to the current pixel. The reconstructed pixel values refer to pixel values having been compressed and decompressed in the method before or even to the original pixel values in case that a lossless coding method is used. The weights of the weighted sum of reconstructed pixel values are determined based on linear and/or non-linear equations for reconstructed pixels in a specific training region adjacent to the current pixel where the training region has at least the size of the neighborhood region and preferably (but not necessarily) includes the pixels of the neighborhood region. The method described so far refers to a piecewise autoregressive prediction method. The prediction error between predicted pixel values and the original pixel values is processed in the coding method for generating the coded image data as it is known from the prior art. The above term “linear and/or non-linear equations” refers to equations which are linear and/or non-linear with respect to the weights as variables.
The method of the invention is based on a new technique for determining the specific neighborhood region and the specific training region used in the prediction method. Those regions are determined as described in the following.
In a step a), those pixels in a preset neighborhood region adjacent to the current pixel are determined for which reconstructed pixel values in the array exist, resulting in a modified neighborhood region defined by the determined pixels. Furthermore, in a step b), those pixels in a preset training region adjacent to the current pixel are determined for which reconstructed pixel values in the array exist and for which the modified neighborhood region adjacent to the respective pixel exclusively includes pixels for which reconstructed pixel values in the array exist, resulting in a modified training region defined by the determined pixels.
In a step c), a validation value is determined being dependent on a parameter (e.g. being the parameter) which increases in dependency on an increasing number of pixels in the modified training region (with the number of pixels in the modified neighborhood region being fixed) and which increases in dependency on a decreasing number of pixels in the modified neighborhood region (with the number of pixels in the modified training region being fixed). Preferably, the validation value is increasing or decreasing in dependency on an increasing parameter. In a particularly preferred embodiment, the parameter and preferably the validation value is the ratio between the number of pixels in the modified training region and the number of pixels in the modified neighborhood region.
In a step d) of the method according to the invention, an iteration is performed if the validation value corresponds to a parameter which is less than or less than or equal to a predetermined threshold value. In case that this condition for the validation value is not fulfilled, the specific neighborhood region corresponds to the modified neighborhood region and the specific training region corresponds to the modified training region.
In case that the iteration is performed, the method of the invention proceeds with a step e) where at least one additional pixel is removed in each iteration step from the modified neighborhood region, resulting in an updated modified neighborhood region in each iteration step, where the iteration terminates when the validation value determined in the corresponding iteration step based on the number of pixels in the updated modified neighborhood region (e.g. based on the ratio of the number of pixels in the (updated) modified training region and the number of pixels in the updated modified neighborhood region) corresponds to a parameter which exceeds or reaches the predetermined threshold value. In case of a termination of the iteration, the specific neighborhood region is the updated modified neighborhood region and the specific training region is a region exclusively comprising pixels of the preset training region for which reconstructed pixel values exist and for which the updated modified neighborhood region at the termination of the iteration exclusively includes pixels for which reconstructed pixel values exist.
The method of the invention is based on the finding that a low ratio between the number of pixels in the training region and the number of pixels in the neighborhood region leads to an inaccurate prediction due to an inaccurate determination of the weights based on the system of linear and/or non-linear equations. Hence, in such a case, the ratio is increased by pruning pixels in the neighborhood region until a predetermined threshold value is reached. In a preferred embodiment, the predetermined threshold value is chosen to be at least 1.5. Preferably, the threshold value is chosen to be about 5 leading to very good predictions.
In one variant of the invention, the above described specific training region determined in step e) is the modified training region. However, this region may also be an updated modified training region exclusively comprising all pixels of the preset training region for which reconstructed pixel values exist and for which the updated modified neighborhood region at the termination of the iteration exclusively includes pixels for which reconstructed pixel values exist.
In the latter case, a more precise determination of the weights is achieved because the updated modified training region may comprise more pixels than the modified training region due to the reduced number of pixels in the neighborhood region.
In a preferred embodiment of the invention, the validation value in step e) is determined based on both the number of pixels in the updated modified neighborhood region and the number of pixels in the above defined updated modified training region. E.g., the validation value is determined based on the ratio between the number of pixels in the updated modified training region and the number of pixels in the updated modified neighborhood region. This results in a very good measure for the accuracy of the prediction.
In another embodiment of the invention, the at least one pixel being removed in each iteration step is the pixel which results in an updated modified neighborhood region leading to an updated modified training region with the most pixels. This embodiment provides a good adaptation of the training region leading to a large validation value and, thus, a good accuracy of the prediction.
In another variant of the invention, a Euclidean distance is defined in each array of pixels, where the at least one pixel being removed in each iteration step is the pixel with the largest Euclidian distance to the current pixel. Analogously to the embodiment described before, this leads to a good accuracy of the prediction.
In case that there are several pixels leading to an updated modified training region with the most pixels and/or if there are several pixels with the largest Euclidean distance, the pixel out of those several pixels is removed resulting in an updated modified neighborhood region having a centroid with the smallest distance to the current pixel. This results in a removal of pixels in the neighborhood region where many other pixels are present, leading to good prediction results.
In another variant of the invention, another type of prediction than the above-described piecewise autoregressive pixel-prediction is used for specific current pixels based on one or more criteria, particularly in case that the iteration cannot find an updated modified neighborhood region leading to a validation value which corresponds to a parameter which exceeds or reaches the predetermined value. An example of another type of prediction is a prediction based on the mean of the available pixels in the neighborhood region or a direct copy of the nearest pixel to the current pixel.
In another variant of the invention, the pixels of the digital image data are coded in a coding order based on a line-scan and/or a Z-order scan and/or a Hilbert scan.
Preferably, the prediction error determined in the coding method is subjected to an entropy coding. Optionally, the prediction error may be subjected to a lossy compression method before applying entropy coding. If the prediction error is only entropy encoded, this leads to a lossless coding scheme. Preferably, the entropy coding for each array of pixels is an adaptive arithmetic coding or an adaptive variable length coding which preferably starts with an initial probability distribution having one or more distribution parameters and preferably a variance. E.g., the probability distribution is a Laplacian or Gaussian probability distribution. The distribution parameters are included as a side information in the coded image data and, thus, enable the proper decoding of the coded image data.
In another preferred embodiment of the invention, several and preferably all arrays of pixels are coded simultaneously enabling a parallel processing of several arrays, resulting in a fast coding of the image data.
In a preferred variant of the invention, the above described steps a) to e) are performed before performing the prediction of pixel values, where the specific neighborhood regions and specific training regions for the pixels are prestored in a storage which is accessed during prediction of the pixel values. Hence, the determination of the neighborhood and training regions may be performed before the actual coding of the image data leading to a fast prediction and, thus, a fast coding of the image data.
In a preferred variant of the invention, the coding method is a video coding method which codes a sequence of digital images, where each array of pixels refers to a block in the sequence. The term block is to be interpreted broadly and may also comprise the whole image. Particularly, the video coding may be based on the standard H.264/AVC or the (draft) standard HEVC (HEVC=High Efficiency Video Coding).
The method of the invention may also be used for coding image data comprising one or more images having three or more dimensions, particularly medical volume images, e.g. determined by a computed tomography system. In such a case, the above mentioned blocks refer to N-dimensional cuboids which are cubes in case of three dimensions.
Besides the above coding method, the invention also refers to a method for decoding digital image data which is coded by the above described coding method. In such a decoding method, the prediction errors are reconstructed from the coded image data, the image data comprising one or more arrays of pixels with corresponding pixel values, where pixel values of the pixels to be decoded in each array are predicted by a prediction in which the predicted value of a current pixel is determined based on a weighted sum of reconstructed pixel values of reconstructed, previously decoded pixels in a specific neighborhood region adjacent to the current pixel, where the weights of the weighted sum are determined based on linear and/or non-linear equations for reconstructed pixels in a specific training region adjacent to the current pixel, where the predicted pixel value is corrected by the reconstructed prediction error, resulting in a decoded pixel value for the current pixel. The decoding method is characterized in that the specific neighborhood region and the specific training region for the current pixel are determined based in the same way as during coding, i.e. based on the above described steps a) to e).
In a preferred embodiment of the decoding method, several arrays of pixels which have been coded simultaneously are to be decoded. In such a case, the above defined one or more distribution parameters of the probability distribution may be used for determining the start of a corresponding array of pixels.
The invention also refers to a method for coding and decoding digital image data, where the digital image data is coded by the above described coding method and the coded digital image data is decoded by the above described decoding method.
The invention also refers to a coding apparatus for digital image data comprising one or more arrays of pixels with corresponding pixel values, where the apparatus includes a prediction means for predicting the pixel values of the pixels to be coded in each array by a prediction in which the predicted value of a current pixel is determined based on a weighted sum of reconstructed pixel values of reconstructed, previously processed pixels in a specific neighborhood region adjacent to the current pixel, where the weights of the weighted sum are determined based on linear and/or non-linear equations for reconstructed pixels in a specific training region adjacent to the current pixel, where the apparatus includes a processing means in which a prediction error between predicted pixel values and the original pixel values is processed for generating the coded image data.
The apparatus of the invention is characterized by a means for determining the specific neighborhood region and the specific training region for the current pixel, said means including for each of the above described steps a) to e) a means to perform the corresponding one of steps a) to e).
The coding apparatus of the invention preferably includes one or more additional means for performing one or more preferred embodiments of the coding method according to the invention.
The invention also refers to an apparatus for decoding digital image data which is coded by the above described coding method. The apparatus includes a reconstruction means for reconstructing the prediction errors from the coded image data, the image data comprising one or more arrays of pixels with corresponding pixel values, where the apparatus comprises a prediction means for predicting the pixel values of the pixels to be decoded in each array by a prediction in which the predicted value of a current pixel is determined based on a weighted sum of reconstructed pixel values of reconstructed, previously decoded pixels in a specific neighborhood region adjacent to the current pixel, where the weights of the weighted sum are determined based on linear and/or non-linear equations for reconstructed pixels in a specific training region adjacent to the current pixel. The apparatus comprises a correction means for correcting the predicted pixel value by the reconstructed prediction error, resulting in a decoded pixel value for the current pixel.
The decoding apparatus further includes a means for determining the specific neighborhood region and the specific training region of the current pixel, said means including for each of the above described steps a) to e) a means to perform the corresponding one of steps a) to e).
The invention also refers to a codec for coding and decoding digital image data, comprising a coding apparatus according to the invention and a decoding apparatus according to the invention.
In the following, embodiments of the invention will be described with respect to the accompanying drawings wherein:
In the following, an embodiment of the method according to the invention will be described with respect to the coding of images within a video comprising a time sequence of images.
The method is based on intra coding where the values of pixels are predicted by pixels in the same image. The method uses a piecewise autoregressive pixel-prediction known from the prior art. However, the determination of corresponding neighborhood regions and training regions as described later on is not known from the prior art.
Based on
i0=w1×i1+w2×i2+w3×i3+w4×i4+w5×i5+w6×i6+w7×i7+w8×i8+w9×i9+w10×i10+w1×1+w12×i12.
The terms w1, w2, . . . , w12 are weights which are adaptive, which means that the weights are adapted for each pixel to be predicted depending on a larger neighborhood of pixel values, which is called training region or training window. Such a specific training region ST is shown in
The above described prediction method works well for pixels which are not at the border of the corresponding image or image block. For pixels at borders, the neighborhood region and the training region may extend beyond the borders and, thus, may not have a reconstructed pixel value for each pixel in the corresponding region. According to the prior art, a special treatment for such cases based on a border extension or a reduction of the training region size are used which require additional implementation effort or lead to inhomogeneous predictions. Contrary to that, the invention described in the following provides an easy and straightforward method for defining neighborhood regions and training regions leading to a good prediction of the corresponding pixels. The method for defining corresponding neighborhood regions and training regions is explained in the following with respect to the flowchart of
The method starts with a preset neighborhood region PN and a preset training region PT based on corresponding templates (starting point S1). Those regions may be the same as the neighborhood region and training region shown in
As explained above, step S2 results in a modified neighborhood region MN and a modified training region MT. Hereinafter, the number of pixels or positions in the modified neighborhood region is designated as n and the number of pixels or positions in the modified training region is designated as m. It may now happen that the number of training pixels and thus the number of linear equations m for a fixed number n of unknown weights becomes too small, increasing the probability of ill-conditioned systems of equations. This often leads to a set of inaccurate weights causing imprecise predictions, i.e. high prediction errors, which may be caused by image noise, for example. In order to overcome those disadvantages, further positions from the modified neighborhood MN region and the modified training region MT are pruned in a pixel-by-pixel fashion and in an order of decreasing Euclidian distances as will be explained in the following.
In step S3 of the method of
In the pruning process, an iteration IT will take place in step S6. In each iteration step of this iteration, a pixel in the modified neighborhood region MN having the largest Euclidian distance to the current pixel p0 is discarded. This results in an updated modified neighborhood region UMN in each iteration step. For such an updated modified neighborhood region, an updated modified training region UMT is calculated. This updated modified training region exclusively comprises all pixels of the preset training region PT for which reconstructed pixel values exist and for which the updated modified neighborhood region adjacent to the corresponding pixel exclusively includes pixels for which reconstructed pixel values exist.
After each iteration step, the above described validation value VA is calculated based on the number of pixels of the updated modified neighborhood region and the updated modified training region. I.e., the ratio between the number of pixels of the training region UMT and the number of pixels of the neighborhood region UMN is determined. The iteration is continued in case that the value VA is smaller than the threshold value TV. I.e., in a next iteration step, another pixel in the new updated modified neighborhood region having the largest Euclidian distance to the pixel to be predicted is discarded.
In case that the validation value VA reaches or exceeds the threshold value TV (step S7), the iteration terminates because it can be assumed that a precise prediction can be achieved in such a case. For reasonably stable predictions, the threshold value TV shall be chosen to be at least 1.5. According to experiments of the inventors, a threshold value of about 5 is a good choice for a precise prediction. After the termination of the iteration IT, the specific neighborhood region SN and the specific training region ST which are used in the autoregressive prediction are set to the updated modified neighborhood region UMN and the updated modified training region UMT, respectively (step S8).
In cases where the last neighborhood pixel has been removed in the iteration IT of
It may further happen that, in a corresponding iteration step of the iteration IT, more than one pixel with the largest Euclidian distance is found. In the embodiment described herein, the following procedure will be used in order to select the pixel to be removed from the pixels with the same largest Euclidean distance: the pixel will be removed which leads to a pruned neighborhood region having a centroid with the smallest distance to the current pixel to be predicted.
In the foregoing, the method of the invention has been described based on a line-scan coding order. However, the invention may also be applied to a coding method with another scan order. Examples of alternative scan orders are shown in
As described above, an appropriate pruned neighborhood region and pruned training region is determined for each pixel to be predicted. In a preferred embodiment, the process of determining those pruned regions is not performed during prediction but beforehand. The corresponding regions are in this case stored in a storage, which is accessed during coding. Hence, the regions only have to be determined once and may then be used during the whole prediction process of the coding method.
The above described prediction results in predicted pixel values. For these values, a prediction error is determined based on the difference between the predicted and the original pixel value. This prediction error is optionally subjected to further lossy compression. Thereafter, the prediction error is entropy coded. In a particularly preferred embodiment, an efficient arithmetic coder is employed for entropy coding where the previously coded prediction errors are used to adapt a probability distribution processed in the arithmetic coder. Arithmetic coding per se is known from the prior art and uses estimated probabilities for symbols of a symbol alphabet where each symbol has a probability based on a probability distribution where the sum of all probabilities is 1. In arithmetic coding, symbols with higher probabilities need less bits in the code stream than symbols with lower probabilities. In the arithmetic coding described herein, the symbol alphabet consists of all possible values of prediction errors.
For the first predicted pixel value within an image or image block, there is not much information on the probability distribution from previously coded prediction errors available. Therefore, the distribution is initialized to a suitable initial distribution like to Laplacian or Gaussian distribution where a few distribution parameters like a variance estimate are computed in the encoder and then transmitted to the decoder explicitly as side information.
The invention as described above may be used in a parallel scheme where several blocks are coded in parallel independently from each other resulting in independent bit-streams for each block. When these blocks are not decoded sequentially, each decoding thread for a block needs to know the position within the finally composed stream where the decoding of the corresponding block should start. However, the decoder can make a prediction for this position using the above described transmitted variances of each block.
The invention as described in the foregoing may be used as a special prediction mode for each block (e.g. a macroblock in the standard H.264/AVC or a prediction unit in the draft standard HEVC). This special prediction mode indicates the usage of the intra prediction scheme for each block based on the pruned neighborhood and training regions as described above.
For the parts of the image coded by the prediction according to the invention, the intra prediction IP is based on a coding step where context reduction and piecewise autoregressive intra prediction based on pruned neighborhood and training regions is used as described above. Since the method is performed pixel-wise, the transform T is usually bypassed for these blocks. For lossless coding, also the quantization Q is bypassed. In the case of intra prediction, the diagram of
As a result of the coding of
The autoregressive pixel-prediction as described above may be used independently for image arrays in the form of separate image blocks. However, the method is not restricted to block-based applications. The arrays of images processed separately can have arbitrary shapes, which may also be adapted to the image structures. In general, the method of the invention may be applied to any type of visual data like videos, medical volumes or dynamic volumes, multi-view data, hyperspectral data and beyond. For such applications, the neighborhood region and training region which are pruned for the prediction method may also have a multi-dimensional shape.
Apart from lossless coding, the method of the invention may also be used for lossy pixel-predictive compression or even for sparse image restoration, denoising, and error concealment where the main goal is also an estimation of unknown values from known neighborhood pixels.
The method as described in the foregoing has several advantages. Particularly, the method describes an adapted reduction method of neighborhood regions and training regions for a pixel-wise prediction in order to sustain a well-conditioned system of linear equations for calculating predicted pixel values. As a consequence of this method, better predictions and therefore smaller prediction errors can be achieved in regions like borders where the context information is restricted. The inventive method particularly has advantages in block-based coding for parallel processing where a lot of border regions occur. Furthermore, in a preferred embodiment, additional rate can be saved by the transmission of variance information for the initialization of probability distributions used for arithmetic coding as well as by its usage for predicting the code stream size.
In a particularly preferred embodiment, the determination of pruned neighborhood and training regions is done offline, i.e. before the prediction is performed. In this case, the pruned neighborhood and training regions for each pixel to be predicted may be stored in look-up tables or may be hardcoded in the coding software in order to prevent added computational complexity by performing the pruning method during prediction.
By using the method of the invention, a parallel processing of several independent image arrays can be performed on both the encoder and decoder side. There is no communication or synchronization required between separate coding and decoding threads. Furthermore, the amount of parallelization can be scaled arbitrarily with block size and number of images to be coded. Even thousands of threads are possible. As the pixel positions within simultaneously coded blocks have the same neighborhood and training regions, this scheme is also suitable for a SIMD architecture (SIMD=single instruction multiple data).
The inventors have performed experiments in which the prediction errors obtained by a known prediction method using image padding are compared to the prediction errors obtained by the method according to the invention. The experiments showed that instable predictions in certain image areas could be improved, i.e. the prediction error in these areas could be reduced by the method of the invention.
Furthermore, the encoder comprises a processing means M3 where a prediction error between predicted pixel values and the original pixel values is processed for generating the coded image data CI. This processing usually comprises an entropy coding and optionally a transform and a quantization.
The coded image data CI obtained by the encoder EN are transmitted to a decoder DEC which comprises a reconstruction means M4 for reconstructing the prediction errors from the coded image data. This reconstruction means usually comprises an entropy decoder and optionally an inverse transform element and an inverse quantization element. Furthermore, the decoder DEC comprises a prediction means M5 which works analogously to the prediction means M1 in the encoder EN. I.e., the prediction means predicts the pixel values of the pixels to be decoded in each of a plurality of arrays of pixels by an autoregressive pixel-wise prediction. Moreover, the decoding apparatus DEC comprises a means M6 for determining the specific neighborhood region and the specific training region used in the predicting means M5. The means M6 includes for each of the above described decoding steps a) to e) a means Ma′, Mb′, Mc′, Md′ and Me′ to perform the corresponding one of the steps a) to e). Furthermore, the decoder DEC includes a correction means M7 for correcting the predicted pixel values obtained by the prediction means M5 by the reconstructed prediction error obtained by the reconstruction means M4. As a result, a decoded sequence DI of images is obtained which corresponds to the original sequences of images I in case of a lossless coding and decoding.
Number | Date | Country | Kind |
---|---|---|---|
13152388.8 | Jan 2013 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/077277 | 12/19/2013 | WO | 00 |