The present invention relates to an image encoding technique and an image decoding technique.
As an encoding method for compression recording of a moving image, a VVC (Versatile Video Coding) encoding method (to be referred to as VVC hereinafter) is known. In the VVC, to improve the encoding efficiency, a basic block including 128×128 pixels at maximum is divided into subblocks which have not only a conventional square shape but also a rectangular shape.
Also, in the VVC, a matrix called a quantization matrix and configured to weight coefficients (to be referred to as orthogonal transform coefficients hereinafter) after orthogonal transformation in accordance with a frequency component is used. Data of a high frequency component whose degradation is unnoticeable to human vision is reduced, thereby increasing the compression efficiency while maintaining image quality. PTL 1 discloses a technique of encoding such a quantization matrix.
In recent years, in JVET (Joint Video Experts Team) obtained by standardizing the VVC, a technique for implementing a compression efficiency more than the VVC has been examined. To improve the encoding efficiency, in addition to conventional intra-prediction and inter-prediction, a new prediction method (to be referred to as mixed intra-inter prediction hereinafter) in which intra-prediction pixels and inter-prediction pixels are mixed in the same subblock has been examined.
The quantization matrix in the VVC assumes a prediction method such as conventional intra-prediction or inter-prediction, and cannot support mixed intra-inter prediction that is the new prediction method. For this reason, quantization control according to a frequency component cannot be performed for an error in the mixed intra-inter prediction, and image quality cannot be improved. The present invention provides a technique for performing quantization using an appropriate quantization matrix for mixed intra-inter prediction.
According to the first aspect of the present invention, there is provided an image encoding apparatus comprising:
According to the second aspect of the present invention, there is provided an image decoding apparatus comprising:
According to the third aspect of the present invention, there is provided an image encoding method comprising:
According to the fourth aspect of the present invention, there is provided an image decoding method comprising:
According to the fifth aspect of the present invention, there is provided a non-transitory computer-readable storage medium storing a computer program configured to cause a computer to function as:
According to the sixth aspect of the present invention, there is provided a non-transitory computer-readable storage medium storing a computer program configured to cause a computer to function as:
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed invention. Multiple features are described in the embodiments, but limitation is not made to an invention that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
An image encoding apparatus according to this embodiment acquires a prediction image by applying an intra-prediction image obtained by intra-prediction to a partial region of an encoding target block included in an image and applying an inter-prediction image obtained by inter-prediction to another region different from the partial region of the block. The image encoding apparatus encodes quantized coefficients obtained by quantizing an orthogonal transform coefficients of the difference between the block and the prediction image using a quantization matrix (first encoding).
An example of the functional configuration of the image encoding apparatus according to this embodiment will be described first with reference to the block diagram of
A holding unit 103 holds a quantization matrix corresponding to each of a plurality of prediction processes. In this embodiment, the holding unit 103 holds a quantization matrix corresponding to intra-prediction that is intra-frame prediction, a quantization matrix corresponding to inter-prediction that is inter-frame prediction, and a quantization matrix corresponding to the above-described mixed intra-inter prediction. Note that each quantization matrix held by the holding unit 103 may be a quantization matrix having default element values or may be a quantization matrix generated by the control unit 150 in accordance with a user operation. Alternatively, each quantization matrix held by the holding unit 103 may be a quantization matrix generated by the control unit 150 in accordance with the characteristic (such as an edge amount or frequency included in the input image) of the input image.
A prediction unit 104 divides the basic block into a plurality of subblocks on a basic block basis. The prediction unit 104 acquires, for each subblock, a prediction image by one of intra-prediction, inter-prediction, and mixed intra-inter prediction and obtains the difference between the subblock and the prediction image as prediction errors. Also, the prediction unit 104 generates, as prediction information, information necessary for prediction such as information representing the basic block division method, a prediction mode indicating prediction for obtaining the prediction image of each subblock, and a motion vector.
A transformation/quantization unit 105 generates the transform coefficients of each subblock by performing orthogonal transformation (frequency transformation) for the prediction errors of each subblock obtained by the prediction unit 104, acquires, from the holding unit 103, a quantization matrix corresponding to the prediction (intra-prediction, inter-prediction, or mixed intra-inter prediction) performed by the prediction unit 104 to obtain the prediction image of the subblock, and quantizes the transform coefficients using the acquired quantization matrix, thereby generating the quantized coefficients (the quantization result of the transform coefficients) of the subblock.
An inverse quantization/inverse transformation unit 106 performs, using the quantization matrix used by the transformation/quantization unit 105 to generate the quantized coefficients, inverse quantization of the quantized coefficients for the quantized coefficients of each subblock generated by the transformation/quantization unit 105, thereby generating the transform coefficients, and performs inverse orthogonal transformation of the transform coefficients, thereby generating (reproducing) the prediction errors.
An image reproduction unit 107 generates, based on the prediction information generated by the prediction unit 104, a prediction image from the image stored in a frame memory 108, and reproduces the image from the prediction image and the prediction errors generated by the inverse quantization/inverse transformation unit 106. The image reproduction unit 107 then stores the reproduced image in the frame memory 108. The image stored in the frame memory 108 is the image referred to when the prediction unit 104 performs prediction for the image of the current frame or the next frame.
An in-loop filter unit 109 performs in-loop filter processing such as deblocking filter or sample adaptive offset for the image stored in the frame memory 108.
An encoding unit 110 encodes the quantized coefficients generated by the transformation/quantization unit 105 and the prediction information generated by the prediction unit 104, thereby generating encoded data (code data).
An encoding unit 113 encodes the quantization matrix (including at least the quantization matrix used by the transformation/quantization unit 105 for quantization) held by the holding unit 103, thereby generating encoded data (code data).
An integrated encoding unit 111 generates header code data using the encoded data generated by the encoding unit 113, generates a bitstream including the encoded data generated by the encoding unit 110 and the header code data, and outputs the bitstream.
Note that the output destination of the bitstream is not limited to a specific output destination. For example, the bitstream may be output to a memory provided in the image encoding apparatus, may be output to an external apparatus via a network to which the image encoding apparatus is connected, or may be transmitted to the outside for broadcast.
Next, the operation of the image encoding apparatus according to this embodiment will be described. First, encoding of an input image will be described. The division unit 102 divides an input image into a plurality of basic blocks, and outputs each divided basic block.
The prediction unit 104 divides the basic block into a plurality of subblocks on a basic block basis.
In
In
As described above, in this embodiment, encoding processing is performed using not only square subblocks but also rectangular subblocks. In this embodiment, prediction information including information representing the basic block division method is generated. Note that the division methods shown in
The prediction unit 104 decides prediction (prediction mode) to be performed for each subblock. For each subblock, the prediction unit 104 generates a prediction image based on the prediction mode decided for the subblock and encoded pixels, and obtains the difference between the subblock and the prediction image as prediction errors. In addition, the prediction unit 104 generates, as prediction information, “information necessary for prediction” such as information representing the basic block division method, the prediction mode of each subblock, and a motion vector.
Here, prediction used in this embodiment will be described anew. In this embodiment, three types of predictions (prediction modes) including intra-prediction, inter-prediction, and mixed intra-inter prediction are used.
In intra-prediction, the prediction pixels of the encoding target block are generated using encoded pixels that are spatially located around the encoding target block (a subblock in this embodiment). In other words, in intra-prediction, the prediction pixels of the encoding target block are generated using encoded pixels in a frame including the encoding target block. For the subblock that has undergone the intra-prediction, information indicating an intra-prediction method such as horizontal prediction, vertical prediction, or DC prediction is generated as “information necessary for prediction”.
In inter-prediction, the prediction pixels of the encoding target block are generated using encoded pixels in a frame (temporally) different from the frame to which the encoding target block (a subblock in this embodiment) belongs. For the subblock that has undergone the inter-prediction, motion information indicating such as a frame to be referred to or a motion vector is generated as “information necessary for prediction”.
In mixed intra-inter prediction, first, the encoding target block (a subblock in this embodiment) is divided by a line segment in an oblique direction, thereby dividing the encoding target block into two divided regions. As the prediction pixels of one divided region, “prediction pixels obtained for the one divided region by intra-prediction for the encoding target block” are acquired. Also, as the prediction pixels of the other divided region, “prediction pixels obtained for the other divided region by inter-prediction for the encoding target block” are acquired. That is, the prediction pixels of one divided region of the prediction image obtained by mixed intra-inter prediction for the encoding target block are “prediction pixels obtained for the one divided region by intra-prediction for the encoding target block”. In addition, the prediction pixels of the other divided region of the prediction image obtained by mixed intra-inter prediction for the encoding target block are “prediction pixels obtained for the other divided region by inter-prediction for the encoding target block”.
Assume that an encoding target block 1200 is divided by a line segment passing through the vertex at the upper left corner and the vertex at the lower right corner of the encoding target block 1200 to divide the encoding target block 1200 into a divided region 1200a and a divided region 1200b, as shown in
In the above-described way, the prediction unit 104 generates the intra-prediction image 1201 (
In addition, processing of mixed intra-inter prediction by the prediction unit 104 for the encoding target block 1200 will further be described here with reference to
In the above-described way, the prediction unit 104 generates the intra-prediction image 1201 (
For the subblock that has undergone the mixed intra-inter prediction, information indicating the intra-prediction method, motion information indicating such as a frame to be referred to or a motion vector, information defining a divided region (for example, information defining the above-described line segment), and the like are generated as “information necessary for prediction”.
The prediction unit 104 decides the prediction mode of a subblock of interest by the following processing. The prediction unit 104 generates a difference image between the subblock of interest and a prediction image generated by intra-prediction for the subblock of interest. Also, the prediction unit 104 generates a difference image between the subblock of interest and a prediction image generated by inter-prediction for the subblock of interest. In addition, the prediction unit 104 generates a difference image between the subblock of interest and a prediction image generated by mixed intra-inter prediction for the subblock of interest. Note that a pixel value at a pixel position (x, y) in a difference image C between an image A and an image B is the difference between a pixel value AA at the pixel position (x, y) in the image A and a pixel value BB at the pixel position (x, y) in the image B (such as the absolute value of the difference between AA and BB or the square value of the difference between AA and BB). The prediction unit 104 specifies the prediction image for which the sum of the pixel values of all pixels in the difference image is smallest, and decides prediction performed for the subblock of interest to obtain the prediction image as “the prediction mode of the subblock of interest”. Note that the method of deciding the prediction mode of the subblock of interest is not limited to the above-described method.
Then, the prediction unit 104 obtains, for each subblock, the prediction image generated by the prediction mode decided for the subblock as “the prediction image of the subblock”, and generates prediction errors from the subblock and the prediction image. Also, the prediction unit 104 generates, for each subblock, prediction information including the prediction mode decided for the subblock and “information necessary for prediction” generated for the subblock.
The transformation/quantization unit 105 performs, for each subblock, orthogonal transformation processing corresponding to the size of the prediction errors for the prediction errors of the subblock, thereby generating an orthogonal transform coefficients. The transformation/quantization unit 105 then acquires, for each subblock, a quantization matrix corresponding to the prediction mode of the subblock among the quantization matrices held by the holding unit 103, and quantizes the orthogonal transform coefficients of the subblock using the acquired quantization matrix, thereby generating quantized coefficients.
For example, assume that the holding unit 103 holds a quantization matrix having 8 elements×8 elements (the values of all the 64 elements are quantization step values) exemplified in
In this case, the transformation/quantization unit 105 quantizes the orthogonal transform coefficients of “the prediction errors acquired by intra-prediction for the subblock of 8 pixels×8 pixels” using the quantization matrix for intra-prediction shown in
Also, the transformation/quantization unit 105 quantizes the orthogonal transform coefficients of “the prediction errors acquired by inter-prediction for the subblock of 8 pixels×8 pixels” using the quantization matrix for inter-prediction shown in
In addition, the transformation/quantization unit 105 quantizes the orthogonal transform coefficients of “the prediction errors acquired by mixed intra-inter prediction for the subblock of 8 pixels×8 pixels” using the quantization matrix for mixed intra-inter prediction shown in
The inverse quantization/inverse transformation unit 106 performs inverse quantization for the quantized coefficients of each subblock generated by the transformation/quantization unit 105 using the quantization matrix used by the transformation/quantization unit 105 to quantize the subblock, thereby generating transform coefficients, and performs inverse orthogonal transformation of the transform coefficients, thereby generating (reproducing) the prediction errors.
The image reproduction unit 107 generates the prediction image from the image stored in the frame memory 108 based on the prediction information generated by the prediction unit 104, and reproduces the image of the subblock by adding the prediction image and the prediction errors generated (reproduced) by the inverse quantization/inverse transformation unit 106. The image reproduction unit 107 then stores the reproduced image in the frame memory 108.
The in-loop filter unit 109 performs in-loop filter processing such as deblocking filter or sample adaptive offset for the image stored in the frame memory 108, and stores the image that has undergone the in-loop filter processing in the frame memory 108.
The encoding unit 110 performs, for each subblock, entropy-encoding of the quantized coefficients of the subblock generated by the transformation/quantization unit 105 and the prediction information of the subblock generated by the prediction unit 104, thereby generating encoded data. Note that the method of entropy encoding is not particularly designated, and Golomb coding, arithmetic encoding, Huffman coding, or the like can be used.
Encoding of the quantization matrix will be described next. The quantization matrix held by the holding unit 103 is generated in accordance with the size or prediction mode of the subblock to be encoded. For example, as shown in
The method of generating the quantization matrix according the size or prediction mode of the subblock is not limited to a specific generation method as described above, and the method of managing the quantization matrix in the holding unit 103 is not limited to a specific management method.
In this embodiment, the quantization matrix held by the holding unit 103 is held in a two-dimensional shape, as shown in
The encoding unit 113 reads out the quantization matrix (including at least the quantization matrix used by the transformation/quantization unit 105 for quantization) held by the holding unit 103, and encodes the readout quantization matrix. For example, the encoding unit 113 encodes a quantization matrix of interest by the following processing.
The encoding unit 113 refers to, in a predetermined order, the values of elements in the quantization matrix of interest that is a two-dimensional array, and generates a one-dimensional array in which difference values between the values of currently referred elements and the values of immediately precedingly referred elements are arranged. For example, if the quantization matrix shown in
In this case, since the value of the element referred to first is “8”, and there does not exist the value of the immediately precedingly referred element, the encoding unit 113 outputs, as an output value, a predetermined value or a value obtained by a certain method. For example, the encoding unit 113 may output the value “8” of the currently referred element as the output value, or output a value obtained by subtracting a predetermined value from the value “8” of the element as the output value, and the output value may not be a value decided by a specific method.
Since the value of the element referred to next is “11”, and the value of the immediately precedingly referred element is “8”, the encoding unit 113 outputs, as the output value, a difference value “+3” obtained by subtracting the value “8” of the immediately precedingly referred element from the value “11” of the currently referred element. In this way, the encoding unit 113 refers to the values of the elements in the quantization matrix in a predetermined order, obtains and outputs output values, and generates a one-dimensional array in which the output values are arranged in the output order.
The encoding unit 113 then encodes the one-dimensional array generated for the quantization matrix of interest. For example, the encoding unit 113 refers to an encoding table exemplified in
Referring back to
Encoding processing by the above-described image encoding apparatus will be described with reference to the flowchart of
Before the start of the processing according to the flowchart of
In step S302, the encoding unit 113 reads out the quantization matrix (including at least the quantization matrix used by the transformation/quantization unit 105 for quantization) held by the holding unit 103, and encodes the readout quantization matrix, thereby generating encoded data.
In step S303, the integrated encoding unit 111 generates “header information necessary for image encoding”. The integrated encoding unit 111 then integrates the “header information necessary for image encoding” with the encoded data generated by the encoding unit 113 in step S302, and generates header code data using the encoded data integrated with the header information.
In step S304, the division unit 102 divides an input image into a plurality of basic blocks, and outputs each divided basic block. The prediction unit 104 divides the basic block into a plurality of subblocks on a basic block basis.
In step S305, the prediction unit 104 selects one of unselected subblocks among subblocks of the input image as a selected subblock, and decides the prediction mode of the selected subblock. The prediction unit 104 performs prediction according to the decided prediction mode for the selected subblock, and acquires the prediction image, the prediction errors, and the prediction information of the selected subblock.
In step S306, the transformation/quantization unit 105 performs, for the prediction errors of the selected subblock acquired in step S305, orthogonal transformation processing corresponding to the size of the prediction errors, thereby generating orthogonal transform coefficients. The transformation/quantization unit 105 then acquires a quantization matrix corresponding to the prediction mode of the selected subblock among the quantization matrices held by the holding unit 103, and quantizes the orthogonal transform coefficients of the subblock using the acquired quantization matrix, thereby acquiring quantized coefficients.
In step S307, the inverse quantization/inverse transformation unit 106 performs inverse quantization for the quantized coefficients of the selected subblock acquired in step S306 using the quantization matrix used by the transformation/quantization unit 105 to quantize the selected subblock, thereby generating transform coefficients. The inverse quantization/inverse transformation unit 106 then performs inverse orthogonal transformation of the generated transform coefficients, thereby generating (reproducing) the prediction errors.
In step S308, the image reproduction unit 107 generates, based on the prediction information acquired in step S305, a prediction image from the image stored in the frame memory 108, and reproduces the image of the subblock by adding the prediction image and the prediction errors generated in step S307. The image reproduction unit 107 then stores the reproduced image in the frame memory 108.
In step S309, the encoding unit 110 performs entropy-encoding of the quantized coefficients acquired in step S306 and the prediction information acquired in step S305, thereby generating encoded data.
The integrated encoding unit 111 generates a bitstream by multiplexing the header code data generated in step S303 and the encoded data generated by the encoding unit 110 in step S309, and outputs the bitstream.
In step S310, the control unit 150 determines whether all the subblocks of the input image are selected as selected subblocks. As the result of the determination, if all the subblocks of the input image are selected as selected subblocks, the process advances to step S311. On the other hand, if at least one subblock that is not selected yet as a selected subblock remains among the subblocks of the input image, the process returns to step S305.
In step S311, the in-loop filter unit 109 performs in-loop filter processing for the image (the image of the selected subblock reproduced in step S308) stored in the frame memory 108. The in-loop filter unit 109 then stores the image that has undergone the in-loop filter processing in the frame memory 108.
With this processing, since the transform coefficients of the subblock that has undergone mixed intra-inter prediction can be quantized using the quantization matrix corresponding to the mixed intra-inter prediction, it is possible to control quantization for each frequency component and improve image quality.
In the first embodiment, quantization matrices are individually prepared for intra-prediction, inter-prediction, and mixed intra-inter prediction, and the quantization matrix corresponding to each prediction is encoded. However, some of these may be shared.
For example, to quantize the orthogonal transform coefficients of the prediction errors obtained based on mixed intra-inter prediction, not the quantization matrix corresponding to the mixed intra-inter prediction but the quantization matrix corresponding to intra-prediction may be used. That is, for example, to quantize the orthogonal transform coefficients of the prediction errors obtained based on mixed intra-inter prediction, not the quantization matrix for mixed intra-inter prediction shown in
In addition, to quantize the orthogonal transform coefficients of the prediction errors obtained based on mixed intra-inter prediction, not the quantization matrix corresponding to the mixed intra-inter prediction but the quantization matrix corresponding to inter-prediction may be used. That is, for example, to quantize the orthogonal transform coefficients of the prediction errors obtained based on mixed intra-inter prediction, not the quantization matrix for mixed intra-inter prediction shown in
Also, according to the sizes of the region of “prediction pixels obtained by intra-prediction” and the region of “prediction pixels obtained by inter-prediction” in the prediction image of the subblock for which mixed intra-inter prediction is executed, the quantization matrix to be used for the subblock may be decided.
For example, assume that the subblock 1200 is divided into the divided region 1200c and the divided region 1200d, as shown in
In this case, the size of the divided region 1200d to which inter-prediction is applied is larger than the size of the divided region 1200c to which intra-prediction is applied in the subblock 1200. Hence, the transformation/quantization unit 105 applies the quantization matrix corresponding to inter-prediction (for example, the quantization matrix shown in
Note that if the size of the divided region 1200d to which inter-prediction is applied is smaller than the size of the divided region 1200c to which intra-prediction is applied in the subblock 1200, the transformation/quantization unit 105 applies the quantization matrix corresponding to intra-prediction (for example, the quantization matrix shown in
This makes it possible to omit encoding of the quantization matrix corresponding to mixed intra-inter prediction while reducing image quality degradation of the divided region with a larger size. Hence, the amount of the encoded data of the quantization matrix included in the bitstream can be decreased.
Also, a quantization matrix obtained by combining “the quantization matrix corresponding to intra-prediction” and “the quantization matrix corresponding to the inter-prediction” in accordance with the ratio of S1 and S2 may be generated as the quantization matrix corresponding to mixed intra-inter prediction. For example, the transformation/quantization unit 105 may generate the quantization matrix corresponding to mixed intra-inter prediction using equation (1).
Here, QM[x][y] indicates the value (quantization step value) of the element at the coordinates (x, y) in the quantization matrix corresponding to mixed intra-inter prediction. QMinter[x][y] indicates the value (quantization step value) of the element at the coordinates (x, y) in the quantization matrix corresponding to inter-prediction. QMintra[x][y] indicates the value (quantization step value) of the element at the coordinates (x, y) in the quantization matrix corresponding to intra-prediction. Also, w has a value of not less than 0 and not more than 1, which indicates the ratio of the region where inter-prediction is used in the subblock, and w=S2/(S1+S2). Since the quantization matrix corresponding to mixed intra-inter prediction can be generated as needed and need not be created in advance, encoding of the quantization matrix can be omitted. Hence, the amount of the encoded data of the quantization matrix included in the bitstream can be decreased. It is also possible to perform appropriate quantization control according to the ratio of the sizes of the regions in which intra-prediction and inter-prediction are used and improve image quality.
Also, in the first embodiment, the quantization matrix to be applied to the subblock to which mixed intra-inter prediction is applied is uniquely decided. However, the quantization matrix may be selected by introducing an identifier.
Various methods are used to select the quantization matrix to be applied to the subblock to which mixed intra-inter prediction is applied from the quantization matrix corresponding to intra-prediction, the quantization matrix corresponding to inter-prediction, and the quantization matrix corresponding to mixed intra-inter prediction. For example, the control unit 150 may select the quantization matrix in accordance with a user operation.
An identifier for specifying the quantization matrix selected as the quantization matrix to be applied to the subblock to which mixed intra-inter prediction is applied is stored in the bitstream.
For example, in
This makes it possible to selectively implement a decrease of the amount of the encoded data of the quantization matrix included in the bitstream and unique quantization control for the subblock to which mixed intra-inter prediction is applied.
Also, in the first embodiment, a prediction image including prediction pixels (first prediction pixels) for one divided region obtained by dividing a subblock and prediction pixels (second prediction pixels) for the other divided region is generated. However, the prediction image generation method is not limited to this generation method. For example, to improve the image quality of a region (boundary region) near the boundary between one divided region and the other divided region, third prediction pixels calculated by weighted-averaging the first prediction pixels and the second prediction pixels included in the boundary region may be used as the prediction pixels of the boundary region. In this case, the prediction pixel values in a corresponding region corresponding to the one divided region in the prediction image are the first prediction pixels, and the prediction pixel values in a corresponding region corresponding to the other divided region in the prediction image are the second prediction pixels. The prediction pixel values in a corresponding region corresponding to the above-described boundary region in the prediction image are the third prediction pixels. This can suppress degradation of image quality in the boundary region between the divided regions in which different predictions are used, and improve image quality.
Also, in the first embodiment, three types of predictions including intra-prediction, inter-prediction, and mixed intra-inter prediction have been described as an example, but the types and number of predictions are not limited to this example. For example, combined inter-intra prediction (CIIP) employed in the VVC may be used. Combined inter-intra prediction is prediction that calculates pixels in an entire encoding target block by weighted-averaging prediction pixels by intra-prediction and prediction pixels by inter-prediction. In this case, a quantization matrix used for a subblock using mixed intra-inter prediction can be shared as a quantization matrix used for a subblock using combined inter-intra prediction. This makes it possible to apply, to a subblock for which prediction with a common feature that both the prediction pixels by intra-prediction and the prediction pixels by inter-prediction are used in the same subblock is used, quantization using a quantization matrix having the same quantization control characteristic. Furthermore, the code amount of the quantization matrix corresponding to the new prediction method can be also decreased.
Also, in the first embodiment, the encoding target is an input image. However, the encoding target is not limited to an image. For example, a two-dimensional data array that is feature amount data used in machine learning such as object recognition may be encoded, like an input image, and a bitstream may thus be generated and output. This can efficiently encode the feature amount data used in machine learning.
An image decoding apparatus according to this embodiment decodes quantized coefficients for a decoding target block from a bitstream, derives transform coefficients from the quantized coefficients using a quantization matrix, and performs inverse frequency transformation of the transform coefficients, thereby deriving prediction errors for the decoding target block. The image decoding apparatus then generates a prediction image by applying an intra-prediction image obtained by intra-prediction for a partial region in the decoding target block and applying an inter-prediction image obtained by inter-prediction for another region different from the partial region in the decoding target block, and decodes the decoding target block using the generated prediction image and the prediction errors.
In this embodiment, an image decoding apparatus that decodes a bitstream encoded by the image encoding apparatus according to the first embodiment will be described. An example of the functional configuration of the image decoding apparatus according to this embodiment will be described first with reference to the block diagram of
A control unit 250 controls the operation of the entire image decoding apparatus. A separation decoding unit 202 acquires a bitstream encoded by the image encoding apparatus according to the first embodiment. The bitstream acquisition form is not limited to a specific acquisition form. For example, the bitstream output from the image encoding apparatus according to the first embodiment may be acquired via a network, or may be acquired from a memory that temporarily stores the bitstream. The separation decoding unit 202 then separates information about decoding processing or encoded data concerning a coefficient from the acquired bitstream and decodes encoded data existing in the header portion of the bitstream. In this embodiment, the separation decoding unit 202 separates the encoded data of a quantization matrix from the bitstream and supplies the encoded data to a decoding unit 209. Also, the separation decoding unit 202 separates the encoded data of an input image from the bitstream and supplies the encoded data to a decoding unit 203. That is, the separation decoding unit 202 performs an operation reverse to that of the integrated encoding unit 111 shown in
The decoding unit 209 decodes the encoded data supplied from the separation decoding unit 202, thereby reproducing a quantization matrix. The decoding unit 203 decodes the encoded data supplied from the separation decoding unit 202, thereby reproducing quantized coefficients and prediction information.
An inverse quantization/inverse transformation unit 204 performs the same operation as the inverse quantization/inverse transformation unit 106 provided in the image encoding apparatus according to the first embodiment. The inverse quantization/inverse transformation unit 204 selects one of the quantization matrices decoded by the decoding unit 209, and inversely quantizes the quantized coefficients using the selected quantization matrix, thereby reproducing transform coefficients. The inverse quantization/inverse transformation unit 204 performs inverse orthogonal transformation for the reproduced transform coefficients, thereby reproducing prediction errors.
An image reproduction unit 205 refers to an image stored in a frame memory 206 based on the prediction information decoded by the decoding unit 203, thereby generating a prediction image. The image reproduction unit 205 then generates a reproduced image by adding the prediction errors obtained by the inverse quantization/inverse transformation unit 204 to the generated prediction image, and stores the generated reproduced image in the frame memory 206.
An in-loop filter unit 207 performs in-loop filter processing such as deblocking filter or sample adaptive offset for the reproduced image stored in the frame memory 206. The reproduced image stored in the frame memory 206 is appropriately output by the control unit 250. The output destination of the reproduced image is not limited to a specific output destination. For example, the reproduced image may be displayed on a display screen of a display device such as a display, or the reproduced image may be output to a projection apparatus such as a projector.
The operation (bitstream decoding processing) of the image decoding apparatus having the above-described configuration will be described next. The separation decoding unit 202 acquires a bitstream generated by the image encoding apparatus, separates information about decoding processing or encoded data concerning a coefficient from the bitstream, and decodes encoded data existing in the header of the bitstream. The separation decoding unit 202 extracts the encoded data of a quantization matrix from the sequence header of the bitstream shown in
The decoding unit 209 decodes the encoded data of the quantization matrix supplied from the separation decoding unit 202, thereby reproducing a one-dimensional array. More specifically, the decoding unit 209 refers to an encoding table exemplified in
Furthermore, the decoding unit 209 reproduces each element value of the quantization matrix from each difference value of the reproduced one-dimensional array. Processing reverse to the processing performed by the encoding unit 113 to generate a one-dimensional array from a quantization matrix is performed. That is, the value of the element at the start of the one-dimensional array is the element value at the upper left corner of the quantization matrix. A value obtained by adding the value of the element at the start of the one-dimensional array to the value of the second element from the start of the one-dimensional array is the second element value in the above-described “predetermined order”. A value obtained by adding the value of the (n−1)th (2<n≤N: N is the number of elements of the one-dimensional array) element from the start of the one-dimensional array to the value of the nth element from the start of the one-dimensional array is the nth element value in the above-described “predetermined order”. For example, the decoding unit 209 reproduces the quantization matrices shown in
The decoding unit 203 decodes the encoded data of the input image supplied from the separation decoding unit 202, thereby decoding quantized coefficients and prediction information.
The inverse quantization/inverse transformation unit 204 specifies “the prediction mode corresponding to the quantized coefficients to be decoded” included in the prediction information decoded by the decoding unit 203, and selects the quantization matrix corresponding to the specified prediction mode among the quantization matrices reproduced by the decoding unit 209. The inverse quantization/inverse transformation unit 204 then inversely quantizes the quantized coefficients using the selected quantization matrix, thereby reproducing transform coefficients. The inverse quantization/inverse transformation unit 204 reproduces the prediction errors by performing inverse orthogonal transformation for the reproduced transform coefficients, and supplies the reproduced prediction errors to the image reproduction unit 205.
The image reproduction unit 205 refers to an image stored in the frame memory 206 based on the prediction information decoded by the decoding unit 203, thereby generating a prediction image. In this embodiment, three types of predictions including intra-prediction, inter-prediction, and mixed intra-inter prediction are used, like the prediction unit 104 according to the first embodiment. Detailed prediction processing is the same as that of the prediction unit 104 described in the first embodiment, and a description thereof will be omitted. The image reproduction unit 205 then generates a reproduced image by adding the prediction errors obtained by the inverse quantization/inverse transformation unit 204 to the generated prediction image, and stores the generated reproduced image in the frame memory 206. The reproduced image stored in the frame memory 206 is a prediction reference candidate to be referred to when decoding another subblock.
The in-loop filter unit 207 operates like the above-described in-loop filter unit 109 and performs in-loop filter processing such as deblocking filter or sample adaptive offset for the reproduced image stored in the frame memory 206. The reproduced image stored in the frame memory 206 is appropriately output by the control unit 250.
Decoding processing of the image decoding apparatus according to this embodiment will be described with reference to the flowchart of
In step S402, the decoding unit 209 decodes the encoded data supplied from the separation decoding unit 202, thereby reproducing the quantization matrix. In step S403, the decoding unit 203 decodes the encoded data supplied from the separation decoding unit 202, thereby reproducing the quantized coefficients of a decoding target subblock and prediction information.
In step S404, the inverse quantization/inverse transformation unit 204 specifies “the prediction mode corresponding to the quantized coefficients of the decoding target subblock” included in the prediction information decoded by the decoding unit 203. The inverse quantization/inverse transformation unit 204 selects the quantization matrix corresponding to the specified prediction mode among the quantization matrices reproduced by the decoding unit 209. For example, if the prediction mode specified for the decoding target subblock is intra-prediction, among the quantization matrices shown in
In step S405, the image reproduction unit 205 refers to an image stored in a frame memory 206 based on the prediction information decoded by the decoding unit 203, thereby generating the prediction image of the decoding target subblock. The image reproduction unit 205 then generates the reproduced image of the decoding target subblock by adding the prediction errors of the decoding target subblock obtained by the inverse quantization/inverse transformation unit 204 to the generated prediction image, and stores the generated reproduced image in the frame memory 206.
In step S406, the control unit 250 determines whether the processes of steps S403 to S405 are performed for all subblocks. As the result of the determination, if the processes of steps S403 to S405 are performed for all subblocks, the process advances to step S407. On the other hand, if a subblock for which the processes of steps S403 to S405 are not performed still remains, the process returns to step S403 to perform the processes of steps S403 to S405 for the subblock.
In step S407, the in-loop filter unit 207 performs in-loop filter processing such as deblocking filter or sample adaptive offset for the reproduced image generated and stored in the frame memory 206 in step S405.
With this processing, even for a subblock using mixed intra-inter prediction, which is generated in the first embodiment, it is possible to control quantization for each frequency component and decode a bitstream with improved image quality.
In the second embodiment, quantization matrices are individually prepared for intra-prediction, inter-prediction, and mixed intra-inter prediction, and the quantization matrix corresponding to each prediction is decoded. However, some of these may be shared.
For example, to inversely quantize the quantized coefficients of the orthogonal transform coefficients of the prediction errors obtained based on mixed intra-inter prediction, not the quantization matrix corresponding to the mixed intra-inter prediction but the quantization matrix corresponding to intra-prediction may be decoded and used. That is, for example, to inversely quantize the quantized coefficients of the orthogonal transform coefficients of the prediction errors obtained based on mixed intra-inter prediction, the quantization matrix for intra-prediction shown in
In addition, to inversely quantize the quantized coefficients of the orthogonal transform coefficients of the prediction errors obtained based on mixed intra-inter prediction, not the quantization matrix corresponding to the mixed intra-inter prediction but the quantization matrix corresponding to inter-prediction may be decoded and used. That is, for example, to inversely quantize the quantized coefficients of the orthogonal transform coefficients of the prediction errors obtained based on mixed intra-inter prediction, the quantization matrix for inter-prediction shown in
Also, according to the sizes of the region of “prediction pixels obtained by intra-prediction” and the region of “prediction pixels obtained by inter-prediction” in the prediction image of the subblock for which mixed intra-inter prediction is executed, the quantization matrix to be used for inverse quantization of the subblock may be decided.
For example, assume that a subblock 1200 is divided into a divided region 1200c and a divided region 1200d, as shown in
In this case, the size of the divided region 1200d to which inter-prediction is applied is larger than the size of the divided region 1200c to which intra-prediction is applied in the subblock 1200. Hence, the inverse quantization/inverse transformation unit 204 applies the quantization matrix corresponding to inter-prediction to inversely quantize the quantized coefficients of the subblock 1200.
Note that if the size of the divided region 1200d to which inter-prediction is applied is smaller than the size of the divided region 1200c to which intra-prediction is applied in the subblock 1200, the inverse quantization/inverse transformation unit 204 applies the quantization matrix corresponding to intra-prediction to inversely quantize the quantized coefficients of the subblock 1200.
This makes it possible to omit decoding of the quantization matrix corresponding to mixed intra-inter prediction while reducing image quality degradation of the divided region with a larger size. This makes it possible to decode a bitstream in which the amount of the encoded data of the quantization matrix included in the bitstream is decreased.
Also, a quantization matrix obtained by combining “the quantization matrix corresponding to intra-prediction” and “the quantization matrix corresponding to the inter-prediction” in accordance with the ratio of S1 and S2 may be generated as the quantization matrix corresponding to mixed intra-inter prediction. For example, the inverse quantization/inverse transformation unit 204 may generate the quantization matrix corresponding to mixed intra-inter prediction using equation (1) described above.
Since the quantization matrix corresponding to mixed intra-inter prediction can be generated as needed, encoding of the quantization matrix can be omitted. This makes it possible to decode a bitstream in which the amount of the encoded data of the quantization matrix included in the bitstream is decreased. It is also possible to perform appropriate quantization control according to the ratio of the sizes of the regions in which intra-prediction and inter-prediction are used and decode a bitstream with improved image quality.
Also, in the second embodiment, the quantization matrix to be applied to the subblock to which mixed intra-inter prediction is applied is uniquely decided. However, the quantization matrix may be selected by introducing an identifier, as in the first embodiment. It is therefore possible to decode a bitstream in which a decrease of the amount of the encoded data of the quantization matrix included in the bitstream and unique quantization control for the subblock to which mixed intra-inter prediction is applied are selectively implemented.
Also, in the second embodiment, a prediction image including prediction pixels (first prediction pixels) for one divided region obtained by dividing a subblock and prediction pixels (second prediction pixels) for the other divided region is decoded. However, the prediction image to be decoded is not limited to the prediction image. For example, as in the modification of the first embodiment, a prediction image in which third prediction pixels calculated by weighted-averaging the first prediction pixels and the second prediction pixels included in a region (boundary region) near the boundary between the one divided region and the other divided region are used as the prediction pixels of the boundary region may be generated. In this case, in the prediction image to be decoded, as in the first embodiment, the prediction pixel values in a corresponding region corresponding to the one divided region in the prediction image are the first prediction pixels, and the prediction pixel values in a corresponding region corresponding to the other divided region in the prediction image are the second prediction pixels. In the prediction image, the prediction pixel values in a corresponding region corresponding to the above-described boundary region in the prediction image are the third prediction pixels. This can suppress degradation of image quality in the boundary region between the divided regions in which different predictions are used, and decode a bitstream with improved image quality.
Also, in the second embodiment, three types of predictions including intra-prediction, inter-prediction, and mixed intra-inter prediction have been described as an example, but the types and number of predictions are not limited to this example. For example, combined inter-intra prediction (CIIP) employed in the VVC may be used. In this case, a quantization matrix used for a subblock using mixed intra-inter prediction can be shared as a quantization matrix used for a subblock using combined inter-intra prediction. This makes it possible to decode a bitstream in which to a subblock for which a prediction method with a common feature that both the prediction pixels by intra-prediction and the prediction pixels by inter-prediction are used in the same subblock is used, quantization using a quantization matrix having the same quantization control characteristic is applied. Furthermore, it is also possible to decode a bitstream in which the code amount of the quantization matrix corresponding to the new prediction method is decreased.
Also, in the second embodiment, an input image that is the encoding target is decoded from a bitstream. However, the decoding target is not limited to an image. For example, a two-dimensional data array may be decoded from a bitstream including encoded data obtained by encoding, like the input image, the two-dimensional array that is feature amount data used in machine learning such as object recognition. This makes it possible to decode a bitstream in which the feature amount data used in machine learning is efficiently encoded.
The function units shown in
In the former case, the hardware may be a circuit incorporated in an apparatus that performs encoding or decoding of an image, such as an image capturing apparatus, or may be a circuit incorporated in an apparatus that performs encoding or decoding of an image supplied from an external apparatus such as an image capturing apparatus or a server apparatus.
In the latter case, the computer program may be stored in the memory of an apparatus that performs encoding or decoding of an image, such as an image capturing apparatus, a memory accessible from an apparatus that performs encoding or decoding of an image supplied from an external apparatus such as an image capturing apparatus or a server apparatus, or the like. An apparatus (computer apparatus) capable of reading out the computer program from the memory and executing it can be applied to the above-described image encoding apparatus or the above-described image decoding apparatus. An example of the hardware configuration of the computer apparatus will be described with reference to the block diagram of
A CPU 501 executes various kinds of processing using computer programs and data stored in a RAM 502 or a ROM 503. Thus, the CPU 501 controls the operation of the entire computer apparatus, and executes or controls various kinds of processing described as processing executed by the image encoding apparatus or the image decoding apparatus in the above-described embodiments and modifications.
The RAM 502 has an area configured to store computer programs and data loaded from an external storage device 506, and an area configured to store data acquired from the outside via an I/F (interface) 507. The RAM 502 further has a work area (a frame memory or the like) used by the CPU 501 when executing various kinds of processing. The RAM 502 can thus appropriately provide various kinds of areas.
The ROM 503 stores setting data of the computer apparatus, computer programs and data associated with activation of the computer apparatus, computer programs and data associated with the basic operation of the computer apparatus, and the like.
An operation unit 504 is a user interface such as a keyboard, a mouse, or a touch panel, and a user can input various kinds of instructions to the CPU 501 by operating the operation unit 504.
A display unit 505 includes a liquid crystal screen or a touch panel screen, and displays a processing result by the CPU 501 as an image, characters, or the like. Note that the display unit 505 may be a projection device such as a projector that projects an image or characters.
The external storage device 506 is a mass information storage device such as a hard disk drive device. In the external storage device 506, an OS (Operating System), computer programs and data used to cause the CPU 501 to execute the above-described various kinds of processing described as processing performed by the image encoding apparatus or the image decoding apparatus, and the like are stored. Information (the encoding table and the like) handled as known information in the above description is also stored in the external storage device 506. Encoding target data (such as an input image or a two-dimensional data array) may be stored in the external storage device 506.
The computer programs and data stored in the external storage device 506 are appropriately loaded into the RAM 502 in accordance with the control of the CPU 501 and processed by the CPU 501. Note that the above-described holding unit 103 and the frame memories 108 and 206 can be implemented using the RAM 502, the ROM 503, the external storage device 506, or the like.
A network such as a LAN or the Internet, or another device such as a projection device or a display device can be connected to an I/F 507, and the computer apparatus can acquire or send various kinds of information via the I/F 507.
All the CPU 501, the RAM 502, the ROM 503, the operation unit 504, the display unit 505, the external storage device 506, and the I/F 507 are connected to a system bus 508.
In the above-described configuration, when the computer apparatus is powered on, the CPU 501 executes a boot program stored in the ROM 503, loads the OS stored in the external storage device 506 into the RAM 502, and activates the OS. As a result, the computer apparatus can perform communication via the I/F 507. Under the control of the OS, the CPU 501 loads an application associated with encoding from the external storage device 506 into the RAM 502 and executes it, thereby functioning as the function units (except the holding unit 103 and the frame memory 108) shown in
Note that in this embodiment, description in which the computer apparatus having the configuration shown in
The numerical values, processing timings, processing orders, the main constituent of processing, the transmission destinations/transmission sources/storage locations of data (information), and the like used in the above-described embodiments and modifications are merely examples used to make a detailed description, and it is not intended to limit to these examples.
Some or all of the above-described embodiments or modifications may appropriately be combined and used. Also, some or all of the above-described embodiments or modifications may selectively be used.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
According to the present invention, it is possible to provide a technique for performing quantization using an appropriate quantization matrix for mixed intra-inter prediction.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
Number | Date | Country | Kind |
---|---|---|---|
2022-046034 | Mar 2022 | JP | national |
This application is a Continuation of International Patent Application No. PCT/JP2022/047079, filed Dec. 21, 2022, which claims the benefit of Japanese Patent Application No. 2022-046034, filed Mar. 22, 2022, both of which are hereby incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2022/047079 | Dec 2022 | WO |
Child | 18810814 | US |