The present invention relates to a video encoding device for and a video encoding method of compression-encoding and transmitting an image, and a video decoding device for and a video decoding method of decoding encoded data transmitted thereto from a video encoding device into an image.
Conventionally, according to international standard video encoding methods, such as MPEG (Moving Picture Experts Group) and “ITU-T H.26x,” an inputted video frame is partitioned into square blocks which are called macroblocks, and an intra-frame prediction, an inter-frame prediction, an orthogonal transformation of a prediction error signal, quantization, an entropy encoding process, and so on are carried out on each of the macroblocks. Further, after the processes on all the macroblocks are completed and one screenful of local decoded image is generated, a process of deriving loop filter parameters, an entropy encoding process, and a process of filtering the local decoded image based on the driven parameters are carried out.
The encoding process of encoding each macroblock is based on the premise that macroblocks are processed in a raster scan order, and in the encoding process on a certain macroblock, the encoded result of a previously-processed macroblock is needed in the raster scan order. Concretely, when carrying out an inter-frame prediction, a reference to a pixel from a local decoded image of an adjacent macroblock is made. Further, in the entropy encoding process, a probability switching model used for the estimation of the occurrence probability of a symbol is shared with the previously-processed macroblock in the raster scan order, and it is necessary to refer to the mode information of an adjacent macroblock for switching between probability models.
Therefore, in order to advance the encoding process on a certain macroblock, a part or all of the process on the previously-processed macroblock has to be completed in the raster scan order. This interdependence between macroblocks is an obstacle to the parallelization of the encoding process and a decoding process. In order to solve the above-mentioned problem, nonpatent reference 1 discloses a technique of partitioning an inputted image (picture) into a plurality of rectangular regions (tiles), processing each macroblock within each tile in a raster scan order, and making it possible to carry out an encoding process or a decoding process in parallel on a per tile basis by eliminating the independence between macroblocks respectively belonging to different tiles. Each tile consists of a plurality of macroblocks, and the size of each tile can be defined by only an integral multiple of a macroblock size.
Because the conventional video encoding device is constructed as above, the size of each tile at the time of partitioning a picture into a plurality of tiles (rectangular regions) is limited to an integral multiple of a macroblock size. A problem is therefore that when the size of a picture is not a preset integral multiple of a macroblock size, the picture cannot be partitioned into equal tiles, and the load of the encoding process on each tile differs depending upon the size of the tile and therefore the efficiency of parallelization drops. A further problem is that when an image specified by an integral multiple of a pixel number (1920 pixels×1080 pixels) defined for HDTV (High Definition Television), e.g., 3840 pixels×2160 pixels or 7680 pixels×4320 pixels, is encoded, the encoding cannot be implemented while the image is partitioned into tiles each having the HDTV size, depending upon the preset macroblock size, and therefore an input interface and equipment for use in HDTV in this device cannot be utilized.
The present invention is made in order to solve the above-mentioned problems, and it is therefore an object of the present invention to provide a video encoding device and a video encoding method capable of utilizing an input interface, equipment, etc. for use in HDTV in the above-mentioned device when the size of an inputted image is an integral multiple of the pixel number defined for HDTV. It is another object of the present invention is to provide a video encoding device and a video encoding method capable of implementing a parallel encoding process without dropping the efficiency of parallelization even when the size of an inputted image is not an integral multiple of a macroblock size. It is a further object of the present invention is to provide a video decoding device and a video decoding method that can be applied to the above-mentioned video encoding device and the above-mentioned video encoding method respectively.
In accordance with the present invention, there is provided a video encoding device including: a tile partitioner partitioning an inputted image into tiles each of which is a rectangular region having a specified size and outputting the tiles; an encoding controller determining an upper limit on a number of hierarchical layers when a coding block which is a unit to be processed at a time when a prediction process is carried out is hierarchically partitioned, and also determining a coding mode for determining an encoding method for each coding block; a block partitioner partitioning a tile outputted from the tile partitioner into coding blocks each having a predetermined size and also partitioning each of the coding blocks hierarchically until the number of hierarchical layers reaches the upper limit on the number of hierarchical layers which is determined by the encoding controller; a prediction image generator carrying out a prediction process on a coding block obtained through the partitioning by the block partitioner to generate a prediction image in the coding mode determined by the encoding controller; and an image compressor compressing a difference image between the coding block obtained through the partitioning by the block partitioner and the prediction image generated by the prediction image generator, and outputting compressed data about the difference image, in which a variable length encoder variable-length-encodes the compressed data, which are outputted from the image compressor, and the coding mode determined by the encoding controller and also variable-length-encodes tile information showing a size of each of the tiles outputted from the tile partitioner and a position of each of the tiles in the inputted image to generate a bitstream into which encoded data about the compressed data, encoded data about the coding mode, and encoded data about the tile information are multiplexed.
According to the present invention, because the video encoding device includes: the tile partitioner partitioning an inputted image into tiles each of which is a rectangular region having a specified size and outputting the tiles; the encoding controller determining an upper limit on a number of hierarchical layers when a coding block which is a unit to be processed at a time when a prediction process is carried out is hierarchically partitioned, and also determining a coding mode for determining an encoding method for each coding block; the block partitioner partitioning a tile outputted from the tile partitioner into coding blocks each having a predetermined size and also partitioning each of the coding blocks hierarchically until the number of hierarchical layers reaches the upper limit on the number of hierarchical layers which is determined by the encoding controller; the prediction image generator carrying out a prediction process on a coding block obtained through the partitioning by the block partitioner to generate a prediction image in the coding mode determined by the encoding controller; and the image compressor compressing a difference image between the coding block obtained through the partitioning by the block partitioner and the prediction image generated by the prediction image generator, and outputting compressed data about the difference image, and the variable length encoder is constructed in such a way as to variable-length-encode the compressed data, which are outputted from the image compressor, and the coding mode determined by the encoding controller and also variable-length-encode the tile information showing the size of each of the tiles outputted from the tile partitioner and the position of each of the tiles in the inputted image to generate a bitstream into which encoded data about the compressed data, encoded data about the coding mode, and encoded data about the tile information are multiplexed, there is provided an advantage of being able to utilize an input interface, equipment, etc. for use in HDTV in the above-mentioned device when the size of the inputted image is an integral multiple of a pixel number defined for HDTV.
a) shows a distribution of coding target blocks and prediction blocks obtained through partitioning, and
b) is an explanatory drawing showing a situation in which a coding mode m(Bn) is assigned through hierarchical partitioning;
Hereafter, the preferred embodiments of the present invention will be explained in detail with reference to the drawings.
The encoding controlling unit 2 has a function of accepting a setting of the tile size, and carries out a process of calculating the position of each tile in the inputted image on the basis of the size of the tile for which the encoding controller accepts a setting. The encoding controlling unit 2 further carries out a process of determining both the size of each coding target block (coding block) which is a unit to be processed at a time when a prediction process is carried out, and an upper limit on the number of hierarchical layers at a time when each coding target block is partitioned hierarchically, and also determining a coding mode having the highest coding efficiency for a coding target block outputted from a block partitioning unit 10 of the partition video encoding unit 3 from among one or more selectable intra coding modes and one or more selectable inter coding modes. The encoding controlling unit 2 also carries out a process of, when the coding mode with the highest coding efficiency is an intra coding mode, determining an intra prediction parameter which the video encoding device uses when carrying out an intra prediction process on the coding target block in the intra coding mode, and, when the coding mode with the highest coding efficiency is an inter coding mode, determining an inter prediction parameter which the video encoding device uses when carrying out an inter prediction process on the coding target block in the inter coding mode. The encoding controlling unit 2 further carries out a process of determining a prediction difference coding parameter to be provided for a transformation/quantization unit 15 and an inverse quantization/inverse transformation unit 16 of the partition video encoding unit 3. The encoding controlling unit 2 constructs an encoding controller.
The partition video encoding unit 3 carries out a process of, every time when receiving a tile from the tile partitioning unit 1, partitioning this tile into blocks (coding target blocks) each having the size determined by the encoding controlling unit 2, and performing a prediction process on each of the coding target blocks to generate a prediction image in the coding mode determined by the encoding controlling unit 2. The partition video encoding unit 3 also carries out a process of performing an orthogonal transformation process and a quantization process on a difference image between each of the coding target blocks and the prediction image to generate compressed data and outputting the compressed data to a variable length encoding unit 7, and also performing an inverse quantization process and an inverse orthogonal transformation process on the compressed data to generate a local decoded image and storing the local decoded image in an image memory 4. When storing the local decoded image in the image memory 4, the partition video encoding unit stores the local decoded image at an address, in the image memory 4, corresponding to the position of the tile calculated by the encoding controlling unit 2.
The image memory 4 is a recording medium for storing the local decoded image generated by the partition video encoding unit 3. When the encoding on all the tiles in the picture is completed and the one picture of local decoded image is written in the image memory 4, a loop filter unit 5 carries out a process of performing a predetermined filtering process on the one picture of local decoded image, and outputting the local decoded image on which the loop filter unit performs the filtering process. A motion-compensated prediction frame memory 6 is a recording medium for storing the local decoded image on which the loop filter unit 5 performs the filtering process.
The variable length encoding unit 7 carries out a process of variable-length-encoding tile information outputted from the encoding controlling unit 2 and showing the rectangular region size of each tile and the position of each tile in the picture, coding parameters of each coding target block outputted from the encoding controlling unit 2 (a coding mode, an intra prediction parameter or an inter prediction parameter, and a prediction difference coding parameter), and encoded data about each coding target block outputted from the partition video encoding unit 3 (compressed data and motion information (when the coding mode is an inter coding mode)) to generate a bitstream into which the results of encoding those data are multiplexed. The variable length encoding unit 7 also carries out a process of variable-length-encoding a confirmation flag for partitioning showing whether the tile partitioning unit 1 partitions the picture into tiles to generate a bitstream into which the result of encoding the confirmation flag for partitioning is multiplexed. However, because it is not necessary to transmit the confirmation flag for partitioning to a video decoding device when the tile partitioning unit 1 partitions each picture into tiles at all times, the variable length encoding unit does not variable-length-encode the confirmation flag for partitioning. The variable length encoding unit 7 includes a motion vector variable length encoding unit 7a that variable-length-encodes a motion vector outputted from a motion-compensated prediction unit 13 of the partition video encoding unit 3 therein. The variable length encoding unit 7 constructs a variable length encoder.
The block partitioning unit 10 carries out a process of, every time when receiving a tile from the tile partitioning unit 1, partitioning this tile into coding target blocks each having the size determined by the encoding controlling unit 2, and outputting each of the coding target blocks. More specifically, the block partitioning unit 10 carries out a process of partitioning a tile outputted from the tile partitioning unit 1 into largest coding blocks each of which is a coding target block having the largest size determined by the encoding controlling unit 2, and also partitioning each of the largest coding blocks into blocks hierarchically until the number of hierarchical layers reaches the upper limit on the number of hierarchical layers which is determined by the encoding controlling unit 2. The block partitioning unit 10 constructs a block partitioner.
A select switch 11 carries out a process of, when the coding mode determined by the encoding controlling unit 2 is an intra coding mode, outputting the coding target block outputted from the block partitioning unit 10 to an intra prediction unit 12, and, when the coding mode determined by the encoding controlling unit 2 is an inter coding mode, outputting the coding target block outputted from the block partitioning unit 10 to a motion-compensated prediction unit 13.
The intra prediction unit 12 carries out a process of performing an intra prediction process on the coding target block outputted from the select switch 11 by using the intra prediction parameter determined by the encoding controlling unit 2 while referring to a local decoded image stored in a memory 18 for intra prediction to generate an intra prediction image (prediction image). The motion-compensated prediction unit 13 carries out a process of comparing the coding target block outputted from the select switch 11 with the local decoded image which is stored in the motion-compensated prediction frame memory 6 and on which a filtering process is carried out to search for a motion vector, and performing an inter prediction process (motion-compensated prediction process) on the coding target block by using both the motion vector and the inter prediction parameter determined by the encoding controlling unit 2 to generate an inter prediction image (prediction image). A prediction image generator is comprised of the intra prediction unit 12 and the motion-compensated prediction unit 13.
A subtracting unit 14 carries out a process of subtracting the intra prediction image generated by the intra prediction unit 12 or the inter prediction image generated by the motion-compensated prediction unit 13 from the coding target block outputted from the block partitioning unit 10, and outputting a prediction difference signal showing a difference image which is the result of the subtraction to the transformation/quantization unit 15. The transformation/quantization unit 15 carries out a process of performing an orthogonal transformation process (e.g., a DCT (discrete cosine transform) or an orthogonal transformation process, such as a KL transform, in which bases are designed for a specific learning sequence in advance) on the prediction difference signal outputted from the subtracting unit 14 by referring to the prediction difference coding parameter determined by the encoding controlling unit 2 to calculate transform coefficients, and also quantizing the transform coefficients by referring to the prediction difference coding parameter and then outputting compressed data which are the transform coefficients quantized thereby (quantization coefficients of the difference image) to the inverse quantization/inverse transformation unit 16 and the variable length encoding unit 7. An image compressor is comprised of the subtracting unit 14 and the transformation/quantization unit 15.
The inverse quantization/inverse transformation unit 16 carries out a process of inverse-quantizing the compressed data outputted from the transformation/quantization unit 15 by referring to the prediction difference coding parameter determined by the encoding controlling unit 2, and also performing an inverse orthogonal transformation process on the transform coefficients which are the compressed data inverse-quantized thereby by referring to the prediction difference coding parameter to calculate a local decoded prediction difference signal corresponding to the prediction difference signal outputted from the subtracting unit 14. An adding unit 17 carries out a process of adding the image shown by the local decoded prediction difference signal calculated by the inverse quantization/inverse transformation unit 16 and the intra prediction image generated by the intra prediction unit 12 or the inter prediction image generated by the motion-compensated prediction unit 13 to calculate a local decoded image corresponding to the coding target block outputted from the block partitioning unit 10. The memory 18 for intra prediction is a recording medium for storing the local decoded image calculated by the adding unit 17.
A motion vector predicted vector determining unit 22 carries out a process of determining a predicted vector candidate which is the nearest to the motion vector of the coding target block as a predicted vector from among the one or more predicted vector candidates calculated by the motion vector predicted vector candidate calculating unit 21, and outputting the predicted vector to a motion vector difference calculating unit 23, and also outputting an index (predicted vector index) showing the predicted vector to an entropy encoding unit 24.
The motion vector difference calculating unit 23 carries out a process of calculating a difference vector between the predicted vector outputted from the motion vector predicted vector determining unit 22 and the motion vector of the coding target block. The entropy encoding unit 24 carries out a process of performing variable length encoding, such as arithmetic coding, on the difference vector calculated by the motion vector difference calculating unit 23 and the predicted vector index outputted from the motion vector predicted vector determining unit 22 to generate a motion vector information code word, and outputting the motion vector information code word.
In the example shown in
A partition video decoding unit 31 carries out a process of performing a decoding process on a per tile basis to generate a decoded image on the basis of the compressed data, the coding mode, the intra prediction parameter or the inter prediction parameter and the motion vector, and the prediction difference coding parameter, which are variable-length-decoded on a per tile basis by the variable length decoding unit 30, and storing the decoded image in an image memory 32. When storing the decoded image in the image memory 32, the partition video decoding unit stores the decoded image at an address, in the image memory 32, corresponding to the position of the tile currently being processed, the position being indicated by the tile information. The image memory 32 is a recording medium for storing the decoded image generated by the partition video decoding unit 31. The image memory 32 constructs a decoded image storage.
A loop filter unit 33 carries out a process of, when the encoding on all the tiles in the picture is completed and the one picture of decoded image is written in the image memory 32, performing a predetermined filtering process on the one picture of decoded image, and outputting the decoded image on which the loop filter unit performs the filtering process. A motion-compensated prediction frame memory 34 is a recording medium for storing the decoded image on which the loop filter unit 33 performs the filtering process.
The intra prediction unit 42 carries out a process of performing an intra prediction process on a decoding target block (block corresponding to a “coding target block” in the video encoding device shown in
An inverse quantization/inverse transformation unit 44 carries out a process of inverse-quantizing the compressed data variable-length-decoded by the variable length decoding unit 30 by referring to the prediction difference coding parameter variable-length-decoded by the variable length decoding unit 30, and also performing an inverse orthogonal transformation process on transform coefficients which are the compressed data inverse-quantized thereby by referring the prediction difference coding parameter to calculate a decoded prediction difference signal. An adding unit 45 carries out a process of adding an image shown by the decoded prediction difference signal calculated by the inverse quantization/inverse transformation unit 44 and the intra prediction image generated by the intra prediction unit 42 or the inter prediction image generated by the motion compensation unit 43 to calculate a decoded image of the decoding target block. A decoded image generator is comprised of the inverse quantization/inverse transformation unit 44 and the adding unit 45. The memory 46 for intra prediction is a recording medium for storing the decoded image calculated by the adding unit 45.
A motion vector predicted vector determining unit 53 carries out a process of selecting the predicted vector candidate shown by the predicted vector index variable-length-decoded by the entropy decoding unit 51 from the one or more predicted vector candidates calculated by the motion vector predicted vector candidate calculating unit 52, and outputting the predicted vector candidate as a predicted vector. A motion vector calculating unit 54 carries out a process of adding the predicted vector outputted from the motion vector predicted vector determining unit 53 and the difference vector variable-length-decoded by the entropy decoding unit 51 to calculate a motion vector of the decoding target block.
In the example shown in
Next, operations will be explained. In this Embodiment 1, an example in which the video encoding device receives each frame image (picture) of a video as an inputted image, partitions the picture into one or more tiles each of which is a rectangular region, carries out a motion-compensated prediction and so on between adjacent frames on a per tile basis, and performs a compression process with an orthogonal transformation and quantization on an acquired prediction difference signal, and, after that, carries out variable length encoding to generate a bitstream, and the video decoding device decodes the bitstream outputted from the video encoding device will be explained.
The video encoding device shown in
The partition video encoding unit 3 of the video encoding device shown in
Although in the encoding process a process of generating a prediction difference signal having small signal power and small entropy by using a temporal and spatial prediction, thereby reducing the whole code amount, is carried out, the code amount of a parameter used for the prediction can be reduced as long as the parameter can be applied uniformly to as large an image signal region as possible. On the other hand, because the amount of errors occurring in the prediction increases when the same prediction parameter is applied to a large image area in an image signal pattern having a large change in time and space, the code amount of the prediction difference signal increases. Therefore, it is desirable to apply the same prediction parameter to an image area having a large change in time and space to reduce the block size of a block which is subjected to the prediction process, thereby increasing the data volume of the parameter which is used for the prediction and reducing the electric power and entropy of the prediction difference signal.
The video encoding device in accordance with this Embodiment 1 is constructed in such a way as to, in order to carry out encoding adapted for these typical characteristics of a video signal, hierarchically partition each tile which is an image obtained through the partitioning, and adapt a prediction process and an encoding process on a prediction difference for each region obtained through the partitioning. The video encoding device is further constructed in such a way as to, in consideration of the continuity within the picture of each region obtained through the partitioning, be able to refer to information to be referred to in a temporal direction (e.g., a motion vector) over a boundary between regions obtained through the partitioning and throughout the whole of a reference picture.
A video signal having a format which is to be processed by the video encoding device shown in
In the following explanation, for convenience' sake, the video signal of the inputted image is a YUV signal unless otherwise specified. Further, a case in which signals having a 4:2:0 format which are subsampled are handled as the two color difference components U and V with respect to the luminance component Y will be described. Further, a data unit to be processed which corresponds to each frame of the video signal is referred to as a “picture.” In this Embodiment 1, although an explanation will be made in which a “picture” is a video frame signal on which progressive scanning is carried out, a “picture” can be alternatively a field image signal which is a unit which constructs a video frame when the video signal is an interlaced signal.
First, the processing carried out by the video encoding device shown in
When receiving the video signal showing a picture, the tile partitioning unit 1 partitions the picture into tiles each of which has the size determined by the encoding controlling unit 2, and outputs each of the tiles to the partition video encoding unit 3 in order (step ST3). The encoding controlling unit 2 can set the size of each tile at the time of partitioning the picture into one or more tiles in steps of a pixel. The encoding controlling unit can alternatively set the size of each tile in steps of a minimum coding block size which is determined on the basis of the upper limit on the number of hierarchical layers with which to hierarchically partition each largest coding block, which will be mentioned below, into blocks. As an alternative, the encoding controlling unit can arbitrarily set the tile step size to the order of the power of 2. For example, in the case of 2 to the 0th power, the encoding controlling unit can the size of each tile in steps of one pixel, and, in the case of 2 to the 2th power, the encoding controlling unit can the size of each tile in steps of four pixels. In this case, the video encoding device can encode the exponent (i.e., the logarithm of the tile step size) as a parameter showing the tile step size, and encode the size of each tile on the basis of the tile step size. For example, in a case in which the tile step size is 8 pixels, the size of each tile can be set to an integral multiple of the tile step size, i.e., an integral multiple of 8, and values obtained by dividing the height and width of each tile by 8 are encoded as tile size information. As an alternative, the tile partitioning unit can partition the picture into small blocks each having the tile step size, and then partition the picture into tiles at the position of one of the small blocks which are numbered one by one in a raster scan order (
The encoding controlling unit 2 further determines the size of a largest coding block which is used for encoding of a tile which is the target to be encoded, and the upper limit on the number of hierarchical layers with which each largest coding block is hierarchically partitioned into blocks (step ST4). As a method of determining the size of a largest coding block, for example, there can be a method of determining an identical size for all the tiles in the picture, and a method of quantifying a difference in the complexity of a local movement in a tile of the video signal as a parameter, and determining a small size for a tile having a vigorous motion while determining a large size for a tile having few motions. As a method of determining the upper limit on the number of hierarchical layers for partitioning, there can be a method of adaptively determining the upper limit for each tile by, for example, increasing the number of hierarchical layers so that a finer motion can be detected when the video signal in the tile has a vigorous motion, and reducing the number of hierarchical layers when the video signal in the tile has few motions.
Every time when receiving a tile from the tile partitioning unit 1, the block partitioning unit 10 of the partition video encoding unit 3 partitions the tile into image regions each having the largest coding block size determined by the encoding controlling unit 2. After the block partitioning unit 10 partitions the tile into image regions each having the largest coding block size, for each of the image regions having the largest coding block size, the encoding controlling unit 2 determines a coding mode for each of coding target blocks, each having a coding block size, into which the above-mentioned image region is partitioned hierarchically until the number of hierarchical layers reaches the upper limit on the number of hierarchical layers for partitioning determined previously (step ST5).
Hereafter, the coding block size determined by the encoding controlling unit 2 is defined as the size of (Ln, Mn) in the luminance component of each coding target block. Because quadtree partitioning is carried out, (Ln, Mn)=(Ln/2, Mn/2) is always established. In the case of a color video image signal (4:4:4 format), such as an RGB signal, in which all the color components have the same sample number, all the color components have a size of (Ln, Mn), while in the case of handling a 4:2:0 format, a corresponding color difference component has a coding block size of (Ln/2, Mn/2).
Hereafter, each coding target block in the nth hierarchical layer is expressed as Bn, and a coding mode selectable for each coding target block Bn is expressed as m(Bn). In the case of a color video signal which consists of a plurality of color components, the coding mode m(Bn) can be formed in such a way that an individual mode is used for each color component, or can be formed in such a way that a common mode is used for all the color components. Hereafter, an explanation will be made by assuming that the coding mode indicates the one for the luminance component of a coding block having a 4:2:0 format in a YUV signal unless otherwise specified.
The coding mode m(Bn) can be one of one or more intra coding modes (generically referred to as “INTRA”) or one or more inter coding modes (generically referred to as “INTER”), and the encoding controlling unit 2 selects, as the coding mode m(Bn), an coding mode with the highest coding efficiency for each coding target block Bn from among all the coding modes available in the picture currently being processed or a subset of these coding modes.
Each coding target block Bn is further partitioned into one or more units for prediction process (partitions) by the block partitioning unit 10, as shown in
The encoding controlling unit 2 generates such a block partitioning state as shown in, for example,
When the coding mode m(Bn) determined by the encoding controlling unit 2 is an intra coding mode (in the case of m(Bn)εINTRA), the select switch 11 outputs the coding target block Bn outputted from the block partitioning unit 10 to the intra prediction unit 12. In contrast, when the coding mode m(Bn) determined by the encoding controlling unit 2 is an inter coding mode (in the case of m(Bn)εINTER), the select switch outputs the coding target block Bn outputted from the block partitioning unit 10 to the motion-compensated prediction unit 13.
When the coding mode m(Bn) determined by the encoding controlling unit 2 is an intra coding mode (in the case of m(Bn)εINTRA), and the intra prediction unit 12 receives the coding target block Bn from the select switch 11 (step ST6), the intra prediction unit 12 carries out an intra prediction process on each partition Pin in the coding target block Bn by using the intra prediction parameter determined by the encoding controlling unit 2 while referring to the local decoded image stored in the memory 18 for intra prediction to generate an intra prediction image PINTRAin (step ST7). Because the image decoding device needs to generate an intra prediction image which is completely the same as the intra prediction image PINTRAin, the intra prediction parameter used for the generation of the intra prediction image PINTRAin is outputted from the encoding controlling unit 2 to the variable length encoding unit 7 and is multiplexed into the bitstream.
When the coding mode m(Bn) determined by the encoding controlling unit 2 is an inter coding mode (in the case of m(Bn)εINTER), and the motion-compensated prediction unit 13 receives the coding target block Bn from the select switch 11 (step ST6), the motion-compensated prediction unit 13 compares each partition Pin in the coding target block Bn with the local decoded image which is stored in the motion-compensated prediction frame memory 6 and on which a filtering process is carried out to search for a motion vector, and carries out an inter prediction process on each partition Pin in the coding target block Bn by using both the motion vector and the inter prediction parameter determined by the encoding controlling unit 2 to generate an inter prediction image PINTERin (step ST8). The local decoded image stored in the motion-compensated prediction frame memory 6 is one picture of local decoded image, and the motion-compensated prediction unit can generate an inter prediction image PINTERin in such a way that the inter prediction image extends over a tile boundary.
Further, because the video decoding device needs to generate an inter prediction image which is completely the same as the inter prediction image PINTERin, the inter prediction parameter used for the generation of the inter prediction image PINTERin is outputted from the encoding controlling unit 2 to the variable length encoding unit 7 and is multiplexed into the bitstream. The motion vector which is searched for by the motion compensation prediction unit 13 is also outputted to the variable length encoding unit 7 and is multiplexed into the bitstream.
When receiving the coding target block Bn from the block partitioning unit 10, the subtracting unit 14 subtracts the intra prediction image PINTERin generated by the intra prediction unit 12 or the inter prediction image PINTERin generated by the motion-compensated prediction unit 13 from each partition Pin in the coding target block Bn, and outputs a prediction difference signal ein showing a difference image which is the result of the subtraction to the transformation/quantization unit 15 (step ST9).
When receiving the prediction difference signal ein from the subtracting unit 14, the transformation/quantization unit 15 carries out an orthogonal transformation process (e.g., a DCT (discrete cosine transform) or an orthogonal transformation process, such as a KL transform, in which bases are designed for a specific learning sequence in advance) on the prediction difference signal ein by referring to the prediction difference coding parameter determined by the encoding controlling unit 2 to calculate transform coefficients (step ST10). The transformation/quantization unit 15 also quantizes the transform coefficients by referring to the prediction difference coding parameter and then outputs compressed data which are the transform coefficients quantized thereby to the inverse quantization/inverse transformation unit 16 and the variable length encoding unit 7 (step ST10).
When receiving the compressed data from the transformation/quantization unit 15, the inverse quantization/inverse transformation unit 16 inverse-quantizes the compressed data by referring to the prediction difference coding parameter determined by the encoding controlling unit 2 (step ST11). The inverse quantization/inverse transformation unit 16 also carries out an inverse orthogonal transformation process (e.g., an inverse DCT or an inverse KL transform) on the transform coefficients which are the compressed data inverse-quantized thereby by referring to the prediction difference coding parameter to calculate a local decoded prediction difference signal corresponding to the prediction difference signal ein outputted from the subtracting unit 14 (step ST11).
When receiving the local decoded prediction difference signal from the inverse quantization/inverse transformation unit 16, the adding unit 17 adds an image shown by the local decoded prediction difference signal and the intra prediction image PINTRAin generated by the intra prediction unit 12 or the inter prediction image PINTERin generated by the motion-compensated prediction unit 13 to calculate a local decoded image corresponding to the coding target block Bn outputted from the block partitioning unit 10 as a local decoded partition image or a group of local decoded partition images (step ST12). The adding unit 17 stores the local decoded image in the image memory 4, and also stores the local decoded image in the memory 18 for intra prediction. This local decoded image is an image signal for subsequent intra prediction.
The loop filter unit 5 carries out a predetermined filtering process on the local decoded image stored in the image memory 4, and stores the local decoded image on which the loop filter unit carries out the filtering process in the motion-compensated prediction frame memory 6 (step ST16). The filtering process by the loop filter unit 5 can be carried out on each largest coding block of the local decoded image inputted thereto or each coding target block of the local decoded image inputted thereto. As an alternative, after one picture of local decoded image is inputted, the loop filter unit can carry out the filtering process on the one picture of local decoded image at a time. Further, as an example of the predetermined filtering process, there can be provided a process of filtering a block boundary in such a way as to make discontinuity (block noise) at the block boundary unobtrusive, and a filtering process of compensating for a distortion occurring in the local decoded image in such a way that an error between the picture shown by the video signal inputted and the local decoded image is minimized. However, because the loop filter unit 5 needs to refer to the video signal showing the picture when carrying out the filtering process of compensating for a distortion occurring in the local decoded image in such a way that an error between the picture and the local decoded image is minimized, there is a necessity to modify the video encoding device shown in
The video encoding device repeatedly carries out the processes of steps ST6 to ST12 until the video encoding device completes the processing on all the coding blocks Bn into which the inputted image is partitioned hierarchically, and, when completing the processing on all the coding blocks Bn, shifts to a process of step ST15 (steps ST13 and ST14).
The variable length encoding unit 7 carries out a process of variable-length-encoding the tile information outputted from the encoding controlling unit 2 and showing the rectangular region size of each tile and the position of each tile in the picture (the tile information includes an initialization instruction flag for arithmetic coding process, and a flag showing whether or not to allow a reference to a decoded pixel over a tile boundary and a reference to various coding parameters over a tile boundary, in addition to the information showing the size and the position of each tile), the coding parameters of each coding target block outputted from the encoding controlling unit 2 (the coding mode, the intra prediction parameter or the inter prediction parameter, and the prediction difference coding parameter), and the encoded data about each coding target block outputted from the partition video encoding unit 3 (the compressed data and the motion information (when the coding mode is an inter coding mode)) to generate a bitstream into which the results of the encoding are multiplexed. The variable length encoding unit 7 also variable-length-encodes the confirmation flag for partitioning showing whether the tile partitioning unit 1 partitions the picture into tiles to generate a bitstream into which the result of encoding the confirmation flag for partitioning is multiplexed. However, when the tile partitioning unit 1 does not partition a picture into tiles at all times, the video encoding device does not carry out variable length encoding on the confirmation flag for partitioning because the video encoding device does not need to transmit the confirmation flag for partitioning to the video decoding device.
Next, the process carried out by the intra prediction unit 12 will be explained in detail.
The intra prediction unit 12 carries out an intra prediction process on a partition Pin by referring to the intra prediction parameter of the partition Pin to generate an intra prediction image PINTRAin. Hereafter, an intra prediction process of generating an intra prediction signal of the luminance signal on the basis of the intra prediction parameter (intra prediction mode) for the luminance signal of the partition Pin will be explained.
Hereafter, the partition Pin is assumed to have a size of lin×min pixels.
When an index value indicating the intra prediction mode for the partition Pin is 2 (average prediction), the intra prediction unit generates a prediction image by using the average of the adjacent pixels in the upper partition and the adjacent pixels in the left partition as the predicted value of each pixel in the partition Pin. When the index value indicating the intra prediction mode is other than 2 (average prediction), the intra prediction unit generates the predicted value of each pixel in the partition Pin on the basis of a prediction direction vector up=(dx, dy) shown by the index value. When relative coordinates in the partition Pin (the upper left pixel of the partition is defined as the point of origin) of each pixel (prediction target pixel) for which the predicted value is generated are expressed as (x, y), the position of a reference pixel which is used for prediction is the point of intersection where the following L and a line of adjacent pixels intersect each other.
where k is a scalar value.
When the reference pixel is at an integer pixel position, the value of the corresponding integer pixel is determined as the predicted value of the prediction target pixel, whereas when the reference pixel is not at an integer pixel position, the value of an interpolation pixel generated from the integer pixels which are adjacent to the reference pixel is determined as the predicted value of the prediction target pixel. In the example of
The intra prediction unit also carries out an intra process based on the intra prediction parameter (intra prediction mode) on each of the color difference signals of the partition Pin according to the same procedure as that according to which the intra prediction unit carries out an intra process on the luminance signal, and outputs the intra prediction parameter used for the generation of the intra prediction image to the variable length encoding unit 7.
Next, the process carried out by the variable length encoding unit 7 will be explained in detail. When variable-length-encoding the motion vector, the variable length encoding unit 7 calculates a predicted vector for the motion vector of the partition Pin which is the target to be encoded on the basis of the motion vector of an already-encoded neighboring partition or the motion vector of a reference frame, and carries out predictive coding by using the predicted vector. More specifically, the motion vector predicted vector candidate calculating unit 21 of the motion vector variable length encoding unit 7a which constructs a part of the variable length encoding unit 7 calculates predicted vector candidates for the partition Pin which is the target to be encoded from the motion vector of an already-encoded partition adjacent to the partition Pin which is the target to be encoded, and the motion vector of a reference frame stored in the motion-compensated prediction frame memory 6.
Further, when the motion vector of an already-encoded upper right partition (B0) located opposite to the upper right corner of the partition Pin is determined as a predicted vector candidate B. However, when the motion vector of the upper right partition (B0) cannot be used, such as when the upper right partition (B0) is not included in the target tile to be encoded or when the upper right partition is a partition already encoded in an intra coding mode, the motion vector of an already-encoded partition B1 adjacent to the upper right partition (B0) or the motion vector of an already-encoded upper left partition (B2) located opposite to the upper left corner of the partition Pin is determined as the predicted vector candidate B.
Next, a method of calculating predicted vector candidates C from the motion vector of a reference frame will be explained. The reference frame used for calculating predicted vector candidates is determined from among the reference frames stored in the motion-compensated prediction frame memory 6. In the method of determining the reference frame, for example, the frame which is the nearest to the frame including the target tile to be encoded in the order of displaying frames is selected. Next, a partition which is used for calculating predicted vector candidates in the reference frame is determined.
A motion vector candidate C in a temporal direction can be referred to over a tile boundary in the reference frame. As an alternative, any reference to a motion vector candidate C in a temporal direction over a tile boundary in the reference frame can be prohibited. As an alternative, whether to enable or disable a reference over a tile boundary in the reference frame can be changed according to a flag on a per sequence, frame, or tile basis, and the flag can be multiplexed into the bitstream as a parameter per sequence, frame, or tile.
After calculating one or more predicted vector candidates, the motion vector predicted vector candidate calculating unit 21 outputs the one or more predicted vector candidates to the motion vector predicted vector determining unit 22. When no predicted vector candidates A, no predicted vector candidates B, and no predicted vector candidates C exist, i.e., when no motion vector can be used, such as when any partition which is the target for which predicted vector candidates are calculated is already encoded in an intra coding mode, a fixed vector (e.g., a zero vector (a vector that refers to a position just behind)) is outputted as a predicted vector candidate.
When receiving the one or more predicted vector candidates from the motion vector predicted vector candidate calculating unit 21, the motion vector predicted vector determining unit 22 selects, as a predicted vector, a predicted vector candidate which minimizes the magnitude or the code amount of a difference vector between the predicted vector candidate and the motion vector of the partition Pin which is the target to be encoded from the one or more predicted vector candidates. The motion vector predicted vector determining unit 22 outputs the predicted vector selected thereby to the motion vector difference calculating unit 23, and outputs an index (predicted vector index) showing the predicted vector to the entropy encoding unit 24.
When receiving the predicted vector from the motion vector predicted vector determining unit 22, the motion vector difference calculating unit 23 calculates the difference vector between the predicted vector and the motion vector of the partition Pin, and outputs the difference vector to the entropy encoding unit 24. When receiving the difference vector from the motion vector difference calculating unit 23, the entropy encoding unit 24 carries out variable length encoding, such as arithmetic coding, on the difference vector and the predicted vector index outputted from the motion vector predicted vector determining unit 22 to generate a motion vector information code word, and outputs the motion vector information code word.
Next, processing carried out by the video decoding device shown in
When the confirmation flag for partitioning shows that a picture is partitioned into one or more tiles, the variable length decoding unit 30 variable-length-decodes the tile information from the bitstream. The tile information includes the initialization instruction flag for arithmetic coding process, and the flag showing whether or not to allow a reference to a decoded pixel over a tile boundary and a reference to various coding parameters over a tile boundary, in addition to the information showing the size and the position of each tile.
After variable-length-decoding the tile information from the bitstream, the variable length decoding unit 30 variable-length-decodes the coding parameters of each of coding target blocks into which each tile having the size shown by the tile information is hierarchically partitioned (the coding mode, the intra prediction parameter or the inter prediction parameter, and the prediction difference coding parameter), and the encoded data (the compressed data and the motion information (when the coding mode is an inter coding mode)) (step ST21 of
After decoding the partitioning state of each largest coding block, the variable length decoding unit 30 specifies the decoding target blocks into which the largest coding block is partitioned hierarchically (blocks respectively corresponding to “coding target blocks” in the video encoding device shown in
After specifying the decoding target blocks (coding target blocks) into which the largest coding block is partitioned hierarchically, the variable length decoding unit 30 decodes the coding mode assigned to each of the decoding target blocks, partitions the decoding target block into one or more units for prediction process on the basis of the information included in the coding mode, and decodes the prediction parameter assigned to each of the one or more units for prediction process (step ST24). When the coding mode assigned to a decoding target block is an intra coding mode, the variable length decoding unit 30 decodes the intra prediction parameter for each of one or more partitions included in the decoding target block.
When the coding mode assigned to the coding mode is an inter coding mode, the variable length decoding unit 30 decodes the motion vector and the inter prediction parameter for each of the one or more partitions included in the decoding target block. The decoding of the motion vector is carried out by calculating a predicted vector for the motion vector of the target partition to be decoded Pin on the basis of the motion vector of an already-decoded neighboring partition or the motion vector of a reference frame and by using the predicted vector according to the same procedure as that according to which the video encoding device shown in
The motion vector predicted vector determining unit 53 selects, as a predicted vector, a predicted vector candidate shown by the predicted vector index variable-length-decoded by the entropy decoding unit 51 from the one or more predicted vector candidates calculated by the motion vector predicted vector candidate calculating unit 52, and outputs the predicted vector to the motion vector calculating unit 54. When receiving the predicted vector from the motion vector predicted vector determining unit 53, the motion vector calculating unit 54 decodes the motion vector (predicted vector+difference vector) by adding the predicted vector and the difference vector variable-length-decoded by the entropy decoding unit 51.
The variable length decoding unit 30 further divides each of the one or more partitions which is a unit for prediction process into one or more partitions each of which is a unit for transformation process on the basis of transform block size information included in the prediction difference coding parameter, and decodes the compressed data (the transform coefficients transformed and quantized) for each partition which is a unit for transformation process.
When the confirmation flag for partitioning shows that the picture is not partitioned into one or more tiles, the variable length decoding unit 30 variable-length-decodes the coding parameters of each of coding target blocks into which the picture which is the inputted image inputted to the video encoding device shown in
When the coding mode m(Bn) variable-length-decoded by the variable length decoding unit 30 is an intra coding mode (in the case of m(Bn)εINTRA), the select switch 41 of the partition video decoding unit 31 outputs the intra prediction parameter variable-length-decoded by the variable length decoding unit 30 to the intra prediction unit 42. In contrast, when the coding mode m(Bn) variable-length-decoded by the variable length decoding unit 30 is an inter coding mode (in the case of m(Bn)εINTER), the select switch outputs the inter prediction parameter and the motion vector which are variable-length-decoded by the variable length decoding unit 30 to the motion compensation unit 43.
When the coding mode m(Bn) variable-length-decoded by the variable length decoding unit 30 is an intra coding mode (in the case of m(Bn)εINTRA) and the intra prediction unit 42 receives the intra prediction parameter from the select switch 41 (step ST25), the intra prediction unit 42 carries out an intra prediction process on each partition Pin in the decoding target block Bn by using the intra prediction parameter while referring to the decoded image stored in the memory 46 for intra prediction to generate an intra prediction image PINTRAin according to the same procedure as that according to which the intra prediction unit 12 shown in
When the coding mode m(Bn) variable-length-decoded by the variable length decoding unit 30 is an inter coding mode (in the case of m(Bn)εINTER) and the motion compensation unit 43 receives the inter prediction parameter and the motion vector from the select switch 41 (step ST25), the motion compensation unit 43 carries out an inter prediction process on the decoding target block by using the motion vector and the inter prediction parameter while referring to the decoded image which is stored in the motion-compensated prediction frame memory 34 and on which a filtering process is carried out to generate an inter prediction image PINTERin (step ST27).
When receiving the compressed data and the prediction difference coding parameter from the variable length decoding unit 30 (step ST25), the inverse quantization/inverse transformation unit 44 inverse-quantizes the compressed data by referring to the prediction difference coding parameter and also carries out an inverse orthogonal transformation process on transform coefficients which are the compressed data inverse-quantized thereby by referring to the prediction difference coding parameter to calculate a decoded prediction difference signal according to the same procedure as that according to which the inverse quantization/inverse transformation unit 16 shown in
The adding unit 45 adds an image shown by the decoded prediction difference signal calculated by the inverse quantization/inverse transformation unit 44 and the intra prediction image PINTRAin generated by the intra prediction unit 42 or the inter prediction image PINTERin generated by the motion compensation unit 43 and stores a decoded image in the image memory 32 as a group of one or more decoded partition image included in the decoding target block, and also stores the decoded image in the memory 46 for intra prediction (step ST29). This decoded image is an image signal for subsequent intra prediction. When storing the decoded image in the image memory 32, the adding unit 45 stores the decoded image at an address in the image memory 32, the address corresponding to the position of the tile currently being processed, the position being indicated by the tile information variable-length-decoded by the variable length decoding unit 30.
After the decoding of all the tiles in the picture is completed, and one picture of decoded image is written in the image memory 32 (step ST30), the loop filter unit 33 carries out a predetermined filtering process on the one picture of decoded image, and stores the decoded image on which the loop filter unit carries out the filtering process in the motion-compensated prediction frame memory 34 (step ST31). This decoded image is a reference image for motion-compensated prediction, and is also a reproduced image.
As can be seen from the above description, in accordance with this Embodiment 1, the tile partitioning unit 1 that partitions an inputted image into tiles each having a specified size and outputs the tiles, the encoding controlling unit 2 that determines an upper limit on the number of hierarchical layers when a coding block, which is a unit to be processed at a time when a prediction process is carried out, is hierarchically partitioned, and also determines a coding mode for determining an encoding method for each coding block, the block partitioning unit 10 that partitions a tile outputted from the tile partitioning unit 1 into coding blocks each having a predetermined size and also partitions each of the coding blocks hierarchically until the number of hierarchical layers reaches the upper limit on the number of hierarchical layers which is determined by the encoding controlling unit 2, the prediction image generator (the intra prediction unit 12 and motion-compensated prediction unit 13) that carries out a prediction process on a coding block obtained through the partitioning by the block partitioning unit 10 to generate a prediction image in the coding mode determined by the encoding controlling unit 2, the subtracting unit 14 that generates a difference image between the coding block obtained through the partitioning by the block partitioning unit 10, and the prediction image generated by the prediction image generator, and the transformation/quantization unit 15 that compresses the difference image generated by the subtracting unit 14 and outputs compressed data about the difference image are disposed, and the variable length encoding unit 7 is constructed in such a way as to variable-length-encode the compressed data outputted from the transformation/quantization unit 15 and the coding mode determined by the encoding controlling unit, and also variable-length-encode the tile information showing the size and the position in the inputted image of each of the tiles outputted from the tile partitioning unit 1 to generate a bitstream into which encoded data about the compressed data, encoded data about the coding mode, and encoded data about the tile information are multiplexed. Therefore, there is provided an advantage of, even when the size of an inputted image is not an integral multiple of a pixel number defined for HDTV, being able to utilize an input interface, equipment, etc. for use in HDTV in the above-mentioned device.
More specifically, according to this Embodiment 1, even when the size of a tile which is an inputted image is an integral multiple of the pixel number defined for HDTV, the tile partitioning unit 1 of the video encoding device can partition the picture into tiles each having an arbitrary number of pixels. Therefore, there is provided an advantage of being able to utilize an input interface, equipment, etc. for use in HDTV in the above-mentioned device regardless of the preset size of a macroblock. Further, by partitioning a picture which is an inputted image into a plurality of tiles and adaptively determining an upper limit on the number of hierarchical layers for partitioning for each of the tiles according to the characteristics of a local motion in the tile, or the like, encoding can be carried out with an improved degree of coding efficiency.
Because the variable length decoding unit 30 of the video decoding device according to this Embodiment 1 decodes the size and the position information in the picture of each tile from the bitstream which is generated by partitioning the picture into a plurality of tiles and carrying out encoding, the variable length decoding unit can decode the above-mentioned bitstream correctly. Further, because the variable length decoding unit 30 decodes the upper limit on the number of hierarchical layers for partitioning or the like, which is a parameter associated with a tile, from the above-mentioned bitstream on a per tile basis, the variable length decoding unit can correctly decode the bitstream which is encoded with a degree of coding efficiency which is improved by adaptively determining the upper limit on the number of hierarchical layers for partitioning for each of the tiles.
Although the video encoding device in which the single partition video encoding unit 3 is mounted and sequentially processes each tile outputted from the tile partitioning unit 1 in turn is shown in above-mentioned Embodiment 1, the video encoding device can alternatively include a plurality of partition video encoding units 3 (tile encoding devices), as shown in
Although the video decoding device in which the single partition video decoding unit 31 is mounted and sequentially processes each tile is shown in above-mentioned Embodiment 1, the video decoding device can alternatively include a plurality of partition video decoding units 31 (tile decoding devices), as shown in
While the invention has been described in its preferred embodiments, it is to be understood that an arbitrary combination of two or more of the above-mentioned embodiments can be made, various changes can be made in an arbitrary component according to any one of the above-mentioned embodiments, and an arbitrary component according to any one of the above-mentioned embodiments can be omitted within the scope of the invention.
As mentioned above, because the video encoding device, the video decoding device, the video encoding method, and the video decoding method in accordance with the present invention make it possible to utilize an input interface, equipment, etc. for use in HDTV in the above-mentioned device when the size of an inputted image is an integral multiple of the pixel number defined for HDTV, the video encoding device and the video encoding method are suitable for use as a video encoding device for and a video encoding method of compression-encoding and transmitting an image, and the video decoding device and the video decoding method are suitable for use as a video decoding device for and a video decoding method of decoding encoded data transmitted by a video encoding device into an image.
1 tile partitioning unit (tile partitioner), 2 encoding controlling unit (encoding controller), 3 partition video encoding unit (tile encoding device), 4 image memory, 5 loop filter unit, 6 motion-compensated prediction frame memory, 7 variable length encoding unit (variable length encoder), 7a motion vector variable length encoding unit, 10 block partitioning unit (block partitioner), 11 select switch, 12 intra prediction unit (prediction image generator), 13 motion-compensated prediction unit (prediction image generator), 14 subtracting unit (image compressor), 15 transformation/quantization unit (image compressor), 16 inverse quantization/inverse transformation unit, 17 adding unit, 18 memory for intra prediction, 21 motion vector predicted vector candidate calculating unit, 22 motion vector predicted vector determining unit, 23 motion vector difference calculating unit, 24 entropy encoding unit, 30 variable length decoding unit (variable length decoder), 30a motion vector variable length decoding unit, 31 partition video decoding unit (tile decoding device), 32 image memory (decoded image storage), 33 loop filter unit, 34 motion-compensated prediction frame memory, 41 select switch, 42 intra prediction unit (prediction image generator), 43 motion compensation unit (prediction image generator), 44 inverse quantization/inverse transformation unit (decoded image generator), 45 adding unit (decoded image generator), 46 memory for intra prediction, 51 entropy decoding unit, 52 motion vector predicted vector candidate calculating unit, 53 motion vector predicted vector determining unit, 54 motion vector calculating unit.
Number | Date | Country | Kind |
---|---|---|---|
2011-239009 | Oct 2011 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/073067 | 9/10/2012 | WO | 00 | 4/16/2014 |