The present disclosure relates to a video signal processing method and device and, more specifically, to a video signal processing method and device by which a video signal is encoded or decoded.
Compression coding refers to a series of signal processing techniques for transmitting digitized information through a communication line or storing information in a form suitable for a storage medium. An object of compression encoding includes objects such as voice, video, and text, and in particular, a technique for performing compression encoding on an image is referred to as video compression. Compression coding for a video signal is performed by removing excess information in consideration of spatial correlation, temporal correlation, and stochastic correlation. However, with the recent development of various media and data transmission media, a more efficient video signal processing method and apparatus are required.
An aspect of the present specification is to provide a video signal processing method and a device therefor to increase the coding efficiency of a video signal.
The disclosure provides a video signal processing method and an apparatus therefor.
In the specification, a video signal decoding apparatus may include a processor, and the processor may parse a first syntax element that is a general constraint information (GCI) syntax element from a bitstream, may parse, based on a result of parsing the first syntax element, a second syntax element indicating whether an LIC mode is available for a current sequence, may parse a third syntax element indicating whether the LIC mode is used in the current block, and, when the third syntax element indicates whether the LIC mode is used in the current block, may predict the current block based on the LIC mode. The first syntax element may be included in at least one of a sequence parameter set (SPS) RBSP syntax and a video parameter set (VPS) RBSP syntax. The second syntax element may be included in the SPS RBSP syntax. When the value of the first syntax element is 1, the value of the second syntax element may be set to 0 that is a value indicating that the LIC mode is not used, irrespective of the result of parsing the second syntax element. When the value of the first syntax element is 0, the value of syntax element is not constrained.
In the specification, the third syntax element may indicate whether the LIC mode is used in the current block. In this instance, the processor may configure a first template including neighboring blocks of the current block, may second configure template including neighboring blocks of a reference block of the current block, may obtain an LIC linear model based on the first template and the second template, and may predict the current block based on the LIC linear model, and the location and the size of the first template correspond to the location and the size of the second template.
In the specification, when an encoding mode of the current block is a GPM mode, and the third syntax element indicates whether the LIC mode is used in the current block, the current block may be divided into a first area and a second area. In this instance, the processor may obtain a first LIC linear model for the first area, may obtain, based on the first LIC linear model, a first prediction block for the first area, may obtain a second LIC linear model for the second area, may obtain, based on the second LIC linear model, a second prediction block for the second area, and based on the first prediction block and the second prediction block, may predict the current block.
In the specification, the third syntax element may indicate whether the LIC mode is used in the current block. In this instance, the processor may configure a template including neighboring blocks located in a predetermined range from the current block, may obtain a convolutional model based on the template, and may predict the current block based on the convolutional model.
In the specification, a video signal encoding apparatus may obtain a bitstream decoded by a decoding method.
In the specification, in a computer-readable non-transitory storage medium that stores a bitstream, the bitstream is decoded by a decoding method.
In the specification, the decoding method may include an operation of parsing a first syntax element that is a general constraint information (GCI) syntax element from a bitstream, an operation parsing a second syntax element indicating whether an LIC mode is available for a current sequence, an operation of parsing, based on a result of parsing the second syntax element, a third syntax element indicating whether the LIC mode is used in the current block, an operation of predicting the current block based on the LIC mode when the third syntax element indicates whether the LIC mode is used in the current block. The first syntax element may be included in at least one of a sequence parameter set (SPS) RBSP syntax and a video parameter set (VPS) RBSP syntax. The second syntax element may be included in the SPS RBSP syntax. When a value of the first syntax element is 1, a value of the second syntax element may be set to 0 that is a value indicating that the LIC mode is not used, irrespective of a result of parsing the second syntax element. When the value of the first syntax element is 0, the value of the second syntax element may not be constrained.
In the specification, the third syntax element may indicate whether the LIC mode is used in the current block. In this instance, the decoding method may include an template including operation of configuring a first neighboring blocks of the current block, an operation of configuring a second template including neighboring blocks of a reference block of the current block, an operation of obtaining an LIC linear model based on the first template and the second template, and an operation of predicting the current block based on the LIC linear model, and a location and a size of the first template may correspond to a location and a size of the second template.
In the specification, when the encoding mode of the current block is a GPM mode, and the third syntax element indicates whether the LIC mode is used in the current block, the current block may be divided into a first area and a second area. In this instance, the decoding method may include an operation of obtaining a first LIC linear model for the first area, an operation of obtaining a first prediction block for the first area based on the first LIC linear model, an operation of obtaining a second LIC linear model for the second area, an operation of obtaining a second prediction block for the second area based on the second LIC linear model, and an operation of predicting the current block based on the first prediction block and the second prediction block.
In the specification, the third syntax element may indicate whether the LIC mode is used in the current block. In this instance, the decoding method may include an operation of configuring a template including neighboring blocks located in a predetermined range from the current block, an operation of obtaining a convolutional model based on the template, and an operation of predicting the current block based on the convolutional model.
In the specification, the third syntax element may be parsed when the second syntax element indicates that the LIC mode is available for the current block.
In the specification, the third syntax element may be parsed by additionally taking into consideration at least one of the number of samples of the current block, an encoding mode of the current block, and a prediction direction associated with the current block.
In the specification, the third syntax element may be parsed when the number of samples of the current block is 32 or more.
In the specification, the third syntax element may be parsed when the encoding mode of the current block is not a merge mode, an IBC mode, and a CIIP mode.
In the specification, the third syntax element may be parsed when the prediction direction associated with the coding block is not bi-prediction.
In the specification, the first template may include upper side neighboring blocks of the current block, and the second template may include upper side neighboring blocks of the reference block.
In the specification, the first template may include left side neighboring blocks of the current block, and the second template may include left side neighboring blocks of the reference block.
In the specification, the first template may include upper side neighboring blocks of the current block and left side neighboring blocks of the current block, and the second template may include upper side neighboring blocks of the reference block and left side neighboring blocks of the reference block.
In the specification, the current block may be one sample. A filter coefficient of the convolutional model may be a coefficient of at least one sample among an upper side sample, a lower side sample, a left side sample, and a right side sample of the one sample.
In the specification, when one or more samples among the upper side sample, the lower side sample, the left side sample, and the right side sample of the one sample are not included in the template, the value of the sample that is not included in the template may be the mean value of the samples remaining after excluding the sample that is not included in the template.
In the specification, when one or more samples among the upper side sample, the lower side sample, the left side sample, and the right side sample of the one sample are not included in the template, the value of the sample that is not included in the template may be identical to a value of a sample closest to the sample that is not included in the template, among the samples included in the template.
The present disclosure provides a method for efficiently processing a video signal.
The effects obtainable from the present specification are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by to those skilled in the art, to which the present disclosure belongs, from the description below.
Terms used in this specification may be currently widely used general terms in consideration of functions in the present invention but may vary according to the intents of those skilled in the art, customs, or the advent of new technology. Additionally, in certain cases, there may be terms the applicant selects arbitrarily and in this case, their meanings are described in a corresponding description part of the present invention. Accordingly, terms used in this specification should be interpreted based on the substantial meanings of the terms and contents over the whole specification.
In this specification, ‘A and/or B’ may be interpreted as meaning ‘including at least one of A or B.’
In this specification, some terms may be interpreted as follows. Coding may be interpreted as encoding or decoding in some cases. In the present specification, an apparatus for generating a video signal bitstream by performing encoding (coding) of a video signal is referred to as an encoding apparatus or an encoder, and an apparatus that performs decoding (decoding) of a video signal bitstream to reconstruct a video signal is referred to as a decoding apparatus or decoder. In addition, in this specification, the video signal processing apparatus is used as a term of a concept including both an encoder and a decoder. Information is a term including all values, parameters, coefficients, elements, etc. In some cases, the meaning is interpreted differently, so the present invention is not limited thereto. ‘Unit’ is used as a meaning to refer to a basic unit of image processing or a specific position of a picture, and refers to an image region including both a luma component and a chroma component. Furthermore, a “block” refers to a region of an image that includes a particular component of a luma component and chroma components (i.e., Cb and Cr). However, depending on the embodiment, the terms “unit”, “block”, “partition”, “signal”, and “region” may be used interchangeably. Also, in the present specification, the term “current block” refers to a block that is currently scheduled to be encoded, and the term “reference block” refers to a block that has already been encoded or decoded and is used as a reference in a current block. In addition, the terms “luma”, “luminance”, “Y”, and the like may be used interchangeably in this specification. Additionally, in the present specification, the terms “chroma”, “chrominance”, “Cb or Cr”, and the like may be used interchangeably, and chroma components are classified into two components, Cb and Cr, and thus each chroma component may be distinguished and used. Additionally, in the present specification, the term “unit” may be used as a concept that includes a coding unit, a prediction unit, and a transform unit. A “picture” refers to a field or a frame, and depending on embodiments, the terms may be used interchangeably. Specifically, when a captured video is an interlaced video, a single frame may be separated into an odd (or cardinal or top) field and an even (or even-numbered or bottom) field, and each field may be configured in one picture unit and encoded or decoded. If the captured video is a progressive video, a single frame may be configured as a picture and encoded or decoded. In addition, in the present specification, the terms “error signal”, “residual signal”, “residue signal”, “remaining signal”, and “difference signal” may be used interchangeably. Also, in the present specification, the terms “intra-prediction mode”, “intra-prediction directional mode”, “intra-picture prediction mode”, and “intra-picture prediction directional mode” may be used interchangeably. In addition, in the present specification, the terms “motion”, “movement”, and the like may be used interchangeably. Also, in the present specification, the terms “left”, “left above”, “above”, “right above”, “right”, “right below”, “below”, and “left below” may be used interchangeably with “leftmost”, “top left”, “top”, “top right”, “right”, “bottom right”, “bottom”, and “bottom left”. Also, the terms “element” and “member” may be used interchangeably. Picture order count (POC) represents temporal position information of pictures (or frames), and may be the playback order in which displaying is performed on a screen, and each picture may have unique POC.
The transformation unit 110 obtains a value of a transform coefficient by transforming a residual signal, which is a difference between the inputted video signal and the predicted signal generated by the prediction unit 150. For example, a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), or a Wavelet Transform can be used. The DCT and DST perform transformation by splitting the input picture signal into blocks. In the transformation, coding efficiency may vary according to the distribution and characteristics of values in the transformation region. A transform kernel used for the transform of a residual block may has characteristics that allow a vertical transform and a horizontal transform to be separable. In this case, the transform of the residual block may be performed separately as a vertical transform and a horizontal transform. For example, an encoder may perform a vertical transform by applying a transform kernel in the vertical direction of a residual block. In addition, the encoder may perform a horizontal transform by applying the transform kernel in the horizontal direction of the residual block. In the present disclosure, the transform kernel may be used to refer to a set of parameters used for the transform of a residual signal, such as a transform matrix, a transform array, a transform function, or transform. For example, a transform kernel may be any one of multiple available kernels. Also, transform kernels based on different transform types may be used for the vertical transform and the horizontal transform, respectively.
The transform coefficients are distributed with higher coefficients toward the top left of a block and coefficients closer to “0” toward the bottom right of the block. As the size of a current block increases, there are likely to be many coefficients of “0” in the bottom-right region of the block. To reduce the transform complexity of a large-sized block, only a random top-left region may be kept and the remaining region may be reset to “0”.
In addition, error signals may be present in only some regions of a coding block. In this case, the transform process may be performed on only some random regions. In an embodiment, in a block having a size of 2N×2N, an error signal may be present only in the first 2N×N block, and the transform process may be performed on the first 2N×N block. However, the second 2N×N block may not be transformed and may not be encoded or decoded. Here, N may be any positive integer.
The encoder may perform an additional transform before transform coefficients are quantized. The above-described transform method may be referred to as a primary transform, and the additional transform may be referred to as a secondary transform. The secondary transform may be selective for each residual block. According to an embodiment, the encoder may improve coding efficiency by performing a secondary transform for regions where it is difficult to focus energy in a low-frequency region by using a primary transform alone. For example, a secondary transform may be additionally performed for blocks where residual values appear large in directions other than the horizontal or vertical direction of a residual block. Unlike a primary transform, a secondary transform may not be performed separately as a vertical transform and a horizontal transform. Such a secondary transform may be referred to as a low frequency non-separable transform (LFNST).
The quantization unit 115 quantizes the value of the transform coefficient value outputted from the transformation unit 110.
In order to improve coding efficiency, instead of coding the picture signal as it is, a method of predicting a picture using a region already coded through the prediction unit 150 and obtaining a reconstructed picture by adding a residual value between the original picture and the predicted picture to the predicted picture is used. In order to prevent mismatches in the encoder and decoder, information that can be used in the decoder should be used when performing prediction in the encoder. For this, the encoder performs a process of reconstructing the encoded current block again. The inverse quantization unit 120 inverse-quantizes the value of the transform coefficient, and the: transformation unit 125 reconstructs the residual value using the inverse quantized transform coefficient value. Meanwhile, the filtering unit 130 performs filtering operations to improve the quality of the reconstructed picture and to improve the coding efficiency. For example, a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter may be included. The filtered picture is outputted or stored in a decoded picture buffer (DPB) 156 for use as a reference picture.
The deblocking filter is a filter for removing intra-block distortions generated at the boundaries between blocks in a reconstructed picture. Through the distribution of pixels included in several columns or rows based on random edges in a block, the encoder may determine whether to apply a deblocking filter to the edges. When applying a deblocking filter to the block, the encoder may apply a long filter, a strong filter, or a weak filter depending the strength of deblocking filtering.
Additionally, horizontal filtering and vertical filtering may be processed in parallel. The sample adaptive offset (SAO) may be used to correct offsets from an original video on a pixel-by-pixel basis with respect to a residual block to which a deblocking filter has been applied. To correct offset for a particular picture, the encoder may use a technique that divides pixels included in the picture into a predetermined number of regions, determines a region in which the offset correction is to be performed, and applies the offset to the region (Band Offset). Alternatively, the encoder may use a method for applying an offset in consideration of edge information of each pixel (Edge Offset). The adaptive loop filter (ALF) is a technique of dividing pixels included in a video into predetermined groups and then determining one filter to be applied to each group, thereby performing filtering differently for each group. Information about whether to apply ALF may be signaled on a per-coding unit basis, and the shape and filter coefficients of an ALF to be applied may vary for each block. In addition, an ALF filter having the same shape (a fixed shape) may be applied regardless of the characteristics of a target block to which the ALF filter is to be applied.
The prediction unit 150 includes an intra-prediction unit 152 and an inter-prediction unit 154. The intra-prediction unit 152 performs intra prediction within a current picture, and the inter-prediction unit 154 performs inter prediction to predict the current picture by using a reference picture stored in the decoded picture buffer 156. The intra-prediction unit 152 performs intra prediction from reconstructed regions in the current picture and transmits intra encoding information to the entropy coding unit 160. The intra encoding information may include at least one of an intra-prediction mode, a most probable mode (MPM) flag, an MPM index, and information regarding a reference sample. The inter-prediction unit 154 may again include a motion estimation unit 154a and a motion compensation unit 154b. The motion estimation unit 154a finds a part most similar to a current region with reference to a specific region of a reconstructed reference picture, and obtains a motion vector value which is the distance between the regions. Reference region-related motion information (reference direction indication information (L0 prediction, L1 prediction, or bidirectional prediction), a reference picture index, motion vector information, etc.) and the like, obtained by the motion estimation unit 154a, are transmitted to the entropy coding unit 160 so as to be included in a bitstream. The motion compensation unit 154B performs inter-motion compensation by using the motion information transmitted by the motion estimation unit 154a, to generate a prediction block for the current block. The inter-prediction unit 154 transmits the inter encoding information, which includes motion information related to the reference region, to the entropy coding unit 160.
According to an additional embodiment, the prediction unit 150 may include an intra block copy (IBC) prediction unit (not shown). The IBC prediction unit performs IBC prediction from reconstructed samples in a current picture and transmits IBC encoding information to the entropy coding unit 160. The IBC prediction unit references a specific region within a current picture to obtain a block vector value that indicates a reference region used to predict a current region. The IBC prediction unit may perform IBC prediction by using the obtained block vector value. The IBC prediction unit transmits the IBC encoding information to the entropy coding unit 160. The IBC encoding information may include at least one of reference region size information and block vector information (index information for predicting the block vector of a current block in a motion candidate list, and block vector difference information).
When the above picture prediction is performed, the transform unit 110 transforms a residual value between an original picture and a predictive picture to obtain a transform coefficient value. At this time, the transform may be performed on a specific block basis in the picture, and the size of the specific block may vary within a predetermined range. The quantization unit 115 quantizes the transform coefficient value generated by the transform unit 110 and transmits the quantized transform coefficient to the entropy coding unit 160.
The quantized transform coefficients in the form of a two-dimensional array may be rearranged into a one-dimensional array for entropy coding. In relation to methods for scanning a quantized transform coefficient, the size of a transform block and an intra-picture prediction mode may determine which scanning method is used. In an embodiment, diagonal, vertical, and horizontal scans may be applied. This scan information may be signaled on a block-by-block basis, and may be derived based on predetermined rules.
The entropy coding unit 160 generates a video signal bitstream by entropy coding information indicating a quantized transform coefficient, intra encoding information, and inter encoding information. The entropy coding unit 160 may use variable length coding (VLC) and arithmetic coding. The variable length coding (VLC) is a technique of transforming input symbols into consecutive codewords, wherein the length of the codewords is variable. For example, frequently occurring symbols are represented by shorter codewords, while less frequently occurring symbols are represented by longer codewords. As the variable length coding, context-based adaptive variable length coding (CAVLC) may be used. The arithmetic coding uses the probability distribution of each data symbol to transform consecutive data symbols into a single decimal number. The arithmetic coding allows acquisition of the optimal decimal bits needed to represent each symbol. As the arithmetic coding, context-based adaptive binary arithmetic coding (CABAC) may be used.
CABAC is a binary arithmetic coding technique using multiple context models generated based on probabilities obtained from experiments. First, when symbols are not in binary form, the encoder binarizes each symbol by using exp-Golomb, etc. The binarized value, 0 or 1, may be described as a bin. A CABAC initialization process is divided into context initialization and arithmetic coding initialization. The context initialization is the process of initializing the probability of occurrence of each symbol, and is determined by the type of symbol, a quantization parameter (QP), and slice type (I, P, or B). A context model having the initialization information may use a probability-based value obtained through an experiment. The context model provides information about the probability of occurrence of Least Probable Symbol (LPS) or Most Probable Symbol (MPS) for a symbol to be currently coded and about which of bin values 0 and 1 corresponds to the MPS (valMPS). One of multiple context models is selected via a context index (ctxIdx), and the context index may be derived from information in a current block to be encoded or from information about neighboring blocks. Initialization for binary arithmetic coding is performed based on a probability model selected from the context models. In the binary arithmetic coding, encoding is performed through the process in which division into probability intervals is made through the probability of occurrence of 0 and 1, and then a probability interval corresponding to a bin to be processed becomes the entire probability interval for the next bin to be processed. Information about a position within the last bin in which the last bin has been processed is output. However, the probability interval cannot be divided indefinitely, and thus, when the probability interval: is reduced to a certain size, a renormalization process is performed to widen the probability interval and the corresponding position information is output. In addition, after each bin is processed, a probability update process may be performed, wherein information about a processed bin is used to set a new probability for the next to be processed.
The generated bitstream is encapsulated in network abstraction layer (NAL) unit as basic units. The NAL units are classified into video a coding layer (VCL) NAL unit, which includes video data, and a non-VCL NAL unit, which includes parameter information for decoding video data. There are various types of VCL or non-VCL NAL units. A NAL unit includes NAL header information and raw byte sequence payload (RBSP) which is data. The NAL header information includes summary information about the RBSP. The RBSP of a VCL NAL unit includes an integer number of encoded coding tree units. In order to decode a bitstream in a video decoder, it is necessary to separate the bitstream into NAL units and then decode each of the separate NAL units. Information required for decoding a video signal bitstream may be included in a picture parameter set (PPS), a sequence parameter set (SPS), a video parameter set (VPS), etc., and transmitted.
The block diagram of
The entropy decoding unit 210 entropy-decodes a video signal bitstream to extract transform coefficient information, intra encoding information, inter encoding information, and the like for each region. For example, the entropy decoding unit 210 may obtain a binarization code for transform coefficient information of a specific region from the video signal bitstream. The entropy decoding unit 210 obtains a quantized transform coefficient by inverse-binarizing a binary code. The inverse quantization unit 220 inverse-quantizes the quantized transform coefficient, and the inverse transformation unit 225 restores a residual value by using the inverse-quantized transform coefficient. The video signal processing device 200 restores an original pixel value by summing the residual value obtained by the inverse transformation unit 225 with a prediction value obtained by the prediction unit 250.
Meanwhile, the filtering unit 230 performs filtering on a picture to improve image quality. This may include a deblocking filter for reducing block distortion and/or an adaptive loop filter for removing distortion of the entire picture. The filtered picture is outputted or stored in the DPB 256 for use as a reference picture for the next picture.
The prediction unit 250 includes an intra prediction unit 252 and an inter prediction unit 254. The prediction unit 250 generates a prediction picture by using the encoding type decoded through the entropy decoding unit 210 described above, transform coefficients for each region, and intra/inter encoding information. In order to reconstruct a current block in which decoding is performed, a decoded region of the current picture or other pictures including the current block may be used. In a reconstruction, only a current picture, that is, a picture (or, tile/slice) that performs intra prediction or intra BC prediction, is called an intra picture or an I picture (or, tile/slice), and a picture (or, tile/slice) that can perform all of intra prediction, inter prediction, and intra BC prediction is called an inter picture (or, tile/slice). In order to predict sample values of each block among inter pictures (or, tiles/slices), a picture (or, tile/slice) using up to one motion vector and a reference picture index is called a predictive picture or P picture (or, tile/slice), and a picture (or tile/slice) using up to two motion vectors and a reference picture index is called a bi-predictive picture or a B picture (or tile/slice). In other words, the P picture (or, tile/slice) uses up to one motion information set to predict each block, and the B picture (or, tile/slice) uses up to two motion information sets to predict each block. Here, the motion information set includes one or more motion vectors and one reference picture index.
The intra prediction unit 252 generates a prediction block using the encoding intra information and reconstructed samples in the current picture. As described above, the intra encoding information may include at least one of an intra prediction mode, a Most Probable Mode (MPM) flag, and an MPM index. The intra prediction unit 252 predicts the sample values of the current block by using the reconstructed samples located on the left and/or upper side of the current block as reference samples. In this disclosure, reconstructed samples, reference samples, and samples of the current block may represent pixels. Also, sample values may represent pixel values.
According to an embodiment, the reference samples may be samples included in a neighboring block of the current block. For example, the reference samples may be samples adjacent to a left boundary of the current block and/or samples may be samples adjacent to an upper boundary. Also, the reference samples may be samples located on a line within a predetermined distance from the left boundary of the current block and/or samples located on a line within a predetermined distance from the upper boundary of the current block among the samples of neighboring blocks of the current block. In this case, the neighboring block of the current block may include the left (L) block, the upper (A) block, the below left (BL) block, the above right (AR) block, or the above left (AL) block.
The inter prediction unit 254 generates a prediction block using reference pictures and inter encoding information stored in the DPB 256. The inter coding information may include motion information set (reference picture index, motion vector information, etc.) of the current block for the reference block. Inter prediction may include L0 prediction, L1 prediction, and bi-prediction. L0 prediction means prediction using one reference picture included in the L0 picture list, and L1 prediction means prediction using one reference picture included in the L1 picture list. For this, one set of motion information (e.g., motion vector and reference picture index) may be required. In the bi-prediction method, up to two reference regions may be used, and the two reference regions may exist in the same reference picture or may exist in different pictures. That is, in the bi-prediction method, up to two sets of motion information (e.g., a motion vector and a reference picture index) may be used and two motion vectors may correspond to the same reference picture index or different reference picture indexes. In this case, the reference pictures are pictures located temporally before or after the current picture, and may be pictures for which reconstruction has already been completed. According to an embodiment, two reference regions used in the bi-prediction scheme may be regions selected from picture list L0 and picture list L1, respectively.
The inter prediction unit 254 may obtain a reference block of the current block using a motion vector and a reference picture index. The reference block is in a reference picture corresponding to a reference picture index. Also, a sample value of a block specified by a motion vector or an interpolated value thereof can be used as a predictor of the current block. For motion prediction with sub-pel unit pixel accuracy, for example, an 8-tap interpolation filter for a luma signal and a 4-tap interpolation filter for a chroma signal can be used. However, the interpolation filter for motion prediction in sub-pel units is not limited thereto. In this way, the inter prediction unit 254 performs motion compensation to predict the texture of the current unit from motion pictures reconstructed previously. In this case, the inter prediction unit may use a motion information set.
According to an additional embodiment, the prediction unit 250 may include an IBC prediction unit (not shown). The IBC prediction unit may reconstruct the current region by referring to a specific region including reconstructed samples in the current picture. The IBC prediction unit obtains IBC encoding information for the current region from the entropy decoding unit 210. The IBC prediction unit obtains a block vector value of the current region indicating the specific region in the current picture. The IBC prediction unit may perform IBC prediction by using the obtained block vector value. The IBC encoding information may include block vector information.
The reconstructed video picture is generated by adding the predict value outputted from the intra prediction unit 252 or the inter prediction unit 254 and value the residual outputted from the inverse transformation unit 225. That is, the video signal decoding apparatus 200 reconstructs the current block using the prediction block generated by the prediction unit 250 and the residual obtained from the inverse transformation unit 225.
Meanwhile, the block diagram of
The technology proposed in the present specification may be applied to a method and a device for both an encoder and a decoder, and the wording signaling and parsing may be for convenience of description. In general, signaling may be described as encoding each type of syntax from the perspective of the encoder, and parsing may be described as interpreting each type of syntax from the perspective of the decoder. In other words, each type of syntax may be included in a bitstream and signaled by the encoder, and the decoder may parse the syntax and use the syntax in a reconstruction process. In this case, the sequence of bits for each type of syntax arranged according to a prescribed hierarchical configuration may be called a bitstream.
One picture may be partitioned into sub-pictures, slices, tiles, etc. and encoded. A sub-picture may include one or more slices or tiles. When one picture is partitioned into multiple slices or tiles and encoded, all the slices or tiles within the picture must be decoded before the picture can be output a screen. On the other hand, when one picture is encoded into multiple subpictures, only a random subpicture may be decoded and output on the screen. A slice may include multiple tiles or subpictures. Alternatively, a tile may include multiple subpictures or slices. Subpictures, slices, and tiles may be encoded or decoded independently of each other, and thus are advantageous for parallel processing and processing speed improvement. However, there is the disadvantage in that a bit rate increases because encoded information of other adjacent subpictures, slices, and tiles is not available. A subpicture, a slice, and a tile may be partitioned into multiple coding tree units (CTUs) and encoded.
The coding unit refers to a basic unit for processing a picture in the process of processing the video signal described above, that is, intra/inter prediction, transformation, quantization, and/or entropy coding. The size and shape of the coding unit in one picture may not unit may have a square or be constant. The coding rectangular shape. The rectangular coding unit (or rectangular block) includes a vertical coding unit (or vertical block) and a horizontal coding unit (or horizontal block). In the present specification, the vertical block is a block whose height is greater than the width, and the horizontal block is a block whose width is greater than the height. Further, in this specification, a non-square block may refer to a rectangular block, but the present invention is not limited thereto.
Referring to
Meanwhile, the leaf node of the above-described quad tree may be further split into a multi-type tree (MTT) structure. According to an embodiment of the present invention, in a multi-type tree structure, one node may be split into a binary or ternary tree structure of horizontal or vertical division. That is, in the multi-type tree structure, there are four split structures such as vertical binary split, horizontal binary split, vertical ternary split, and horizontal ternary split. According to an embodiment of the present invention, in each of the tree structures, the width and height of the nodes may all have powers of 2. For example, in a binary tree (BT) structure, a node of a 2N×2N size may be split into two N×2N nodes by vertical binary split, and split into two 2N×N nodes by horizontal binary split. In addition, in a ternary tree (TT) structure, a node of a 2N×2N size is split into (N/2)×2N, N×2N, and (N/2)×2N nodes by vertical ternary split, and split into 2N×(N/2), 2N×N, and 2N×(N/2) nodes by horizontal ternary split. This multi-type tree split can be performed recursively.
A leaf node of the multi-type tree can be a coding unit. When the coding unit is not greater than the maximum transform length, the coding unit can be used as a unit of prediction and/or transform without further splitting. As an embodiment, when the width or height of the current coding unit is greater than the maximum transform length, the current coding unit can be split into a plurality of transform units without explicit signaling regarding splitting. On the other hand, at least one of the following parameters in the above-described quad tree and multi-type tree may be predefined or transmitted through a higher level set of RBSPs such as PPS, SPS, VPS, and the like. 1) CTU size: root node size of quad tree, 2) minimum QT size MinQtSize: minimum allowed QT leaf node size, 3) maximum BT size MaxBtSize: maximum allowed BT root node size, 4) Maximum TT size MaxTtSize: maximum allowed TT root node size, 5) Maximum MTT depth MaxMttDepth: maximum allowed depth of MTT split from QT's leaf node, 6) Minimum BT size MinBtSize: minimum allowed BT leaf node size, 7) Minimum TT size MinTtSize: minimum allowed TT leaf node size.
According to an embodiment of the present invention, ‘split_cu_flag’, which is a flag indicating whether or not to split the current node, can be signaled first. When the value of ‘split_cu_flag’ is 0, it indicates that the current node is not split, and the current node becomes a coding unit. When the current node is the coating tree unit, the coding tree unit includes one unsplit coding unit. When the current node is a quad tree node ‘QT node’, the current node is a leaf node ‘QT leaf node’ of the quad tree and becomes the coding unit. When the current node is a multi-type tree node ‘MTT node’, the current node is a leaf node ‘MTT leaf node’ of the multi-type tree and becomes the coding unit.
When the value of ‘split_cu_flag’ is 1, the current node can be split into nodes of the quad tree or multi-type tree according to the value of ‘split_qt_flag’. A coding tree unit is a root node of the quad tree, and can be split into a quad tree structure first. In the quad tree structure, ‘split_qt_flag’ is signaled for each node ‘QT node’. When the value of ‘split_qt_flag’ is 1, the corresponding node is split into 4 square nodes, and when the value of ‘qt_split_flag’ is 0, the corresponding node becomes the ‘QT leaf node’ of the quad tree, and the corresponding node is split into multi-type nodes. According to an embodiment of the present invention, quad tree splitting can be limited according to the type of the current node. Quad tree splitting can be allowed when the current node is the coding tree unit (root node of the quad tree) or the quad tree node, and quad tree splitting may not be allowed when the current node is the multi-type tree node. Each quad tree leaf node ‘QT leaf node’ can be further split into a multi-type tree structure. As described above, when ‘split_qt_flag’ is 0, the current node can be split into multi-type nodes. In order to indicate the splitting direction and the splitting shape, ‘mtt_split_cu_vertical_flag’ and ‘mtt_split_cu_binary_flag’ can be signaled. When the value of ‘mtt_split_cu_vertical_flag’ is 1, vertical splitting of the node ‘MTT node’ is indicated, and when the value of ‘mtt_split_cu_vertical_flag’ is 0, horizontal splitting of the node ‘MTT node’ is indicated. In addition, when the value of ‘mtt_split_cu_binary_flag’ is 1, the node ‘MTT node’ is split into two rectangular nodes, and when the value of ‘mtt_split_cu_binary_flag’ is 0, the node ‘MTT node’ is split into three rectangular nodes.
In the tree partitioning structure, a luma block and a chroma block may be partitioned in the same form. That is, a chroma block may be partitioned by referring to the partitioning form of a luma block. When a current chroma block is less than a predetermined size, a chroma block may not be partitioned even if a luma block is partitioned.
In the tree partitioning structure, a luma block and a chroma block may have different forms. In this case, luma block partitioning information and chroma block partitioning information may be signaled separately. Furthermore, in addition to the partitioning information, luma block encoding information and chroma block encoding information may also be different from each other. In one example, the luma block and the chroma block may be different in at least one among intra encoding mode, encoding information for motion information, etc.
A node to be split into the smallest units may be treated as one coding block. When a current block is a coding block, the coding block may be partitioned into several sub-blocks (sub-coding blocks), and the sub-blocks may have the same prediction information or different pieces of prediction information. In one example, when a coding unit is in an intra mode, intra-prediction modes of sub-blocks may be the same or different from each other. Also, when the coding unit is in an inter mode, sub-blocks may have the same motion information or different pieces of the motion information. Furthermore, the sub-blocks may be encoded or decoded independently of each other. Each sub-block may be distinguished by a sub-block index (sbIdx). Also, when a coding unit is partitioned into sub-blocks, the coding unit may be partitioned horizontally, vertically, or diagonally. In an intra mode, a mode in which a current coding unit is partitioned into two or four sub-blocks horizontally or vertically is called intra sub-partitions (ISP). In an inter mode, a mode in which a current coding block is partitioned diagonally is called a geometric partitioning mode (GPM). In the GPM mode, the position and direction of a diagonal line are derived using a predetermined angle table, and index information of the angle table is signaled.
Picture prediction (motion compensation) for coding is performed on a coding unit that is no longer divided (i.e., a leaf node of a coding unit tree). Hereinafter, the basic unit for performing the prediction will be referred to as a “prediction unit” or a “prediction block”.
Hereinafter, the term “unit” used herein may replace the prediction unit, which is a basic unit for performing prediction. However, the present disclosure is not limited thereto, and “unit” may be understood as a concept broadly encompassing the coding unit.
First,
Pixels from multiple reference lines may be used for intra prediction of the current block. The multiple reference lines may include n lines located within a predetermined range from the current block. According to an embodiment, when pixels from multiple reference lines are used for intra prediction, separate index information that indicates lines to be set as reference pixels may be signaled, and may be named a reference line index.
When at least some samples to be used as reference samples have not yet been restored, the intra prediction unit may obtain reference samples by performing a reference sample padding procedure. The intra prediction unit may perform a reference sample filtering procedure to reduce an error in intra prediction. That is, filtering may be performed on neighboring samples and/or reference samples obtained by the reference sample padding procedure, so as to obtain the filtered reference samples. The intra prediction unit predicts samples of the current block by using the reference samples obtained as in the above. The intra prediction unit predicts samples of the current block by using unfiltered reference samples or filtered reference samples. In the present disclosure, neighboring samples may include samples on at least one reference line. For example, the neighboring samples may include adjacent samples on a line adjacent to the boundary of the current block.
Next,
According to an embodiment of the present invention, the intra prediction mode set may include all intra prediction modes used in intra prediction (e.g., a total of 67 intra prediction modes). More specifically, the intra prediction mode set may include a planar mode, a DC mode, and a plurality (e.g., 65) of angle modes (i.e., directional modes). Each intra prediction mode may be indicated through a preset index (i.e., intra prediction mode index). For example, as shown in
Meanwhile, the preset angle range can be set differently depending on a shape of the current block. For example, if the current block is a rectangular block, a wide angle mode indicating an angle exceeding 45 degrees or less than-135 degrees in a clockwise direction can be additionally used. When the current block is a horizontal block, an angle mode can indicate an angle within an angle range (i.e., a second angle range) between (45+offset1) degrees and (−135+offset1) degrees in a clockwise direction. In this case, angle modes 67 to 76 outside the first angle range can be additionally used. In addition, if the current block is a vertical block, the angle mode can indicate an angle within an angle range (i.e., a third angle range) between (45−offset2) degrees and (−135−offset2) degrees in a clockwise direction. In this case, angle modes −10 to −1 outside the first angle range can be additionally used. According to an embodiment of the present disclosure, values of offset1 and offset2 can be determined differently depending on a ratio between the width and height of the rectangular block. In addition, offset1 and offset2 can be positive numbers.
According to a further embodiment of the present invention, a plurality of angle modes configuring the intra prediction mode set can include a basic angle mode and an extended angle mode. In this case, the extended angle mode can be determined based on the basic angle mode.
According to an embodiment, the basic angle mode is a mode corresponding to an angle used in intra prediction of the existing high efficiency video coding (HEVC) standard, and the extended angle mode can be a mode corresponding to an angle newly added in intra prediction of the next generation video codec standard. More specifically, the basic angle mode can be an angle mode corresponding to any one of the intra prediction modes {2, 4, 6, . . . , 66}, and the extended angle mode can be an angle mode corresponding to any one of the intra prediction modes {3, 5, 7, . . . , 65}. That is, the extended angle mode can be an angle mode between basic angle modes within the first angle range. Accordingly, the angle indicated by the extended angle mode can be determined on the basis of the angle indicated by the basic angle mode.
According to another embodiment, the basic angle mode can be a mode corresponding to an angle within a preset first angle range, and the extended angle mode can be a wide angle mode outside the first angle range. That is, the basic angle mode can be an angle mode corresponding to any one of the intra prediction modes {2, 3, 4, . . . , 66}, and the extended angle mode can be an angle mode corresponding to any one of the intra prediction modes {−14, −13, −12, . . . , −1} and {67, 68, . . . , 80}. The angle indicated by the extended angle mode can be determined as an angle on a side opposite to the angle indicated by the corresponding basic angle mode. Accordingly, the angle indicated by the extended angle mode can be determined on the basis of the angle indicated by the basic angle mode. Meanwhile, the number of extended angle modes is not limited thereto, and additional extended angles can be defined according to the size and/or shape of the current block. Meanwhile, the total number of intra prediction modes included in the intra prediction mode set can vary depending on the configuration of the basic angle mode and extended angle mode described above
In the embodiments described above, the spacing between the extended angle modes can be set on the basis of the spacing between the corresponding basic angle modes. For example, the spacing between the extended angle modes {3, 5, 7, . . . , 65} can be determined on the basis of the spacing between the corresponding basic angle modes {2, 4, 6, . . . , 66}. In addition, the spacing between the extended angle modes {−14, −13, . . . , −1} can be determined on the basis of the spacing between corresponding basic angle modes {53, 54, . . . , 66} on the opposite side, and the spacing between the extended angle modes {67, 68, . . . , 80} can be determined on the basis of the spacing between the corresponding basic angle modes {2, 3, 4, . . . , 15} on the opposite side. The angular spacing between the extended angle modes can be set to be the same as the angular spacing between the corresponding basic angle modes. In addition, the number of extended angle modes in the intra prediction mode set can be set to be less than or equal to the number of basic angle modes.
According to an embodiment of the present invention, the extended angle mode can be signaled based on the basic angle mode. For example, the wide angle mode (i.e., the extended angle mode) can replace at least one angle mode (i.e., the basic angle mode) within the first angle range. The basic angle mode to be replaced can be a corresponding angle mode on a side opposite to the wide angle mode. That is, the basic angle mode to be replaced is an angle mode that corresponds to an angle in an opposite direction to the angle indicated by the wide angle mode or that corresponds to an angle that differs by a preset offset index from the angle in the opposite direction. According to an embodiment of the present invention, the preset offset index is 1. The intra prediction mode index corresponding to the basic angle mode to be replaced can be remapped to the wide angle mode to signal the corresponding wide angle mode. For example, the wide angle modes {−14, −13, . . . , −1} can be signaled by the intra prediction mode indices {52, 53, . . . , 66}, respectively, and the wide angle modes {67, 68, . . . , 80} can be signaled by the intra prediction mode indices {2, 3, . . . , 15}, respectively. In this way, the intra prediction mode index for the basic angle mode signals the extended angle mode, and thus the same set of intra prediction mode indices can be used for signaling the intra prediction mode even if the configuration of the angle modes used for intra prediction of each block are different from each other. Accordingly, signaling overhead due to a change in the intra prediction mode configuration can be minimized.
Meanwhile, whether or not to use the extended angle mode can be determined on the basis of at least one of the shape and size of the current block. According to an embodiment, when the size of the current block is greater than a preset size, the extended angle mode can be used for intra prediction of the current block, otherwise, only the basic angle mode can be used for intra prediction of the current block. According to another embodiment, when the current block is a block other than a square, the extended angle mode can be used for intra prediction of the current block, and when the current block is a square block, only the basic angle mode can be used for intra prediction of the current block.
The intra-prediction unit determines reference samples and/or interpolated reference samples to be used for intra prediction of the current block, based on the intra-prediction mode information of the current block. When the intra-prediction mode index indicates a specific angular mode, a reference sample corresponding to the specific angle or an interpolated reference sample from current samples in the current block is used for prediction of a current pixel. Thus, different sets of reference samples and/or interpolated reference samples may be used for intra prediction depending on the intra-prediction mode. After the intra prediction of the current block is performed using the reference samples and the intra-prediction mode information, the decoder reconstructs sample values of the current block by adding the residual signal of the current block, which has been obtained from the inverse transform unit, to the intra-prediction value of the current block.
Motion information used for inter prediction may include reference direction indication information (inter_pred_idc), reference picture index (ref_idx_10, ref_idx_11), and motion vector (mvL0, mvL1). Reference picture list utilization information (predFlagL0, predFlagL1) may be set based on the reference direction indication information. In one example, for a unidirectional prediction using an L0 reference picture, predFlagL0=1 and predFlagL1=0 may be set. For a unidirectional prediction using an L1 reference picture, predFlagL0=0 and predFlagL1=1 may be set. For bidirectional prediction using both the L0 and L1 reference pictures, predFlagL0=1 and predFlagL1=1 may be set.
When the current block is a coding unit, the coding unit may be partitioned into multiple sub-blocks, and the sub-blocks have the same prediction information or different pieces of prediction information. In one example, when the coding unit is in an intra mode, intra-prediction modes of the sub-blocks may be the same or different from each other. Also, when the coding unit is in an inter mode, the sub-blocks may have the same motion information or different pieces of motion information. Furthermore, the sub-blocks may be encoded or decoded independently of each other. Each sub-block may be distinguished by a sub-block index (sbIdx).
The motion vector of the current block is likely to be similar to the motion vector of a neighboring block. Therefore, the motion vector of the neighboring block may be used as a motion vector predictor (MVP), and the motion vector of the current block may be derived using the motion vector of the neighboring block. Furthermore, to improve the accuracy of the motion vector, the motion vector difference (MVD) between the optimal motion vector of the current block and the motion vector predictor found by the encoder from an original video may be signaled.
The motion vector may have various resolutions, and the resolution of the motion vector may vary on a block-by-block basis. The motion vector resolution may be expressed in integer units, half-pixel units, ¼ pixel units, 1/16 pixel units, 4-integer pixel units, etc. A video, such as screen content, has a simple graphical form such as text, and does not require an interpolation filter to be applied. Thus, integer units and 4-integer pixel units may be selectively applied on a block-by-block basis. A block encoded using an affine mode, which represent rotation and scale, exhibit significant changes in form, so integer units, ¼ pixel units, and 1/16 pixel units may be applied selectively on a block-by-block basis. Information about whether to selectively apply motion vector resolution on a block-by-block basis is signaled by amvr_flag. If applied, information about a motion vector resolution to be applied to the current block is signaled by amvr_precision_idx.
In the case of blocks to which bidirectional prediction is applied, weights applied between two prediction blocks may be equal or different when applying the weighted average, and information about the weights is signaled via BCW_IDX.
In order to improve the accuracy of the motion vector predictor, a merge or AMVP (advanced motion vector prediction) method may be selectively used on a block-by-block basis. The merge method is a method that configures motion information of a current block to be the same as motion information of a neighboring block adjacent to the current block, and is advantageous in that the motion information is spatially propagated without change in a motion region with homogeneity, and thus the encoding efficiency of the motion information is increased. On the other hand, the AMVP method is a method for predicting motion information in L0 and L1 prediction directions respectively and signaling the most optimal motion information in order to represent accurate motion information. The decoder derives motion information for a current block by using the AMVP or merge method, and then uses a reference block, located in the motion information in a reference picture, as a prediction block for the current block.
A method of deriving motion information in Merge or AMVP involves a method for constructing a motion candidate list using motion vector predictors derived from neighboring blocks of current block, and then signaling index information for the optimal motion candidate. In the case of AMVP, motion candidate lists are derived for L0 and L1, respectively, so the most optimal motion candidate indexes (mvp_10_flag, mvp_11_flag) for L0 and L1 are signaled, respectively. In the case of Merge, a single move candidate list is derived, so a single merge index (merge_idx) is signaled. There may be various motion candidate lists derived from a single coding unit, and a motion candidate index or a merge index may be signaled for each motion candidate list. In this case, a mode in which there is no information about residual blocks in blocks encoded using the merge mode may be called a MergeSkip mode.
Symmetric MVD (SMVD) is a method which makes motion vector difference (MVD) values in the L0 and L1 directions symmetrical in the case of bi-directional prediction, thereby reducing the bit rate of motion information transmitted. The MVD information in the L1 direction that is symmetrical to the L0 direction is not transmitted, and reference picture information in the L0 and L1 directions is also not transmitted, but is derived during decoding.
Overlapped block motion compensation (OBMC) is a method in which, when blocks have different pieces of motion information, prediction blocks for a current block are generated by using motion information of neighboring blocks, and the prediction blocks are then weighted averaged to generate a final prediction block for the current block. This has the effect of reducing the blocking phenomenon that occurs at the block edges in a motion-compensated video.
Generally, a merged motion candidate has low motion accuracy. To improve the accuracy of the merge motion candidate, a merge mode with MVD (MMVD) method may be used. The MMVD method is a method for correcting motion information by using one candidate selected from several motion difference value candidates. Information about a correction value of the motion information obtained by the MMVD method (e.g., an index indicating one candidate selected from among the motion difference value candidates, etc.) may be included in a bitstream and transmitted to the decoder. By including the information about the correction value of the motion information in the bitstream, a bit rate may be saved compared to including an existing motion information difference value in a bitstream.
A template matching (TM) method is a method of configuring a template through a neighboring pixel of a current block, searching for a matching area most similar to the template, and correcting motion information. Template matching (TM) is a method of performing motion prediction by a decoder without including motion information in a bitstream so as to reduce the size of an encoded bitstream. The decoder does not have an original image, and thus may schematically derive motion information of a current block by using a pre-reconstructed neighboring block.
A Decoder-side Motion Vector Refinement (DMVR) method is a method for correcting motion information through the correlation of already restored reference videos in order to find more accurate motion information. The DMVR method is a method which uses the bidirectional motion information of a current block to use, within predetermined regions of two reference pictures, a point with the best matching between reference blocks in the reference pictures as a new bidirectional motion. When the DMVR method is performed, the encoder may perform DMVR on one block to correct motion information, and then partition the block into sub-blocks and perform DMVR on each sub-block to correct motion information of the sub-block again, and this may be referred to as multi-pass DMVR (MP-DMVR).
A local illumination compensation (LIC) method is a method for compensating for changes in luma between blocks, and is a method which derives a linear model by using neighboring pixels adjacent to a current block, and then compensate for luma information of the current block by using the linear model.
Existing video encoding methods perform motion compensation by considering only parallel movements in upward, downward, leftward, and rightward directions, thus reducing the encoding efficiency when encoding videos that include movements such as zooming, scaling, and rotation that are commonly encountered in real life. To express the movements such as zooming, scaling, and rotation, affine model-based motion prediction techniques using four (rotation) or six (zooming, scaling, rotation) parameter models may be applied.
Bi-directional optical flow (BDOF) is used to correct a prediction block by estimating the amount of change in pixels on an optical-flow basis from a reference block of blocks with bi-directional motion. Motion information derived by the BDOF of VVC may be used to correct the motion of a current block.
Prediction refinement with optical flow (PROF) is a technique for improving the accuracy of affine motion prediction for each sub-block so as to be similar to the accuracy of motion prediction for each pixel. Similar to BDOF, PROF is a technique that obtains a final prediction signal by calculating a correction value for each pixel with respect to pixel values in which affine motion is compensated for each sub-block based on optical-flow.
The combined inter-/intra-picture prediction (CIIP) method is a method for generating a final prediction block by performing weighted averaging of a prediction block generated by an intra-picture prediction method and a prediction block generated by an inter-picture prediction method when generating a prediction block for the current block.
The intra block copy (IBC) method is a method for finding a part, which is most similar to a current block, in an already reconstructed region within a current picture and using the reference block as a prediction block for the current block. In this case, information related to a block vector, which is the distance between the current block and the reference block, may be included in a bitstream. The decoder can parse the information related to the block vector contained in the bitstream to calculate or set the block vector for the current block.
The bi-prediction with CU-level weights (BCW) method is a method in which with respect to two motion-compensated prediction blocks from different reference pictures, weighted averaging of the two prediction blocks is performed by adaptively applying weights on a block-by-block basis without generating the prediction blocks using an average.
The multi-hypothesis prediction (MHP) method is a method for performing weighted prediction through various prediction signals by transmitting additional motion information in addition to unidirectional and bidirectional motion information during inter-picture prediction.
The cross-component linear model (CCLM) is a method that constructs a linear model by using the high correlation between a luma signal and a chroma signal at the same position as the luma signal, and then predict the chroma signal by using the linear model. A template is constructed using a block, which has been completely reconstructed, among neighboring blocks adjacent to a current block, and parameters for the linear model are derived through the template. Next, a current luma block, selectively reconstructed based on video formats so as to fit the size of a chroma block, is downsampled. Finally, the downsampled luma block and the corresponding linear model are used to predict a chroma block of the current block. In this case, a method using two or more linear models is referred to as multi-model linear mode (MMLM).
In independent scalar quantization, a reconstructed coefficient t′k for an input coefficient tk depends only on a related quantization index qk. That is, a quantization index for a random reconstructed coefficient has a different value from quantization indexes for other reconstructed coefficients. Here, t′k may be a value that includes a quantization error in tk, and may be different or the same depending on quantization parameters. Here, t′k may be called a reconstructed transform coefficient or a dequantized transform coefficient, and the quantization index may be called a quantized transform coefficient.
In uniform reconstruction quantization (URQ), reconstructed coefficients have the characteristic of being arrangement at equal intervals. The distance between two adjacent reconstructed values may be called a quantization step size. The reconstructed values may include 0, and the entire set of available reconstructed values may be uniquely defined based on the quantization step size. The quantization step size may vary depending on quantization parameters.
In the existing methods, quantization reduces the set of acceptable reconstructed transform coefficients, and elements of the set may be finite. Thus, there are limitation in minimizing the average error between an original video and a reconstructed video. Vector quantization may be used as a method for minimizing the average error.
A simple form of vector quantization used in video encoding is sign data hiding. This is a method in which the encoder does not encode a sign for one non-zero coefficient and the decoder determines the sign for the coefficient based on whether the sum of absolute values of all the coefficients is even or odd. To this end, in the encoder, at least one coefficient may be incremented or decremented by “1”, and the at least one coefficient may be selected and have a value adjusted so as to be optimal from the perspective of rate-distortion cost. In one example, a coefficient with a value close to the boundary between the quantization intervals may be selected.
Another vector quantization method is trellis-coded quantization, and, in video encoding, is used as an optimal path-searching technique to obtain optimized quantization values in dependent quantization. On a block-by-block basis, quantization candidates for all coefficients in a block are placed in a trellis graph, and the optimal trellis path between optimized quantization candidates is found by considering rate-distortion cost. Specifically, the dependent quantization applied to video encoding may be designed such that a set of acceptable reconstructed transform coefficients with respect to transform coefficients depends on the value of a transform coefficient that precedes a current transform coefficient in the reconstruction order. At this time, by selectively using multiple quantizers according to the transform coefficients, the average error between the original video and the reconstructed video is minimized, thereby increasing the encoding efficiency.
Among intra prediction encoding techniques, the matrix intra prediction (MIP) method is a matrix-based intra prediction method, and obtains a prediction signal by using a predefined matrix and offset values through pixels on the left and top of a neighboring block, unlike a prediction method having directionality from pixels of neighboring blocks adjacent to a current bloc.
To derive an intra-prediction mode for a current block, on the basis of a template which is a random reconstructed region adjacent to the current block, an intra-prediction mode for a template derived through neighboring pixels of the template may be used to reconstruct the current block. First, the decoder may generate a prediction template for the template by using neighboring pixels (references) adjacent to the template, and may use an intra-prediction mode, which has generated the most similar prediction template to an already reconstructed template, to reconstruct the current block. This method may be referred to as template intra mode derivation (TIMD).
In general, the encoder may determine a prediction mode for generating a prediction block and generate a bitstream including information about the determined prediction mode. The decoder may parse a received bitstream to set an intra-prediction mode. In this case, the bit rate of information about the prediction mode may be approximately 10% of the total bitstream size. To reduce the bit rate of information about the prediction mode, the encoder may not include information about an intra-prediction mode in the bitstream. Accordingly, the decoder may use the characteristics of neighboring blocks to derive (determine) an intra-prediction mode for reconstruction of a current block, and may use the derived intra-prediction mode to reconstruct the current block. In this case, to derive the intra-prediction mode, the decoder may apply a Sobel filter horizontally and vertically to each neighboring pixel adjacent to the current block to infer directional information, and then map the directional information to the intra-prediction mode. The method by which the decoder derives the intra-prediction mode using neighboring blocks may be described as decoder side intra mode derivation (DIMD).
The neighboring blocks may be spatially located blocks or temporally located blocks. A neighboring block that is spatially adjacent to a current block may be at least one among a left (A1) block, a left below (A0) block, an above (B1) block, an above right (B0) block, or an above left (B2) block. A neighboring block that is temporally adjacent to the current block may be a block in a collocated picture, which includes the position of a top left pixel of a bottom right (BR) block of the current block. When a neighboring block temporally adjacent to the current block is encoded using an intra mode, or when the neighboring block temporally adjacent to the current block is positioned not to be used, a block, which includes a horizontal and vertical center (Ctr) pixel position in the current block, in the collocated picture corresponding to the current picture may be used as a temporal neighboring block. Motion candidate information derived from the collocated picture may be referred to as a temporal motion vector predictor (TMVP). Only one TMVP may be derived from one block. One block may be partitioned into multiple sub-blocks, and a TMVP candidate may be derived for each sub-block. A method for deriving TMVPs on a sub-block basis may be referred to as sub-block temporal motion vector predictor (sbTMVP).
Whether methods described in present the specification are to be applied may be determined on the basis of at least one of pieces of information relating to slice type information (e.g., whether a slice is an I slice, a P slice, or a B slice), whether the current block is a tile, whether the current block is a subpicture, the size of a current block, the depth of a coding unit, whether a current block is a luma block or a chroma block, whether a frame is a reference frame or a non-reference frame, and a temporal layer corresponding a reference sequence and a layer. Pieces of information used to determine whether methods described in the present specification are to be applied may be pieces of information promised between a decoder and an encoder in advance. In addition, such pieces of information may be determined according to a profile and a level. Such pieces of information may be expressed by a variable value, and a bitstream may include information on a variable value. That is, a decoder may parse information on a variable value included in a bitstream to determine whether the above methods are applied. For example, whether the above methods are to be applied may be determined on the basis of the width length or the height length of a coding unit. If the width length or the height length is equal to or greater than 32 (e.g., 32, 64, or 128), the above methods may be applied. If the width length or the height length is smaller than 32 (e.g., 2, 4, 8, or 16), the above methods may be applied. If the width length or the height length is equal to 4 or 8, the above methods may be applied.
A coding unit generally described the specification may imply the same meaning as a coding block. In addition, prediction of a coding unit (block) described generally in the specification may be the same meaning as restoration of a coding unit (block).
The LIC is a method of predicting a current coding block by using a linear model associated with a change in illumination (brightness) of the current coding block and a reference block. The LIC may be adaptively applied to a coding block of which prediction is performed between screens. The LIC may be applied to each of a luma component, a Cb component, and a Cr component of the current coding block. Equation 1 expresses a linear model used for LIC.
In Equation 1, each parameter may be as follows. P′ (x) denotes a sample value of a reference block, to which an LIC method is applied, P(x) denotes a sample value of the reference block, a denotes a scale coefficient, and b denotes an offset value.
In Equation 1, a and b may be obtained by using a least-squares error method. a and b may be least-squares errors of neighboring samples of the current coding block and neighboring samples of the reference block. a and b may be obtained via Equation 2.
In Equation 2, x denotes the value of a neighboring sample of the reference block, y denotes the value of a neighboring sample of the current coding block, and N denotes the number of neighboring samples. In this instance, the number of neighboring samples of the reference block and the number of neighboring samples of the current coding block are the same, and relative locations of corresponding samples may also be the same. The number of samples and the locations thereof may be variously defined. In the specification, the number of samples and the locations thereof may be described via an LIC template. Referring to
In the case in which the current coding block is encoded according to a merge mode, the LIC method may be applied in order to predict the current coding block. A predetermined flag (syntax element) which indicates whether the LIC method is applied may be signaled in units of coding blocks. In addition, a flag (syntax element) indicating whether to enable the LIC method in an SPS level or a slice level may be signaled. An encoder may produce a bit stream including a predetermined flag (syntax element) indicating whether the LIC method is applied in units of coding blocks, and a flag (syntax element) indicating whether to enable the LIC method in an SPS level or a slice level.
The cases in which the LIC method is not applied to a coding block may be as follows. The cases may include i) the case in which the number of the entire samples of a coding block is less than 32, ii) the case in which an block is a geometric encoding mode of a coding partitioning merge (GPM) mode, iii) the case in which an encoding mode of a coding block is an intra block copy (IBC) mode, iv) the case in which an encoding mode of a coding block is a combined intra and inter prediction (CIIP) mode, and v) the case in which an inter bi-prediction is applied to a coding block. When at least one of the cases is satisfied, the current coding block may not be predicted via the LIC method.
A current coding block may be divided into a plurality of sub-blocks. A decoder may perform inter-prediction in units of sub-blocks, and may restore the current coding block. In the case of inter-prediction, a neighboring block for an LIC method may be a neighboring block based on motion information of each sub-block. A neighboring block based on motion information of each sub-block may be a block on the left of a top-left sub-block and a block in the upper side of the top-left sub-block in the case of the top-left sub-block, and may be a neighboring block may be a block on the left of each of the remaining sub-blocks and a block in the upper side of each of the remaining sub-blocks in the case of the remaining sub-blocks. For example, with reference to
The current coding block may be configured with blocks of a luma component and a chroma component. For example, the current coding block may be configured with one configuration among Y, Cb, and Cr component blocks, R, G, and B component blocks, and Y, Cg, and Co component blocks. In this instance, an LIC method based on a linear model may be applied to each of the constituent components. The LIC method based on a linear model may be applied to all the constituent components or the LIC method based on a linear model may be applied to some of the constituent components. A decoder may receive syntax element (flag) indicating a signaling of a component block to which the LIC method is applied, for each coding block. In the case in which the current coding block is configured with Y, Cb, and Cr components blocks, with reference to
A bitstream may be capsulated by using a network abstraction layer (NAL) unit as a basic unit. That is, a bitstream may be configured with one or more network abstraction layer (NAL) units. Referring to
Information related to LIC may be included in an SPS RBSP, a PPS RBSP, and GCI. Referring to
A syntax element (lic_flag) indicating whether LIC is applied to a current coding unit (block) may be signaled based on a coding unit. lic_flag may be signaled based on the value of pps_lic_enabled_flag. For example, in the case in which the value of pps_lic_enabled_flag is 1 (i.e., true), lic_flag may be signaled. In the case in which the value of lic_flag is 1, this indicates that an LIC method is applied to the current coding unit. In the case in which the value of lic_flag is 0, this indicates that the LIC method is not applied to the current coding unit. In the case in which an encoding mode of the current coding unit is a merge mode, a decoder may obtain information associated with whether the LIC method is applied, via a reference block that is adjacent to the current coding unit. Accordingly, in the case in which the encoding mode of the current coding unit is a merge mode, lic_flag may not be signaled. Specifically, with reference to 1301 of
A GPM mode is a mode in which a current coding unit is divided into two areas based on a single straight boundary line, and intra-prediction is performed with respect to each of the two division areas, so that a prediction signal of the current coding unit is obtained. That is, a decoder may perform intra-prediction by using different pieces of motion information for two division areas, respectively, so as to produce a prediction signal (P0, P1) for each of the two division areas. The decoder may mix P0 and P1 so as to obtain a prediction signal of the current coding unit. Specifically, P0 and P1 may be produced by using a mixed matrix (w0, w1). In this instance, the mixed matrix may have a value in the range of 0 to 8.
Referring to
Referring to
A condition for signaling of lic_flag in the case in which a GPM mode is applied to a current coding unit will be described. That is, in the case in which the GPM mode is applied to the current coding unit, the decoder may parse lic_flag based on the condition listed in Table 1 (the diagram 1710 of
Specifically, referring to Table 1 and diagram 1710 of
In the case in which inter-prediction is applied in order to predict the current coding unit, a condition for using the GPM mode is as shown in Table 2.
With reference to Table 2, when the encoding mode of the current coding unit is a merge mode (general_merge_flag[x0] [y0]==1) and the merge mode is not a merge in mode units of sub-blocks (merge_subblock_flag[x0] [y0]==0) but a merge mode in coding units, and the merge mode is not a regular merge mode nor mmvd merge mode (regular_merge_flag[x0] [y0]==0), and a CIIP mode is not used for predicting the current coding unit (ciip_flag[x0] [y0]==0), the GPM mode may be used. That is, when the condition of Table 2 is satisfied, the current coding unit may be predicted according to the GPM mode.
Hereinafter, embodiments in which lic_flag is signaled when the GPM mode is used will be described.
According to the GPM mode, the current coding unit may be divided into two areas (partitions). In this instance, lic_flag may be signaled for each area. In this instance, lic_flag may be separated as gpm0_lic_flag and gpm1_lic_flag and may be signaled to corresponding areas, respectively. That is, the decoder may parse gpm0_lic_flag indicating whether the LIC mode is applied to a first area and parge gpm1_lic_flag indicating whether the LIC mode is applied to a second area, respectively, and may determine whether an LIC is applied to each area.
In addition, based on whether LIC is applied to a neighboring block of the current coding unit, whether LIC is applied to each of one or more of the two areas.
In addition, based on whether LIC is applied to a neighboring block, the decoder may determine whether LIC is applied to the first area. The decoder may determine that LIC is not applied to the second area, irrespective of whether LIC is applied to a neighboring block. The first area may be highly associated with an adjacent neighboring block. However, in the case of the second area, it is difficult to configure a template adjacent to the second area. This may be the case in which the current coding unit is divided into a first area, which is adjacent to an upper side neighboring block and a left side neighboring block of the current coding unit, and a second area, which is not adjacent to an upper side neighboring block and a left side neighboring block of the current coding unit.
When performing a regular bi-intra-prediction, a decoder may perform bi-intra-prediction by using a reference block of a first picture that corresponds to a first direction and is earlier than a current picture and a reference block of a second picture that corresponds to a second direction and is later in time. The decoder may derive a parameter for an LIC linear model by using a neighboring sample of the current block, a neighboring sample of the reference block of the first picture, and a neighboring sample of the reference block of the second picture. In this instance, based on whether LIC is used for the reference block of the first picture and the reference block of the second picture, whether LIC is used for predicting the current block may be determined. In this instance, whether LIC is used for the reference block of the first picture and the reference block of the second picture may be determined, respectively, and separate signaling indicating whether LIC is used for each reference block may be present. In this instance, separate signaling (lic_flag[x0] [y0]) may be included in a coding unit syntax structure (coding_unit( ){ }). In the case in which an encoding mode of the current block is an AMVP mode, and bi-intra-prediction is applied, a condition for parsing separate signaling (lic_flag[x0] [y0]) is as shown in Table 3. When the encoding mode of the current block is a merge mode and bi-intra-prediction is applied, a condition for parsing separate signaling (lic_flag[x0] [y0]) is as shown in Table 4.
The decoder may derive an LIC linear parameter for each direction (a first direction, a second direction), may configure a prediction block for each direction, and may obtain a weighted mean thereof, thereby producing a final prediction block. The decoder may produce a first prediction block by multiplying a first weight with the reference block corresponding to the first direction, and may produce a second prediction block by multiplying a second weight with the reference block corresponding to the second direction. The decoder may produce a final prediction block (a weighted-mean prediction block) by calculating a weighted mean of the first prediction block and the second prediction block. In this instance, the first weight and the second weight may be different from each other. The decoder may derive a parameter of an LIC linear model between a template configured with a neighboring block of the weighted-mean prediction block and a template configured with a neighboring block of the current block. The decoder may predict the current block by applying the derived parameter of the LIC linear model to the final prediction block. In this instance, to configure a template of the weighted-mean prediction block, a prediction block for each direction may be produced as a block extended as big as the size of the template. Alternatively, the decoder may configure the template for the prediction block by using the pixels in one line arranged in the top of the weighted-mean prediction block and the pixels in one line arranged in the very left of the weighted-mean prediction block.
In the case in which a neighboring block located in the upper side of a current block is a first template, and a neighboring block on the left of the current block is a second template, a template of a current coding unit for an LIC linear mode may be configured in three types of methods. Referring to
An encoder may indicate, based on a cost value, a method to be used for configuring a template and obtaining an LIC linear model, among the three methods described above. Specifically, the encoder may produce a bitstream including a syntax element (lic_mode_idx) indicating a template configuration or a template configuration to be used. The decoder may parse lic_mode_idx, and may identify a template to be used for obtaining an LIC linear mode. In this instance, lic_mode_idx may be parsed, only when the value of lic_flag[x0][y0] is 1 (i.e., true). In the case in which both the first template and the second template are used, the value of lic_mode_idx may be 2. In the case in which only the first template is used, the value of lic_mode_idx may be 1. In the case in which only the second template is used, the value of lic_mode_idx may be 0. In addition, lic_mode_idx may indicate a template configuration via a 2-bit value. For example, i) when the value of lic_mode_idx is 00, both the first template and the second template may be used, ii) when the value of lic_mode_idx is 10, only the first template may be used, and iii) when lic_mode_idx is 11, only the second template may be used. As another index mapping method, i) when the value of lic_mode_idx is 00, only the second template may be used, ii) when the value of lic_mode_idx is 10, only the first template may be used, and iii) when lic_mode_idx is 11, both the first and the second template may be used.
An encoder may apply a context model to a first bin, and may perform entropy coding using context adaptive binary arithmetic coding (CABAC). The context model for lic_mode_idx may be defined to be a value obtained via experiments. initValue of
In addition, the use of initValue depending on a slice type may be selectively applied for each slice. According to an embodiment, the order of use of the value of initValue may be changed depending on the value of lic_mode_idx defined in a slice header. In the case in which the value of lic_mode_idx is 1, and the current slice type is slice P, initValue may be 6. In the case in which the value of lic_mode_idx is 1, and the current slice type is slice B, initValue may be 3. In the case in which the value of lic_mode_idx is 0, and the current slice type is slice P, initValue may be 3. In the case in which the value of lic_mode_idx is 0, and the current slice type is slice B, initValue may be 6.
The location of a luma component block located in the top-left of the current coding unit may be expressed as (x0, y0) in the form of coordinates. A sample location (xNbL, yNbL) of a left side neighboring block of the current coding unit may be (x0-1, y0), and a sample location (xNbA, yNbA) of an upper side neighboring block of the current coding unit may be (x0, y0-1). In the case in which a sample of the upper side neighboring block is available, it may be expressed as availableA. In the case in which a sample of the left side neighboring block is available, it may be expressed as availableL. In the case in which a sample is not available, it may be expressed as FALSE.
Hereinafter, a context model associated with a symbol of lic_mode_idx that is an embodiment of the disclosure will be described.
The value of a context index (ctxInc) may be determined to be a value of between 0 and 2 when LIC is applied to both the left side neighboring block and the upper side neighboring block among the neighboring blocks of the current block. The value of ctxInc may be determined to be a value of between 0 and 1 when LIC is applied only to one of the left side neighboring block and the upper side neighboring block among the neighboring blocks of the current block. condL may indicate whether an LIC mode is applied to the left side neighboring block among the neighboring blocks of the current block. That is, based on the value of lic_mode_idx, condL may indicate whether the LIC mode is applied to the left side neighboring block. condA may indicate whether the LIC mode is applied to the upper side neighboring block among the neighboring blocks of the current block. That is, based on the value of lic_mode_idx, condA may indicate whether the LIC mode is applied to the upper side neighboring block. ctxSetIdx is a value determined based on a current slice type, and may have a value ranging from 0 to 2. Table 5 is an example of determining a context index according to an embodiment of the disclosure.
In the case in which LIC is applied to both the left side neighboring block and the upper side neighboring block of the current block, and all neighboring blocks use a template (first template) configured with blocks located in the upper side of the neighboring blocks and a template (second template) configured with blocks on the left of the neighboring blocks, the value of lic_mode_idx may be determined to be 2. In the case in which LIC is applied to both the left side neighboring block and the upper side neighboring block of the current block, and only the first template or only the second template is used for one or more among the neighboring blocks, the value of lic_mode_idx may be determined to be 1. condL may indicate whether an LIC mode is applied to the left side neighboring block among the neighboring blocks of the current block. That is, based on the value of lic_mode_idx, condL may indicate whether the LIC mode is applied to the left side neighboring block, and may be set to a value indicating a template configuration (mode). condA may indicate whether an LIC mode is applied to the upper side neighboring block among the neighboring blocks of the current block. That is, based on the value of lic_mode_idx, condA may indicate whether the LIC mode is applied to the upper side neighboring block, and may be set to a value indicating a template configuration (mode). ctxSetIdx is a value determined based on a current slice type, and may have a value ranging from 0 to 2. As described above, the template configuration (mode) may be i) the case in which both the first template and the second template are used, ii) the case in which only the first template is used, and iii) the case in which only the second template is used. In this instance, each of the template configurations of i) to iii) may be mapped to one of a first mode, a second mode, and a third mode. In addition, each mode may be indicated by a value of 0, 1, or 2 or may be indicated by a 2-bit value. Table 6 is an example of determining a context index according to an embodiment of the disclosure.
Similar to an LIC linear model, a convolutional model may derive a linear relation between a template of a current coding/prediction block and a template of a reference block, and may predict the current coding/prediction block by applying the derived linear relation to a sample of the reference block. The convolutional model may be provided in a form that includes a plurality of convolutional filter coefficients, unlike the form of Equation 1. In this instance, the number of the convolutional filter coefficients may be a predetermined number, or may be variable. A convolutional filter coefficient may be a value that minimizes a mean square error (MSE) between template samples of the current coding/prediction block and template samples of a reference block determined based on motion information. The convolutional filter coefficient may be obtained by using Cholesky decomposition or LDL decomposition. For example, in a matrix operation of Ax=B, calculation may be performed in the form of x=B/A. In this instance, a method of decomposing matrix A for easy calculation of 1/A may be Cholesky decomposition or LDL decomposition. Cholesky decomposition may be decomposed in the form of the product of a lower triangular matrix (or an upper triangular matrix) and the transposed matrix thereof, and LDL decomposition may be decomposed in the form of the product of a lower triangular matrix (or an upper triangular matrix), a diagonal matrix, and the transposed matrix of the lower triangular matrix. A lower triangular matrix may be a matrix in which components are present below a diagonal matrix of a matrix, and components of 0 are present above the diagonal matrix. The upper triangular matrix, on the contrary, may be a matrix in which components are present above a diagonal matrix and components of 0 are present below the diagonal matrix. In the matrix operation of Ax=B, A may be template values of a luma component (or a Cb component or Cr component) block of a reference block, and B may be template values of a luma component (or a Cb component or Cr component) block of the current coding/prediction block. Alternatively, A may be template values of a luma component (or a Cb component or Cr component) block of the current coding/prediction block, and B may be template values of a luma component (or a Cb component or Cr component) block of the reference block. A method of obtaining a filter coefficient may be as follows. An autocorrelation matrix may be obtained for A values, and a cross-correlation vector between A values and B values may be calculated. The autocorrelation matrix may be decomposed using LDL decomposition. The result of decomposition may be expressed as U′*D*U*x=B. U denotes an upper triangular matrix, D denotes a diagonal matrix, and U′ denotes the transposed matrix of U. A convolutional filter coefficient may be obtained by applying back substitution of Gauss-Jordan elimination to U′*D*U*x=B.
Referring to
Hereinafter, a method of calculating the non-linear element (P) of
The non-linear element (P) may be determined based on the value of the sample (C) located in the center of the current block. CLuma denotes the value of C when the current block is a luma component block. CCb denotes the value of C when the current block is a Cb component block. CCr denotes the value of C when the current block is a CR component block.
The non-linear element (P) may be determined based on the mean value (meanSamples) of all sample values of a reference block and/or all sample values of the current block. P may be calculated for each color component of the reference block and/or the current block. meanSamplesLuma denotes the mean value of all sample values when the reference block and/or the current block is a luma component block. meanSamplesCb denotes the mean value of all sample values when the reference block and/or the current block is a Cb component block. meanSamplesCr denotes the mean value of all sample values when the reference block and/or the current block is a Cr component block.
The non-linear element (P) may be determined based on a mean value for each color component of a template. meanY denotes the mean value of a template of a luma component block. meanCb denotes the mean value of a template of a Cb component block. meanCr denotes the mean value of a template of a Cr component block. In this instance, the template may be the template of the reference block and/or the template of the current block.
The bit operators << and >> in the equation are left/right shift operators, and may indicate results of multiplication and division.
The bias element (B) is an integer value and may have the median value of bitDepth. For example, in the case in which bitDepth is 10 bits, B may be 512.
Referring to
A neighboring sample in a predetermined location may be a sample (i.e., a side sample) that is not included in a template. In this instance, the side sample may be a predetermined value. The side sample may have the value same as one of the neighboring samples included in the template. That is, the side sample may be obtained by padding one of the neighboring samples included in the template. The side sample may be obtained by padding a sample closest to the side sample among the neighboring samples included in the template.
Alternatively, the side sample may have the mean value of a plurality of neighboring samples included in Depending on the location of the side the template. sample, a plurality of neighboring samples used for calculating a mean value may be determined. There may be a single set 2301, 2302, 2303, or 2304 including a calculated side sample and neighboring samples. In this instance, the mean value of neighboring samples included in a line closest to the side sample may be the value of the side sample. For example, referring to diagram 2301, a side sample may be S, and S may have the mean value of W, C, and E. Referring to diagram 2302, a side sample may be W, and W may have the mean value of N, C, and S. Referring to diagram 2303, side samples may be W and N, and W and N may have the mean value of C, E and S. Referring to diagram 2304, a side sample may be E, and E may have the mean value of N, C, and S. Alternatively, the mean value of the samples remaining after excluding a side sample in a set may be the value of the side sample.
Referring to
Referring to
A convolutional model relation may be changed based on the number of configured samples. Whether each of C, N, S, E, and W is present among neighboring samples, the filter coefficient may be number of a convolutional Whether a changed. a side sample for calculating convolutional model filter coefficient is needed may be determined based on the locations and the number of the neighboring samples. For example, in the case in which only the sample in location C is used as shown in
Each side sample W, N, E, and S may be padded with the value of a sample in location C, as shown in
An optimal filter form may be a predetermined form. In this instance, a decoder may produce information associated with a filter form by including the same in a bitstream. In this instance, the information associated with a filter form may be included in header information of at least one of an SPS, a PPS, a PH, a slice/tile, and a coding unit. The decoder may parse the information associated with a filter form, so as to determine a filter form for predicting a current block. In the case in which information associated with a filter form is not included in a bitstream, a predetermined filter form may be used. In this instance, the predetermined filter form may be the form illustrated in
An LIC linear model that has been described with reference to Equation 1 may be updated. In Equation 1, a which is the gradient and b which is the y-intercept may be updated as shown in Equation 3. That is a′ calculated via Equation 3 may be updated with a in Equation 1, and b′ may be updated with b in Equation 1.
In Equation 3, u may be a value signaled for each coding unit or each prediction unit. In this instance, u may have an integer value ranging from −4 to 4. Yr in Equation 3 may be the mean value of a template of a reference block. In this instance, updating may be performed for each color component. Therefore, in the case in which the template of the reference block corresponds to a luma component block, Yr may be the mean value of luma component blocks. In the case in which the template of the reference block corresponds to a Cb component block, Yr may be the mean value of Cb component blocks. In the case in which the template of the reference block corresponds to a Cr component block, Yr may be the mean value of Cr component blocks. In addition, Yr may be the mean value for each color component of a template of a current coding/prediction block, as opposed to the mean value for each color component of the template of the reference block. Yr may be the mean value of blocks of any one color component. That is, the mean value of blocks of any one component may be the mean value of blocks of the remaining color components.
The opposite of the relation between the horizontal axis (x-axis) and the vertical axis (y-axis) in
A decoder may parse a first syntax element that is a general constraint information (GCI) syntax element in operation S2610. The decoder may parse a second syntax element indicating whether an LIC mode is available for a current sequence in operation S2620. Based on a result of parsing the second syntax element, the decoder may parse a third syntax element indicating whether the LIC mode is used in the current block in operation S2630. When the third syntax element indicates whether the LIC mode is used in the current block, the decoder may predict the current block based on the LIC mode in operation S2640.
The first syntax element may be included in at least one of a sequence parameter set (SPS) RBSP syntax and a video parameter set (VPS) RBSP syntax, and the second syntax element may be included in the SPS RBSP syntax.
When the value of the first syntax element is 1, the value of the second syntax element is set to 0 that is a value indicating that the LIC mode is not used, irrespective of the result of parsing the second syntax element. When the value of the first syntax element is 0, the value of the second syntax element is not constrained.
The third syntax element may be parsed when the second syntax element indicates that the LIC mode is available for the current block.
The third syntax element may be parsed by additionally taking into consideration at least one of the number of samples of the current block, an encoding mode of the current block, and a prediction direction associated with the current block. Specifically, the third syntax element may be parsed when the number of samples of the current block is 32 or more. The third syntax element may be parsed when the encoding mode of the current block is not a merge mode, an IBC mode, and a CIIP mode. The third syntax element may be parsed when the prediction direction associated with the coding block is not bi-prediction.
The third syntax element may indicate whether the LIC mode is used in the current block. In this instance, the decoder may configure a first template including neighboring blocks of the current block. The decoder may configure a second template including neighboring blocks of a reference block of the current block. The decoder may obtain an LIC linear model based on the first template and the second template. The decoder may predict the current block based on the LIC linear model. A location and a size of the first template correspond to a location and a size of the second template.
The first template may include upper side neighboring blocks of the current block, and the second template may include upper side neighboring blocks of the reference block.
The first template may include left side neighboring blocks of the current block, and the second template may include left side neighboring blocks of the reference block.
The first template may include upper side neighboring blocks of the current block and left side neighboring blocks of the current block, and the second template may include upper side neighboring blocks of the reference block and left side neighboring blocks of the reference block.
When the encoding mode of the current block is a GPM mode, and the third syntax element indicates whether the LIC mode is used in the current block, the current block may be divided into a first area and a second area. In this instance, the decoder may obtain a first LIC linear model for the first area. The decoder may obtain, based on the first LIC linear model, a first prediction block for the first area. The decoder may obtain a second LIC linear model for the second area. The decoder may obtain, based on the second LIC linear model, a second prediction block for the second area. Based on the first prediction block and the second prediction block, the decoder may predict the current block.
The third syntax element may indicate whether the LIC mode is used in the current block. In this instance, the decoder may configure a template including neighboring blocks located in a predetermined range from the current block. Based on the template, the decoder may obtain a convolutional model. Based on the convolutional model, the decoder may predict the current block. The current block may be one sample. A filter coefficient of the convolutional model may be a coefficient of at least one sample among an upper side sample, a lower side sample, a left side sample, and a right side sample of the one sample. When one or more samples among the upper side sample, the lower side sample, the left side sample, and the right side sample of the one sample are not included in the template, a value of the sample that is not included in the template may be the mean value of the samples remaining after excluding the sample that is not included in the template. When one or more samples among the upper side sample, the lower side sample, the left side sample, and the right side sample of the one sample are not included in the template, the value of the sample that is not included in the template is identical to a value of a sample closest to the sample that is not included in the template, among the samples included in the template.
The above methods (video signal processing methods) described in the present specification may be performed by a processor in a decoder or an encoder. Furthermore, the encoder may generate a bitstream that is decoded by a video signal processing method. Furthermore, the bitstream generated by the encoder may be stored in a computer-readable non-transitory storage medium (recording medium).
The present specification has been described primarily from the perspective of a decoder, but may function equally in an encoder. The term “parsing” in the present specification has been described in terms of the process of obtaining information from a bitstream, but in terms of the encoder, may be interpreted as configuring the information in a bitstream. Thus, the term “parsing” is not limited to operations of the decoder, but may also be interpreted as the act of configuring a bitstream in the encoder. Furthermore, the bitstream may be configured to be stored in a computer-readable recording medium.
The above-described embodiments of the present invention may be implemented through various means. For example, embodiments of the present invention may be implemented hardware, firmware, software, or a combination thereof.
For implementation by hardware, the method according to embodiments of the present invention may be implemented by one or more of Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, and the like.
In the case of implementation by firmware or software, the method according to embodiments of the present invention may be implemented in the form of a module, procedure, or function that performs the functions or operations described above. The software code may be stored in memory and driven by a processor. The memory may be located inside or outside the processor, and may exchange data with the processor by various means already known.
Some embodiments may also be implemented in the form of a recording medium including computer-executable instructions such as a program module that is executed by a computer. Computer-readable media may be any available media that may be accessed by a computer, and may include all volatile, nonvolatile, removable, and non-removable media. In addition, the computer-readable media may include both computer storage media and communication media. The computer storage media include all volatile, nonvolatile, removable, and non-removable media implemented in any method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data. Typically, the communication media include computer-readable instructions, other data of modulated data signals such as data structures or program modules, or other transmission mechanisms, and include any information transfer media.
The above-mentioned description of the present invention is for illustrative purposes only, and it will be understood that those of ordinary skill in the art to which the present invention belongs may make changes to the present invention without altering the technical ideas or essential characteristics of the present invention and the invention may be easily modified in other specific forms. Therefore, the embodiments described above are illustrative and are not restricted in all aspects. For example, each component described as a single entity may be distributed and implemented, and likewise, components described as being distributed may also be implemented in an associated fashion.
The scope of the present invention is defined by the appended claims rather than the above detailed description, and all changes or modifications derived from the meaning and range of the appended claims and equivalents thereof are to be interpreted as being included within the scope of present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0117969 | Sep 2021 | KR | national |
10-2022-0104334 | Aug 2022 | KR | national |
This application is a continuation of pending PCT International Application No. PCT/KR2022/013284, which was filed on Sep. 5, 2022, and which claims priority under 35 U.S.C 119 (a) to Korean Patent Application No. 10-2021-0117969 filed with the Korean Intellectual Property Office on Sep. 3, 2021, and Korean Patent Application No. 10-2022-0104334 filed with the Korean Intellectual Property Office on Aug. 19, 2022. The disclosures of the above patent applications are incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2022/013284 | 9/5/2022 | WO |