The present disclosure relates to a video signal processing method and apparatus.
Recently, demand for high-resolution, high-quality images such as high definition (HD) images and ultra-high definition (UHD) images has been increasing in various application fields. As the resolution and quality of video data increase, the amount of data increases relative to existing video data, and thus when video data is transmitted using media such as conventional wired or wireless broadband lines or stored using conventional storage media, transmission costs and storage costs increase. High-efficiency video compression techniques can be used to solve such problems occurring as the resolution and quality of video data increase.
There are various video compression techniques such as an inter-prediction technique for predicting pixel values included in the current picture from pictures before or after the current picture, an intra-prediction technique for predicting pixel values included in the current picture using pixel information in the current picture, and an entropy coding technique for allocating short codes to values with a high frequency of appearance and allocating long codes to values with a low frequency of appearance. By using these video compression techniques, video data can be effectively compressed and transmitted or stored.
Meanwhile, as the demand for high-resolution video increases, the demand for three-dimensional video content as a new video service is also increasing. Discussions are underway regarding video compression technology for effectively providing high-resolution and ultra-high-resolution three-dimensional video content.
Therefore, the present disclosure has been made in view of the above problems, and it is an object of the present disclosure to provide a method of performing motion estimation based on a previously reconstructed picture on a decoder and an apparatus for performing the same.
It is another object of the present disclosure to provide a method of improving prediction accuracy by combining a plurality of inter-prediction modes and an apparatus for performing the same.
It is a further object of the present disclosure to provide a method of adaptively determining indices assigned to motion information candidates included in a motion information list and an apparatus for performing the same.
It is a further object of the present disclosure to provide a method of rearranging motion information candidates included in a motion information list and an apparatus for performing the same.
It is a further object of the present disclosure to provide a method of generating a new motion information candidate by combining two motion information candidates included in a motion information list and an apparatus for performing the same.
The objects to be achieved by the present disclosure are not limited to the objects mentioned above, and other objects which are not mentioned can be clearly understood by those skilled in the art from the description below.
In accordance with an aspect of the present disclosure, the above and other objects can be accomplished by the provision of a video decoding method including generating a motion information merge list for a current block, selecting one of a plurality of motion information merge candidates included in the motion information merge list, and obtaining a prediction block for the current block on the basis of motion information derived from the selected motion information merge candidate. Here, an index assigned to each of the plurality of motion information merge candidates is determined on the basis of ranking of the motion information merge candidates, determined by a decoder.
In accordance with another aspect of the present disclosure, there is provided a video encoding method including performing motion estimation on a current block, generating a motion information merge list for the current block, and selecting one motion information merge candidate having the same motion information as the current block from among a plurality of motion information merge candidates included in the motion information merge list. Here, an index assigned to each of the plurality of motion information merge candidates is determined on the basis of adaptively determined ranking of the motion information merge candidates.
In the video decoding/encoding method according to the present disclosure, an insertion order of the motion information merge candidates may be determined according to the ranking, and the plurality of motion information merge candidates may be sequentially inserted into the motion information merge list in the insertion order.
In the video decoding/encoding method according to the present disclosure, an arrangement order of the motion information merge candidates may be determined according to the ranking, and the plurality of motion information merge candidates may be rearranged in the arrangement order.
In the video decoding/encoding method according to the present disclosure, among the plurality of motion information merge candidates, an arrangement order of motion information merge candidates having unidirectional motion information may be the same as an insertion order, and an arrangement order of motion information merge candidates having bidirectional motion information may be determined on the basis of bidirectional matching.
In the video decoding/encoding method according to the present disclosure, the arrangement order of the motion information merge candidates having the bidirectional motion information may be determined on the basis of a cost between an L0 reference block and an L1 reference block determined on the basis of the bidirectional motion information of each of the motion information merge candidates having the bidirectional motion information.
In the video decoding/encoding method according to the present disclosure, the ranking may be determined on the basis of index information indicating one of a plurality of ranking candidates, and the index may be explicitly signaled through a bitstream.
In the video decoding/encoding method according to the present disclosure, the index information may be signaled in units of regions where parallel processing is performed, and a region where the parallel processing is performed may be a sub-picture, a slice, a tile, a coding tree unit, or a merge estimation region.
In the video decoding/encoding method according to the present disclosure, the ranking may be determined on the basis of a cost between a reference template determined on the basis of motion information of each of the motion information merge candidates and a current template composed of previously reconstructed samples around the current block.
In the video decoding/encoding method according to the present disclosure, the ranking may be determined on the basis of a shape of the current block.
In the video decoding/encoding method according to the present disclosure, if a width of the current block is greater than a height, an index assigned to a motion information merge candidate derived from a left neighboring block may have a smaller value than an index assigned to a motion information merge candidate derived from a top neighboring block.
In the video decoding/encoding method according to the present disclosure, at least one of the plurality of motion information merge candidates may be generated by combining motion information merge candidates previously included in the motion information merge list, and the combination may have a lowest cost among combinations of L0 motion information and L1 motion information of the previously included motion information merge candidates.
In the video decoding/encoding method according to the present disclosure, the cost may be derived on the basis of a difference between an L0 reference block determined by the L0 motion information and an L1 reference block determined by the L1 motion information.
In the video decoding/encoding method according to the present disclosure, the cost may be derived on the basis of a difference between a result of a weighted sum operation performed on an L0 reference template determined by the L0 motion information and an L1 reference template determined by the L1 motion information and the current template around the current block.
The features briefly summarized above with respect to the present disclosure are merely exemplary aspects of the detailed description of the present disclosure described below and do not limit the scope of the present disclosure.
According to the present disclosure, signaling overhead can be reduced by performing motion estimation on the decoder side on the basis of a previously reconstructed picture.
According to the present disclosure, prediction accuracy can be improved by combining a plurality of inter-prediction modes.
According to the present disclosure, the number of bits required to encode indices can be reduced by adaptively determining indices assigned to motion information candidates included in a motion information list.
According to the present disclosure, the number of bits required to encode indices can be reduced by rearranging motion information candidates included in the motion information list.
According to the present disclosure, prediction accuracy can be improved by combining two motion information candidates included the motion information list to generate a new motion information candidate.
The effects that can be obtained from the present disclosure are not limited to the effects mentioned above, and other effects that are not mentioned can be clearly understood by those skilled in the art from the description below.
As the present disclosure may make various changes and have several embodiments, specific embodiments will be illustrated in a drawing and described in detail. But, it is not intended to limit the present disclosure to a specific embodiment, and it should be understood that it includes all changes, equivalents or substitutes included in an idea and a technical scope for the present disclosure. A similar reference numeral was used for a similar component while describing each drawing.
A term such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only to distinguish one component from other components. For example, without going beyond a scope of a right of the present disclosure, a first component may be referred to as a second component and similarly, a second component may be also referred to as a first component. A term of and/or includes a combination of a plurality of relative entered items or any item of a plurality of relative entered items.
When a component is referred to as being “linked” or “connected” to other component, it should be understood that it may be directly linked or connected to that other component, but other component may exist in the middle. On the other hand, when a component is referred to as being “directly linked” or “directly connected” to other component, it should be understood that other component does not exist in the middle.
As terms used in this application are just used to describe a specific embodiment, they are not intended to limit the present disclosure. Expression of the singular includes expression of the plural unless it clearly has a different meaning contextually. In this application, it should be understood that a term such as “include” or “have”, etc. is to designate the existence of characteristics, numbers, steps, motions, components, parts or their combinations entered in the specification, but is not to exclude a possibility of addition or existence of one or more other characteristics, numbers, steps, motions, components, parts or their combinations in advance.
Hereinafter, referring to the attached drawings, a desirable embodiment of the present disclosure will be described in more detail. Hereinafter, the same reference numeral is used for the same component in a drawing and an overlapping description for the same component is omitted.
Referring to
As each construction unit shown in
Further, some components may be just an optional component for improving performance, not a necessary component which perform an essential function in the present disclosure. The present disclosure may be implemented by including only a construction unit necessary for implementing the essence of the present disclosure excluding a component used to just improve performance, and a structure including only a necessary component excluding an optional component used to just improve performance is also included in a scope of a right of the present disclosure.
A picture partitioning unit 110 may partition an input picture into at least one processing unit. In this case, a processing unit may be a prediction unit (PU), a transform unit (TU) or a coding unit (CU). In a picture partitioning unit 110, one picture may be partitioned into a combination of a plurality of coding units, prediction units and transform units and a picture may be encoded by selecting a combination of one coding unit, prediction unit and transform unit according to a predetermined standard (e.g., a cost function).
For example, one picture may be partitioned into a plurality of coding units. In order to partition a coding unit in a picture, a recursive tree structure such as a quad tree, a ternary tree or a binary tree may be used, and a coding unit which is partitioned into other coding units by using one image or the largest coding unit as a route may be partitioned with as many child nodes as the number of partitioned coding units. A coding unit which is no longer partitioned according to a certain restriction becomes a leaf node. In an example, when it is assumed that quad tree partitioning is applied to one coding unit, one coding unit may be partitioned into up to four other coding units.
Hereinafter, in an embodiment of the present disclosure, a coding unit may be used as a unit for encoding or may be used as a unit for decoding.
A prediction unit may be partitioned with at least one square or rectangular shape, etc. in the same size in one coding unit or may be partitioned so that any one prediction unit of prediction units partitioned in one coding unit can have a shape and/or a size different from another prediction unit.
In intra prediction, a transform unit may be configured to be the same as a prediction unit. In this case, after partitioning a coding unit into a plurality of transform units, intra prediction may be performed per each transform unit. A coding unit may be partitioned in a horizontal direction or in a vertical direction. The number of transform units generated by partitioning a coding unit may be 2 or 4 according to a size of a coding unit.
Prediction units 120 and 125 may include an inter prediction unit 120 performing inter prediction and an intra prediction unit 125 performing intra prediction. Whether to perform inter prediction or intra prediction for a coding unit may be determined and detailed information according to each prediction method (e.g., an intra prediction mode, a motion vector, a reference picture, etc.) may be determined. In this case, a processing unit that prediction is performed may be different from a processing unit that a prediction method and details are determined. For example, a prediction method, a prediction mode, etc. may be determined in a coding unit and prediction may be performed in a prediction unit or a transform unit. A residual value (a residual block) between a generated prediction block and an original block may be input to a transform unit 130. In addition, prediction mode information, motion vector information, etc. used for prediction may be encoded with a residual value in an entropy encoding unit 165 and may be transmitted to a decoding device. When a specific encoding mode is used, an original block may be encoded as it is and transmitted to a decoding unit without generating a prediction block through prediction units 120 or 125.
An inter prediction unit 120 may predict a prediction unit based on information on at least one picture of a previous picture or a subsequent picture of a current picture, or in some cases, may predict a prediction unit based on information on some encoded regions in a current picture. An inter prediction unit 120 may include a reference picture interpolation unit, a motion prediction unit and a motion compensation unit.
A reference picture interpolation unit may receive reference picture information from a memory 155 and generate pixel information equal to or less than an integer pixel in a reference picture. For a luma pixel, a 8-tap DCT-based interpolation filter having a different filter coefficient may be used to generate pixel information equal to or less than an integer pixel in a ¼ pixel unit. For a chroma signal, a 4-tap DCT-based interpolation filter having a different filter coefficient may be used to generate pixel information equal to or less than an integer pixel in a ⅛ pixel unit.
A motion prediction unit may perform motion prediction based on a reference picture interpolated by a reference picture interpolation unit. As a method for calculating a motion vector, various methods such as FBMA (Full search-based Block Matching Algorithm), TSS (Three Step Search), NTS (New Three-Step Search Algorithm), etc. may be used. A motion vector may have a motion vector value in a ½ or ¼ pixel unit based on an interpolated pixel. A motion prediction unit may predict a current prediction unit by varying a motion prediction method. As a motion prediction method, various methods such as a skip method, a merge method, an advanced motion vector prediction (AMVP) method, an intra block copy method, etc. may be used.
An intra prediction unit 125 may generate a prediction unit based on reference pixel information which is pixel information in a current picture. Reference pixel information may be derived from selected one of a plurality of reference pixel lines. A N-th reference pixel line among a plurality of reference pixel lines may include left pixels whose x-axis difference with a top-left pixel in a current block is N and top pixels whose y-axis difference with the top-left pixel is N. The number of reference pixel lines which may be selected by a current block may be 1, 2, 3 or 4.
When a neighboring block in a current prediction unit is a block which performed inter prediction and accordingly, a reference pixel is a pixel which performed inter prediction, a reference pixel included in a block which performed inter prediction may be used by being replaced with reference pixel information of a surrounding block which performed intra prediction. In other words, when a reference pixel is unavailable, unavailable reference pixel information may be used by being replaced with at least information of available reference pixels.
A prediction mode in intra prediction may have a directional prediction mode using reference pixel information according to a prediction direction and a non-directional mode not using directional information when performing prediction. A mode for predicting luma information may be different from a mode for predicting chroma information and intra prediction mode information used for predicting luma information or predicted luma signal information may be utilized to predict chroma information.
When a size of a prediction unit is the same as that of a transform unit in performing intra prediction, intra prediction for a prediction unit may be performed based on a pixel at a left position of a prediction unit, a pixel at a top-left position and a pixel at a top position.
An intra prediction method may generate a prediction block after applying a smoothing filter to a reference pixel according to a prediction mode. According to a selected reference pixel line, whether a smoothing filter is applied may be determined.
In order to perform an intra prediction method, an intra prediction mode in a current prediction unit may be predicted from an intra prediction mode in a prediction unit around a current prediction unit. When a prediction mode in a current prediction unit is predicted by using mode information predicted from a surrounding prediction unit, information that a prediction mode in a current prediction unit is the same as a prediction mode in a surrounding prediction unit may be transmitted by using predetermined flag information if an intra prediction mode in a current prediction unit is the same as an intra prediction mode in a surrounding prediction unit, and prediction mode information of a current block may be encoded by performing entropy encoding if a prediction mode in a current prediction unit is different from a prediction mode in a surrounding prediction unit.
In addition, a residual block may be generated which includes information on a residual value that is a difference value between a prediction unit which performed prediction based on a prediction unit generated in prediction units 120 and 125 and an original block in a prediction unit. A generated residual block may be input to a transform unit 130.
A transform unit 130 may transform an original block and a residual block including residual value information in a prediction unit generated through prediction units 120 and 125 by using a transform method such as DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT. Whether to apply DCT, DST or KLT to transform a residual block may be determined based on at least one of a size of a transform unit, a form of a transform unit, a prediction mode in a prediction unit or intra prediction mode information in a prediction unit.
A quantization unit 135 may quantize values transformed into a frequency domain in a transform unit 130. A quantization coefficient may be changed according to a block or importance of an image. A value calculated in a quantization unit 135 may be provided to a dequantization unit 140 and a rearrangement unit 160.
A rearrangement unit 160 may perform rearrangement of a coefficient value for a quantized residual value.
A rearrangement unit 160 may change a coefficient in a shape of a two-dimensional block into a shape of a one-dimensional vector through a coefficient scan method. For example, a rearrangement unit 160 may scan a DC coefficient to a coefficient in a high-frequency domain by using a zig-zag scan method and change it into a shape of a one-dimensional vector. According to a size of a transform unit and an intra prediction mode, instead of zig-zag scan, vertical scan where a coefficient in a shape of a two-dimensional block is scanned in a column direction, horizontal scan where a coefficient in a shape of a two-dimensional block is scanned in a row direction or diagonal scan where a coefficient in a shape of a two-dimensional block is scanned in a diagonal direction may be used. In other words, which scan method among zig-zag scan, vertical directional scan, horizontal directional scan or diagonal scan will be used may be determined according to a size of a transform unit and an intra prediction mode.
An entropy encoding unit 165 may perform entropy encoding based on values calculated by a rearrangement unit 160. Entropy encoding, for example, may use various encoding methods such as exponential Golomb, CAVLC (Context-Adaptive Variable Length Coding), CABAC (Context-Adaptive Binary Arithmetic Coding).
An entropy encoding unit 165 may encode a variety of information such as residual value coefficient information and block type information in a coding unit, prediction mode information, partitioning unit information, prediction unit information and transmission unit information, motion vector information, reference frame information, block interpolation information, filtering information, etc. from a rearrangement unit 160 and prediction units 120 and 125.
An entropy encoding unit 165 may perform entropy encoding for a coefficient value in a coding unit which is input from a rearrangement unit 160.
A dequantization unit 140 and an inverse transform unit 145 dequantize values quantized in a quantization unit 135 and inversely transform values transformed in a transform unit 130. A residual value generated by a dequantization unit 140 and an inverse transform unit 145 may be combined with a prediction unit predicted by a motion prediction unit, a motion compensation unit and an intra prediction unit included in prediction units 120 and 125 to generate a reconstructed block.
A filter unit 150 may include at least one of a deblocking filter, an offset correction unit and an adaptive loop filter (ALF).
A deblocking filter may remove block distortion which is generated by a boundary between blocks in a reconstructed picture. In order to determine whether deblocking is performed, whether a deblocking filter will be applied to a current block may be determined based on a pixel included in several rows or columns included in a block. When a deblocking filter is applied to a block, a strong filter or a weak filter may be applied according to required deblocking filtering strength. In addition, in applying a deblocking filter, when horizontal filtering and vertical filtering are performed, horizontal directional filtering and vertical directional filtering may be set to be processed in parallel.
An offset correction unit may correct an offset with an original image in a unit of a pixel for an image that deblocking was performed. In order to perform offset correction for a specific picture, a region where an offset will be performed may be determined after dividing a pixel included in an image into the certain number of regions and a method in which an offset is applied to a corresponding region or a method in which an offset is applied by considering edge information of each pixel may be used.
Adaptive loop filtering (ALF) may be performed based on a value obtained by comparing a filtered reconstructed image with an original image. After a pixel included in an image is divided into predetermined groups, filtering may be discriminately performed per group by determining one filter which will be applied to a corresponding group. Information related to whether to apply ALF may be transmitted per coding unit (CU) for a luma signal and a shape and a filter coefficient of an ALF filter to be applied may vary according to each block. In addition, an ALF filter in the same shape (fixed shape) may be applied regardless of a characteristic of a block to be applied.
A memory 155 may store a reconstructed block or picture calculated through a filter unit 150 and a stored reconstructed block or picture may be provided to prediction units 120 and 125 when performing inter prediction.
Referring to
When an image bitstream is input from an image encoding device, an input bitstream may be decoded according to a procedure opposite to that of an image encoding device.
An entropy decoding unit 210 may perform entropy decoding according to a procedure opposite to a procedure in which entropy encoding is performed in an entropy encoding unit of an image encoding device. For example, in response to a method performed in an image encoding device, various methods such as Exponential Golomb, CAVLC (Context-Adaptive Variable Length Coding), CABAC (Context-Adaptive Binary Arithmetic Coding) may be applied.
An entropy decoding unit 210 may decode information related to intra prediction and inter prediction performed in an encoding device.
A rearrangement unit 215 may perform rearrangement based on a method that a bitstream entropy-decoded in an entropy decoding unit 210 is rearranged in an encoding unit. Coefficients expressed in a form of a one-dimensional vector may be rearranged by being reconstructed into coefficients in a form of a two-dimensional block. A rearrangement unit 215 may receive information related to coefficient scanning performed in an encoding unit and perform rearrangement through a method in which scanning is inversely performed based on scanning order performed in a corresponding encoding unit.
A dequantization unit 220 may perform dequantization based on a quantization parameter provided from an encoding device and a coefficient value of a rearranged block.
An inverse transform unit 225 may perform transform performed in a transform unit, i.e., inverse transform for DCT, DST, and KLT, i.e., inverse DCT, inverse DST and inverse KLT for a result of quantization performed in an image encoding device. Inverse transform may be performed based on a transmission unit determined in an image encoding device. In an inverse transform unit 225 of an image decoding device, a transform technique (for example, DCT, DST, KLT) may be selectively performed according to a plurality of information such as a prediction method, a size or a shape of a current block, a prediction mode, an intra prediction direction, etc.
Prediction units 230 and 235 may generate a prediction block based on information related to generation of a prediction block provided from an entropy decoding unit 210 and pre-decoded block or picture information provided from a memory 245.
As described above, when a size of a prediction unit is the same as a size of a transform unit in performing intra prediction in the same manner as an operation in an image encoding device, intra prediction for a prediction unit may be performed based on a pixel at a left position of a prediction unit, a pixel at a top-left position and a pixel at a top position, but when a size of a prediction unit is different from a size of a transform unit in performing intra prediction, intra prediction may be performed by using a reference pixel based on a transform unit. In addition, intra prediction using N×N partitioning may be used only for the smallest coding unit.
Prediction units 230 and 235 may include a prediction unit determination unit, an inter prediction unit and an intra prediction unit. A prediction unit determination unit may receive a variety of information such as prediction unit information, prediction mode information of an intra prediction method, motion prediction-related information of an inter prediction method, etc. which are input from an entropy decoding unit 210, divide a prediction unit in a current coding unit and determine whether a prediction unit performs inter prediction or intra prediction. An inter prediction unit 230 may perform inter prediction for a current prediction unit based on information included in at least one picture of a previous picture or a subsequent picture of a current picture including a current prediction unit by using information necessary for inter prediction in a current prediction unit provided from an image encoding device. Alternatively, inter prediction may be performed based on information on some regions which are pre-reconstructed in a current picture including a current prediction unit.
In order to perform inter prediction, whether a motion prediction method in a prediction unit included in a corresponding coding unit is a skip mode, a merge mode, an AMVP mode, or an intra block copy mode may be determined based on a coding unit.
An intra prediction unit 235 may generate a prediction block based on pixel information in a current picture. When a prediction unit is a prediction unit which performed intra prediction, intra prediction may be performed based on intra prediction mode information in a prediction unit provided from an image encoding device. An intra prediction unit 235 may include an adaptive intra smoothing (AIS) filter, a reference pixel interpolation unit and a DC filter. As a part performing filtering on a reference pixel of a current block, an AIS filter may be applied by determining whether a filter is applied according to a prediction mode in a current prediction unit. AIS filtering may be performed for a reference pixel of a current block by using AIS filter information and a prediction mode in a prediction unit provided from an image encoding device. When a prediction mode of a current block is a mode which does not perform AIS filtering, an AIS filter may not be applied
When a prediction mode in a prediction unit is a prediction unit which performs intra prediction based on a pixel value which interpolated a reference pixel, a reference pixel interpolation unit may interpolate a reference pixel to generate a reference pixel in a unit of a pixel equal to or less than an integer value. When a prediction mode in a current prediction unit is a prediction mode which generates a prediction block without interpolating a reference pixel, a reference pixel may not be interpolated. A DC filter may generate a prediction block through filtering when a prediction mode of a current block is a DC mode.
A reconstructed block or picture may be provided to a filter unit 240. A filter unit 240 may include a deblocking filter, an offset correction unit and ALF.
Information on whether a deblocking filter was applied to a corresponding block or picture and information on whether a strong filter or a weak filter was applied when a deblocking filter was applied may be provided from an image encoding device. Information related to a deblocking filter provided from an image encoding device may be provided in a deblocking filter of an image decoding device and deblocking filtering for a corresponding block may be performed in an image decoding device.
An offset correction unit may perform offset correction on a reconstructed image based on offset value information, a type of offset correction, etc. applied to an image when performing encoding.
ALF may be applied to a coding unit based on information on whether ALF is applied, ALF coefficient information, etc. provided from an encoding device. Such ALF information may be provided by being included in a specific parameter set.
A memory 245 may store a reconstructed picture or block for use as a reference picture or a reference block and provide a reconstructed picture to an output unit.
As described above, hereinafter, the term “coding unit” is used as a coding unit for convenience of description in embodiments of the present disclosure, but it may also be a unit in which decoding is performed.
In addition, the current block represents an encoding/decoding target block, and depending on the encoding/decoding stage, the current block may represent a coding tree block (or a coding tree unit), a coding block (or a coding unit), a transform block (or a transform unit), a prediction block (or a prediction unit), or a block to which an in-loop filter is applied. In this specification, “unit” may represent a basic unit for performing a specific encoding/decoding process, and “block” may represent a pixel array of a predetermined size. Unless otherwise specified, “block” and “unit” can be used with the same meaning. For example, in embodiments which will be described later, a coding block and a coding unit may be understood to have equivalent meanings.
Furthermore, a picture including the current block will be referred to as a current picture.
At the time of encoding the current picture, redundant data between pictures can be removed through inter-prediction. Inter-prediction can be performed on a block basis. Specifically, a prediction block for the current block can be generated from a reference picture using motion information of the current block. Here, the motion information may include at least one of a motion vector, a reference picture index, or a prediction direction.
The motion information of the current block can be generated through motion estimation.
In
A search range for motion estimation may be set from the same position as a reference point of the current block in the reference picture. Here, the reference point may be the position of the upper left sample of the current block.
As an example,
After setting reference blocks having the same size as the current block within the search range, a cost for each reference block with respect to the current block may be measured. The cost may be calculated using a similarity between two blocks.
As an example, the cost may be calculated on the basis of the absolute sum of difference values between original samples in the current block and original samples (or reconstructed samples) in each reference block. As the absolute sum decreases, the cost can decrease.
After comparing the costs of the reference blocks, a reference block with the optimal cost can be set as prediction block for the current block.
Additionally, the distance between the current block and the reference block can be set as a motion vector. Specifically, the x-coordinate difference and y-coordinate difference between the current block and the reference block may be set as a motion vector.
Furthermore, the index of a picture containing the reference block identified through motion estimation is set as a reference picture index.
Additionally, a prediction direction can be set on the basis of whether the reference picture belongs to an L0 reference picture list or an L1 reference picture list.
Further, motion estimation may be performed for each of an L0 direction and an L1 direction. If prediction is performed in both the L0 direction and the L1 direction, motion information in the L0 direction and motion information in the L1 direction can be generated.
In the case of unidirectional prediction, a prediction block for the current block is generated using one piece of motion information. As an example, the motion information may include an L0 motion vector, an L0 reference picture index, and prediction direction information indicating the L0 direction.
In the case of bidirectional prediction, a prediction block is created using two pieces of motion information. As an example, a reference block in the L0 direction identified based on motion information on the L0 direction (L0 motion information) may be set as an L0 prediction block, and a reference block in the L1 direction identified based on motion information on the L1 direction (L1 motion information) may be set as an L1 prediction block. Thereafter, the L0 prediction block and the L1 prediction block can be subjected to weighted summation to generate a prediction block for the current block.
In the examples shown in
However, unlike the examples, the L0 reference picture may be present after the current picture, or the L1 reference picture may be present before the current picture. For example, both the L0 reference picture and the L1 reference picture may be present before the current picture, or both may be present after the current picture. Alternatively, bidirectional prediction may be performed using the L0 reference picture present after the current picture and the L1 reference picture present before the current picture.
Motion information of a block on which inter-prediction has been performed may be stored in a memory. In this case, the motion information may be stored in sample units. Specifically, motion information of a block to which a specific sample belongs may be stored as motion information of the specific sample. The stored motion information can be used to derive motion information of a neighboring block to be encoded/decoded later.
An encoder may signal information obtained by encoding a residual sample corresponding to the difference value between the sample (i.e., original sample) of the current block and the prediction sample and the motion information necessary to generate the prediction block to a decoder. The decoder may decode information on the signaled difference value to derive a difference sample, and add a prediction sample within the prediction block generated using motion information to the difference generate sample to a reconstructed sample.
Here, in order to effectively compress the motion information signaled to the decoder, one of a plurality of inter-prediction modes may be selected. Here, the plurality of inter-prediction modes may include a motion information merge mode and a motion vector prediction mode.
The motion vector prediction mode is a mode in which the difference value between a motion vector and a motion vector prediction value is encoded and signaled. Here, the motion vector prediction value may be derived based on motion information of neighboring blocks or neighboring samples adjacent to the current block.
For convenience of description, it is assumed that the current block has a size of 4×4.
In the illustrated example, “LB” indicates a sample included in the leftmost column and bottommost row in the current block. “RT” indicates a sample included in the rightmost column and topmost row in the current block. A0 to A4 indicate samples neighboring the left of the current block, and B0 to B5 indicate samples neighboring the top of the current block. As an example, A1 indicates a sample neighboring the left of LB, and B1 indicates sample neighboring the top of RT.
Col indicates the position of a sample neighboring the bottom right of the current block in a co-located picture. The co-located picture is a different picture from the current picture, and information for identifying the co-located picture can be explicitly encoded and signaled in a bitstream. Alternatively, a reference picture with a predefined reference picture index may be set as a co-located picture.
The motion vector prediction value of the current block may be derived from at least one motion vector prediction candidate included in a motion vector prediction list.
The number of motion vector prediction candidates that can be included in the motion vector prediction list (i.e., the size of the list) may be predefined in the encoder and decoder. As an example, the maximum number of motion vector prediction candidates may be two.
A motion vector stored at the position of a neighboring sample adjacent to the current block or a scaled motion vector derived by scaling the motion vector may be inserted into the motion vector prediction list as a motion vector prediction candidate. At this time, motion vector prediction candidates may be derived by scanning neighboring samples adjacent to the current block in a predefined order.
As an example, it can be checked whether a motion vector is stored at each position in the order from A0 to A4. According to this scan order, the first discovered available motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate.
As another example, it is checked whether a motion vector is stored at each location in the order from A0 to A4, and the first discovered motion vector corresponding to the position having the same reference picture as the current block may be inserted into the motion vector prediction list as a motion vector prediction candidate. If there is no neighboring sample having the same reference picture as the current block, a motion vector prediction candidate can be derived based on the first discovered available vector. Specifically, after scaling the first discovered available motion vector, the scaled motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate. Here, scaling may be performed on the basis of the output order difference between the current picture and the reference picture (i.e., POC difference) and the output order difference between the current picture and the reference picture of a neighboring sample (i.e., POC difference).
Furthermore, it is possible to check whether a motion vector is stored at each position in the order from B0 to B5. According to this scan order, the first discovered available motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate.
As another example, it is possible to check whether a motion vector is stored at each position in the order from B0 to B5, and the first discovered motion vector corresponding to the position having the same reference picture as the current block may be inserted into the motion vector prediction list as a motion vector prediction candidate. If there is no neighboring sample having the same reference picture as the current block, a motion vector prediction candidate can be derived based on the first discovered available vector. Specifically, after scaling the first discovered available motion vector, the scaled motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate. Here, scaling may be performed on the basis of the output order difference between the current picture and the reference picture (i.e., POC difference) and the output order difference between the current picture and the reference picture of a neighboring sample (i.e., POC difference).
As in the above-described example, a motion vector prediction candidate can be derived from a sample adjacent to the left of the current block, and a motion vector prediction candidate can be derived from a sample adjacent to the top of the current block.
Here, the motion vector prediction candidate derived from the left sample may be inserted into the motion vector prediction list prior to the motion vector prediction candidate derived from the top sample. In this case, the index assigned to the motion vector prediction candidate derived from the left sample may have a smaller value than the index assigned to the motion vector prediction candidate derived from the top sample.
On the other hand, the motion vector prediction candidate derived from the top sample may be inserted into the motion vector prediction list prior to the motion vector prediction candidate derived from the left sample.
Among motion vector prediction candidates included in the motion vector prediction list, a motion vector prediction candidate with the highest coding efficiency may be set as a motion vector predictor (MVP) of the current block. Additionally, index information indicating the motion vector prediction candidate that is set as the motion vector predictor of the current block among the plurality of motion vector prediction candidates may be encoded and signaled to the decoder. When the number of motion vector prediction candidates is two, the index information may be a 1-bit flag (e.g., MVP flag). Additionally, a motion vector difference (MVD), which is the difference between the motion vector of the current block and the motion vector predictor, can be encoded and signaled to the decoder.
The decoder can construct a motion vector prediction list in the same way as the encoder. Additionally, the decoder may decode index information from a bitstream and select one of a plurality of motion vector prediction candidates on the basis of the decoded index information. The selected motion vector prediction candidate can be set as the motion vector predictor of the current block.
Additionally, the decoder may decode a motion vector difference from the bitstream. Thereafter, the decoder may derive the motion vector of the current block by summing the motion vector predictor and the motion vector difference.
In a case where bidirectional prediction is applied to the current block, a motion vector prediction list can be generated for each of the L0 direction and L1 direction. That is, a motion vector prediction list may be composed of motion vectors in the same direction. Accordingly, the motion vector of the current block and the motion vector prediction candidates included in the motion vector prediction list have the same direction.
In a case where the motion vector prediction mode is selected, a reference picture index and prediction direction information may be explicitly encoded and signaled to the decoder. As an example, in a case where a plurality of reference pictures is present in a reference picture list and motion estimation is performed on each of the plurality of reference pictures, a reference picture index for identifying a reference picture from which the motion information of the current block is derived among the plurality of reference pictures can be explicitly encoded and signaled to the decoder.
At this time, if the reference picture list includes only one reference picture, encoding/decoding of the reference picture index may be omitted.
Prediction direction information may be an index indicating one of L0 unidirectional prediction, L1 unidirectional prediction, or bidirectional prediction. Alternatively, an L0 flag indicating whether prediction in the L0 direction is performed and an L1 flag indicating whether prediction in the L1 direction is performed may be encoded and signaled.
The motion information merge mode is a mode in which the motion information of the current block is set to be the same as motion information of a neighboring block. In the motion information merge mode, motion information can be encoded/decoded using a motion information merge list.
A motion information merge candidate may be derived based on motion information of a neighboring block or neighboring sample adjacent to the current block. For example, a reference position around the current block may be predefined, and then whether motion information is present at the predefined reference position may be checked. If motion information is present at the predefined reference position, the motion information at the position can be inserted into the motion information merge list as a motion information merge candidate.
In the example of
Among motion information merge candidates included in the motion information merge list, motion information of a motion information merge candidate with the optimal cost can be set as motion information of the current block. Furthermore, index information (e.g., merge index) indicating a motion information merge candidate selected from among the plurality of motion information merge candidates may be encoded and transmitted to the decoder.
In the decoder, a motion information merge list can be constructed in the same way as in the encoder. Then, a motion information merge candidate can be selected on the basis of a merge index decoded from a bitstream. Motion information of the selected motion information merge candidate may be set as motion information of the current block.
Unlike the motion vector prediction list, the motion information merge list is configured as a single list regardless of the prediction direction. That is, motion information merge candidates included in the motion information merge list may have only L0 motion information or L1 motion information, or may have bidirectional motion information (i.e., L0 motion information and L1 motion information).
The motion information of the current block may also be derived using a reconstructed sample area around the current block. Here, the reconstructed sample area used to derive the motion information of the current block may be called a template.
In
As an example, the cost may be calculated on the basis of the absolute sum of differences between reconstructed samples in the current template and reconstructed samples in a reference block. As the absolute sum decreases, the cost can be decreased.
Upon determination of the current template within a search range and a reference template having the optimal cost, a reference block neighboring the reference template can be set as a prediction block for the current block.
Additionally, motion information of the current block can be set on the basis of the distance between the current block and the reference block, the index of the picture to which the reference block belongs, and whether the reference picture is included in the L0 or L1 reference picture list. Since a previously reconstructed area around the current block is defined as a template, the decoder can perform motion estimation in the same manner as the encoder. Accordingly, in a case where motion information is derived using a template, it is not necessary to encode and signal motion information other than information indicating whether the template is used.
The current template may include at least one of an area adjacent to the top of the current block or an area adjacent to the left side of the current block. Here, the area adjacent to the top may include at least one row, and the area adjacent to the left side may include at least one column.
A current template may be constructed according to one of the examples shown in
Alternatively, unlike the examples shown in
The size and/or shape of the current template may be predefined in the encoder and the decoder.
Alternatively, a plurality of template candidates having different sizes and/or shapes may be predefined, and then index information for identifying one of the plurality of template candidates may be encoded and signaled to the decoder.
Alternatively, one of a plurality of template candidates may be adaptively selected on the basis of at least one of the size, shape, or position of the current block. For example, if the current block comes into contact with the upper boundary of a CTU, the current template can be constructed using only the area adjacent to the left side of the current block.
Template-based motion estimation can be performed on each reference picture stored in the reference picture list. Alternatively, motion estimation may be performed on only some reference pictures. As an example, motion estimation may be performed only on a reference picture with a reference picture index of 0, or motion estimation may be performed only on reference pictures having reference picture indices less than a threshold value or reference pictures whose POC differences from the current picture are less than a threshold value.
Alternatively, a reference picture index may be explicitly encoded and signaled, and then motion estimation may be performed only on the reference picture indicated by the reference picture index.
Alternatively, motion estimation may be performed on a reference picture of a neighboring block corresponding to the current template. For example, if the template is composed of a left neighboring area and a top neighboring area, at least one reference picture can be selected using at least one of the reference picture index of the left neighboring block or the reference picture index of the top neighboring block. Thereafter, motion estimation can be performed on at least one selected reference picture.
Information indicating whether template-based motion estimation has been applied may be encoded and signaled to the decoder. The information may be a 1-bit flag. For example, if the flag is true (1), it indicates that template-based motion estimation has been applied to the L0 direction and L1 direction of the current block. On the other hand, if the flag is false (0), it indicates that template-based motion estimation has not been applied. In this case, motion information of the current block may be derived on the basis of the motion information merge mode or the motion vector prediction mode.
On the other hand, template-based motion estimation can be applied only in a case where it is determined that the motion information merge mode and motion vector prediction mode have not been applied to the current block. For example, when a first flag indicating whether the motion information merge mode has been applied and a second flag indicating whether the motion vector prediction mode has been applied are both 0, template-based motion estimation can be performed.
For each of the L0 direction and the L1 direction, information indicating whether template-based motion estimation has been applied may be signaled. That is, whether template-based motion estimation is applied to the L0 direction and whether template-based motion estimation is applied to the L1 direction may be determined independently of each other. Accordingly, template-based motion estimation can be applied to one of the L0 direction and the L1 direction, whereas another mode (e.g., the motion information merge mode or the motion vector prediction mode) can be applied to the other direction.
When template-based motion estimation is applied to both the L0 direction and the L1 direction, a prediction block for the current block may be generated on the basis of a weighted sum operation of an L0 prediction block and an L1 prediction block. Alternatively, even when template-based motion estimation is applied to one of the L0 direction and the L1 direction but another mode is applied to the other, a prediction block for the current block may be generated on the basis of a weighted sum operation of the L0 prediction block and the L1 prediction block. This will be described later using Equation 2.
Alternatively, a template-based motion estimation method may be inserted as a motion information merge candidate in the motion information merge mode or a motion vector prediction candidate in the motion vector prediction mode. In this case, whether to apply the template-based motion estimation method may be determined on the basis of whether a selected motion information merge candidate or a selected motion vector prediction candidate indicates the template-based motion estimation method.
Motion information of the current block may also be generated on the basis of a bidirectional matching method.
The bidirectional matching method can be performed only when the temporal order (i.e., POC) of the current picture is present between the temporal order of an L0 reference picture and the temporal order of an L1 reference picture.
When the bidirectional matching method is applied, a search range can be set for each of the L0 reference picture and the L1 reference picture. At this time, the L0 reference picture index for identifying the L0 reference picture and the L1 reference picture index for identifying the L1 reference picture may be encoded and signaled.
As another example, only the L0 reference picture index may be encoded and signaled, and an L1 reference picture may be selected on the basis of the distance between the current picture and the L0 reference picture (hereinafter referred to as an L0 POC difference). As an example, among L1 reference pictures included in the L1 reference picture list, an L1 reference picture for which the absolute value of the distance from the current picture (hereinafter referred to as an L1 POC difference) is the same as the absolute value of the distance between the current picture and the L0 reference picture may be selected. If there is no L1 reference picture having the same L1 POC difference as the L0 POC difference, an L1 reference picture having an L1 POC difference most similar to the L0 POC difference may be selected from among the L1 reference pictures.
Here, among the L1 reference pictures, only an L1 reference picture having a different temporal direction from the L0 reference picture can be used for bidirectional matching. For example, if the POC of the L0 reference picture is smaller than that of the current picture, one of L1 reference pictures having POCs greater than that of the current picture can be selected.
On the other hand, only the L1 reference picture index may be encoded and signaled, and an L0 reference picture may be selected on the basis of the distance between the current picture and the L1 reference picture.
Alternatively, the bidirectional matching method may be performed using an L0 reference picture closest in distance to the current picture among L0 reference pictures and an L1 reference picture closest in distance to the current picture among L0 reference pictures.
Alternatively, the bidirectional matching method may also be performed using an L0 reference picture to which a predefined index (e.g., index 0) is assigned in the L0 reference picture list and an L1 reference picture to which a predefined index (e.g., index 0) is assigned in the L1 reference picture list.
Alternatively, an LX (X being 0 or 1) reference picture may be selected based on an explicitly signaled reference picture index, and a reference picture closest to the current reference pictures, or a reference picture among L|x−1| picture having a predefined index in an L|X−1| reference picture list may be selected as an L|X−1| reference picture.
As another example, an L0 reference picture and/or an L1 reference picture may be selected on the basis of motion information of a neighboring block of the current block. As an example, an L0 reference picture and/or an L1 reference picture to be used for bidirectional matching may be selected using the reference picture index of the left or top neighboring block of the current block.
A search range may be set within a predetermined range from a co-located block in a reference picture.
As another example, the search range may be set on the basis of initial motion information. The initial motion information may be derived from a neighboring block of the current block. For example, motion information of the left neighboring block or the top neighboring block of the current block may be set as the initial motion information of the current block.
In a case where the bidirectional matching method is applied, an L0 motion vector and an L1 motion vector are set in opposite directions. This indicates that the L0 motion vector and the L1 motion vector have opposite signs. In addition, the size of an LX motion vector may be proportional to the distance (i.e., POC difference) between the current picture and an LX reference picture.
Thereafter, motion estimation can be performed using a cost between a reference block within the search range of L0 reference pictures (hereinafter referred to as an L0 reference block) and a reference block within the search range of L1 reference pictures (hereinafter referred to as an L1 reference block).
If an L0 reference block for which a vector with respect to the current block is (x, y) is selected, an L1 reference block located at a distance of (−Dx, −Dy) from the current block can be selected. Here, D can be determined by the ratio of the distance between the current picture and the L0 reference picture to the distance between the L1 reference picture and the current picture.
For example, in the example shown in
Upon selection of an L0 reference block and an L1 reference block with an optimal cost, the L0 reference block and the L1 reference block can be set as an L0 prediction block and an L1 prediction block for the current block. Thereafter, the final prediction block for the current block can be generated through a weighted sum operation of the L0 reference block and the L1 reference block. As an example, a prediction block for the current block may be generated according to Equation 2 which will be described later.
When the bidirectional matching method is applied, the decoder can perform motion estimation in the same way as the encoder. Accordingly, information indicating whether the bidirectional motion matching method is applied is explicitly encoded/decoded, while encoding/decoding of motion information such as motion vectors can be omitted. As described above, at least one of the L0 reference picture index or the L1 reference picture index may be explicitly encoded/decoded.
As another example, information indicating whether the bidirectional matching method has been applied may be explicitly encoded/decoded, and when the bidirectional matching method has been applied, the L0 motion vector or the L1 motion vector may be explicitly encoded and signaled. In a case where the L0 motion vector has been signaled, the L1 motion vector can be derived on the basis of the POC difference between the current picture and the L0 reference picture and the POC difference between the current picture and the L1 reference picture. In a case where the L1 motion vector has been signaled, the L0 motion vector can be derived on the basis of the POC difference between the current picture and the L0 reference picture and the POC difference between the current picture and the L1 reference picture. At this time, the encoder can explicitly encode the smaller one of the L0 motion vector and the L1 motion vector.
The information indicating whether the bidirectional matching method has been applied may be a 1-bit flag. As an example, if the flag is true (e.g., 1), it can indicate that the bidirectional matching method has been applied to the current block. If the flag is false (e.g., 0), it can indicate that the bidirectional matching method has not been applied to the current block. In this case, the motion information merge mode or the motion vector prediction mode may be applied to the current block.
On the other hand, the bidirectional matching method can be applied only in a case where it is determined that the motion information merge mode and the motion vector prediction mode are not applied to the current block. For example, when the first flag indicating whether the motion information merge mode is applied and the second flag indicating whether the motion vector prediction mode is applied are both 0, the bidirectional matching method can be applied.
Alternatively, the bidirectional matching method may be inserted as a motion information merge candidate in the motion information merge mode or a motion vector prediction candidate in the motion vector prediction mode. In this case, whether to apply the bidirectional matching method may be determined on the basis of whether a selected motion information merge candidate or a selected motion vector prediction candidate indicates the bidirectional matching method.
An example in which the temporal order of the current picture needs to be present between the temporal order of the L0 reference picture and the temporal order of the L1 reference picture in the bidirectional matching method has been described. It is also possible to generate a prediction block for the current block by using a unidirectional matching method to which the above constraints of the bidirectional matching method are not applied. Specifically, in the unidirectional matching method, two reference pictures having a temporal order (i.e., POC) smaller than that of the current block or two reference pictures having a temporal order greater than that of the current block can be used. Here, both the two reference pictures may be derived from the L0 reference picture list or the L1 reference picture list. Alternatively, one of the two reference pictures may be derived from the L0 reference picture list, and the other may be derived from the L1 reference picture list.
The unidirectional matching method can be performed based on two reference pictures (i.e., forward reference pictures) having a POC smaller than that of the current picture or two reference pictures (i.e., backward reference pictures) having a POC larger than that of the current picture.
Here, the first reference picture index for identifying the first reference picture and the second reference picture index for identifying the second reference picture may be encoded and signaled. Among the two reference pictures used for the unidirectional matching method, a reference picture having a smaller POC difference from the current picture can be set as the first reference picture. Accordingly, when the first reference picture is selected, only a reference picture having a larger POC difference from the current picture than the first reference picture can be set as the second reference picture. The second reference picture index can be set such that it indicates one of rearranged reference pictures that have the same temporal direction as the first reference picture and have larger POC differences from the current picture than the first reference picture.
On the other hand, a reference picture having a larger POC difference from the current picture among the two reference pictures may be set as the first reference picture. In this case, the second reference picture index can be set such that it indicates one of rearranged reference pictures that have the same temporal direction as the first reference picture and have smaller POC differences from the current picture than the first reference picture.
Alternatively, the unidirectional matching method may be performed using a reference picture to which a predefined index in the reference picture list is assigned and a reference picture having the same temporal direction as this reference picture. As an example, a reference picture with an index of 0 in the reference picture list may be set as the first reference picture, and a reference picture with the smallest index among reference pictures having the same temporal direction as the first reference picture in the reference picture list may be selected as the second reference picture.
Both the first reference picture and the second reference picture can be selected from the L0 reference picture list or the L1 reference picture list.
Information indicating whether the first reference picture and/or the second reference picture belong to the L0 reference picture list or the L1 reference picture list may be additionally encoded/decoded.
Alternatively, unidirectional matching may be performed using one of the L0 reference picture list and the L1 reference picture list, which is set as default. Alternatively, two reference pictures may be selected from the L0 reference picture list and the L1 reference picture list, whichever has a larger number of reference pictures.
Thereafter, search ranges within the first reference picture and the second reference picture can be set.
The search ranges can be set within a predetermined range from the co-located block in the reference pictures.
As another example, the search ranges can be set on the basis of initial motion information. The initial motion information may be derived from a neighboring block of the current block. For example, motion information of the left neighboring block or the top neighboring block of the current block may be set as the initial motion information of the current block.
Thereafter, motion estimation can be performed using the cost between the first reference block within the search range of the first reference picture and the second reference block within the search range of the second reference picture.
At this time, in the unidirectional matching method, the size of a motion vector needs to be set to increase in proportion to the distance between the current picture and a reference picture. Specifically, in a case where the first reference block for which a vector with respect to the current picture is (x, y) is selected, the second reference block needs to be spaced apart from the current block by (Dx, Dy). Here, D can be determined by the ratio of the distance between the current picture and the first reference picture to the distance between the current picture and the second reference picture.
For example, in the example of
When the first reference block and the second reference block with the optimal cost are selected, the first reference block and the second reference block can be set as first and second prediction blocks for the current block. Thereafter, the final prediction block for the current block can be generated through a weighted sum operation of the first prediction block and the second prediction block. As an example, a prediction block for the current block can be generated according to Equation 2 which will be described later.
In a case where the unidirectional matching method is applied, the decoder can perform motion estimation in the same way as the encoder. Accordingly, information indicating whether the unidirectional motion matching method is applied is explicitly encoded/decoded, while encoding/decoding of motion information such as motion vectors can be omitted. As described above, at least one of the first reference picture index or the second reference picture index may be explicitly encoded/decoded.
As another example, information indicating whether the unidirectional matching method has been applied may be explicitly encoded/decoded, and in a case where the unidirectional matching method has been applied, the first motion vector or the second motion vector may be explicitly encoded and signaled. In a case where the first motion vector has been signaled, the second motion vector may be derived on the basis of the POC difference between the current picture and the first reference picture and the POC difference between the current picture and the second reference picture. In a case where the second motion vector has been signaled, the first motion vector may be derived on the basis of the POC difference between the current picture and the first reference picture and the POC difference between the current picture and the second reference picture. At this time, the encoder can explicitly encode the smaller one of the first motion vector and the second motion vector.
The information indicating whether the unidirectional matching method has been applied may be a 1-bit flag. As an example, if the flag is true (e.g., 1), it can indicate that the unidirectional matching method is applied to the current block. If the flag is false (e.g., 0), it can indicate that the unidirectional matching method is not applied to the current block. In this case, the motion information merge mode or the motion vector prediction mode can be applied to the current block.
On the other hand, the unidirectional matching method may be applied only in a case where it is determined that the motion information merge mode and the motion vector prediction mode are not applied to the current block. For example, when the first flag indicating whether the motion information merge mode is applied and the second flag indicating whether the motion vector prediction mode is applied are both 0, the unidirectional matching method can be applied.
Alternatively, the unidirectional matching method may be inserted as a motion information merge candidate in motion information merge mode or a motion vector prediction candidate in motion vector prediction mode. In this case, whether to apply the unidirectional matching method may be determined based on whether the selected motion information merge candidate or the selected motion vector prediction candidate indicates the unidirectional matching method.
Intra-prediction is a method of obtaining a prediction block for the current block using reference samples having spatial similarity to the current block. Reference samples used for intra-prediction may be reconstructed samples. As an example, a previously reconstructed sample around the current block may be set as a reference sample. Alternatively, in a case where it is determined that a reconstructed sample at a specific position is unavailable, an adjacent reconstructed sample may be set as a reference sample at a specific position.
Unlike what has been described, an original sample may also be set as a reference sample.
As in the above-mentioned example, a method of performing motion estimation in the decoder in the same manner as that in the encoder, that is, at least one of the template-based motion estimation method, the bidirectional estimation method, or the unidirectional estimation method, may be defined as an inter-prediction mode. Here, in a case where a plurality of decoder-side motion estimation methods is defined as an inter-prediction mode, an index indicating one of the plurality of decoder-side motion estimation methods may be encoded and signaled along with a flag indicating whether a decoder-side motion estimation method is applied. As an example, an index indicating at least one of the template-based motion estimation method, the bidirectional estimation method, or the unidirectional estimation method may be encoded and signaled.
Intra-prediction may be performed based on at least one of a plurality of intra-prediction modes predefined in the encoder and decoder.
The intra-prediction modes predefined in the encoder and decoder may include non-directional intra-prediction modes and directional prediction modes. For example, in the example shown in
More or fewer intra-prediction modes than illustrated may be predefined in the encoder and decoder.
One of the predefined intra-prediction modes can be selected, and a prediction block for the current block can be obtained based on the selected intra-prediction mode. At this time, the number and positions of reference samples used to generate prediction samples within the prediction block may be adaptively determined according to the selected intra-prediction mode.
In the example shown in
P1 represents a prediction sample in the horizontal direction, and P2 represents a prediction sample in the vertical direction. P1 can be generated by linearly interpolating a reference sample having the same y coordinate as P1 (i.e., a reference sample located in the horizontal direction of P1) and the reference sample T. P2 can be generated by linearly interpolating the reference sample L and a reference sample having the same x coordinate as P2 (i.e., a reference sample located in the vertical direction of P2).
Thereafter, the final prediction sample can be obtained through a weighted sum operation of the horizontal prediction sample P1 and the vertical prediction sample P2. Equation 1 represents an example of generating the final prediction sample.
In equation 1, α indicates a weight assigned to the horizontal prediction sample P1 and β indicates a weight assigned to the vertical prediction sample P2. The weights α and β can be determined based on the width and height of the current block. Depending on the width and height of the current block, the weights α and β may have the same value or different values. For example, if one side of the block is longer than the other side, the weight assigned to the prediction sample in the direction parallel to the long side can be set to have a larger value. Alternatively, on the other hand, the weight assigned to the prediction sample in the direction parallel to the long side may be set to have a smaller value.
In the DC mode, the average value of reference samples surrounding the current block can be calculated.
Depending on the type of the current block, the average value may be calculated using only the upper reference samples or only the left reference samples. For example, if the width of the current block is greater than the height, or if the ratio between the width and height of the current block is equal to or greater than (or less than) a predefined value, the average value can be calculated using only the upper reference samples.
On the other hand, if the width of the current block is smaller than the height, or if the ratio between the width and the height of the current block is less than (or greater than) a predefined value, the average value can be calculated using only the left reference samples.
In a case where the directional intra-prediction mode is applied to the current block, projection can be performed at each sample position in the current block in the direction in which a reference sample is located according to the angle of the directional intra-prediction mode.
If a reference sample is present at the projected position (that is, if the projected position is an integer position of a reference sample), the reference sample at the corresponding position can be set as a prediction sample.
On the other hand, if there is no reference sample at the projected position (i.e., if the projected position is a fractional position of a reference sample), reference samples around the projected position can be interpolated and the interpolated value can be set as a prediction sample.
For example, in the example shown in
On the other hand, when projection based on the angle of the directional intra-prediction mode is performed at the position of a sample A in the current block, there is no reference sample at the projected position. In this case, integer position reference samples present around the projected position can be interpolated, and the interpolated value can be set as a prediction sample for the position of the sample A. Here, the value generated by interpolating integer position reference samples may be called a fractional position reference sample (r in
As described above, a prediction block for the current block may be generated through inter-prediction or intra-prediction. At this time, inter-prediction may be performed based on at least one of a plurality of inter-prediction modes, and the plurality of inter-prediction modes include at least one of the motion vector merge mode, the motion vector prediction mode, the template-based motion estimation method, the bidirectional matching method, or the unidirectional matching method.
In the following embodiments, among the inter-prediction modes, an inter-prediction mode (i.e., the template-based motion estimation method, bidirectional matching method and/or unidirectional matching method) in which the decoder performs motion estimation in the same manner as the encoder to generate a prediction block will be referred to as a decoder-side motion estimation mode for convenience of description. In addition, among the inter-prediction modes, an inter-prediction mode (i.e., the motion information merge mode and/or motion vector prediction mode) in which information generated by motion estimation in the encoder is explicitly encoded and signaled will be referred to as a motion information signaling mode.
According to an embodiment of the present disclosure, a prediction block for the current block can be generated by combining two or more inter-prediction methods. As an example, a plurality of prediction blocks may be generated on the basis of the plurality of inter-prediction methods, and then a final prediction block for the current block may be generated on the basis of the plurality of prediction blocks.
The above inter-prediction method may be called a combined prediction method.
For convenience of description, it is assumed that the number of prediction blocks used to generate the final prediction block is two in an example which will be described later.
Information indicating whether a combined prediction method is applied may be explicitly encoded and signaled. The information may be a 1-bit flag.
At the time of combining two different inter-prediction modes, information for identifying each of the two inter-prediction modes may be additionally encoded/decoded. As an example, two or more of a flag indicating whether template-based motion estimation is applied, a flag indicating whether the bidirectional matching method is applied, a flag indicating whether the unidirectional matching method is applied, and a flag indicating whether a motion information merge mode is applied may be encoded and signaled.
Alternatively, a plurality of combination candidates formed by combining two of the inter-prediction modes may be predefined, and an index for identifying one of the plurality of combination candidates may be encoded and signaled.
Hereinafter, a combined prediction method will be described in detail.
First, a first inter-prediction mode can be applied to the current block to generate a first prediction block (S1510). Here, the first inter-prediction mode may be any one of the motion vector merge mode, the motion vector prediction mode, the template-based motion estimation method, the bidirectional matching method, or the unidirectional matching method.
A second inter-prediction mode can be applied to the current block to generate a second prediction block (S1520). Here, the second inter-prediction mode may be any one of the motion vector merge mode, the motion vector prediction mode, the template-based motion estimation method, the bidirectional matching method, and the unidirectional matching method, which is different from the first inter-prediction mode.
Alternatively, one of the first inter-prediction mode and the second inter-prediction mode may be forced to be set as the decoder-side motion estimation mode, and the other may be forced to be set as the motion information signaling mode.
The example shown in
Specifically, the template-based motion estimation method can be applied in the L0 direction to generate the first prediction block (i.e., L0 prediction block) for the current block. Specifically, the current template and a reference template with the lowest cost within the search range of the L0 reference picture can be searched. Upon determination of the reference template with the lowest cost, the distance between the current template and the reference template can be set as a motion vector for the L0 direction.
Additionally, a reference block neighboring the reference template can be set as the first prediction block (i.e., L0 prediction block) for the current block.
A general motion estimation method can be applied in the L1 direction to generate a second prediction block (i.e., L1 prediction block) for the current block. Specifically, the current block and a reference block with the lowest cost within the search range of the L1 reference picture can be searched for. Upon determination of the reference block with the lowest cost, the distance between the current block and the reference template can be set as a motion vector for the L1 direction.
Additionally, the reference block can be set as the second prediction block (i.e., L1 prediction block) for the current block.
In a case where the template-based motion estimation method is applied, information indicating that the template-based motion estimation method has been applied is signaled, and motion information (e.g., a motion vector) need not be additionally signaled. That is, for the L0 direction, it is sufficient to encode and signal the information indicating that the template-based motion estimation method has been applied. The decoder can determine whether the template-based motion estimation method has been applied on the basis of the information. In a case where it is determined that the template-based motion estimation method has been applied, the decoder can generate the motion vector and/or the first prediction block through the template-based motion estimation method in the same manner as the encoder.
That is, the decoder can set the template-based motion estimation method as the first inter-prediction mode, and perform template-based motion estimation to generate the first prediction block.
On the other hand, in a case where a general motion estimation method is applied, information generated based on the motion vector merge mode or the motion vector prediction mode needs to be explicitly encoded and signaled.
Accordingly, the decoder can set the motion information merge mode or the motion vector prediction mode as the second inter-prediction mode and generates the second prediction block on the basis of motion information derived based on information parsed from a bitstream. For example, if the second inter-prediction mode is the motion information merge mode, motion information can be derived based on a motion information merge index parsed from the bitstream. Alternatively, if the second inter-prediction mode is the motion vector prediction mode, motion information can be derived based on a motion vector prediction flag and a motion vector difference parsed from the bitstream.
As in the example shown in
As another example, bidirectional prediction (i.e., L0 and L1 prediction) may be performed in the first inter-prediction mode, and unidirectional prediction (i.e., L0 or L1 prediction) may be performed in the second inter-prediction mode. On the other hand, unidirectional prediction may be performed in the inter-prediction first mode, and bidirectional prediction may be performed in the second inter-prediction mode.
Alternatively, bidirectional prediction may be performed in both the first inter-prediction mode and the second inter-prediction mode.
Alternatively, the prediction direction of the second inter-prediction mode may be set depending on the prediction direction of the first inter-prediction mode. For example, when the first inter-prediction mode is applied to L0 direction prediction, the second inter-prediction mode may be set to be applied to L1 direction prediction or bidirectional prediction.
Alternatively, regardless of the prediction direction, the first inter-prediction mode and the second inter-prediction mode may be selected.
In the example shown in
As another example, unlike the illustrated example, forward reference pictures may be used in both the L0 direction and L1 direction, or backward reference pictures may be used in both the L0 and L1 directions.
When the first prediction block and the second prediction block are generated, the final prediction block for the current block can be generated through a weighted sum operation of the first prediction block and the second prediction block (S1530).
The following equation 2 represents an example of generating the final prediction block for the current block through a weighted sum operation.
In equation 2, P indicates the final prediction block for the current block, and P0 and P1 indicate the first prediction block and the second prediction block, respectively. In addition, [x, y] represents the coordinates of a sample in the current block. Additionally, W represents the width of the current block, and H represents the height of the current block.
In the above example, a weight applied to the first prediction block P0 is w0, and a weight applied to the second prediction block P1 is 1-w0. The weight 1-w0 may be applied to the first prediction block P0, and the weight w0 may be applied to the second prediction block P1.
In the weighted sum operation, the same weight may be assigned to P0 and P1. That is, by setting w0 to ½, the average value of P0 and P1 can be set as the final prediction block.
As another example, the weight assigned to each prediction block may be adaptively set depending on the inter-prediction mode used to generate each prediction block. As an example, the weight assigned to a prediction block generated in the decoder-side motion estimation mode may be set to a larger value than the weight assigned to a prediction block generated in the motion information signaling mode. In
As another example, the weight may be determined based on the POC of the first reference picture used to generate the first prediction block and the POC of the second reference picture used to generate the second prediction block. As an example, the weight w0 may be determined based on the ratio between the absolute value of the POC difference between the first reference picture and the current picture and the absolute value of the POC difference between the second reference picture and the current picture. A higher weight may be assigned to a prediction block derived from a reference picture having a smaller absolute value of the POC difference from the current picture among the first and second reference pictures.
As another example, the weight applied to each prediction block can be determined based on a predefined weight table. Here, the weight table may be a lookup table in which different indices are assigned to weight candidates that can be set as the weight w0.
As an example, a lookup table in which different indices are assigned to five weight candidates may be previously stored in the encoder and decoder. For example, a weight table [ 4/8, ⅝, ⅜, 10/8, − 2/8] including five weight candidates to which indices 0 to 4 are assigned according to the listed order may be predefined. In the lookup tables, index information for identifying a candidate value having the same value as w0 may be explicitly encoded and signaled.
In order to perform integer operation-based processing, it is also possible to scale up the candidate values in the weight table by N times and then perform a weighted sum operation on the first prediction block P0 and the second prediction block P1. In this case, normalization needs to be performed by scaling down the weighted sum result based on N.
The number of weight candidates and/or the values of the weight candidates included in the weight table may be adaptively determined on the basis of at least one of the inter-prediction mode, the size of the current block, the shape of the current block, the temporal direction of reference picture (s), or the temporal order of the reference picture (s).
For example, in a case where the first prediction block is generated based on the decoder-side motion estimation mode, the weight w0 can be selected from N weight candidates. On the other hand, in a case where the first prediction block is generated based on the motion information signaling mode, the weight w0 can be selected from M weight candidates. Here, M may be a natural number different from N.
In the above example, a combined prediction method is applied by combining the first inter-prediction mode and the second inter-prediction mode different from the first inter-prediction mode. Unlike the example, the final prediction block for the current block may be generated by generating two prediction blocks on the basis of one inter-prediction mode.
For example, in the example shown in
Alternatively, two prediction blocks may be generated by applying the template-based motion estimation method to each of two forward reference pictures or two backward reference pictures.
Alternatively, two prediction blocks may be generated by selecting a plurality of motion information merge candidates from a motion information merge list. As an example, a first prediction block may be derived based on first motion information derived from a first motion information merge candidate, and a second prediction block may be derived based on second motion information derived from a second motion information merge candidate.
As described above, in a case where the decoder-side motion estimation mode is included in the motion information merge list as one of motion information merge candidates, at least one of the first motion information merge candidate or the second motion information merge candidate may be forced to indicate a motion information merge candidate corresponding to the decoder-side motion estimation mode.
In a case where bidirectional prediction is performed through the first inter-prediction mode and unidirectional prediction is performed through the second prediction mode, a total of three prediction blocks can be generated for the current block. For example, in a case where the first inter-prediction mode is the motion information merge mode and motion information derived based on the motion information merge list has bidirectional motion information, an L0 prediction block and an L1 prediction block can be generated using the motion information. In a case where the second inter-prediction mode is the template-based motion estimation method, and the template-based motion estimation method is applied only to the L0 direction, one L0 prediction block can be generated. In this case, for the current block, three prediction blocks (i.e., two L0 prediction blocks and one L1 prediction block) can be generated.
In this case, the prediction block generated by performing a weighted sum operation (or averaging operation) on an L0 prediction block and an L1 prediction block generated based on the first inter-prediction mode may be set as a first prediction block, an L0 prediction block generated based on the second inter-prediction mode may be set as a second prediction block, and then the final prediction block for the current block may be generated through equation 2. Alternatively, the prediction block generated by performing a weighted sum operation (or averaging operation) on two L0 prediction blocks may be set as the first prediction block, one L1 prediction block may be set as the second prediction block, and then the final prediction block for the current block may be generated through equation 2.
Even when each of the first inter-prediction mode and the second inter-prediction mode is applied to bidirectional prediction, the final prediction block for the current block can be generated in the same manner as above. For example, if the first inter-prediction mode is the motion information merge mode and motion information derived based on the motion information merge list has bidirectional motion information, an L0 prediction block and an L1 prediction block can be generated using the motion information. If the second inter-prediction mode is the template-based motion estimation method, and the template-based motion estimation method is applied to each of the L0 direction and the L1 direction, an L0 prediction block and an L1 prediction block can be generated. In this case, for the current block, four prediction blocks (i.e., two L0 prediction blocks and two L1 prediction blocks) may be generated.
In this case, the prediction block generated by performing a weighted sum operation (or averaging operation) on an L0 prediction block and an L1 prediction block generated based on the first inter-prediction mode may be set as a first prediction block, and the prediction block generated by performing a weighted sum operation (or averaging operation) on an L0 prediction block and an L1 prediction block generated based on the second inter-prediction mode may be set as a second prediction block. Alternatively, the prediction block generated by performing a weighted sum operation (or averaging operation) on an L0 prediction block generated based on the first inter-prediction mode and an L0 prediction block generated based on the second inter-prediction mode may be set as the first prediction block, and the prediction block generated by performing a weighted sum operation (or averaging operation) on an L1 prediction block generated based on the first inter-prediction mode and an L1 prediction block generated based on the second inter-prediction mode may be set as the second prediction block.
Alternatively, a combined prediction method may be performed using three or more inter-prediction modes. In this case, three or more prediction blocks can be generated for the current block, and the final prediction block for the current block can be generated by performing a weighted sum operation on three prediction blocks. At this time, each of the three prediction blocks may be generated based on the decoder-side motion estimation mode or the motion information signaling mode.
In this case, the result of a weighted sum (or averaging) operation performed on L0 prediction blocks can be set as the first prediction block, and the result of a weighted sum (or averaging) operation performed on L1 prediction blocks can be set as the second prediction block. Alternatively, the result of a weighted sum (or averaging) operation performed on prediction blocks derived based on the decoder-side motion estimation mode among a plurality of prediction blocks may be set as the first prediction block, and the result of a weighted sum (or averaging) operation performed on prediction blocks derived based on the motion information signaling mode may be set as the second prediction block.
As another example, in a case where three or more prediction blocks are generated for the current block, the final prediction block for the current block may be derived based on a weighted sum operation of the three or more prediction blocks. At this time, the weight applied to each prediction block may be determined based on a predefined weight table. To this end, information for identifying one of weight candidates included in the weight table may be explicitly encoded and signaled.
The template-based motion estimation method may be applied a plurality of times to generate a plurality of prediction blocks for the current block. Specifically, the template-based motion estimation method may be applied to each of reference pictures that can be used by the current block, and based on this, up to N reference blocks may be selected from each reference picture. Here, N is 1 or an integer greater than 1. For example, if N is 1, as many reference blocks as the number of reference pictures can be generated.
Alternatively, by applying the template-based motion estimation method to each of the reference pictures, M reference blocks may be derived based on costs with respect to the current template. As an example, it is assumed that template-based motion estimation is applied to each of L reference pictures, and as a result, a reference block with the optimal cost in each reference picture is selected. Among the L reference blocks selected through the above process, M reference blocks can be selected in descending order of cost with respect to the current template. Here, M is an integer of 1 or more.
In the above example, N and/or M may have predefined values in the encoder and the decoder. Alternatively, N and/or M may be determined based on at least one of the size of the current block, the shape of the current block, or the number of samples in the current block.
By performing a weighted sum operation on a plurality of reference blocks derived through the above process, the final prediction block for the current block can be obtained. At this time, the weight assigned to each reference block may be determined based on the cost ratio between reference templates. As an example, if the final prediction block for the current block is obtained by performing a weighted sum operation on two reference blocks, the weight assigned to each of the first reference block and the second reference block may be determined by a ratio between a cost a for a first reference template around the first reference block and a cost b for a second reference template around the second reference block. For example, a weight of b/(a+b) may be applied to the first reference block, and a weight of a/(a+b) may be applied to the second reference block.
As another example, the ratio between the cost a for the first reference template and the cost b for the second reference template or the difference between the cost a and the cost b may be compared with a threshold value, and then the weights to be applied to the first reference block and the second reference block may be determined. For example, if the ratio or the difference does not exceed the threshold value, the same weight may be applied to the first reference block and the second reference block. Otherwise, different weights may be determined for the first reference block and the second reference block.
If the ratio or the difference exceeds the threshold value, a reference block with the optimal cost among the plurality of reference blocks may be selected as the prediction block for the current block.
In a case where the final prediction block is generated based on a plurality of prediction blocks, it is possible to perform a weighted sum operation only on some areas of the current block instead of the entire area of the current block.
As an example, a final prediction sample included in a first area in the current block may be obtained through a weighted sum operation of a prediction sample included in the first prediction block and a prediction sample included in the second prediction block. On the other hand, a final prediction sample included in a second area in the current block may be set to a prediction sample included in the first prediction block or may be set to a prediction sample included in the second prediction block. That is, in areas where a weighted sum operation is not performed, a value copied from a reference sample in the first reference picture or the second reference picture may be set as the final prediction sample.
An area where the weighted sum operation is applied may be determined on the basis of at least one of the distance from a specific boundary of the current block, the distance from a reconstructed pixel around the current block, the size of the current block, the shape of the current block, the number of samples in the current block, the inter-prediction mode used to obtain the motion vector of the current block, or whether bidirectional prediction is applied to the current block.
As another example, information for identifying an area where the weighted sum operation is applied may be explicitly encoded and signaled through a bitstream. As an example, the information may identify at least one of the position or angle of a partitioning line that separates an area where the weighted sum operation is applied (hereinafter referred to as a weighted sum area) and the other area (hereinafter referred to as an unweighted sum area).
According to the examples shown in
Each of the above four factors may be encoded and signaled as a 1-bit flag. That is, a partition type of the current block can be determined by a plurality of flags.
For example, referring to the example in
Additionally, it is illustrated that the second (i.e., right) partition of the two partitions is designated as a weighted sum area. Accordingly, a flag indicating the ratio of the weighted sum area to the unweighted sum area is set to indicate 3:1, and a flag indicating the position of the weighted sum area is set to indicate the second partition.
Referring to the example shown in
Since the second (i.e., right) partition of the two partitions is designated as a weighted sum area, the flag indicating the position of the weighted sum area is set to indicate the second partition.
As another example, an index indicating one of a plurality of partition type candidates may be encoded and signaled. The plurality of partition type candidates may be configured as shown in
Encoding/decoding of information indicating the ratio between the weighted sum area and the unweighted sum area may be omitted, and the larger or smaller partition of the two partitions may be set as the weighted sum area. For example, in the above-mentioned example, if the current block is partitioned at a ratio of 1:3 or 3:1, encoding/decoding of the flag indicating the position of the weighted sum area may be omitted, and the larger or smaller of the two partitions may be set as the weighted sum area.
In the above-described example, the ratio between the weighted sum area and the unweighted area is 1:1, 1:3, or 3:1. Unlike the example, the current block may be partitioned such that the ratio between the weighted sum area and the unweighted area is 1:15, 15:1, 1:31, or 31:1.
Instead of the flag indicating whether the weighted sum area and the unweighted area within the current block are partitioned in a symmetrical form, information indicating a ratio occupied by the weighted sum area or the unweighted area within the current block may be encoded and signaled, or an index corresponding to the ratio among a plurality of ratio values may be encoded and signaled.
Unlike the examples shown in
In
The current block may be partitioned by a partitioning line perpendicular to a partitioning direction shown in
For example, as in the example shown in
As in the examples shown in
When mode #5 in
Additionally, information indicating a weighted sum area among the two areas may be additionally encoded and signaled. Alternatively, the larger or smaller of the two areas may be designated as a default weighted sum area.
In the above example, a prediction sample is obtained by a weighted sum operation of the first prediction block and the second prediction block in the weighted sum area, and a prediction sample is obtained using only the first prediction block or the second prediction block in the unweighted sum area.
At this time, information indicating whether the first prediction block or the second prediction block is used to derive the prediction sample of the unweighted sum area may be explicitly encoded and signaled.
Alternatively, the prediction block used to derive the prediction sample of the unweighted sum area may be determined according to priorities of the first prediction block and the second prediction block. Here, the priorities of the first prediction block and the second prediction block may be determined on the basis of at least one of the temporal distance between the current picture and a reference picture, the temporal direction of the reference picture, the inter-prediction mode, the position of the unweighted sum area, or the cost with respect to the current template.
For example, a prediction block derived from a reference picture having a shorter distance from the current picture (i.e., smaller absolute value of the POC difference) between first and second reference pictures may have a higher priority. That is, if the temporal distance between the first reference picture and the current picture is less than the temporal distance between the current picture and the second reference picture, a prediction sample of the unweighted sum area can be derived using the first prediction block.
Alternatively, one of the first prediction block and the second prediction block may be selected in consideration of the cost between the current template and a reference template. As an example, the cost between the current template and a first reference template composed of reconstructed areas around the first prediction block (i.e., the first reference block) is compared with the cost between the current template and a second reference template composed of reconstructed areas around the second prediction block (i.e., the second reference block), and then a prediction block adjacent to the reference template with a lower cost can be used to derive a prediction sample of the unweighted sum area.
A prediction sample included in the weighted sum area may be derived by performing a weighted sum operation on a prediction sample included in the first prediction block and a prediction sample included in the second prediction block. For example, as in the example shown in
A prediction sample of the unweighted sum area may be generated by copying the prediction sample included in the first prediction block or the second prediction block. In the example shown in
Filtering may also be performed at the boundary of the weighted sum area and the unweighted sum area. Filtering may be performed for smoothing between prediction samples included in the weighted sum area and prediction samples included in the unweighted sum area.
Filtering may be performed by assigning a first weight to a first prediction sample included in the weighted sum area and assigning a second weight to a second prediction sample included in the unweighted sum area. Here, the first prediction sample may be generated by performing a weighted sum operation on the prediction samples included in the first prediction block and the second prediction block, and the second prediction sample may be generated by copying the prediction sample included in the first prediction block or the second prediction block.
At this time, the weights assigned to the first prediction sample and the second prediction sample may be adaptively determined depending on location.
In
As in the example shown in
Upon determination of a weight for each position, a final prediction sample can be obtained through a weighted sum operation of the first prediction sample and the second prediction sample. The following equations 3 to 5 represent an example of obtaining a filtered prediction sample.
In equation 3, P0 indicates the first prediction sample and P1 indicates the second prediction sample. W indicates a weight matrix, and W Max indicates the sum of maximum and minimum values in the weight matrix. For example, referring to the example of
In the above example, the current block is partitioned into a weighted sum area and an unweighted area, and then different prediction sample derivation methods are set for the respective areas. Unlike what has been described, a prediction sample may be derived by a weighted sum operation of the first prediction block and the second prediction block over the entire area of the current block, and weights applied to the first prediction block and the second prediction block may be set differently for each sample position, as shown in
The encoder determines motion information on the current block through motion estimation (S2310 and S2410).
When the motion information merge mode is applied, the encoder generates a motion information merge list (S2320) and selects a motion information merge candidate that has motion information identical or similar to the motion information on the current block (S2330). Then, an index indicating the selected motion information merge candidate can be encoded (S2340).
The decoder generates a motion information merge list in the same manner as the encoder (S2350). Then, the decoder selects a motion information merge candidate in the motion information merge list on the basis of an index decoded from a bitstream (S2360). The decoder can obtain a prediction block for the current block by setting motion information on the selected motion information merge candidate as the motion information on the current block (S2370).
When the motion vector prediction mode is applied, the encoder generates a motion vector prediction list (S2420) and selects a motion vector prediction candidate similar to the motion vector of the current block (S2430). Then, the encoder derives a motion vector difference value by subtracting a motion vector prediction value from the motion vector (S2440). Thereafter, the encoder can encode an index indicating the selected motion vector prediction candidate and motion vector difference value (S2450).
The decoder generates a motion vector prediction list in the same manner as the encoder (S2460). Then, the decoder selects a motion vector prediction candidate in the motion vector prediction list on the basis of an index decoded from a bitstream (S2470). The decoder adds a motion vector difference value decoded from the bitstream to the selected motion vector prediction candidate to derive the motion vector of the current block (S2480). Thereafter, the decoder can obtain a prediction block for the current block on the basis of the derived motion vector (S2490).
As described above, in the motion information signaling mode, a motion information merge list or a motion vector prediction list is used. At this time, motion information merge candidates or motion vector prediction candidates derived on the basis of neighboring blocks adjacent to the current block are inserted into the motion information merge list or the motion vector prediction list in a predefined order.
As an example, when the motion information merge mode is applied, motion information merge candidates can be inserted into the motion information merge list in the order of A1, B1, B0, A0, B5, and Col in
Additionally, indices in ascending order from 0 may be assigned according to the order of insertion into the motion information merge list. For example, if six motion information merge candidates are inserted into the motion information merge list, indices of 0 to 5 can be assigned to the six motion information merge candidates.
The encoder can explicitly encode and signal index information indicating one of the motion information candidates in the motion information merge list. In this case, the explicitly encoded index information may indicate one of 0 to 5.
However, since indices have different binarized bin lengths, it is necessary to increase the probability of a motion information merge candidate with a small index being selected rather than the probability of a motion information candidate with a large index being selected.
To this end, the decoder needs to actively determine ranking among motion information merge candidates or motion vector prediction candidates. Here, the ranking may indicate the order of insertion into the motion information merge list or the motion vector prediction candidate list. Alternatively, the ranking may indicate an arrangement order for rearranging motion information merge candidates or motion vector merge candidates inserted into the list according to a predefined order.
A method of determining the insertion order and/or arrangement order which will be described later may be applied to each of the motion information merge list and the motion vector prediction list. In consideration of this, a motion information merge candidate and/or a motion vector prediction candidate are referred to as motion information candidates, and the motion information merge list and/or motion vector prediction list are referred to as motion information lists. That is, when the motion information merge mode is applied, the motion information candidate and the motion information list may mean a motion information merge candidate and a motion information merge list. On the other hand, when the motion vector prediction mode is applied, the motion information candidate and the motion information list may mean a motion vector prediction candidate and a motion vector prediction list.
Hereinafter, a method of determining the insertion order or the arrangement order will be described in detail.
The insertion order or arrangement order of motion information candidates may be adaptively determined depending on the size of the motion information list. Here, the size of the motion information list represents the maximum number of motion information candidates that can be included in the motion information list.
The maximum number of motion information merge candidates that can be included in the motion information merge list and the maximum number of motion vector prediction candidates that can be included in the motion vector prediction list may be different. For example, the maximum number of candidates that can be included in the motion information merge list may be 5, 6, or 7, whereas the maximum number of candidates that can be included in the motion vector prediction list may be 2 or 3.
Alternatively, information for determining the insertion order or arrangement order of motion information candidates may be explicitly encoded and signaled. For example, different indices may be assigned to a plurality of insertion order candidates or a plurality of arrangement order candidates. Additionally, index information indicating one of the plurality of insertion order candidates or the plurality of arrangement order candidates may be explicitly encoded and signaled.
Table 1 shows insertion order candidates or arrangement order candidates for a motion vector prediction list of size 3.
A candidate for B position represents a motion vector prediction candidate derived from a sample neighboring the top of the current block. Deriving N candidates for B position means selecting N available motion vector prediction candidates discovered by scanning top neighboring samples in a predefined order. For example, in the example of
Alternatively, only some of the top neighboring samples may be set as search targets. For example, in the example of
A candidate for A position represents a motion vector prediction candidate derived from a sample neighboring the left side of the current block. Deriving N candidates for A position means selecting N available motion vector prediction candidates discovered by scanning left neighboring samples in a predefined order. For example, in the example of
Alternatively, only some of the left neighboring samples may be set as search targets. For example, in the example of
Table 2 shows insertion order or arrangement order candidates for a motion information merge list of size 5.
Index information indicating one of a plurality of insertion order candidates or a plurality of arrangement order candidates may be encoded and signaled in units of blocks.
Alternatively, index information may be signaled through an upper header. Here, the upper header may indicate a slice header, a picture header, a picture parameter set, or a sequence parameter set.
Alternatively, index information may be encoded and signaled in units of regions where parallel processing is performed. A region where parallel processing is performed may represent a sub-picture, a slice, a tile, a CTU, or a coding unit having a predefined size (e.g., Merge Estimation Region (MER)).
Upon selection of one of the plurality of insertion order candidates, motion information candidates can be derived according to the selected insertion order, and the motion information candidates can be inserted into the motion information list in the derived order. In this case, indices assigned to the motion information candidates may be determined in ascending order of insertion into the motion information list.
Alternatively, upon selection of one of the plurality of arrangement order candidates, motion information candidates previously inserted into the motion information list can be rearranged according to the arrangement order. For example, if the arrangement order indicated by index 1 in Table 1 is selected in a state in which the motion vector prediction list has been composed of candidates in the order of B0, A0, and Col, the motion vector prediction list can be rearranged in the order of A0, B0, and Col.
The insertion order or arrangement order may be determined on the basis of at least one of the size of the current block, the shape of the current block, or the ratio between the width and height of the current block. For example, if the height of the current block is greater than the width, a motion information candidate derived from top neighboring samples (i.e., candidates for B position) may be inserted into the motion information merge list prior to a motion information candidate derived from left neighboring samples (i.e., candidates for A position). On the other hand, if the width of the current block is greater than the height, a motion information candidate derived from the left neighboring samples (i.e., the candidates for A position) may be inserted into the motion information list prior to a motion information candidate derived from the top neighboring samples (i.e., the candidates for B position).
Further, the maximum number of motion information candidates (hereinafter referred to as top motion information candidates) that can be derived from top neighboring samples and/or the maximum number of motion information candidates (hereinafter referred to left as motion information candidates) that can be derived from left neighboring samples may be adjusted on the basis of at least one of the size of the current block, the shape of the current block, or the ratio between the width and height of the current block.
For example, if the height of the current block is greater than the width, the maximum number of top motion information candidates can be set to a value greater than the maximum number of left motion information candidates. For example, if the height of the current block is greater than the width, two top motion information candidates can be derived and only one left motion information candidate can be derived. Alternatively, if the height of the current block is greater than the width, only the top motion information candidates and a temporal motion information candidate (i.e., Col) may be inserted into the motion information list.
On the other hand, if the width of the current block is greater than the height, the maximum number of left motion information candidates can be set to a value greater than the maximum number of top motion information candidates. For example, if the width of the current block is greater than the height, two left motion information candidates can be derived and only one top motion information candidate can be derived. Alternatively, if the width of the current block is greater than the height, only the left motion information candidates and the temporal motion information candidate (i.e., Col) may be inserted into the motion information list.
An insertion order or arrangement order of motion information candidates may be determined depending on the number of available motion information merge candidates. Specifically, the insertion order or arrangement order may be determined by comparing the number of available top motion information candidates with the number of available left motion information candidates. For example, if the number of available top motion information candidates is greater than the number of available left motion information candidates, at least one top motion information candidate can be inserted into the motion information list prior to the left motion information candidates. On the other hand, if the number of available left motion information candidates is greater than the number of available top motion information candidates, at least one left motion information candidate can be inserted into the motion information list prior to the top motion information candidates.
Instead of explicitly encoding and signaling index information indicating the insertion order and/or the arrangement order, the insertion order and/or the arrangement order may be adaptively determined on the basis of a parameter derived in the same manner in the encoder and the decoder. Here, the parameter may be derived on the basis of at least one of the size of the current block, the shape of the current block, the ratio between the width and height of the current block, or the number of available motion information candidates.
As another example, the insertion order and/or arrangement order of motion information may be determined using a template. Specifically, the insertion order and/or arrangement order of motion information may be determined on the basis of the cost between the current template and a reference template derived by motion information indicated by a motion information candidate.
For convenience of description, it is assumed that motion information merge candidates included in the motion information merge list are rearranged using a template.
A reference template can be designated on the basis of motion information of each motion information merge candidate included in the motion information merge list of the current picture. For example, if the motion information merge list includes five motion information merge candidates A0, A1, B0, B1, and B5, five reference templates can be derived on the basis of motion information of each motion information merge candidate.
For example, as in the example shown in
Although it is assumed that all five reference templates are present in the same reference picture in
Thereafter, the cost with respect to the current template can be calculated for each of the reference templates. Upon calculation of the costs, motion information merge candidates can be rearranged in ascending order of costs. That is, a smaller index can be assigned to a motion information merge candidate used to derive a reference template with a lower cost than to a motion information merge candidate used to derive a reference template with a large cost.
The above method of determining an insertion order and/or arrangement order based on a template can also be applied to the motion vector prediction list. In this case, a reference template for each motion vector prediction candidate may be derived on the basis of a vector derived by adding a motion vector difference value to each motion vector prediction candidate. At this time, information on the motion vector difference value may be explicitly signaled through a bitstream.
As an example, it is assumed that the motion vector prediction list includes one top motion vector prediction candidate and one left motion vector prediction candidate. In this case, a reference template for the top motion vector prediction candidate can be set on the basis of a vector derived by adding a motion vector difference value to the top motion vector prediction candidate. Additionally, a reference template for the left motion vector prediction candidate can be set on the basis of a vector derived by adding the motion vector difference value to the left motion vector prediction candidate.
As another example, a reference template may be derived on the basis of a vector indicated by a motion vector prediction candidate.
Thereafter, for each of the reference templates, the cost with respect to the current template can be calculated and the motion vector prediction candidates in the motion vector prediction list can be rearranged.
N candidates for B position or N candidates for A position may be derived in consideration of costs with respect to the current template. For example, if one candidate for B position is derived, a reference template can be derived on the basis of motion information of each of B0 to B5. Then, a motion information candidate can be derived on the basis of motion information used to derive a reference template with the lowest cost with respect to the current template.
If one candidate for A position is derived, a reference template can be derived on the basis of motion information of each of A0 to A4. Thereafter, a motion information candidate can be derived on the basis of the motion information used to derive a reference template with the lowest cost with respect to the current template.
Alternatively, after deriving M candidates for B position and N candidates for A position, L motion information candidates among the (M+N) candidates may be derived. That is, among the (M+N) candidates, L candidates with the lowest cost can be inserted into the motion information list as motion information candidates.
An insertion order and/or arrangement order of motion information candidates may be determined on the basis of the bidirectional matching method.
A motion vector prediction list for the current block can be generated for each of the L0 direction and the L1 direction.
Here, an L0 reference block can be derived on the basis of an L0 motion vector of each L0 motion vector prediction candidate included in the L0 motion vector prediction list. Here, an L0 motion vector may mean the value indicated by an L0 motion vector prediction candidate. Alternatively, the L0 motion vector may be derived by adding an L0 motion vector difference value to the L0 motion vector prediction candidate.
Thereafter, an L1 motion vector having the same magnitude as and opposite sign to the L0 motion vector of each L0 motion vector prediction candidate is derived. Then, for each of the L0 motion vector prediction candidates, an L1 reference block can be derived using the L1 motion vector.
For example, as in the example shown in
Similarly, an L0 reference block for A1 may be derived on the basis of the L0 motion vector of the motion vector prediction candidate A1, and an L1 reference block for A1 may be derived on the basis of the L1 motion vector having the same magnitude as and opposite sign to the L0 motion vector.
Then, by comparing the cost between the L0 reference block and the L1 reference block for B0 and the cost between the L0 reference block and the L1 reference block for A1, the insertion order and/or arrangement order of the motion vector prediction candidate B0 and the motion vector prediction candidate A1 can be determined. Specifically, the insertion order and/or arrangement order may be determined such that a smaller index is assigned to a motion vector prediction candidate with a lower cost.
For example, upon determination of the arrangement order of L0 motion vector prediction candidates by comparing costs, the L0 motion vector prediction candidates included in the L0 motion vector prediction list can be rearranged in the arrangement order.
The insertion order and/or arrangement order of motion vector prediction candidates can be determined in the same manner for the L1 motion vector prediction list. That is, an L1 reference block can be derived on the basis of an L1 motion vector of an L1 motion vector prediction candidate, and an L0 reference block can be derived on the basis of an L0 motion vector having the same magnitude as and opposite sign to the L1 motion vector. Thereafter, costs for L1 motion vector prediction candidates can be compared to determine the insertion order and/or arrangement order.
As in the example shown in
Unlike the example shown in
For example, in a case where the insertion order and/or arrangement order for the L0 motion vector prediction list are determined, an L0 reference picture may be determined on the basis of an L0 reference picture index explicitly signaled through a bitstream. On the other hand, an L1 reference picture may be set as a reference picture closest to the current picture in the L1 reference picture list or a reference picture with a predefined index.
If the distance (i.e., the absolute value of the POC difference) between the current picture and the L0 reference picture is different from the distance between the current picture and the L1 reference picture, the L0 motion vector may be scaled to derive the L1 motion vector. In this case, scaling may be performed on the basis of parameters derived based on the distance between the current picture and the L0 reference picture and the distance between the current picture and the L1 reference picture.
For motion information merge candidates, an insertion order and/or arrangement order based on bidirectional matching may be determined.
At this time, bidirectional matching may be applied only to motion information merge candidates having bidirectional motion information among the motion information merge candidates. In this case, the L0 reference block and the L1 reference block for the motion information merge candidate may be derived based on L0 motion information and L1 motion information of the motion merge candidate.
Alternatively, bidirectional matching may be applied not only to motion information merge candidates having bidirectional motion information among motion information merge candidates but also to motion information merge candidates having unidirectional motion information. As an example, if a motion information merge candidate has only L0 motion information, as in the example shown in
In
In the example shown in
In this case, only motion information merge candidates having bidirectional motion information can be rearranged. Specifically, the cost between an L0 reference block and an L1 reference block can be calculated for each of the motion information merge candidates with index 0, index 2, and index 4 having bidirectional motion information. Here, the L0 reference block is derived based on the L0 motion information of the motion information merge candidate, and the L1 reference block is derived based on the L1 motion information of the motion information merge candidate.
Thereafter, an arrangement order for the motion information merge candidates is determined by comparing costs. Then, the indices for the motion information merge candidates can be reassigned with reference to the determined arrangement order.
At this time, motion information merge candidates to which the bidirectional matching method is not applied, that is, the motion information merge candidates with index 1, index 3, and index 5 may be excluded from rearrangement targets. That is, the indices assigned to motion information merge candidates with unidirectional motion information may not be changed before and after realignment.
As an example, if the ascending order of the costs derived based on the bidirectional matching method is B0 (index 2), B5 (index 4), and A1 (index 1), the indices assigned to the motion information merge candidates having bidirectional motion in formation may be changed as in the example shown in
This means that the arrangement order of motion information merge candidates having unidirectional motion information is the same as the insertion order, whereas the arrangement order of motion information merge candidates having bidirectional motion information can be changed to be different from the insertion order.
As another example, the arrangement order of motion information merge candidates may be set such that motion information merge candidates having bidirectional motion information always precede motion information merge candidates having unidirectional motion information. In this case, the motion information merge list will be rearranged in the order of B0, B5, A, B1, A0, and Col. That is, an index assigned to a motion information merge candidate having unidirectional motion information may have a larger value than an index assigned to a motion information merge candidate having bidirectional motion information.
On the other hand, the arrangement order of motion information merge candidates may be set such that motion information merge candidates having unidirectional motion information always precede motion information merge candidates having bidirectional motion information. In this case, the motion information merge list will be rearranged in the order of B1, A0, Col, B0, B5, and A1. That is, an index assigned to a motion information merge candidate having unidirectional motion information may have a smaller value than an index assigned to a motion information merge candidate having bidirectional motion information.
As another example, the bidirectional matching method can be applied not only to motion information merge candidates having bidirectional motion information but also to motion information merge candidates having unidirectional motion information. In this case, the arrangement order for all motion information merge candidates can be determined by comparing the costs of motion information merge candidate.
As another example, the costs for motion information merge candidates having bidirectional motion information can be measured using the bidirectional matching method, and the costs for motion information merge candidates having unidirectional motion information can be measured using the template-based method. Thereafter, the arrangement order for the motion information merge candidates can be determined by comparing the costs for the motion information merge candidates.
Rearrangement of motion information merge candidates included in the motion information merge list may be performed only in a case where predefined conditions are satisfied. Here, the predefined conditions may include at least one of whether there is a motion information merge candidate having bidirectional motion information, whether the number of motion information merge candidates having bidirectional motion information is greater than a threshold value, whether the output order of the current picture falls between output orders of two reference pictures determined by bidirectional motion information, whether the distances between the current picture and the two reference pictures are the same, or whether weights for bidirectional prediction are the same.
The motion information merge list may include motion information merge candidates (i.e., spatial motion information merge candidates) derived from spatial neighboring blocks of the current block and motion information merge candidates (i.e., temporal motion information merge candidates) derived from temporal neighboring blocks.
If the number of motion information merge candidates included in the motion information merge list is less than a threshold value, additional motion information merge candidates may be created and inserted into the motion information merge list. Here, the threshold value may represent the maximum number of motion information merge candidates that can be included in the motion information merge list or a value obtained by subtracting an offset from the maximum number.
If the number of available spatial neighboring blocks and/or available temporal neighboring blocks is less than the threshold value, the motion information merge list does not include a sufficient number of motion information merge candidates.
In this case, a new motion information merge candidate may be generated on the basis of the motion information merge candidate (s) already included in the motion information merge list and added to the motion information merge list.
Specifically, a new motion information merge candidate may be generated by performing bidirectional matching or template-based motion estimation on the basis of the motion information merge candidate (s) included in the motion information merge list.
Bilateral matching may be performed on the basis of L0 motion information and L1 motion information of each motion information merge candidate included in the motion information merge list. As an example, the cost between an L0 reference block indicated by the L0 motion information of the first motion information merge candidate and an L1 reference block indicated by the L1 motion information of the second motion information merge candidate is calculated. For L0 motion information and L1 motion information combinations of two motion information merge candidates, the above cost calculation is repeatedly performed, and then a combination with the lower cost is inserted into the motion information merge list as a new motion information merge candidate.
As an example, in
For example, in the example shown in
For example, if the cost between the L0 reference block and the L1 reference block generated based on the L0 direction motion information with index 1 and the L1 direction motion information with index 2 is the smallest, this combination can be allocated to index 3 as a new motion information merge candidate.
However, if a combination to be newly added is the same as one of the motion information merge candidates already included in the motion information merge list (e.g., motion information merge candidates with indices 0 to 2), the combination is not added to the motion information merge list as a motion information merge candidate.
A new motion information merge candidate may also be generated using a template. The method of using a template differs from the aforementioned bidirectional matching method in that a motion information merge candidate is generated on the basis of costs between an L0 reference template, an L1 reference template, and the current template, rather than the cost between an L0 reference block and an L1 reference block.
An L0 reference template may be determined based on the L0 motion information of the first motion information merge candidate included in the motion information merge list, and an L1 reference template may be determined based on the L1 motion information of the second motion information merge candidate.
As an example, in
Thereafter, the cost can be calculated by performing an averaging or weighted sum operation based on the L0 reference template and the L1 reference template, and then subtracting the average or weighted sum result from the current template.
At this time, whether to apply the averaging operation or the weighted sum operation to the L0 reference template and the L1 reference template may be determined based on whether predefined conditions are satisfied. The predefined conditions may be determined based on at least one of whether at least one piece of combined motion information has been derived from a motion information merge candidate having bidirectional motion information, whether the output order of the current picture falls between the output order of the L0 reference picture and the output order of the L1 reference picture, whether the distance between the current picture and the L0 reference picture is the same as the distance between the current picture and the L1 reference picture, or whether weights applied to bidirectional prediction of motion information merge candidates are the same.
As described above, the cost for each of the L0 motion information and L1 motion information combinations is calculated, and then N combinations with lower costs among the combinations are added to the motion information merge list. Here, N represents the number of motion information merge candidates to be added to the motion information merge list.
The motion information merge candidates generated using the above-described bidirectional matching and/or template can be added to the motion information merge list after spatial and/or temporal motion information merge candidates are inserted into the motion information merge list, after a motion information merge candidate derived from a historical motion vector predictor (HMVP) table is inserted into the motion information merge list, or after pairwise or combined motion information merge candidates are inserted into the motion information merge list.
Applying the embodiments described focusing on the decoding process or the encoding process to the encoding process or the decoding process is included in the scope of the present disclosure. Modification of the embodiments described in a predetermined order to embodiments in an order different from the predetermined order is also included in the scope of the present disclosure.
Although the above disclosure has been described based on a series of steps or a flowchart, this does not limit the chronological order of the disclosure and the series of steps may be performed simultaneously or in a different order as needed. In addition, each of components (e.g., units, modules, etc.) constituting a block diagram in the above-described disclosure may be implemented as a hardware device or software, and a plurality of components may be combined to form a single hardware device or software. The above-described disclosure may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, and the like alone or in combination. Examples of computer-readable recording media include magnetic media such as a hard disk, a floppy disk, and a magnetic tape, optical recording media such as a CD-ROM and a DVD, magneto-optical media such as a floptical disks, and hardware devices specifically configured to store and execute program instructions, such as a ROM, a RAM, and a flash memory. The hardware devices may be configured to operate as one or more software modules to perform processing according to the present disclosure, and vice versa.
The present disclosure may be applied to a computing or electronic device which may encode/decode a video signal.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0123306 | Sep 2021 | KR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2022/013792 | 9/15/2022 | WO |