The present invention relates to image and video coding with affine motion compensation. In particular, the present invention relates to techniques to improve the coding efficiency or reduce the complexity of a video coding system implementing various coding modes including an affine mode.
In most coding standards, adaptive coding and Inter/Intra prediction are applied on a block basis. For example the basic block unit for video coding in the High Efficiency Video Coding (HEVC) system is called coding unit (CU). A CU may begin with a largest CU (LCU), which is also referred as coded tree unit (CTU). Once each LCU is recursively partitioned into leaf CUs, each leaf CU is further split into one or more prediction units (PUs) according to a prediction type and a PU partition mode. Pixels in a PU share the same prediction parameters.
For a current block processed by Inter prediction mode, block matching may be used to locate a reference block in a reference picture. The displacement between locations of the two blocks is determined as a motion vector (MV) for the current block. HEVC supports two different types of Inter prediction mode, one is advanced motion vector prediction (AMVP) mode and another is Merge mode. The MV of the current block is predicted by a motion vector predictor (MVP) corresponds to motion vector associated with spatial and temporal neighbors of the current block. A motion vector difference (MVD) between the MV and the MVP, as well as an index of the MVP are coded and transmitted for the current block coded in AMVP mode. In B slice, a syntax element inter_pred_idc is used to indicate the inter prediction direction. One MV is used to locate a predictor for the current block if the current block is coded in uni-directional prediction, while two MVs are used to locate predictors if the current block is coded in bi-directional prediction, so two MVDs and two indices of MVP are signaled for blocks coded in bi-directional prediction. In the case of multiple reference pictures, a syntax element ref_idx_10 is signaled to indicate which reference picture in list 0 is used, and a syntax element ref_idx_11 is signaled to indicate which reference picture in list 1 is used. In Merge mode, motion information of a current block including MV, reference picture index, and inter prediction direction is inherited from motion information of a final Merge candidate selected from a Merge candidate list. The Merge candidate list is constructed by motion information of spatially and temporally neighboring blocks of the current block, and a Merge index is signaled to indicate the final Merge candidate.
The block based motion compensation in HEVC assumes all pixels within a PU follow the same translational motion model by sharing the same motion vector; however, the translational motion model cannot capture complex motion such as rotation, zooming, and the deformation of moving objects. An affine transform model introduced in the literature provides more accurate motion-compensated prediction as the affine transform model is capable of describing two-dimensional block rotations as well as two-dimensional deformations to transform a rectangle into a parallelogram. This model can be described as follows:
x′=a*x+b*y+e, and
y′=c*x+d*y+f. (1)
where A(x,y) is an original pixel at location (x,y) under consideration, and A′ (x′,y′) is the corresponding pixel at location (x′,y′) in a reference picture for the original pixel A(x,y). A total of six parameters a, b, c, d, e, f, are used in this affine transform model, and this affine transform model describes the mapping between original locations and reference locations in six-parameter affine prediction. For each original pixel A(x,y), the motion vector (vx, vy) between this original pixel A(x,y) and its corresponding reference pixel A′(x′y′) is derived as:
vx=(1−a)*x−b*y−e, and
vy=(1−c)*x−d*y−f. (2)
The motion vector (vx, vy) of each pixel in the block is location dependent and can be derived by the affine motion model present in Equation (2) according to its location (x,y).
Various implementations of affine motion compensation have been disclosed in the literature. For example, sub-block based affine motion model is applied to derive a MV for each sub-block instead of each pixel to reduce the complexity of affine motion compensation. In a technical paper by Li el at. (“An Affine Motion Compensation Framework for High Efficiency Video Coding”, 2015 IEEE International Symposium on Circuits and Systems (ISCAS), May 2015, pages: 525-528), an affine flag is signaled for each 2N×2N block partition to indicate the use of affine motion compensation when the current block is coded either in Merge mode or AMVP mode. If this flag is true, the derivation of motion vectors for the current block follows the affine motion model, else the derivation of a motion vector for the current block follows the traditional translational motion model. Three MVs of three corner pixels are signaled when affine Inter mode (also known as affine AMVP mode or AMVP affine mode) is used. At each control point location, the MV is predictively coded by signaling a MVD of the control point. In another exemplary implementation, an affine flag is conditionally signaled depending on Merge candidates when the current block is coded in Merge mode. The affine flag indicates whether the current block is coded in affine Merge mode. The affine flag is only signaled when there is at least one Merge candidate being affine coded, and the first available affine coded Merge candidate is selected if the affine flag is true.
A four-parameter affine prediction is an alternative to the six-parameter affine prediction, which has two control points instead of three control points. An example of the four-parameter affine prediction is shown in
A method and apparatus for video encoding and decoding with affine motion compensation in a video coding system are disclosed. Embodiments of a video encoder or decoder according to the present invention receive input data associated with a current block in a current picture, and derive a first affine candidate for the current block if the current block is coded or to be coded in affine Merge mode. The input data associated with the current block includes a set of pixels at the video encoder side or the input data associated with the current block is a video bitstream corresponding to compressed data including the current block at the video decoder side The first affine candidate includes three affine motion vectors Mv0, Mv1, and Mv2 for predicting motion vectors at control points of the current block. Mv0 is derived from a motion vector of a first neighboring coded block of the current block, Mv1 is derived from a motion vector of a second neighboring coded block of the current block, and Mv2 is derived from a motion vector of a third neighboring coded block of the current block. An affine motion model is then derived according to the affine motion vectors Mv0, Mv1, and Mv2 of the first affine candidate if the first affine candidate is selected to encode or decode the current block. The current block is encoded or decoded by locating a reference block in a reference picture for the current block according to the affine motion model.
In an embodiment, the three neighboring coded blocks are an upper-left corner sub-block adjacent to the current block, a top-right sub-block above the current block, and a left-bottom sub-block beside the current block. In another embodiment, each of the affine motion vectors Mv0, Mv1, and Mv2 is a first available motion vector selected from a predetermined group of motion vectors of neighboring coded blocks. For example, Mv0 is a first available motion vector of motion vectors at an upper-left corner sub-block adjacent to the current block, a top-left sub-block above the current block, and a left-top sub-block beside the current block. Mv1 is a first available motion vector of motion vectors at a top-right sub-block above the current block and an upper-right corner sub-block adjacent to the current block. Mv2 is a first available motion vector of motion vectors at a left-bottom sub-block beside the current block and a lower-left corner sub-block adjacent to the current block.
In some embodiments, multiple affine candidates are used in affine Merge mode. For example, a second affine candidate including three affine motion vectors are also derived and inserted in a Merge candidate list, and if the second affine candidate is selected to encode or decode the current block, deriving the affine motion model according to the affine motion vectors of the second affine candidate. At least one affine motion vector in the second affine candidate is different from the corresponding affine motion vector in the first affine candidate.
An embodiment of the video encoder or decoder denotes the first affine candidate as not exist or unavailable if inter prediction directions or reference pictures of the three affine motion vectors Mv0, Mv1, and Mv2 are not all the same. The video encoder or decoder may derive a new affine candidate to replace the first affine candidate. If all the three affine motion vectors Mv0, Mv1, and Mv2 are available only in the first reference list, an inter prediction direction for the current block is set to uni-directional predicted and using only a first reference list. The first reference list is selected from list 0 and list 1. If reference pictures of the three affine motion vectors are not all the same, an embodiment scales the affine motion vectors Mv0, Mv1, and Mv2 in the first affine candidate to a designated reference picture; or if two affine motion vectors correspond to a same reference picture, the method scales the remaining affine motion vectors in the first affine candidate to set all reference pictures of the three affine motion vectors the same.
Aspects of the disclosure further provide a video encoder or decoder receives input data associated with a current block in a current picture, and derives an affine candidate for the current block if the current block is coded or to be coded in affine Inter mode. The affine candidate includes multiple affine motion vectors for predicting motion vectors at control points of the current block, and the affine motion vectors are derived from one or more neighboring coded blocks. If the affine candidate is selected to encode or decode the current block, the encoder or decoder derives an affine motion model according to the affine motion vectors of the affine candidate, and encodes or decodes the current block by locating a reference block in a current reference picture according to the affine motion model. The current reference picture is pointed by a reference picture index, and the current block is restricted to be coded in uni-directional prediction by disabling bi-directional prediction if the current block is coded or to be coded in affine Inter mode. The affine motion model computes motion based on three control points or a simplified affine motion model can be used which computes motion based on only two control points. In an embodiment, there is only one affine candidate in the candidate list, so the affine candidate is selected without signaling a motion vector predictor (MVP) index.
In some embodiments, one or more of the affine motion vectors in the affine candidate are scaled to the current reference picture pointed by the reference picture index if reference pictures of said one or more affine motion vectors are not the same as the current reference picture. An inter prediction direction flag is signaled to indicate a selected reference list if reference list 0 and reference list 1 of the current block are not the same, and the inter prediction direction flag is not signaled if reference list 0 and reference list 1 of the current block are the same.
Aspects of the disclosure further provide a non-transitory computer readable medium storing program instructions for causing a processing circuit of an apparatus to perform a video coding method with affine motion compensation. The video coding method includes encoding or decoding a current block according to an affine candidate including affine motion vectors derived from multiple neighboring coded blocks of the current block. The video coding method includes disabling bi-directional prediction for blocks coded or to be coded in affine Inter mode. Other aspects and features of the invention will become apparent to those with ordinary skill in the art upon review of the following descriptions of specific embodiments.
Various embodiments of this disclosure that are proposed as examples will be described in detail with reference to the following figures, wherein like numerals reference like elements, and wherein:
It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.
Reference throughout this specification to “an embodiment”, “some embodiments”, or similar language means that a particular feature, structure, or characteristic described in connection with the embodiments may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in an embodiment” or “in some embodiments” in various places throughout this specification are not necessarily all referring to the same embodiment, these embodiments can be implemented individually or in conjunction with one or more other embodiments.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. In the following discussion and in the claims, the term “including” and “comprising” are used in an open-ended fashion, and thus should be interpreted to mean “including, but not limited to . . . ”.
In order to improve the coding efficiency or reduce the system complexity associated with a video coding system with affine motion prediction, various methods and improvements of utilizing affine motion compensation in affine Merge mode or affine Inter mode are disclosed.
Affine Motion Derivation
An embodiment of the present invention demonstrates an improved affine motion derivation for sub-block based or pixel-based affine motion compensation. A first exemplary affine motion derivation method is for sub-block based six-parameter affine motion prediction with three control points, one at the upper-left corner, one at the upper-right corner, and one at the lower-left corner. Given three affine motion vectors Mv0, Mv1, and Mv2 representing motion vectors at the three control points of a current block denoted as: Mv0=(Mvx0, Mvy0), Mv1=(Mvx1, Mvy1), and Mv2=(Mvx2, Mvy2).
The current block has a width equals to BlkWidth and a height equals to BlkHeight, and is partitioned into sub-blocks, where each sub-block has a width equals to SubWidth and a height equals to SubHeight. The number of sub-blocks M in one row of the current block is M=BlkWidth/SubWidth, and the number of sub-blocks N in one column of the current block is N=BlkHeight/SubHeight. MV of a sub-block Mv(i,j) at ith sub-block column and jth sub-block row is (Mvx(i,j), Mvy(i,j)), where i=0, . . . , N−1, and j=0, . . . , M−1, is derived as:
Mvx(i,j)=Mvx0+(i+1)*deltaMvxVer+(j+1)*deltaMvxHor, and
Mvy(i,j)=Mvy0+(i+1)*deltaMvyVer+(j+1)*deltaMvyHor. (3)
where deltaMvxHor, deltaMvyHor, deltaMvxVer, deltaMvyVer are calculated as:
deltaMvxHor=(Mvx1−Mvx0)/M,
deltaMvyHor=(Mvy1-Mvy0)/M,
deltaMvxVer=(Mvx2−Mvx0)/N, and
deltaMvyVer=(Mvy2−Mvy0)/N. (4)
In another embodiment, MV of a sub-block Mv(i,j) at ith sub-block column and jth sub-block row is (Mvx(i,j), Mvy(i,j)), where i=0, . . . , N−1, and j=0, . . . , M−1, is derived as:
Mvx(i,j)=Mvx0+i*deltaMvxVer+j*deltaMvxHor, and
Mvy(i,j)=Mvy0+i*deltaMvyVer+j*deltaMvyHor. (5)
For applying the first exemplary affine motion derivation method to a pixel-based affine prediction, definitions for M and N in Equation (4) can be modified to represent the number of pixels in one row of the current block and the number of pixels in one column of the current block, in this case, M=BlkWidth and N=BlkHeight. The motion vector Mv(i,j) of each pixel at location (i,j) is (Mvx(i,j), Mvy(i,j)), and the motion vector at each pixel can also be derived by Equation (3) or Equation (5).
For a current block coded or to be coded in affine Inter mode or affine Merge mode, a final candidate is selected for predicting motions of the current block. The final candidate includes three affine motion vectors Mv0, Mv1 and Mv2 for predicting motions of the three control points of the current block. A motion vector at each pixel or each sub-block of the current block is calculated using the affine motion derivation method described in the embodiments of the present invention. A reference block in a reference picture is located according to the motion vectors of the current block and the reference block is used to encode or decode the current block.
Affine Merge Candidate Derivation
In some embodiments of the affine Merge candidate derivation method according to the present invention, the affine motion vectors Mv0, Mv1, and Mv2 of a single affine Merge candidate are derived from multiple neighboring coded blocks of a current block 20, for examples, Mv0 is derived from a top-left neighboring sub-block (sub-block a0, a1, or a2 in
Another embodiment of the affine Merge candidate derivation method constructs multiple affine Merge candidates and inserts the affine Merge candidates to the Merge candidate list. For example, a first affine Merge candidate includes affine motion vectors Mv0, Mv1, and Mv2, where Mv0 is a motion vector at sub-block a0 in
There are various modifications to improve the affine Merge candidate derivation method. A modification is to check whether inter prediction directions of the three affine motion vectors in the affine Merge candidate are the same, if the inter prediction directions are not all the same, this affine Merge candidate is denoted as not exist or unavailable. In one embodiment, a new affine Merge candidate is derived to replace this affine Merge candidate. Another modification is to check the availability of reference list 0 and reference list 1, and set the inter prediction direction of the current block accordingly. For example, if all three affine motion vectors Mv0, Mv1, and Mv2 are only available in reference list 0, then the current block is coded or to be coded in uni-directional prediction and using only reference list 0. If all three affine motion vectors Mv0, Mv1, and Mv2 are only available in reference list 1, then the current block is coded or to be coded in uni-directional prediction and using only reference list 1. A third modification is to check whether reference pictures of the affine motion vectors Mv0, Mv1, and Mv2 are different, if the reference pictures are not all the same, one embodiment is to denote the affine Merge candidate as not exist or as unavailable, another embodiment is to scale all the affine motion vectors to a designated reference picture, such as the reference picture with a reference index 0. If out of the three reference pictures of the affine motion vectors, two reference pictures are the same, the affine motion vector with a different reference picture can be scaled to the same reference picture.
The prediction residual signal is further processed by Transformation (T) 418 followed by Quantization (Q) 420. The transformed and quantized residual signal is then coded by Entropy Encoder 434 to form the encoded video bitstream. The encoded video bitstream is then packed with side information such as the motion information. The data associated with the side information are also provided to Entropy Encoder 434. When motion compensation prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. The transformed and quantized residual signal is processed by Inverse Quantization (IQ) 422 and Inverse Transformation (IT) 424 to recover the prediction residual signal of the reference picture or pictures. As shown in
A corresponding Video Decoder 500 for the Video Encoder 400 of
Various components of the Video Encoder 400 and the Video Decoder 500 in
Affine Inter Prediction
If a current block is coded in affine Inter mode, a candidate list is constructed using neighboring valid coded blocks. As shown in
In some embodiments of affine Inter prediction, there is only one candidate in the candidate list, so the affine candidate is always selected without signaling a motion vector predictor (MVP) index when affine Inter mode is selected to encode or decode the current block. Motions in the current block are derived by an affine motion model according to the affine motion vectors in the affine candidate and a reference block is located by the motion vectors of the current block. If a reference picture of a neighboring coded block used to derive the affine motion vector is not the same as the current reference picture of the current block, the affine motion vector is derived by scaling the corresponding motion vector of the neighboring code block.
According to some embodiments of the affine Inter prediction, only uni-directional prediction is allowed for blocks coded in affine Inter mode to reduce the system complexity. In other words, bi-directional prediction is disabled when a current block is coded or to be coded in affine Inter mode. Bi-directional prediction may be enabled when the current block is coded in affine Merge mode, Merge mode, AMVP mode or any combination thereof. In an embodiment, when reference list 0 and reference list 1 for the current block are the same, reference list 0 is used without signaling an inter prediction index inter_pred_idc; when reference list 0 and reference list 1 for the current block are different, the inter prediction index inter_pred_idc is signaled to indicate which list is used for the current block.
Various embodiments of the affine Inter prediction methods may be implemented in the Video Encoder 400 in
The above described affine motion derivation method, affine Merge prediction method, or affine Inter prediction method can be implemented using a simplified affine motion model, for example, two control points are used instead of three control points. An exemplary simplified affine motion model still uses the same mathematical equations for affine motion model but derives the affine motion vector Mv2 for a lower-left control point by the affine motion vectors Mv0 and Mv1. Alternatively, the affine motion vector Mv1 for an upper-right control point may be derived by the affine motion vectors Mv0 and Mv2, or the affine motion vector Mv0 for an upper left control point may be derived by the affine motion vectors Mv1 and Mv2.
Embodiments of the affine motion derivation method, affine Merge prediction method, or affine Inter prediction method may be implemented in a circuit integrated into a video compression chip or program code integrated into video compression software to perform the processing described above. For examples, the affine motion derivation method, affine Merge prediction method, or affine Inter prediction method may be realized in program code to be executed on a computer processor, a Digital Signal Processor (DSP), a microprocessor, or field programmable gate array (FPGA). These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention.
The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2016/075024 | Mar 2016 | CN | national |
The present invention claims priority to PCT Patent Application Serial No. PCT/CN2016/075024, filed on Mar. 1, 2016, entitled “Methods for Affine Motion Compensation”. The PCT Patent Application is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2017/074965 | 2/27/2017 | WO | 00 |