In conventional video coding formats, such as the H0.264/AVC (Advanced Video Coding) and H0.265/HEVC (High Efficiency Video Coding) standards, video frames in a sequence have their size and resolution recorded at the sequence-level in a header. Thus, in order to change frame resolution, a new video sequence must be generated, starting with an intra-coded frame, which carries significantly larger bandwidth costs to transmit than inter-coded frames. Consequently, although it is desirable to adaptively transmit a down-sampled, low resolution video over a network when network bandwidth becomes low, reduced or throttled, it is difficult to realize bandwidth savings while using conventional video coding formats, because the bandwidth costs of adaptively down-sampling offset the bandwidth gains.
Research has been conducted into supporting resolution changing while transmitting inter-coded frames. In the implementation of the AV1 codec, developed by AOM, a new frame type called a switch frame is provided, which may be transmitted having different resolution than that of previous frames. However, a switch frame is restricted in its usage, as motion vector coding of a switch frame cannot reference motion vectors of previous frames. Such references conventionally provide another way to reduce bandwidth costs, so the use of switch frames still sustains greater bandwidth consumption which offsets bandwidth gains.
Furthermore, existing motion coding tools perform motion compensation prediction (MCP) based on only translation motion models.
In the development of the next-generation video codec specification, VVC/H0.266, several new motion prediction coding tools are provided to further support motion vector coding which references previous frames, as well as MCP based on irregular types of motion other than translation motion. New techniques are required in order to implement resolution change in a bitstream with regard to these new coding tools.
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.
Systems and methods discussed herein are directed to enabling adaptive resolutions in video encoding, and more specifically to implementing determining a ratio of a resolution of a current frame to resolutions of one or more reference pictures to enable inter-frame adaptive resolution changes based on motion prediction coding tools provided for by the VVC/H0.266 standard.
According to example embodiments of the present disclosure implemented to be compatible with AVC, HEVC, VVC, and such video coding standards implementing motion prediction, a frame may be subdivided into macroblocks (MBs) each having dimensions of 16×16 pixels, which may be further subdivided into partitions. According to example embodiments of the present disclosure implemented to be compatible with the HEVC standard, a frame may be subdivided into coding tree units (CTUs), the luma and chroma components of which may be further subdivided into coding tree blocks (CTBs) which are further subdivided into coding units (CUs). According to example embodiments of the present disclosure implemented as other standards, a frame may be subdivided into units of NxN pixels, which may then be further subdivided into subunits. Each of these largest subdivided units of a frame may generally be referred to as a “block” for the purpose of this disclosure.
According to example embodiments of the present disclosure, a block may be subdivided into partitions having dimensions in multiples of 4×4 pixels. For example, a partition of a block may have dimensions of 8×4 pixels, 4×8 pixels, 8×8 pixels, 16×8 pixels, or 8×16 pixels.
According to example embodiments of the present disclosure, a motion prediction coding format may refer to a data format encoding motion information and prediction units (PUs) of a frame by the inclusion of one or more references to motion information and PUs of one or more other frames. Motion information may refer to data describing motion of a block structure of a frame or a unit or subunit thereof, such as motion vectors and references to blocks of a current frame or of another frame. PUs may refer to a unit or multiple subunits corresponding to a block structure among multiple block structures of a frame, such as an MB or a CTU, wherein blocks are partitioned based on the frame data and are coded according to established video codecs. Motion information corresponding to a prediction unit may describe motion prediction as encoded by any motion vector coding tool, including, but not limited to, those described herein.
Likewise, frames may be encoded with transform information by the inclusion of one or more transformation units (TUs). Transform information may refer to coefficients representing one of several spatial transformations, such as a diagonal flip, a vertical flip, or a rotation, which may be applied to a sub-block.
Sub-blocks of CUs such as PUs and TUs may be arranged in any combination of sub-block dimensions as described above. A CU may be subdivided into a residual quadtree (RQT), a hierarchical structure of TUs. The RQT provides an order for motion prediction and residual coding over sub-blocks of each level and recursively down each level of the RQT.
According to example embodiments of the present disclosure, motion prediction coding formats may include affine motion prediction coding and decoder-side motion vector refinement (DMVR). Features of these motion prediction coding formats relating to example embodiments of the present disclosure shall be described herein.
A video encoder according to motion prediction coding may obtain a picture from a video source and code the frame to obtain a reconstructed frame that may be output for transmission. Blocks of a reconstructed frame may be intra-coded or inter-coded.
Motion information of a CU of an affine motion prediction coding reconstructed frame may be predicted by affine motion compensated prediction. The motion information may include a plurality of motion vectors, including a plurality of control point motion vectors (CPMVs) and a derived motion vector. As illustrated by
A motion vector at sample location (x, y) may be derived from two control points by the operation:
A motion vector at sample location (x, y) may be derived from three control points by the operation:
The motion information may further be predicted by deriving motion information of a luma component of the block, and deriving motion information of a chroma component of the block by applying block-based affine transform upon motion information of the block.
As illustrated by
A chroma component of the block may be divided into chroma sub-blocks of 4x4 pixels, wherein each chroma sub-block may have four neighboring luma sub-blocks. For example, a neighboring luma sub-block may be a luma sub-block below, left of, right of, or above the chroma sub-block. For each chroma sub-block, a motion vector may be derived from an average of luma motion vectors of the neighboring luma sub-blocks.
A motion compensation interpolation filter may be applied to a derived motion vector of each sub-block to generation a motion prediction of each sub-block.
Motion information of a CU of an affine motion prediction coding reconstructed frame may include a motion candidate list. A motion candidate list may be a data structure containing references to multiple motion candidates. A motion candidate may be a block structure or a subunit thereof, such as a pixel or any other suitable subdivision of a block structure of a current frame, or may be a reference to a motion candidate of another frame. A motion candidate may be a spatial motion candidate or a temporal motion candidate. By applying motion vector compensation (MVC), a decoder may select a motion candidate from the motion candidate list and derive a motion vector of the motion candidate as a motion vector of the CU of the reconstructed frame.
According to example embodiments of the present disclosure wherein the affine motion prediction mode of an affine motion prediction coding reconstructed frame is an affine merge mode, CUs of the frame have both width and height greater than or equal to 8 pixels. The motion candidate list may be an affine merge candidate list and may include up to five CPMVP candidates. The coding of the CU may include a merge index. A merge index may refer to a CPMVP candidate of an affine merge
CPMVs of the current CU may be generated based on control point motion vector predictor (CPMVP) candidates derived from motion information of spatially neighboring blocks or temporally neighboring blocks to the current CU.
As shown by
Of the spatially neighboring blocks shown herein, block A0 may be a block left of the current CU 302; block A1 may be a block left of the current CU 302; block B0 may be a block above the current CU 302; block B1 may be a block above the current CU 302; and block B2 may be a block above the current CU 302. The relative positioning of each spatially neighboring block to the current CU 302, or relative to each other, shall not be further limited. There shall be no limitation as to relative sizes of each spatially neighboring block to the current CU 302 or to each other.
An affine merge candidate list for a CU of a frame coded according to an affine motion prediction mode which is an affine merge mode may include the following CPMVP candidates:
An inherited affine merge candidate may be derived from a spatially neighboring block having affine motion information, that is, a spatially neighboring block belonging to a CU having CPMVs.
A constructed affine merge candidate may be derived from spatially neighboring blocks and temporally neighboring blocks not having affine motion information, that is, CPMVs may be derived from spatially neighboring blocks and temporally neighboring blocks belonging to CUs having only translational motion information.
A zero motion vector may have a motion shift of (0, 0).
At most one inherited affine merge candidate may be derived from searching spatially neighboring blocks left of the current CU, and at most one inherited affine merge candidate may be derived from searching spatially neighboring blocks above the current CU. The left spatially neighboring blocks may be searched in the order of A0 and A1, and the above spatially neighboring blocks may be searched in the order of B0, B1, and B2, in each case for a first spatially neighboring block having affine motion information. In the case that such a first spatially neighboring block is found among the left spatially neighboring blocks, a CPMVP candidate is derived from the CPMVs of the first spatially neighboring block and added to the affine merge candidate list. In the case that such a first spatially neighboring block is found among the above spatially neighboring blocks, a CPMVP candidate is derived from the CPMVs of the first spatially neighboring block and added to the affine merge candidate list. In the case that two CPMVP candidates are derived in this manner, no pruning check among the derived CPMVP candidates is performed, that is, the two derived CPMVP candidates are not checked as to whether they are the same CPMVP candidate.
When block A is coded according to a 6-parameter affine model, CU 404 may, additionally, have the following affine motion information:
The following blocks may be referenced in deriving CPMVs:
CPMV1 may be derived by searching the spatially neighboring blocks B2, B3, and A2 in this order and selecting the first available spatially neighboring block in accordance with criteria found in relevant technology, details of which shall not be elaborated herein.
CPMV2 may be derived by searching the spatially neighboring blocks B1 and B0 in this order and likewise selecting the first available spatially neighboring block.
CPMV3 may be derived by searching the spatially neighboring blocks A1 and A0 in this order and likewise selecting the first available spatially neighboring block.
CPMV4 may be derived from the temporally neighboring block T if it is available.
A constructed affine merge candidate may be constructed using the first available combination, in the order given, of CPMVs of the current CU 502 among the following combinations:
In the cases that a combination of three CPMVs is used, a 6-parameter affine merge candidate is generated. In the cases that a combination of two CPMVs is used, a 4-parameter affine merge candidate is generated. The constructed affine merge candidate is then added to the affine merge candidate list.
For a block not having affine motion information, such as a block belonging to a CU coded according to a Temporal Motion Vector Predictor (TMVP) coding format, the coding of the CU may include an inter prediction indicator. An inter prediction indicator may indicate list 0 prediction in reference to a first reference picture list referred to as list 0, list 1 prediction in reference to a second reference picture list referred to as list 1, or bi-prediction in reference to two reference picture lists referred to as, respectively, list 0 and list 1. In the cases of the inter prediction indicator indicating list 0 prediction or list 1 prediction, the coding of the CU may include a reference index referring to a reference picture of the reference frame buffer referenced by list 0 or by list 1, respectively. In the case of the inter prediction indicator indicating bi-prediction, the coding of the CU may include a first reference index referring to a first reference picture of the reference frame buffer referenced by list 0, and a second reference index referring to a second reference picture of the reference frame referenced by list 1.
The inter prediction indicator may be coded as a flag in a slice header of an inter-coded frame. The reference index or indices may be coded in a slice header of an inter-coded frame. One or two motion vector differences (MVDs) respectively corresponding to the reference index or indices may further be coded.
In the case that, in a particular combination of CPMVs as described above, reference indices of CPMVs are different, that is, CMPVs may be derived from CUs referencing different reference pictures which may have different resolutions, the particular combination of CPMVs may be discarded and not used.
After adding any derived inherited affine merge candidates and any constructed affine merge candidates to the affine merge candidate list for the CU, zero motion vectors, that is, motion vectors indicating a motion shift of (0, 0), are added to any remaining empty positions of the affine merge candidate list.
According to example embodiments of the present disclosure wherein the affine motion prediction mode of an affine motion prediction coding reconstructed frame is an affine adaptive motion vector prediction (AMVP) mode, CUs of the frame have both width and height greater than or equal to 16 pixels. The applicability of AMVP mode, and whether a 4-parameter affine motion model or a 6-parameter affine motion model is used, may be signaled by bit-level flags carried in a video bitstream carrying the coded frame data. The motion candidate list may be an AMVP candidate list and may include up to two AMVP candidates.
CPMVs of the current CU may be generated based on AMVP candidates derived from motion information of spatially neighboring blocks to the current CU.
An AMVP candidate list for a CU of a frame coded according to an affine motion prediction mode which is an AMVP mode may include the following CPMVP candidates:
An inherited AMVP candidate may be derived in the same fashion as that for deriving an inherited affine merge candidate, except that each spatially neighboring block searched for deriving the inherited AMVP candidate belongs to a CU referencing a same reference picture as the current CU. No pruning check is performed between an inherited AMVP candidate and the AMVP candidate list while adding the inherited AMVP candidate to the AMVP candidate list.
A constructed AMVP candidate may be derived in the same fashion as that for deriving a constructed affine merge candidate, except that selecting the first available spatially neighboring block is further performed in accordance with the criteria that the first available spatially neighboring block that is inter-coded and having a reference index referencing a same reference picture as the current CU is selected. Moreover, in accordance with implementations of AMVP wherein temporal control points are not supported, temporally neighboring blocks may not be searched.
In the case that the current CU is coded by a 4-parameter affine motion model, and CPMV1 and CPMV2 of the current CU are available, CPMV1 and CPMV2 are added to the AMVP candidate list as one candidate. In the case that the current CU is coded by a 6-parameter affine motion model, and CPMV1, CPMV2, and CPMV3 of the current CU are available, CPMV1, CPMV2, and CPMV3 are added to the AMVP candidate list as one candidate. Otherwise, a constructed AMVP candidate is not available to be added to the AMVP candidate list.
A translational motion vector may be a motion vector from a spatially neighboring block belonging to a CU having only translational motion information.
A zero motion vector may have a motion shift of (0, 0).
After adding any derived inherited affine merge candidates and any constructed affine merge candidates to the affine merge candidate list for the CU, CPMV1, CPMV2, and CPMV3, in accordance with respective availability, are added to the AMVP candidate list in the given order as translational motion vectors to predict all CPMVs of the current CU. Then, zero motion vectors, that is, motion vectors indicating a motion shift of (0, 0), are added to any remaining empty positions of the AMVP candidate list.
Motion information predicted in accordance with DMVR may be predicted by bi-prediction. Bi-prediction may be performed upon a current frame such that motion information of a block of a reconstructed frame may include a reference to a first motion vector of a first reference block and a second motion vector of a second reference block, the first reference block having a first temporal distance from the current block and the second reference block having a second temporal distance from the current block. The first temporal distance and the second temporal distance may be in different temporal directions from the current block.
The first motion vector may be a motion vector of a block of a first reference picture of a first reference picture list referred to as list 0, and the second motion vector may be a motion vector of a block of a second reference picture of a second reference picture list referred to as list 1. The coding of the CU to which the current block belongs may include a first reference index referring to a first reference picture of the reference frame referenced by list 0, and a second reference index referring to a second reference picture of the reference frame referenced by list 1.
In a second step of the DMVR bi-prediction process, the template 610 is compared to a first sample region of the first reference picture 604 proximate to the initial first block 602 and a second sample region of the second reference picture 608 proximate to the initial second block 606 by a cost measurement. The cost measurement may utilize suitable measures of image similarity such as a sum of absolute differences or a mean removed sum of absolute differences. Within the first sample region, if a subsequent first block 614 has a minimum cost measured against the template, a subsequent first motion vector mv0′ referencing the subsequent first block 614 may replace the initial first motion vector mv0. Within the second sample region, if a subsequent second block 616 has a minimum cost measured against the template, a subsequent second motion vector mv1′ referencing the subsequent second block 616 may replace the initial second motion vector mv1. Bi-prediction may then be performed for the current block 612 using mv0′ and mv1′.
In a video coding process 700, a picture from a video source 702 may be encoded to generate a reconstructed frame, and the reconstructed frame may be output at a destination such as a reference frame buffer 704 or a transmission buffer. The picture may be input into a coding loop, which may include the steps of inputting the picture into a first in-loop up-sampler or down-sampler 706, generating an up-sampled or down-sampled picture, inputting the up-sampled or down-sampled picture into a video encoder 708, generating a reconstructed frame based on a previous reconstructed frame of the reference frame buffer 704, inputting the reconstructed frame into one or more in-loop filters 710, and outputting the reconstructed frame from the loop, which may include inputting the reconstructed frame into a second up-sampler or down-sampler 714, generating an up-sampled or down-sampled reconstructed frame, and outputting the up-sampled or down-sampled reconstructed frame into the reference frame buffer 704 or into a transmission buffer to be transmitted to a bitstream.
In a video decoding process 720, a coded frame is obtained from a source such as a bitstream 721. According to example embodiments of the present disclosure, given a current frame having position N in the bitstream 721, a previous frame having position N-1 in the bitstream 721 may have a resolution larger than or smaller than a resolution of current frame, and a next frame having position N+1 in the bitstream 721 may have a resolution larger than or smaller than the resolution of the current frame. The current frame may be input into a coding loop, which may include the steps of inputting the current frame into a video decoder 722, inputting the current frame into one or more in-loop filters 724, inputting the current frame into a third in-loop up-sampler or down-sampler 728, generating an up-sampled or down-sampled reconstructed frame, and outputting the up-sampled or down-sampled reconstructed frame into the reference frame buffer 704. Alternatively, the current frame may be output from the loop, which may include outputting the up-sampled or down-sampled reconstructed frame into a display buffer.
According to example embodiments of the present disclosure, the video encoder 708 and the video decoder 722 may each implement a motion prediction coding format, including, but not limited to, those coding formats described herein. Generating a reconstructed frame based on a previous reconstructed frame of the reference frame buffer 704 may include inter-coded motion prediction as described herein, wherein the previous reconstructed frame may be an up-sampled or down-sampled reconstructed frame output by the in-loop up-sampler or down-sampler 714/728, and the previous reconstructed frame serves as a reference picture in inter-coded motion prediction as described herein.
According to example embodiments of the present disclosure, motion prediction information may include a motion vector identifying a predictor block. A motion vector may be a displacement vector representing a displacement between a current block and a predictor block that is referenced for coding of the current block. Displacement may be measured in pixels in a horizontal direction and a vertical direction over a current frame. The displacement vector may represent a displacement between a pixel of the current block and a corresponding pixel of the predictor block at the same positions within the respective blocks. For example, the displacement vector may represent a displacement from a pixel at an upper-left corner of the current block to a pixel at an upper-left corner of the predictor block.
Inter-coded motion prediction may add a block of a current frame and a motion vector of the current frame to locate a predictor block. For example, while decoding a block of the current frame, given a coordinate of a pixel at an upper-left corner of the block of the current frame, a motion vector may indicate a coordinate of a predictor block of a reference frame from which motion information should be derived for the block of the current frame. The coordinate of the predictor block of the reference frame may be located by adding the motion vector to the coordinate of the block of the current frame, assuming that the current frame and the reference frame have the same resolution such that pixels correspond one-to-one between the current frame and the reference frame.
Moreover, motion prediction may support accuracy to an integer pixel scale or to a sub-pixel scale. For example, according to example embodiments of the present disclosure implemented according to the HEVC standard, motion prediction may be accurate to a quarter-pixel scale, such that an interpolation filter is applied to a frame to interpolate the frame by a factor of 4. That is, between every two pixels of the frame, three pixels are generated as sub-pixel picture information. An interpolation filter by a factor of 4 may be implemented as, for example, a 7-tap bilinear filter and an 8-tap Discrete Cosine Transform (DCT)-based finite impulse response (FIR) filter. Interpolation may occur in a first stage wherein interpolation is performed to half-pixel accuracy, such that a first interpolation filter is applied to the frame to interpolate the frame by a factor of 2, and then a second stage wherein interpolation is performed to a quarter-pixel accuracy. Motion prediction accuracy to a sub-pixel scale may increase quality of compressed frames over motion prediction accuracy to an integer pixel scale, but at the cost of increased computational cost and computing time for each pixel interpolated.
According to example embodiments of the present disclosure, upon a video decoder determining that a resolution of a reference frame is different from a resolution of a current frame, the video decoder may translate motion information of the current frame, including motion vectors. To translate motion information of the current frame, the video decoder may determine a ratio of a resolution of the current frame to a resolution of the reference frame.
At step 802, a video decoder may obtain a current frame of a bitstream encoded by affine motion prediction coding, wherein an affine merge mode or AMVP mode may further be enabled according to bitstream signals. The current frame may have a position N. A previous frame having position N-1 in the bitstream may have a resolution larger than or smaller than a resolution of the current frame, and a next frame having position N+1 in the bitstream may have a resolution larger than or smaller than the resolution of the current frame.
At step 804, the video decoder may obtain one or more reference pictures from a reference frame buffer and compare resolutions of the one or more reference pictures to a resolution of the current frame.
At step 806, upon the video decoder determining that one or more resolutions of the one or more reference pictures are different from the resolution of the current frame, the video decoder may select a frame from the reference frame buffer having a same resolution as the resolution of the current frame, if available.
According to example embodiments of the present disclosure, the frame having a same resolution as the resolution of the current frame may be a most recent frame of the reference frame buffer having a same resolution as the resolution of the current frame, which may not be the most recent frame of the reference frame buffer.
At step 808, the video decoder may determine a ratio of the resolution of the current frame to the resolutions of the one or more reference pictures.
At step 810, the video decoder determines a motion vector of the block of the current frame, and calculates a pixel coordinate indicated by the motion vector of the block of the current frame.
The motion vector may be determined in accordance with steps of motion prediction. Steps of performing motion prediction determining motion vectors shall not be described in detail herein, but may include, for example, deriving a motion candidate list for the block of the current frame; selecting a motion candidate from the derived motion candidate list or merging candidate list; and deriving a motion vector of the motion candidate as a motion vector of the block of the current frame.
A decoder may decode a frame on a per-block basis in a coding order among blocks of the frame, such as a raster scan order wherein a first-decoded block is an uppermost and leftmost block of the frame, according to video encoding standards.
As illustrated by
At step 812, the video decoder translates motion information of the block of the current frame to a resolution of the reference frame in accordance with the ratio.
According to example embodiments of the present disclosure, translating motion information may include, after adding the motion vector to the block coordinate, multiplying the resulting coordinate by a ratio factor to derive a translated coordinate indicated by the motion vector.
As illustrated by
At step 814, the video decoder locates a predictor block of the reference frame in accordance with the translated motion information.
Translated motion information, by itself, may be insufficient for locating a predictor block of the reference frame. Particularly, translated coordinates indicated by the motion vector may be in proportion to a resolution of the reference frame, but may not correspond to an integer pixel coordinate of the reference frame; may, additionally, not correspond to a half-pixel coordinate of the reference frame in the case that the video decoder implements half-pixel motion prediction; and may, additionally, not correspond to a quarter-pixel coordinate of the reference frame in the case that the video decoder implements quarter-pixel motion prediction. Thus, the video decoder may further round the translated coordinate of the block to a nearest pixel scale or sub-pixel scale supported by the video decoder.
As illustrated by
In the case that the translated coordinates are already at sub-pixel accuracy, rounding may be unnecessary, and the video decoder may locate the predictor block directly at the translated coordinates at the reference frame.
According to other example embodiments of the present disclosure, in the case that the translated coordinates do not correspond to any level of accuracy supported by the video encoder, the video decoder may nevertheless not round the translated coordinates to the highest granularity level of accuracy supported by the video decoder. Instead, the video decoder may round the translated coordinates to a lower granularity level of accuracy than the highest level supported.
At step 816, in the cases that the translated coordinates are at sub-pixel accuracy or are rounded to sub-pixel accuracy, the video decoder applies an interpolation filter to a block at the translated coordinates at the reference frame to generate sub-pixel values of the predictor block. The interpolation filter may be applied as described above, and, furthermore, in the cases that the translated coordinates are at half-pixel accuracy or are rounded to half-pixel accuracy, only the first stage of interpolation as described above may be performed, skipping the second stage, therefore reducing computational costs and computing time of decoding.
In the cases that the translated coordinates are at integer pixel accuracy or are rounded to integer pixel accuracy, the video decoder does not need to apply an interpolation filter to pixels of the reference block, and step 816 may be skipped with pixels at a block at the translated coordinates at the reference frame being used directly in motion prediction. Avoidance of application of the interpolation filter may greatly reduce computational costs and computing time of decoding.
Similarly, in the case that the video decoder rounds the translated coordinates to a lower granularity level of accuracy than the highest level supported as described with regard to step 814, computational costs and computing time may be likewise reduced.
Subsequently, whether step 816 is performed or skipped, the video decoder may decode the current block by reference to the reference frame and the located predictor block therein. The decoded frames may be input into at least one of a reference frame buffer and a display buffer.
At step 818, the video decoder may further translate inter predictors of the one or more reference pictures in accordance with the ratio.
According to example embodiments of the present disclosure, inter predictors may be, for example, motion information for motion prediction referencing other reference pictures which may have different resolutions.
At step 820, the video decoder may derive an affine merge candidate list or an AMVP candidate list for a block of the current frame. The derivation of an affine merge candidate list or an AMVP candidate list may be performed in accordance with aforementioned steps described herein. The derivation of CPMVP candidates or AMVP candidates in the derivation of an affine merge candidate list or an AMVP candidate list, respectively, may further be performed in accordance with aforementioned steps described herein.
At step 822, the video decoder may select a CPMVP candidate or AMVP candidate from the affine merge candidate list or the AMVP candidate list and derive a motion vector of the CPMVP candidate or AMVP candidate as a motion vector of the block of the reconstructed frame, in accordance with aforementioned steps described herein.
At step 824, the video decoder may generate a reconstructed frame from the current frame based on the one or more reference pictures and the selected CPMVP or AMVP candidate.
The reconstructed frame may be predicted by reference to a selected reference picture having the same resolution as the current frame, or by motion vectors or inter predictors of other frames of the reference frame buffer.
At step 826, the reconstructed frame may be input into at least one of the reference frame buffer and a display buffer.
In the case where the reconstructed frame is input into the reference frame buffer, the reconstructed frame may be obtained as a reference picture and subsequently utilized as described with regard to step 806 above in a subsequent iteration of a coding loop.
At step 1002, a video decoder may obtain a current frame of a bitstream. The current frame may have a position N. A previous frame having position N-1 in the bitstream may have a resolution larger than or smaller than a resolution of current frame, and a next frame having position N+1 in the bitstream may have a resolution larger than or smaller than the resolution of the current frame.
At step 1004, the video decoder may obtain one or more reference pictures from a reference frame buffer and compare resolutions of the one or more reference pictures to a resolution of the current frame.
At step 1006, upon the video decoder determining that one or more resolutions of the one or more reference pictures are different from the resolution of the current frame, the video decoder may select a frame from the reference frame buffer having a same resolution as the resolution of the current frame, if available.
According to example embodiments of the present disclosure, the video decoder may select a frame from the reference frame buffer having a same resolution as the resolution of the current frame. The frame having a same resolution as the resolution of the current frame may be a most recent frame of the reference frame buffer having a same resolution as the resolution of the current frame, which may not be the most recent frame of the reference frame buffer.
At step 1008, the video decoder may determine a ratio of the resolution of the current frame to the resolutions of the one or more reference pictures; and translate pixel patterns of the one or more reference pictures in accordance with the ratio.
According to example embodiments of the present disclosure, translating pixel patterns of the one or more reference pictures may facilitate vector refinement processes at different resolutions according to DMVR, such as, for example, the above-mentioned step of comparing a template to a first sample region of a first reference picture proximate to an initial first block and a second sample region of a second reference picture proximate to an initial second block by a cost measurement.
At step 1010, the video decoder may perform bi-prediction and vector refinement upon the current frame based on a first reference frame and a second reference frame of the reference frame buffer, in accordance with aforementioned steps described herein.
At step 1012, the video decoder may generate a reconstructed frame from the current frame based on the first reference frame and the second reference frame.
The reconstructed frame may be predicted by reference to a selected reference picture having the same resolution as the current frame or by pixel patterns of other frames of the reference frame buffer being translated in accordance with a same resolution as the current frame.
At step 1014, the reconstructed frame may be input into at least one of the reference frame buffer and a display buffer.
In the case where the reconstructed frame is input into the reference frame buffer, the reconstructed frame may be obtained as a reference picture and subsequently utilized as described with regard to step 906 above in a subsequent iteration of a coding loop.
The techniques and mechanisms described herein may be implemented by multiple instances of the system 1100 as well as by any other computing device, system, and/or environment. The system 1100 shown in
The system 1100 may include one or more processors 1102 and system memory 1104 communicatively coupled to the processor(s) 1102. The processor(s) 1102 may execute one or more modules and/or processes to cause the processor(s) 1102 to perform a variety of functions. In some embodiments, the processor(s) 1102 may include a central processing unit (CPU), a graphics processing unit (GPU), both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor(s) 1102 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
Depending on the exact configuration and type of the system 1100, the system memory 1104 may be volatile, such as RAM, non-volatile, such as ROM, flash memory, miniature hard drive, memory card, and the like, or some combination thereof. The system memory 1104 may include one or more computer-executable modules 1106 that are executable by the processor(s) 1102.
The modules 1106 may include, but are not limited to, a decoder module 1108. The decoder module 1108 may include a frame obtaining module 1110, a reference picture obtaining module 1112, a frame selecting module 1114, a candidate list deriving module 1116, a motion predicting module 1118, a reconstructed frame generating module 1120, a ratio determining module 1122, a translating module 1124, an inter predictor translating module 1126, and a buffer inputting module 1128.
The frame obtaining module 1110 may be configured to obtain a current frame of a bitstream encoded in an affine motion prediction coding format as abovementioned with reference to
The reference picture obtaining module 1112 may be configured to obtain one or more reference pictures from a reference frame buffer and compare resolutions of the one or more reference pictures to a resolution of a current frame as abovementioned with reference to
The frame selecting module 1114 may be configured to select a frame from the reference frame buffer having a same resolution as the resolution of the current frame, upon the reference picture obtaining module 1112 determining that one or more resolutions of the one or more reference pictures are different from the resolution of the current frame, as abovementioned with reference to
The candidate list deriving module 1116 may be configured to derive an affine merge candidate list or an AMVP candidate list for a block of the current frame, as abovementioned with reference to
The motion predicting module 1118 may be configured to select a CPMVP or AMVP candidate from the derived affine merge candidate list or AMVP candidate list and derive a motion vector of the CPMVP or AMVP candidate as a motion vector of the block of the reconstructed frame, as abovementioned with reference to
The reconstructed frame generating module 1120 may be configured to generate a reconstructed frame from the current frame based on the one or more reference pictures and the selected motion candidate.
The ratio determining module 1122 may be configured to determine a ratio of the resolution of the current frame to the resolutions of the one or more reference pictures.
The translating module 1124 may be configured to translate motion vectors and inter predictors of the one or more reference pictures in accordance with the ratio.
The inter predictor translating module 1126 may be configured to translate inter predictors of the one or more reference pictures in accordance with the ratio.
The buffer inputting module 1128 may be configured to input the reconstructed frame into at least one of the reference frame buffer and a display buffer, as abovementioned with reference to
The system 1100 may additionally include an input/output (I/O) interface 1140 for receiving bitstream data to be processed, and for outputting reconstructed frames into a reference frame buffer and/or a display buffer. The system 1100 may also include a communication module 1150 allowing the system 1100 to communicate with other devices (not shown) over a network (not shown). The network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
The techniques and mechanisms described herein may be implemented by multiple instances of the system 1200 as well as by any other computing device, system, and/or environment. The system 1200 shown in
The system 1200 may include one or more processors 1202 and system memory 1204 communicatively coupled to the processor(s) 1202. The processor(s) 1202 may execute one or more modules and/or processes to cause the processor(s) 1202 to perform a variety of functions. In some embodiments, the processor(s) 1202 may include a central processing unit (CPU), a graphics processing unit (GPU), both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor(s) 1202 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
Depending on the exact configuration and type of the system 1200, the system memory 1204 may be volatile, such as RAM, non-volatile, such as ROM, flash memory, miniature hard drive, memory card, and the like, or some combination thereof. The system memory 1204 may include one or more computer-executable modules 1206 that are executable by the processor(s) 1202.
The modules 1206 may include, but are not limited to, a decoder module 1208. The decoder module 1208 may include a frame obtaining module 1210, a reference picture obtaining module 1212, a bi-predicting module 1214, a vector refining module 1216, a ratio determining module 1218, a pixel pattern translating module 1220, and a buffer inputting module 1222.
The frame obtaining module 1210 may be configured to obtain a current frame of a bitstream as abovementioned with reference to
The reference picture obtaining module 1212 may be configured to obtain one or more reference pictures from a reference frame buffer and compare resolutions of the one or more reference pictures to a resolution of a current frame as abovementioned with reference to
The bi-predicting module 1214 may be configured to perform bi-prediction upon the current frame based on a first reference frame and a second reference frame of the reference frame buffer, as abovementioned with reference to
The vector refining module 1216 may be configured to perform vector refinement during the bi-prediction process based on a first reference frame and a second reference frame of the reference frame buffer, as abovementioned with reference to
The reconstructed frame generating module 1218 may be configured to generate a reconstructed frame from the current frame based on the first reference frame and the second reference frame.
The ratio determining module 1220 may be configured to determine a ratio of the resolution of the current frame to the resolutions of the one or more reference pictures.
The pixel pattern translating module 1222 may be configured to translate pixel patterns of the one or more reference pictures in accordance with the ratio.
The buffer inputting module 1224 may be configured to input the reconstructed frame into at least one of the reference frame buffer and a display buffer as abovementioned with reference to
The system 1200 may additionally include an input/output (I/O) interface 1240 for receiving bitstream data to be processed, and for outputting reconstructed frames into a reference frame buffer and/or a display buffer. The system 1200 may also include a communication module 1250 allowing the system 1200 to communicate with other devices (not shown) over a network (not shown). The network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
Some or all operations of the methods described above can be performed by execution of computer-readable instructions stored on a computer-readable storage medium, as defined below. The term “computer-readable instructions” as used in the description and claims, include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
The computer-readable storage media may include volatile memory (such as random-access memory (RAM)) and/or non-volatile memory (such as read-only memory (ROM), flash memory, etc.). The computer-readable storage media may also include additional removable storage and/or non-removable storage including, but not limited to, flash memory, magnetic storage, optical storage, and/or tape storage that may provide non-volatile storage of computer-readable instructions, data structures, program modules, and the like.
A non-transient computer-readable storage medium is an example of computer-readable media. Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media. Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, phase change memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer-readable storage media do not include communication media.
The computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to
By the abovementioned technical solutions, the present disclosure provides inter-coded resolution-adaptive video coding supported by motion prediction coding formats, improving the video coding process under multiple motion prediction coding formats by enabling resolution changes between frames to be coded while allowing motion vectors to reference previous frames. Thus, the bandwidth savings of inter-coding are maintained, and the bandwidth savings of motion prediction coding are realized allowing reference frames to be used to predict motion vectors of subsequent frames, all at the same time, achieving substantial improvement of network costs during video coding and content delivery while reducing the transport of additional data that would offset or compromise these savings.
A. A method comprising: obtaining a current frame of a bitstream; obtaining one or more reference pictures from a reference frame buffer having resolutions different from a resolution of the current frame; translating an inter predictor of the one or more reference pictures; and generating a reconstructed frame from the current frame based on the one or more reference pictures and motion information of one or more blocks of the current frame, the motion information including at least one inter predictor.
B. The method as paragraph A recites, further comprising: comparing resolutions of the one or more reference pictures to a resolution of the current frame; upon determining that one or more resolutions of the one or more reference pictures are different from the resolution of the current frame, selecting a frame from the reference frame buffer having a same resolution as the resolution of the current frame; and determining a ratio of the resolution of the current frame to the resolutions of the one or more reference pictures; translating the inter predictor of the one or more reference pictures in accordance with the ratio; and translating motion vectors of the one or more reference pictures in accordance with the ratio.
C. The method as paragraph A recites, further comprising: deriving an affine merge candidate list or an AMVP candidate list for the current frame; selecting a CPMVP candidate or an AMVP candidate from the affine merge candidate list or the AMVP candidate list, respectively; and deriving a motion vector of the motion candidate as a motion vector of the block of the reconstructed frame.
D. The method as paragraph C recites, further comprising: deriving at least one of an inherited affine merge candidate and a constructed affine merge candidate, and adding the at least one of an inherited affine merge candidate and a constructed affine merge candidate to the affine merge candidate list or the AMVP candidate list.
E. The method as paragraph A recites, further comprising: generating a reconstructed frame from the current frame based on the one or more reference pictures and at least one inter predictor.
F. A method comprising: obtaining a current frame of a bitstream; obtaining one or more reference pictures from a reference frame buffer and comparing resolutions of the one or more reference pictures to a resolution of the current frame; and upon determining that one or more resolutions of the one or more reference pictures are different from the resolution of the current frame, translating a pixel pattern of the one or more reference pictures in accordance with the resolution of the current frame.
G. The method as paragraph F recites, further comprising performing bi-prediction upon the current frame based on a first reference frame and a second reference frame of the reference frame buffer.
H. The method as paragraph G recites, wherein performing bi-prediction upon the current frame further comprises performing vector refinement upon the current frame based on a first reference frame and a second reference frame of the reference frame buffer.
I. The method as paragraph H recites, further comprising generating a reconstructed frame from the current frame based on the first reference frame and the second reference frame.
J. A system comprising: one or more processors and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including: a frame obtaining module configured to obtain a current frame of a bitstream; and a reference picture obtaining module configured to obtain one or more reference pictures from a reference frame buffer and compare resolutions of the one or more reference pictures to a resolution of a current frame.
K. The system as paragraph J recites, further comprising: a frame selecting module configured to select a frame from the reference frame buffer having a same resolution as the resolution of the current frame, upon the reference picture obtaining module determining that one or more resolutions of the one or more reference pictures are different from the resolution of the current frame.
L. The system as paragraph K recites, further comprising: a candidate list deriving module configured to derive an affine merge candidate list or an AMVP candidate list for a block of the current frame.
M. The system as paragraph L recites, further comprising a motion predicting module configured to select a CPMVP or AMVP candidate from the derived affine merge candidate list or AMVP candidate list, respectively.
N. The system as paragraph M recites, wherein the motion predicting module is further configured to derive a motion vector of the CPMVP or AMVP candidate as a motion vector of the block of the reconstructed frame.
O. The system as paragraph J recites, further comprising: a reconstructed frame generating module configured to generate a reconstructed frame from the current frame based on the one or more reference pictures and the selected motion candidate; a ratio determining module configured to determine a ratio of the resolution of the current frame to the resolutions of the one or more reference pictures; a translating module configured to translate inter predictors of the one or more reference pictures in accordance with the ratio; and a buffer inputting module configured to input the reconstructed frame into at least one of the reference frame buffer and a display buffer.
P. A system comprising: one or more processors and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including: a frame obtaining module configured to obtain a current frame of a bitstream; and a reference picture obtaining module configured to obtain one or more reference pictures from a reference frame buffer and compare resolutions of the one or more reference pictures to a resolution of a current frame.
Q. The system as paragraph P recites, further comprising: a bi-predicting module configured to performs bi-prediction upon the current frame based on a first reference frame and a second reference frame of the reference frame buffer.
R. The system as paragraph Q recites, further comprising: a vector refinement module configured to perform vector refinement during the bi-prediction process based on a first reference frame and a second reference frame of the reference frame buffer.
S. The system as paragraph R recites, further comprising: a reconstructed frame generating module configured to generate a reconstructed frame from the current frame based on the first reference frame and the second reference frame; and a buffer inputting module configured to input the reconstructed frame into at least one of the reference frame buffer and a display buffer.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.
This application is a continuation of U.S. Pat. Application No. 17/048,446, filed Oct. 16, 2020, entitled “INTER CODING FOR ADAPTIVE RESOLUTION VIDEO CODING,” which is a national stage submission under 35 U.S.C. § 371 of PCT Patent Application No. PCT/CN2019/077665, filed Mar. 11, 2019, and is a continuation of U.S. Pat. Application No. 17/607,348, filed Oct. 28, 2021, entitled “RESOLUTION-ADAPTIVE VIDEO CODING,” which is a national stage submission under 35 U.S.C. § 371 of PCT Pat. Application No. PCT/CN2019/095293, filed Jul. 9, 2019, each of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 17607348 | Oct 2021 | US |
Child | 18122692 | US | |
Parent | 17048446 | Oct 2020 | US |
Child | 18122692 | US |