This disclosure relates to, but is not limited to, video technologies, and in particularly to a method for video encoding, a method for video decoding, and an apparatus for video decoding.
At present, a block-based hybrid coding framework is adopted by all common video coding standards. Each picture, sub-picture, or frame in a video is divided into square largest coding units (LCUs) or coding tree units (CTUs) of a same size (such as 128×128, 64×64, etc.). Each LCU or CTU can be divided into rectangular coding units (CUs) according to rules. The CU may further be divided into prediction units (PUs), transform units (TUs), etc. The hybrid coding framework includes modules such as prediction, transform, quantization, entropy coding, in-loop filter, etc. The prediction module includes intra prediction and inter prediction. The inter prediction includes motion estimation and motion compensation. Since there is a strong correlation between adjacent samples in a video picture, intra prediction is used in the video coding technology to eliminate spatial redundancy between adjacent samples. Since there is a strong similarity between adjacent pictures in the video, inter prediction is used in the video coding technology to eliminate temporal redundancy between adjacent pictures, thereby improving coding efficiency.
However, the coding efficiency of existing inter prediction methods still needs to be improved.
In a first aspect, a method for video encoding is provided. The method includes the following. A method for video encoding includes the following. Spatial motion information and temporal motion information of a current block are determined by performing spatial motion information prediction and temporal motion information prediction on the current block. The spatial motion information and the temporal motion information are added to a candidate motion information list of the current block in a set order. Whether first temporal motion information is available is determined in the temporal motion information prediction by: determining a first position in a current picture, wherein the first position is used for the temporal motion information prediction for the current block; determining, according to the first position, a collocated block in a collocated picture, and obtaining motion information of the collocated block; and determining, according to the motion information of the collocated block, whether the first temporal motion information of the current block obtained based on the first position is available. The first temporal motion information is not added to the candidate motion information list when the first temporal motion information is unavailable. One or more candidate motion information is selected from the candidate motion information list, and an index of the selected candidate motion information is recorded. A prediction block for the current block is determined according to the selected candidate motion information, the current block is encoded based on the prediction block, and the index of the candidate motion information is encoded.
In a second aspect, a method for video decoding is provided. The method includes the following. Spatial motion information and temporal motion information of a current block are determined by performing spatial motion information prediction and temporal motion information prediction on the current block. The spatial motion information and the temporal motion information are added to a candidate motion information list of the current block in a set order. Whether first temporal motion information is available is determined in the temporal motion information prediction by: determining a first position in a current picture, wherein the first position is used for the temporal motion information prediction for the current block; determining, according to the first position, a collocated block in a collocated picture, and obtaining motion information of the collocated block; and determining, according to the motion information of the collocated block, whether the first temporal motion information of the current block obtained based on the first position is available. The first temporal motion information is not added to the candidate motion information list when the first temporal motion information is unavailable. One or more candidate motion information is selected from the candidate motion information list according to an index of the candidate motion information of the current block obtained by decoding. A prediction block for the current block is determined according to the selected candidate motion information, and the current block is reconstructed according to the prediction block.
In a third aspect, an apparatus for video decoding is provided. The apparatus for video decoding includes a memory configured to: determine spatial motion information and temporal motion information of a current block by performing spatial motion information prediction and temporal motion information prediction on the current block, add the spatial motion information and the temporal motion information to a candidate motion information list of the current block in a set order, where whether first temporal motion information is available is determined in the temporal motion information prediction by: determining a first position in a current picture, wherein the first position is used for the temporal motion information prediction for the current block, determining, according to the first position, a collocated block in a collocated picture, and obtaining motion information of the collocated block, and determining, according to the motion information of the collocated block, whether the first temporal motion information of the current block obtained based on the first position is available, and the first temporal motion information is not added to the candidate motion information list when the first temporal motion information is unavailable, select one or more candidate motion information from the candidate motion information list according to an index of the candidate motion information of the current block obtained by decoding, and determine a prediction block for the current block according to the selected candidate motion information, and reconstruct the current block according to the prediction block.
The accompanying drawings are used for facilitating the understanding of embodiments of the present disclosure, constitute a part of the specification, and are used for explaining technical solutions of the present disclosure together with embodiments of the present disclosure, but do not constitute a limitation on the technical solutions of the present disclosure.
Multiple embodiments are described in this disclosure, but such elaboration is intended for illustration rather than limitation. It will be apparent to those of ordinary skill in the art that, there can be other embodiments and implementations within the protection scope of the embodiments described in the disclosure.
In the elaborations of the disclosure, terms such as “exemplary” or “for example” are used to indicate an example, an illustration, or an explanation. Any embodiment described as “exemplary” or “for example” in the disclosure shall not be construed as preferable or advantageous over other embodiments. The term “and/or” in the disclosure describes an association between associated objects, and indicates that there may be three relationships. For example, A and/or B may mean A alone, both A and B exist, and B alone. “A plurality of” or “multiple” means two or more than two. In addition, in order to describe clearly technical solutions of the embodiments of the disclosure, terms such as “first” and “second” are used to distinguish the same items or similar items having substantially the same functions and effects. Those skilled in the art can understand that terms such as “first” and “second” do not limit the quantity and the execution order, and do not mean that there must be a difference between items.
When describing exemplary embodiments, the methods and/or procedures may be presented as a particular sequence of steps in the specification. However, if the method or procedure does not depend on the particular order of steps described herein, the method or procedure shall not be limited to the particular order of steps described. It can be understood by those of ordinary skill in the art that other sequences of steps are also possible. Thus, the particular order of steps described in the specification shall not be construed as a limitation on the claims. In addition, the claims regarding the methods and/or procedures shall not be limited to implementing the steps in the described order. It can be understood by those of ordinary skill in the art that these orders can be changed without departing from the spirit and scope of embodiments of the disclosure.
In the present disclosure, a video picture is abbreviated as a picture, and the picture includes a video picture and a partial region in the video picture. The partial region in the video picture may be, for example, a sub-picture, a slice, a slice segment, etc., divided from the video picture.
In the present disclosure, motion information derived through motion information prediction indicating a prediction operation includes reference picture information and motion vector (MV) information. In related video standards, “motion vector predictor” indicates the same prediction operation, so the “motion vector predictor” may also be understood as the motion information prediction. That is, motion information prediction in the disclosure can also be referred to as “motion vector predictor”, or “motion information predictor”.
In the present disclosure, “motion information” is used to refer to the result obtained through motion information prediction (motion vector predictor, or motion information predictor). For example, temporal motion information is obtained through temporal motion information prediction, and spatial motion information is obtained through spatial motion information prediction.
“Motion information” in the present disclosure is referred to as “motion vector prediction” in some standards. “Temporal motion information” in the present disclosure is sometimes referred to as “temporal motion vector prediction” in some standards. “Spatial motion information” in the present disclosure is sometimes referred to as “spatial motion vector prediction” in some standards.
In the present disclosure, a non-adjacent position of a current block refers to a position whose coordinates are not adjacent to any samples in the current block. An adjacent position of the current block refers to a position whose coordinates are adjacent to at least one sample in the current block.
In the present disclosure, the current block may be a current coding unit (CU), or a current prediction unit (PU), etc. A current picture refers to a picture where the current block is located, and a current picture sequence refers to a picture sequence where the current block is located.
The decoding end apparatus 2 can receive the bitstream from the encoding end apparatus 1 via a link 3. The link 3 includes one or more media or apparatuses capable of transmitting the bitstream from the encoding end apparatus 1 to the decoding end apparatus 2. In an example, the link 3 includes one or more communication media that enable the encoding end apparatus 1 to transmit the bitstream directly to the decoding end apparatus 2. The encoding end apparatus 1 may modulate the bitstream according to communication standards (such as a wireless communication protocol) and may transmit the modulated bitstream to the decoding end apparatus 2. The one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form a part of a packet-based network, such as a local area network (LAN), a wide area network (WAN), or a global network (such as the Internet). The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the encoding end apparatus 1 to the decoding end apparatus 2. In another example, the bitstream may also be output from an output interface 15 to a storage apparatus, and the decoding end apparatus 2 may read stored data from the storage apparatus via streaming or downloading.
In the example illustrated in
In the example illustrated in
Based on the system for video coding as illustrated in
As illustrated in
The partitioning unit 1101 is configured to partition, in cooperation with the prediction processing unit 1100, received video data into slices, CTUs, or other larger units. The video data received by the partitioning unit 1101 may be a video sequence of video frames such as an I-frame, a P-frame, or a B-frame.
The prediction processing unit 1100 may divide a CTU into CUs, and perform intra prediction coding or inter prediction coding on the CU. When the intra prediction and the inter prediction are performed on the CU, the CU may be partitioned into one or more PUs.
The inter prediction processing unit 1121 may perform inter prediction on a PU to generate prediction data for the PU, where the prediction data includes a prediction block for the PU, motion information of the PU, and various syntax elements.
The intra-prediction processing unit 1126 may perform intra prediction on a PU to generate prediction data for the PU, where the prediction data for the PU may include a prediction block for the PU and various syntax elements.
The residual generation unit 1102 may generate a residual block for a CU by subtracting a prediction block for a PU obtained by partitioning the CU from an original block for the CU.
The transform processing unit 1104 may partition a CU into one or more transform units (TUs). A TU-associated residual block is a sub-block obtained by partitioning a residual block for a CU. A TU-associated coefficient block is generated by performing one or more transforms on the TU-associated residual block.
The quantization unit 1106 may quantize coefficients in a coefficient block based on a selected quantization parameter (QP), and may adjust a degree of quantization for the coefficient block by adjusting a QP value.
The inverse quantization unit 1108 may perform inverse quantization on the coefficient block, and the inverse transform unit 1110 may perform inverse transform on the coefficient block, so as to obtain a TU-associated reconstructed residual block.
The reconstruction unit 1112 may add the reconstructed residual block to the prediction block generated by the prediction processing unit 1100, so as to generate a reconstructed block for a CU.
The filter unit 1113 performs in-loop filtering on the reconstructed block, and buffers the reconstructed block subject to in-loop filtering in the decoded picture buffer 1114 as a reference picture. The intra-prediction processing unit 1126 may extract a reference picture for blocks adjacent to a PU from the decoded picture buffer 1114 to perform intra prediction. The inter prediction processing unit 1121 may perform inter prediction on a PU in a current picture by using a reference picture for a previous picture buffered in the decoded picture buffer 1114.
The picture resolution adjustment unit 1115 resamples reference pictures buffered in the decoded picture buffer 1114, which may include upsampling and/or downsampling, so as to obtain reference pictures with various resolutions and buffer the reference pictures in the decoded picture buffer 1114.
The entropy encoding unit 1116 may perform entropy coding on received data, such as a syntax element, a quantized coefficient block, motion information, and the like.
As illustrated in
The entropy decoding unit 150 may perform entropy decoding on a received bitstream, and extract information such as a syntax element, a quantized coefficient block, and motion information of a PU. The prediction processing unit 152, the inverse quantization unit 154, the inverse transform processing unit 155, the reconstruction unit 158, and the filter unit 159 may perform their respective operations based on the syntax element extracted from the bitstream.
As a functional component for performing reconstruction, the inverse quantization unit 154 may perform inverse quantization on a quantized TU-associated coefficient block, and the inverse transform processing unit 155 may perform one or more inverse transforms on the inverse quantized coefficient block so as to generate a reconstructed residual block for a TU.
The prediction processing unit 152 includes an inter prediction processing unit 162 and an intra prediction processing unit 164. If a PU is encoded by intra coding, the intra prediction processing unit 164 may determine an intra prediction mode for the PU based on the syntax element parsed from the bitstream, perform intra prediction according to the determined intra prediction mode and reconstructed reference information adjacent to the PU obtained from the picture buffer 160, and generate a prediction block for the PU. If the PU is encoded by inter coding, the inter prediction processing unit 162 may determine one or more reference blocks for the PU according to motion information of the PU and a corresponding syntax element, and generate a prediction block for the PU based on the reference blocks.
The reconstruction unit 158 may obtain a reconstructed block for a CU based on a TU-associated reconstructed residual block and the prediction block (namely, intra prediction data or inter prediction data) for the PU generated by the prediction processing unit 152.
The filter unit 159 may perform in-loop filtering on the reconstructed block for the CU, so as to obtain a reconstructed picture, where the reconstructed picture is stored in the picture buffer 160. The picture buffer 160 may provide reference pictures for subsequent motion compensation, intra prediction, inter prediction, etc., or may output reconstructed video data as decoded video data for presentation on a display apparatus.
The display apparatus 25 described above may be, for example, an LCD, a plasma display, an OLED display, or other types of display apparatuses. In other examples, the decoding end may not include the display apparatus 25, but include other apparatuses to which the decoded data is applicable.
A basic process of video coding is as follows. At an encoding end, a picture (or frame) is partitioned into blocks. A prediction block for a current block is generated by performing intra prediction or inter prediction on the current block. A residual block is obtained by subtracting the prediction block from an original block of the current block. A quantization coefficient matrix is obtained by performing transform and quantization on the residual block. The quantization coefficient matrix is encoded into a bitstream by entropy encoding. At a decoding end, the prediction block for the current block is generated by performing intra prediction or inter prediction on the current block. On the other hand, the quantization coefficient matrix is obtained by parsing the bitstream, the residual block is obtained by performing inverse quantization and inverse transformation on the quantization coefficient matrix, and the reconstructed block is obtained by adding the prediction block and the residual block. The reconstructed blocks form a reconstructed picture. A decoded picture is obtained by performing loop filtering on the reconstructed picture on a picture basis or on a block basis. Similar operations for obtaining the decoded picture are also required at the encoding end. The decoded picture may be a reference picture of a subsequent picture for inter prediction. Block partition information determined at the encoding end, as well as mode information or parameter information for prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the bitstream when necessary. The decoding end parses the bitstream and analyzes existing information to determine the block partition information, as well as the mode information or the parameter information for prediction, transformation, quantization, entropy coding, loop filtering, etc. the same as such information at the encoding end, so as to ensure that the decoded picture obtained by the encoding end is the same as the decoded picture obtained by the decoding end. The decoded picture obtained by the encoding end is also referred to as reconstructed picture. The current block can be partitioned into PUs during prediction. The above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized.
A video consists of pictures. In order to make a video smooth, each second of the video includes dozens or even hundreds of pictures, such as 24 pictures per second, 30 pictures per second, 50 pictures per second, 60 pictures per second, 120 pictures per second, etc. Therefore, there is very obvious temporal redundancy in the video. In other words, there are a lot of temporal correlations. “Motion” is often used in inter prediction so as to utilize the temporal correlations. A very simple “motion” model is as follows. An object is at a position on a picture corresponding to a moment, and after a certain period of time, the object translates to another position on the picture corresponding to a current moment, which is the most basic and commonly used translation motion in video coding. The “motion” is indicated by motion information in inter prediction. Basic motion information includes reference picture (such as reference frame) information and MV information. The codec determines a reference picture according to the reference picture information and determines coordinates of a reference block according to the MV information and coordinates of a current block. The reference block is determined according to the coordinates of the reference block in the reference picture. Motion in a video is not always simple. Even translation motion will have slight changes over time, such as slight deformation, brightness changes, noise changes, etc. Prediction may be performed on the current block by using more than one reference block, so as to achieve better prediction results. For example, in commonly used bidirectional prediction, prediction may be performed on the current block by using two reference blocks. The two reference blocks may be a forward reference block and a backward reference block, and may be two forward reference blocks or two backward reference blocks. The “forward” indicates that a time corresponding to the reference picture is before a current picture, and the “backward” indicates that the time corresponding to the reference picture is after the current picture. Alternatively, the “forward” indicates that a position of the reference picture in a video is before the current picture, and the “backward” indicates that the position of the reference picture in a video is after the current picture. Alternatively, the “forward” indicates that a picture order count (POC) of the reference picture is less than a POC of the current picture, and the “backward” indicates that the POC of the reference picture is greater than the POC of the current picture. Prediction with more reference blocks may be supported in future video coding standards. A simple way to generate a prediction block by using two reference blocks is to average sample values of corresponding positions of the two reference blocks to obtain the prediction block. In order to achieve a better prediction effect, weighted averaging may also be used, such as a bi-prediction with CU-level weight (BCW) currently used in VVC. A geometric partitioning mode (GPM) in VVC may also be understood as a special bidirectional prediction. In order to use the bidirectional prediction, two reference blocks are needed, and thus two sets of reference picture information and MV information are needed. Each set of reference picture information and MV information may be understood as one unidirectional motion information.
In addition to basic reference picture information and motion vector information, the motion information may also include some additional information, for example, whether a bi-prediction with CU-level weight (BCW), an index of BCW, or the like is used.
The video is processed picture-by-picture without considering parallel processing. A coded picture can be stored in a buffer as a reference picture for a picture to be subsequently coded. Current coding standards have a set of reference picture management methods to manage reference pictures. The reference picture management method manages which pictures can be used as reference pictures for a current picture, along with indexes of these reference pictures, which pictures need to be stored in a buffer, and which pictures can be removed from the buffer since they are no longer used as reference pictures, etc.
According to different orders of picture encoding/decoding, current commonly used scenarios can be classified into two categories: random access (RA) and low delay (LD). In a LD scenario, a display order for pictures is the same as a encoding/decoding order for pictures, while in a RA scenario, the display order for pictures and the encoding/decoding order for pictures can be different. Generally speaking, in the LD scenario, encoding/decoding is performed picture-by-picture according to an original display order of a video. However, in a RA scenario, a display order of a video can be shuffled for picture encoding/decoding. To be specific, some pictures can be skipped from coding initially, while subsequent pictures are coded first, and then the skipped pictures are coded. One advantage of RA is that some pictures can refer to reference pictures before and after them, enabling better utilization of “motion” for improved compression efficiency.
The structure of a classic group of pictures (GOP) for RA is illustrated in
The arrows in
The codec uses a reference picture list to manage reference pictures. VVC supports two reference picture lists, denoted as RPL0 and RPL1, where RPL is short for reference picture list. In VVC, P slice can only use RPL0, while B slice can use RPL0 and RPL1. For one slice, each reference picture list involves several reference pictures, and the codec finds a specific reference picture through a reference picture index. VVC uses a reference picture index and a motion vector to represent motion information. For example, VVC uses a reference picture index refIdxL0 corresponding to a reference picture list 0 and a motion vector mvL0 corresponding to the reference picture list 0, as well as a reference picture index refIdxL1 corresponding to a reference picture list 1 and a motion vector mvL1 corresponding to the reference picture list 1 to represent bidirectional motion information described above. The reference picture index corresponding to the reference picture list 0 and the reference picture index corresponding to the reference picture list 1 can be understood as the reference picture information described above. VVC uses two flags, predFlagL0 and predFlagL1, to indicate whether the motion information corresponding to the reference picture list 0 is used and whether the motion information corresponding to the reference picture list 1 is used, respectively. It can also be understood that predFlagL0 and predFlagL1 indicate whether the unidirectional motion information described above is valid.
The precision of motion vectors is not limited to integer pixels. VVC supports prediction with precisions of ½ pixel, ¼ pixel, ⅛ pixel, and 1/16 pixel. Prediction in fractional-pixels requires interpolation in integer pixels. This allows for finer motion vectors to improve prediction quality.
A reference block for a current block can be found from a reference picture by using motion information, and a prediction block for the current block is determined according to the reference block.
Motion information used for a current block can usually be predicted by using some related information, which can be referred to as motion information prediction or motion vector prediction. For example, motion information used for coded blocks surrounding and adjacent to a current block in a current picture can be used due to strong correlation between adjacent blocks. Also, motion information used for coded blocks surrounding but not adjacent to the current block in the current picture can be used because there is still some degree of correlation between blocks in a surrounding region and the current block, even if these blocks are not adjacent to the current block. This method for motion information prediction by using motion information used for coded blocks surrounding a current block is generally called spatial motion information prediction. In addition to coded blocks surrounding a current block, motion information of blocks related to a position of the current block in a coded picture can also be used for motion information prediction for current block, which is generally called temporal motion information prediction. Simply put, spatial motion information is derived using motion information from the same picture as the current block, while temporal motion information is derived using motion information in a different picture from the current block.
To utilize spatial motion information prediction, motion information of coded blocks in the current picture (or slice) needs to be saved. Generally, a minimum storage unit, for example, a 4×4 minimum storage unit is set. However, the minimum storage unit can also be 8×8 or of other sizes. Each time the codec codes a block, the codec stores motion information for all minimum storage units corresponding to the block. To find motion information of a block surrounding the current block, a minimum storage unit can be find according to the coordinates, and the motion information can be obtained. Similarly, for temporal motion information prediction, the motion information of a coded picture (or slice) needs to be saved. A minimum storage unit is also generally set, which can be of a size the same as or different from the size of the storage unit for spatial motion information, depending on the relevant standard rules. To find motion information of a block in the picture (or slice), a minimum storage unit can be find according to the coordinates, and the motion information can be obtained. It should be noted that due to the limitation in storage space or implementation complexity, only temporal motion information or spatial motion information in a certain coordinate range may be obtained for the current block.
One or more motion information can be obtained through motion information prediction. If more than one motion information is obtained, one or more motion information should be selected according to some established rules. For example, GPM in VVC uses two motion information, while sub-block based TMVP (SbTMVP) uses one motion information for each sub-block, where TMVP refers to temporal motion vector predictor.
To utilize predicted motion information, the predicted motion information can be directly adopted as motion information of the current block, as exemplified in merge in HEVC. Optionally, a motion vector difference (MVD) can be combined with the predicted motion information to obtain new motion information. From the perspective of design, it is desirable for predicted motion information to closely approximate the actual motion information, through motion information prediction cannot always be guaranteed to be accurate. Therefore, MVD can be used to obtain more accurate motion information. A new representation method of MVD is introduced in VVC. In the merge mode, this new MVD representation method can be combined with motion vector prediction, referred to as merge with MVD (MMVD). In short, motion information prediction can be used directly or in conjunction with other methods.
A merge motion information candidate list is denoted as mergeCandList. When constructing mergeCandList, spatial motion information prediction based on positions 1 to 5 in
As illustrated in
A specific example of temporal motion information derivation is illustrated as follows. Taking temporal motion information derivation based on position 6 in
The coordinates (xColBr, yColBr) of position 6 are calculated as (xCb+cbWidth, yCb+cbHeight). If the coordinates (xColBr, yColBr) meet the requirements, such as not exceeding the range of the current picture or sub-picture or not exceeding the range of the CTU row where the current block is located, the coordinates (xColBr, yColCb) of the collocated block is calculated as ((xColBr>>3)<<3, (yColBr>>3)<<3). The current block is denoted as currCb, the collocated block in the collocated picture ColPic is denoted as colCb, and colCb is a block covering (xColCb, yColCb). currPic represents the current picture. The reason for right-shifting (>>) by 3 bits and then left-shifting (<<) by 3 bits in calculating the coordinates is that the motion information in the collocated picture in this example is stored based on a 8×8 minimum storage unit (to save cache space, the granularity of caching reference picture motion information can be coarser). Right-shifting by 3 bits and then left-shifting by 3 bits changes the last 3 bits of the coordinates to 0. For example, 1010 is the binary representation of 10. After right-shifting by 3 bits and then left-shifting by 3 bits, it becomes 1000, which is the binary representation of 8.
For the convenience of description, some simplified assumptions are made here. In this example, it is assumed that motion information of a collocated block and temporal motion information to be derived only use one reference picture list LO. In this example, X that appears below is always set to 0. However, in some scenarios, the motion information of the collocated block and the temporal motion information to be derived are allowed to use two reference picture lists. Moreover, forward or backward motion information of temporal motion information derived according to forward or backward motion information of the collocated block can be combined in several manners. Here, only the simplest example is used for illustration. For simplicity, refIdxLX is also set to 0, and refIdxLX can have multiple possible values, which is not limited in the disclosure.
Noted that clip3 is a clipping function, and the values such as—131072, 131071, —4096, and 4095 in clip3 are related to data precision. These values can be different under different precision specifications. The above operations can be referred to in relevant standards.
Motion information of surrounding blocks of a current block can be utilized to predict motion information of the current block due to the correlation between the current block and the surrounding block. Temporal motion information prediction involves finding one block, i.e. collocated block in a collocated picture according to one position, and deriving motion information for a current block by using motion information of the collocated block. However, in an implementation process, it is not necessary to find a collocated block first to determine temporal motion information. Statistically, there is also a correlation between a current block and a collocated block, as even if the two blocks belong to different pictures, the time interval between the two pictures is not too long. Therefore, the current block and the collocated block may still belong to a same object if motion is not too intense. However, in some cases, the current block and the collocated block may not belong to the same object, or belong to different moving parts of the same object if, for example, the motion is relatively intense, the time interval between two pictures is relatively long, a difference between POCs of two pictures is relatively large, or the video has a low frame rate, etc. As a result, temporal motion information determined according to one position is not applicable to the current block in some cases.
When using temporal motion information prediction in a list, such as a mergeCandList, it is generally believed that motion information candidate with smaller index in the list has a higher probability of being selected, while motion information candidate with larger index has a lower probability of being selected. Therefore, variable-length coding is used for the list, with shorter codewords for smaller indexes and longer codewords for larger indexes. As a result, if an inapplicable motion information candidate is added to the list, not only the inapplicable motion information candidate itself will be wasted, this will also affect codeword lengths of candidates following the inapplicable motion information candidate. The codeword length affects the cost of indicating that index in the bitstream. If whether a temporal motion information is applicable to the current block can be determined in advance, it will be beneficial for improving compression efficiency.
Temporal motion information can be checked according to motion trajectory. That is to, assuming that motion represented by motion information used for a collocated block does not change significantly (the assumption is reasonable due to inertia and the small time delay between the current picture and the reference picture), whether an object on the collocated block can move to a position in the current block or a certain range around the current block can accurately reflect whether the temporal motion information is available.
The following is explained with examples in conjunction with accompanying drawings.
In
In
In
The example illustrated in
A method for temporal motion information prediction is provided in an embodiment of the present disclosure. As illustrated in
At S110, a first position in a current picture is determined, where the first position is used for temporal motion information prediction for a current block.
At S120, a collocated block in a collocated picture is determined according to the first position, and motion information of the collocated block is obtained.
At S130, whether first temporal motion information of the current block obtained based on the first position is available is determined according to the motion information of the collocated block.
Bidirectional temporal motion information prediction can be performed based on the first position, and two temporal motion information can be obtained. The reference pictures of the two temporal motion information can be different, and the final results of availability determination can also be different. The availability of the two temporal motion information can be determined using the above process. Moreover, at S130, it is not necessary to actually calculate the first temporal motion information when determining whether the first temporal motion information of the current block obtained based on the first position is available.
In an exemplary embodiment of the disclosure, the first position is an adjacent position of the current block, or the first position is a non-adjacent position of the current block.
In an exemplary embodiment of the disclosure, the first position is at the bottom right of the current block, or at the right of the current block, or at the bottom of the current block.
As illustrated in
In an exemplary embodiment of the disclosure, the collocated picture is a reference picture of the first temporal motion information of the current block, and the collocated block is a minimum storage unit or coding unit in the collocated picture.
In an exemplary embodiment of the disclosure, in a picture sequence, a reference picture of the motion information of the collocated block is before the collocated picture, and the collocated picture is before the current picture. This order is consistent with positions of the respective pictures in
In an exemplary embodiment of the disclosure, whether the first temporal motion information of the current block obtained based on the first position is available being determined according to the motion information of the collocated block includes the following. A second motion vector is obtained by transforming a first motion vector in the motion information. A first region where the collocated block is located after being moved according to the second motion vector is determined. A second region is determined according to the current block, and the first temporal motion information being unavailable is determined when the second region does not intersect with the first region or a sub-region of the first region.
The sub-region here, for example, may be a partial region in the first region, such as a region in which one or more pixels are located.
In the embodiment, obtaining a second motion vector by transforming a first motion vector in the motion information includes the following. the second motion vector is obtained by scaling and reversing the first motion vector in the motion information according to a difference between a POC of the collocated picture and a POC of a reference picture of the motion information and a difference between the POC of the collocated picture and a POC of the current picture. Here, “reversing” a vector refers to changing the direction of the vector to the opposite of its original direction. In a two-dimensional plane, it is equivalent to flipping 180 degrees.
In the embodiment, the method for scaling the first motion vector in the motion information is the same as the method for scaling the first motion vector to obtain the motion vector in the first temporal motion information, except that a reversing operation needs to be performed here. The second motion vector obtained by transforming has the same magnitude as the motion vector in the first temporal motion information, but in the opposite direction. The second motion vector and the motion vector in the first temporal motion information can refer to
Referring to
In an exemplary embodiment of the disclosure, the second region includes a region where the current block is located. For example, the second region is a region where the current block is located, or the second region is a region covering the current block and the area of the second region is larger than the area of the current block. Extending the region of the current block can make it possible to determine that the first temporal motion information is available in the case of
In an exemplary embodiment of the disclosure, the first position is at the bottom right of the current block, or at the right of the current block, or at the bottom of the current block, and the second region includes at least a top right region of a picture defined by a line of a right boundary of the current block and a line of a bottom boundary the current block.
In an example of the disclosure, the horizontal coordinate of the second region is less than or equal to the horizontal coordinate of the bottom right corner of the current block plus m, and the vertical coordinate of the second region is less than or equal to the vertical coordinate of the bottom right corner of the current block plus n, where m and n are set values, m≥0, n≥0. Here, m and n, as set values, can be predefined fixed values, values selected according to parameters, or values calculated according to relevant parameters. The specific value can be determined according to implementation.
In the embodiment, the range of the second region is divided according to the directional orientation of the first position. Therefore, whether objects moving towards the current block from the right, the bottom right, or the bottom are available can be determined more accurately, avoiding some larger objects from being determined as unavailable.
In an exemplary embodiment of the disclosure, the first temporal motion information being available is determined when the second region intersects with the first region or the sub-region of the first region.
In embodiments of the disclosure described above, availability determination for temporal motion information is performed, which can more accurately determine the correlation between the reference block determined according to the first temporal motion information and the current block, so as to reasonably choose whether to use the first temporal motion information.
An embodiment of the present disclosure provides a method for temporal motion information prediction. In this embodiment, a collocated block is used for prediction, and motion information is stored in blocks which can be understood as the minimum storage unit of motion information, such as blocks of 4×4, 8×8, and 16×16. All points in the minimum storage unit share the same motion information. However, the present disclosure is not limited to such a size of the collocated block. A certain point can also be used to determine the motion information, in which case the collocated block can be considered as a collocated point. This may be determined according to the application environment of the solution.
A process of temporal motion information prediction includes the following. A position for temporal motion information prediction is determined according to a current block. A collocated block in a collocated picture is determined according to the position. Motion information of the collocated block is determined according to the collocated block. First temporal motion information is determined according to the motion information of the collocated block. However, the process in embodiments of disclosure includes the following. A position for temporal motion information prediction is determined according to a current block. A collocated block in a collocated picture is determined according to the position. Motion information of the collocated block (i.e., the second motion vector in embodiments described above) is determined according to the collocated block. Whether the first temporal motion information is applicable to the current block is checked according to the motion information of the collocated block. If the first temporal motion information is applicable to the current block, first temporal motion information is determined according to the motion information of the collocated block, for example, the motion vector in the first temporal motion information is determined to be a vector in the opposite direction to the second motion vector. Otherwise, proceed to other processes, for example, the first temporal motion information is set as invalid, motion information part of the reference picture list that corresponds to the usage of the first temporal motion information is set as invalid, or predFlag corresponding to the first temporal motion information is set to 0.
Temporal motion information can also be bidirectional motion information, and the embodiment may be applied to derive one or two unidirectional motion information of the bidirectional motion information, to obtain one or two first temporal motion information.
In the embodiment, a method for checking whether temporal motion information is applicable to a current block according to motion information of a collocated block includes the following. A motion vector (or motion information) from the collocated picture to the current picture is determined according to the motion information of the collocated block, so as to determine motion information of the collocated. A region where the collocated block is located in the current picture after being moved according to the determined motion information is called a mapped block. If the mapped block intersects with the current block or an extended range of the current block, the first temporal motion information is applicable to the current block. Reference can be made to
A method for determining a motion vector from a collocated picture to a current picture (i.e., second motion vector) according to motion information of a collocated block includes the following. The motion information of the collocated block can be understood as a motion vector from the collocated picture to a collocated reference picture (i.e., first motion vector). The motion vector from the collocated picture to the current picture can be derived using a method similar to the method for temporal motion information derivation, because the process of temporal motion information derivation can also be understood as deriving a motion vector from the current picture to the reference picture of temporal motion information of the current picture, i.e., currRef, according to the motion vector from the collocated picture to the collocated reference picture.
An example of data structure suitable for VVC is provided. The motion vector from the collocated picture to the current picture is denoted as mvLXExtend, where X may be 0 or 1. It is allowed to use bidirectional motion information, as VVC allows the use of two reference picture lists. The example here is to derive unidirectional information.
mvLXExtend, predFlagLXExtend are derived as follows:
Here, clip3 is a clipping function, and the values such as—131072, 131071, —4096, and 4095 in clip3 are related to data precision. These values can be different under different precision specifications.
With mvLXExtend derived, a position of a mapped block can be determined according to a position of a collocated block. Coordinates of the top left corner of the collocated block are denoted as (xColCb, yColCb), and coordinates of the top left corner of the mapped block are denoted as (xMapCb, yMapCb). Then, xMapCb=xColCh+mvLXExtend [0], and yMapCb=yColCb+mvLXExtend [1]. With the position of the mapped block, whether the mapped block intersects with the current block or an extended range of the current block. mvLXExtend [0] and mvLXExtend [1] represent the horizontal and vertical components of mvLXExtend, respectively.
A range of the current block is a rectangular region with horizontal coordinate from xCb to xCb+cbWidth-1 and vertical coordinate from yCb to yCb+cbHeight-1. A method for determining whether a mapped block intersects with a current block is to determine if at least one of the four vertices of the mapped block is in the range of the current block.
Since motion may change at different times, such as change in speed or direction, etc., an extended range of the current block can be used for determination. This allows for a certain degree of variation in the temporal motion information. For example, by extending the current block up, down, left, and right by m pixels respectively, the extended range of the current block would have horizontal coordinates from xCb-m to xCb+cbWidth+m and vertical coordinates from yCb-m to yCb+cbHeight+m. Of course, all ranges above can be subject to a limiting condition that they do not exceed the range of the current picture.
For simplicity, instead of directly determining whether the mapped block intersects with the current block or an extended range of the current block, whether a point in the mapped block is in the current block or an extended range of the current block is determined. For example, whether a point at the top left of the mapped block or a point at the center of the mapped block is in the current block or an extended range of the current block is determined. Optionally, if the coordinates used to derive the temporal motion information are (xTemp, y Temp), whether (xTemp+mvLXExtend [0], yTemp+mvLXExtend [1]) is within the current block or an extended range of the current block can be determined. In a possible scenario, (xTemp, yTemp) can be (xColBr, yColBr).
A method for candidate motion information list constructing is further provided in an embodiment of the present disclosure. As illustrated in
At S210, spatial motion information and temporal motion information of a current block are determined in performing spatial motion information prediction and temporal motion information prediction on the current block.
At S220, the spatial motion information and the temporal motion information are added to a candidate motion information list of the current block in a set order.
Whether the first temporal motion information is available is determined in the temporal motion information prediction according to the method of any embodiment of the present disclosure, and the first temporal motion information is not added to the candidate motion information list when the first temporal motion information is unavailable.
In embodiments of the disclosure described above, availability determination for temporal motion information is performed. The first temporal motion information may not be added to the candidate motion information list when the first temporal motion information is unavailable. Therefore, more effective motion information can be added to the candidate motion information list, thereby improving the effect of motion information prediction. The index of the motion information selected from the candidate motion information list is enabled to be smaller, so as to save resource for index encoding.
A method for video encoding is further provided in an embodiment of the present disclosure. As illustrated in
At S310, a candidate motion information list of a current block is constructed according to the method for candidate motion information list constructing in any embodiment of the present disclosure.
At S320, one or more candidate motion information is selected from the candidate motion information list, and an index of the selected candidate motion information is recorded.
At S330, a prediction block for the current block is determined according to the selected candidate motion information, the current block is encoded based on the prediction block, and the index of the candidate motion information is encoded.
A position of a reference block for the current block in a reference picture can be determined according to the selected candidate motion information. In this case, the reference block can be obtained through adding an MVD to the selected candidate motion information, and the prediction block can be obtained based on the reference block (such as one or two). Subsequently, a residual between the prediction block and the current block can be calculated and encoded.
In embodiments of the present disclosure, the temporal motion information that is obviously not applicable to the current block can be filtered out by adding a determination on whether the temporal motion information is applicable to the current block, thereby improving compression efficiency. Specifically, by determining the availability of the temporal motion information, the first temporal motion information is not added to the candidate motion information list when the first temporal motion information is unavailable. Therefore, more effective motion information can be added to the candidate motion information list, thereby improving the effect of motion information prediction. The index of the motion information selected from the candidate motion information list is enabled to be smaller, so as to improve compression efficiency.
A method for video decoding is further provided in an embodiment of the present disclosure. As illustrated in
At S410, a candidate motion information list of a current block is constructed according to the method for candidate motion information list constructing in any embodiment of the present disclosure.
At S420, one or more candidate motion information is selected from the candidate motion information list according to an index of candidate motion information of the current block obtained through decoding.
At S430, a prediction block for the current block is determined according to the selected candidate motion information, and the current block is reconstructed according to the prediction block.
An apparatus for temporal motion information prediction is further provided in an embodiment of the present disclosure. As illustrated in
An apparatus for candidate motion information list constructing is further provided in an embodiment of the present disclosure. As illustrated in
An apparatus for video encoding is further provided in an embodiment of the present disclosure. As illustrated in
An apparatus for video decoding is further provided in an embodiment of the present disclosure. As illustrated in
A system for video coding is further provided in an embodiment of the present disclosure. The system for video coding includes the apparatus for video encoding in any embodiment of the present disclosure and the apparatus for video decoding in any embodiment of the present disclosure.
A non-transitory computer-readable storage medium is further provided in an embodiment of the present disclosure. The non-transitory computer-readable storage medium is configured to store a computer program which, when executed by a processor, causes the processor to perform the method in any embodiment of the present disclosure.
A bitstream is further provided in an embodiment of the present disclosure. The bitstream is generated according to the method for video encoding in any embodiment of the present disclosure.
The apparatus for video encoding and/or the apparatus for video decoding in the foregoing embodiments of the disclosure may be implemented by any one of the following circuits or any combination of the following circuits: one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, and hardware. If the disclosure is implemented partially by software, instructions for the software may be stored in a suitable non-transitory computer-readable storage medium, and may be executed by hardware of one or more processors to implement the methods of embodiments of the disclosure.
In one or more exemplary embodiments, the functions described may be implemented by hardware, software, firmware, or any combination thereof. If implemented by software, the functions may be stored in or transmitted over a computer-readable medium as one or more instructions or codes and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium corresponding to a tangible medium such as a data storage medium, or a communication medium including any medium for transmitting computer programs from one place to another, for example, based on a communication protocol. In this manner, the computer-readable medium may usually correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or a carrier. The data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementation of the technology described in the disclosure. A computer program product may include a computer-readable medium.
By way of example rather than limitation, such computer-readable storage medium may include a random access memory (RAM), a read-only memory (ROM), an electrically erasable programmable ROM (EEPROM), a compact disc ROM (CD-ROM) or other optical disc storage devices, magnetic disc storage devices or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program codes in the form of instructions or data structures and can be accessed by a computer. In addition, any connection may be termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote sources by using a coaxial cable, an optical fiber cable, a twisted pair, a digital subscriber line (DSL), or a wireless technology such as infrared, radio, and microwave, the coaxial cable, the optical fiber cable, the twisted pair, the DSL, or the wireless technology such as infrared, radio, and microwave are included in the definition of a medium. However, it may be appreciated, that the computer-readable storage medium and the data storage medium do not include a connection, a carrier, a signal, or other transitory media, but are intended for a non-transitory tangible storage medium. The magnetic disc and the optical disc used herein include a CD, a laser disc, an optical disc, a digital versatile disc (DVD), a floppy disc or a Blu-Ray disc, etc. where the magnetic disc usually reproduces data magnetically, while the optical disc reproduces data optically with lasers. Combinations of the above shall also be included within the scope of the computer-readable medium.
Instructions may be executed by one or more processors, such as one or more DSPs, general-purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuits. Thus, the term “processor” used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the technology described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for coding, or incorporated in an integrated encoder-decoder. In addition, the technology may be embodied completely in one or more circuits or logic elements.
The technical solutions of embodiments of the disclosure can be implemented in a wide variety of apparatuses or devices, including a wireless mobile phone, an integrated circuit (IC), or a group of ICs (e.g., a chipset). Various components, modules, or units are described in embodiments of the disclosure are configured to implement functions of devices based on the described technology, but do not necessarily mean that these functions shall be implemented by different hardware units respectively. Instead, as described above, various units may be combined in a hardware unit of an encoder-decoder or implemented by a collection of interoperative hardware units (including one or more processors described above) in conjunction with suitable software and/or firmware.
This application is a continuation of International Application No. PCT/CN2022/070403, filed Jan. 5, 2022, the entire disclosure of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/070403 | Jan 2022 | WO |
Child | 18762611 | US |