The present invention relates to video coding. In particular, the present invention relates to a video coding system utilizing GPM (Geometric Partitioning Mode).
Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The standard has been published as an ISO standard: ISO/IEC 23090-3:2021, Information technology—Coded representation of immersive media—Part 3: Versatile video coding, published February 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
As shown in
The decoder, as shown in
According to VVC, an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units), similar to HEVC. Each CTU can be partitioned into one or multiple smaller size coding units (CUs). The resulting CU partitions can be in square or rectangular shapes. Also, VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.
The VVC standard incorporates various new coding tools to further improve the coding efficiency over the HEVC standard. Among various new coding tools, some coding tools relevant to the present invention are reviewed as follows.
Merge Mode with MVD (MMVD)
In addition to the merge mode, where the implicitly derived motion information is directly used for prediction samples generation of the current CU, the merge mode with motion vector differences (MMVD) is introduced in VVC. A MMVD flag is signalled right after sending a regular merge flag to specify whether MMVD mode is used for a CU.
In MMVD, after a merge candidate is selected, it is further refined by the signalled MVDs information. The further information includes a merge candidate flag, an index to specify motion magnitude, and an index for indication of motion direction. In MMVD mode, one for the first two candidates in the merge list is selected to be used as MV basis. The MMVD candidate flag is signalled to specify which one is used between the first and second merge candidates.
Distance index specifies motion magnitude information and indicates the pre-defined offset from the starting points for a L0 reference block and L1 reference block. An offset is added to either horizontal component or vertical component of the starting MV, where small circles in different styles correspond to different offsets from the centre. The relation of distance index and pre-defined offset is specified in Table 1.
Direction index represents the direction of the MVD relative to the starting point. The direction index can represent the four directions as shown in Table 2. It is noted that the meaning of MVD sign could be variant according to the information of starting MVs. When the starting MVs are an un-prediction MV or bi-prediction MVs with both lists pointing to the same side of the current picture (i.e. POCs of two references both larger than the POC of the current picture, or both smaller than the POC of the current picture), the sign in Table 2 specifies the sign of the MV offset added to the starting MV. When the starting MVs are bi-prediction MVs with the two MVs pointing to the different sides of the current picture (i.e. the POC of one reference larger than the POC of the current picture, and the POC of the other reference smaller than the POC of the current picture), and the difference of POC in list 0 is greater than the one in list 1, the sign in Table 2 specifies the sign of MV offset added to the list0 MV component of the starting MV and the sign for the list1 MV has an opposite value. Otherwise, if the difference of POC in list 1 is greater than list 0, the sign in Table 2 specifies the sign of the MV offset added to the list1 MV component of starting MV and the sign for the list0 MV has an opposite value.
The MVD is scaled according to the difference of POCs in each direction. If the differences of POCs in both lists are the same, no scaling is needed. Otherwise, if the difference of POC in list 0 is larger than the one in list 1, the MVD for list 1 is scaled, by defining the POC difference of L0 as td and POC difference of L1 as tb, described in
In VVC, when a CU is coded in merge mode, if the CU contains at least 64 luma samples (that is, CU width times CU height is equal to or larger than 64), and if both CU width and CU height are less than 128 luma samples, an additional flag is signalled to indicate if the combined inter/intra prediction (CIIP) mode is applied to the current CU. As its name indicates, the CIIP prediction combines an inter prediction signal with an intra prediction signal. The inter prediction signal in the CIIP mode Pinter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal Pintra is derived following the regular intra prediction process with the planar mode. Then, the intra and inter prediction signals are combined using weighted averaging, where the weight value wt is calculated depending on the coding modes of the top and left neighbouring blocks of current CU as follows:
The CIIP prediction is formed as follows:
In VVC, a Geometric Partitioning Mode (GPM) is supported for inter prediction as described in JVET-W2002 (Adrian Browne, et al., Algorithm description for Versatile Video Coding and Test Model 14 (VTM 14), ITU-T/ISO/IEC Joint Video Exploration Team (JVET), 23rd Meeting, by teleconference, 7-16 Jul. 2021, document: document JVET-M2002). The geometric partitioning mode is signalled using a CU-level flag as one kind of merge mode, with other merge modes including the regular merge mode, the MMVD mode, the CIIP mode and the subblock merge mode. A total of 64 partitions are supported by geometric partitioning mode for each possible CU size, w×h=2m×2n with m,n ∈{3 . . . 6}excluding 8×64 and 64×8. The GPM mode can be applied to skip or merge CUs having a size within the above limit and having at least two regular merge modes.
When this mode is used, a CU is split into two parts by a geometrically located straight line in certain angles. In VVC, there are a total of 20 angles and 4 offset distances used for GPM, which has been reduced from 24 angles in an earlier draft. The 20 angles used for partition are shown in
If geometric partitioning mode is used for the current CU, then a geometric partition index indicating the selected partition mode of the geometric partition (angle and offset), and two merge indices (one for each partition) are further signalled. The number of maximum GPM candidate size is signalled explicitly in SPS (Sequence Parameter Set) as shown in Table 3 and specifies syntax binarization for GPM merge indices. The mapping among the GMP partition index, angle index and the distance index are shown in Table 4. After predicting each of part of the geometric partition, the sample values along the geometric partition edge are adjusted using a blending processing with adaptive weights using the process described later. This is the prediction signal for the whole CU, and transform and quantization process will be applied to the whole CU as in other prediction modes. Finally, the motion field of a CU predicted using the geometric partition modes is stored using the process described later.
The uni-prediction candidate list is derived directly from the merge candidate list constructed according to the extended merge prediction process. Denote n as the index of the uni-prediction motion in the geometric uni-prediction candidate list. The LX motion vector of the n-th extended merge candidate (X=0 or 1, i.e., LX=L0 or L1), with X equal to the parity of n, is used as the n-th uni-prediction motion vector for geometric partitioning mode. These motion vectors are marked with “x” in
After predicting each part of a geometric partition using its own motion, blending is applied to the two prediction signals to derive samples around geometric partition edge. The blending weight for each position of the CU are derived based on the distance between individual position and the partition edge.
The distance for a position (x, y) to the partition edge are derived as:
The weights for each part of a geometric partition are derived as following:
The partIdx depends on the angle index i. One example of weigh w0 is illustrated in
Mv1 from the first part of the geometric partition, Mv2 from the second part of the geometric partition and a combined MV of Mv1 and Mv2 are stored in the motion filed of a geometric partitioning mode coded CU.
The stored motion vector type for each individual position in the motion filed are determined as:
If sType is equal to 0 or 1, Mv0 or Mv1 are stored in the corresponding motion field, otherwise if sType is equal to 2, a combined MV from Mv0 and Mv2 are stored. The combined My are generated using the following process:
Recently, a template matching based reordering for GPM split modes is disclosed in JVET-Y0135 (Chun-Chi Chen, et al., Non-EE2: Template matching based reordering for GPM split modes, ITU-T/ISO/IEC Joint Video Exploration Team (JVET), 25th Meeting, by teleconference, 12-21 Jan. 2022, document: document JVET-Y0135) for consideration of the emerging new coding standard. The template matching method matches the neighboring template around the current block with the reference template around a reference block(s) in a reference picture(s). The neighboring template usually comprises a top temple corresponding to neighboring pixels above the top edge of the current block and a left template corresponding to neighboring pixels to the left edge of the current block. The reference template comprises a respective top template and left template of the reference block(s). Since the reference template and the neighboring template are available at both the encoder side and the decoder side during the coding/decoding process for the current block, the matching costs (i.e., a measure of similarity or dis-similarity between the neighboring template and the reference template) can be evaluated at both the encoder side and the decoder side. Therefore, the matching cost evaluation is considered at a decoder-derived information. The reordering method for GPM split modes according to JVET-Y0135 is a two-step process after the respective reference templates of the two GPM partitions in a coding unit are generated, as follows:
The edge on the template is extended from that of the current CU, as shown in
After ascending reordering using the TM cost, the best N GPM split modes are assigned to their respective indices, according to their TM cost from small to large. Golomb-Rice code is used to signal this index as shown in Table 5.
The signaling of GPM index according to TM based reordering as disclosed in JVET-Y0135 is more efficient than the original signaling method without the TM based reordering since only best N GPM split modes are assigned to their respective indices and the selected index is entropy coded using Golomb-Rice code. However, the TM based reordering as disclosed in JVET-Y0135 suffers longer latency as disclosed in detailed description of this application. The present invention discloses methods to overcome the long latency issue.
A method and apparatus for video coding are disclosed for the encoder side and the decoder side. According to the method for the decoder side, encoded data associated with a current block is received. A pseudo GPM in a target GPM group for the current block is determined. The current block is divided into one or more subblocks. Assigned MVs (Motion Vectors) of each subblock are determined according to the pseudo GPM. A cost for each GPM in the target GPM group is determined according to decoded data. A selected GPM is determined based on a mode syntax and a reordered target GPM group corresponding to the target GPM group reordered according to the costs, wherein the pseudo GPM is allowed to be different from the selected GPM. The encoded data is decoded using information comprising the selected GPM.
In one embodiment, the method for the decoder side may further comprise parsing the mode syntax from a bitstream comprising the encoded data for the current block.
In one embodiment, the cost is derived between a reference template for a reference block of the current block and a neighboring template of the current block using one or more GPM mode selected MV candidates and a target-tested GPM.
In one embodiment, the target GPM group comprises all GPMs in a GPM list.
In one embodiment, all GPMs in a GPM list are divided into a plurality of GPM groups and the target GPM group corresponds to one of the plurality of GPM groups. In one embodiment, the plurality of GPM groups correspond to M groups, wherein M is an integer greater than 1. In one embodiment, a GPM group syntax is parsed from a bitstream comprising the encoded data for the current block, and wherein the GPM group syntax indicates the target GPM group among the plurality of GPM groups. In one embodiment, information related to said one of the plurality of GPM groups is parsed from a bitstream comprising the encoded data for the current block. In one embodiment, the mode syntax is parsed from a bitstream comprising the encoded data for the current block. In one embodiment, the mode syntax is determined implicitly.
According to the method for the encoder side, pixel data associated with a current block is received. A cost for each GPM in a target GPM group is determined according to decoded data. A reordered target GPM group for the GPMs in the target GPM group is generated according to the costs. A selected GPM is determining for current block. A mode syntax is determined depending on a location of the selected GPM in the reordered target GPM group. The current block is divided into one or more subblocks. A pseudo GPM in the target GPM group is determined for the current block according to the mode syntax. Assigned MVs (Motion Vectors) of each subblock are determined according to the pseudo GPM, wherein the pseudo GPM is allowed to be different from the selected GPM. The current block is then encoded using information comprising the selected GPM.
It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment,” “an embodiment,” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.
In GPM mode, how to the store the MV of each subblock is described in the background section. One of the MV1 and MV2 is selected to be stored in the subblock MV buffer according to the GPM partition mode (e.g. partition angle and offset). However, in the method disclosed in JVET-Y0135, the partition mode is reordered according to template matching costs. However, in a video decoder, in the parsing stage (e.g. the Entropy Decoder 140 in
According to the conventional TM-based GPM process as disclosed by JVET-Y0135, the MVs of the neighboring block (if the neighboring block is coded in the GPM mode) is unknown and the MV for the current block cannot be generated. Consequently, the reference samples cannot be loaded in the parsing stage. As is known in the field of video coding system, the reference pictures are usually stored in offline memory, such as DRAM (Dynamic Random Access Memory). The reference samples have to be loaded into internal memory for processing. The external memory access is typically slow and causes processing delay. The TM-based GPM has to wait for the reconstruction stage to complete so that the reconstructed neighboring template is available and the GPM reordering can be performed. After the GPM reordering is completed, the GPM selected for the current block can be determined based on the signaled GPM index and the reordered GPM list. After the selected GPM is determined for the current block, MVs for the subblocks of the current block can be assigned in the reconstruction stage.
Accordingly, the reference sample pre-fetch cannot be performed in the parsing stage, which causes long latency. In order to improve the decoding throughput, a new method is disclosed in this application.
As mentioned above, one reason causing long latency in the TM-based GPM is that the true MVs selected from a merge list for the current block cannot be generated in the parsing stage and have to wait till the reconstruction stage. In this invention, it is proposed to create or define a method of subblock MV assignment with the decoder-side MV/mode derivation tools with GPM (e.g. the template matching based reordering in JVET-Y-135) or any coding tools where the MV assignment depends on the process performed in sample reconstruction stage. When the syntax of GPM mode index indicating which reordered partition mode being selected is parsed, a predefined subblock MV assignment method for GPM mode can be determined without performing the decoder-side MV/mode derivation according to embodiments of the present invention. The predefined subblock MV is referred as a pseudo MV in this disclosure. For example, one of the partition mode in
In another embodiment, for GPM partition signalling, some similar modes can be collected in a group. All the GPM partitions can be classified into several groups. For each group, one predefined subblock MV assignment is designed. The decoder-side MV/mode derivation can reorder the modes in the same group. The reordered modes in each group can be further re-assigned (e.g. take the one or more modes in each group in an interleaved manner) to the final reordered mode syntax. Therefore, when the GPM syntaxes are parsed, which group is selected can be known. The corresponding MV assignment is also determined. In one example, the GPM mode syntax/index is classified into different groups (e.g. mode indices can be classified into four groups as 4n, 4n+1, 4n+2, 4n+3. Or more general, classified into M groups as Mn, Mn+1, Mn+2, . . . Mn+(M−1).). For each group, one or more subblock MV assignment methods are predefined. Therefore, the subblock MV can be assigned in the parsing stage or said before the sample reconstruction stage. All the GPM modes within a group are reordered by decoder-side MV/mode derivation.
In another embodiment, the GPM partition mode is classified/quantize in to several grouped. Within each group, the exact GPM mode is derived by the decoder-side MV/mode derivation. Therefore, it only needs to signal which group is selected in the bitstream. The decoder can determine the exact GPM partition mode via the decoder-side MV/mode derivation. In each group, one or more subblock MV assignment method is predefined. Therefore, the subblock MV can be assigned in the parsing stage or said before the sample reconstruction stage.
Any of the foregoing proposed methods can be implemented in encoders and/or decoders. For example, any of the proposed methods can be implemented in an inter coding module of an encoder (e.g. Inter Pred. 112 in
The flowcharts shown are intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.
The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.
Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA). These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
The present invention is a non-Provisional application of and claims priority to U.S. Provisional Patent Application No. 63/304,012 filed on Jan. 28, 2022. The U.S. Provisional patent application is hereby incorporated by reference in its entirety.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/CN2023/072055 | 1/13/2023 | WO |
| Number | Date | Country | |
|---|---|---|---|
| 63304012 | Jan 2022 | US |