One or more aspects of the disclosed subject matter are directed to video encoding or decoding and more specifically to video encoding or decoding with memory bandwidth conservation when an affine motion model is used.
The VVC (Versatile Video Coding) is a new video compression standard being developed by the joint video experts team (JVET) jointly established by ISO/IEO MPEG and ITU-T. The VVC standard for single layer coding will be finalized by the end of 2020, with a design goal of being at least 50% more efficient than the previous standard MPEG HEVC/ITU-T H.265 Main-10 profile.
Among proposed coding tools to VVC under consideration, the affine motion compensation prediction introduces a more complex motion model for better compression efficiency. In previous standards such as HEVC, only a translational motion model is considered, in which all the sample positions inside a PU (prediction unit) may have a same translational motion vector for motion compensated prediction. However, in the real world, there are many kinds of motion, e.g., zoom in/out, rotation, perspective motions and other irregular motions. The affine motion model supports different motion vectors at different sample positions inside a PU, which effectively captures more complex motion. Different sample positions inside a PU, such as four corner points of the PU, may have different motion vectors as supported by the affine mode. A PU coded in affine mode and affine merge mode may have uni-prediction (list 0 or list 1 prediction) or bi-directional prediction (i.e. list 0 and list 1 bi-prediction).
In the current VVC design (see JVET-P2001, “Versatile Video Coding (Draft 7)”), the sub-block size for the affine mode is fixed to 4×4, which creates the 4×4 bi-directional prediction for the worst-case memory bandwidth consumption of motion compensation. In the HEVC, the worst-case memory bandwidth consumption for motion compensation is 8×8 bidirectional prediction, while 8×4 and 4×8 PUs use uni-prediction only. The increased memory bandwidth budget can never catch up with the path of sample rate increase (e.g., HEVC is typically for 4K video at 60 fps, while VVC will be used for 8K video at 60 fps, another factor of 4 increase in terms of sample processing rate).
One or more aspects of the disclosed subject matter will be set forth in detail with reference to the drawings, in which:
The present application claims the benefit of U.S. Provisional Patent Application Nos. 62/757,004, filed Nov. 7, 2018; 62/769,875, filed Nov. 20, 2018, and 62/792,195, filed Jan. 14, 2019, whose disclosures are hereby incorporated by reference in their entireties into the present disclosure.
The description set forth in detail with reference to the drawings, in which like reference numerals refer to like elements or operations throughout.
where a, b, c, d, e, f are the affine motion model parameters, which define a 6-parameter affine motion model (see
In the 4-parameter affine motion model, the model parameters a, b, e, f are determined by signaling two control point vectors at the top-left and top-right corner of a PU.
It should be appreciated that in
It should be appreciated that in
Further, to constrain the memory bandwidth consumption of the affine mode for motion compensation, the motion vectors of a PU coded in affine mode are not derived for each sample in a PU. For example, as shown in
The concepts of determining the reference block bounding size will be described.
By substituting Equation 5 into the affine motion model, the 4 sub-block vectors are be derived by
The parameters of the affine motion model, i.e. (a, b, c d), can be calculated in any suitable way such as using Equation 3 or 4.
For motion compensation, reference blocks are loaded around the co-located sub-block locations with the offsets determined by the sub-block motion vectors.
Based on the coordinates listed in Table 1, the coordinates of upper-left and bottom-right corners of the reference block bounding box in
where max( ) and min( ) are functions used to return the largest and the smallest value from a set of data, respectively.
By using Equations 6 and 7, the width and height of the reference block bounding box, i.e. (bxW4,bxH4) can be computed by:
If the PU coded in affine mode uses unidirectional inter prediction, the reference block bounding box can also be drawn in the vertical direction.
From Equation 12, Equation 13 and Equation 14, it can be seen that the reference block bounding block size is independent of sub-block locations inside the PU; it purely depends on the parameters of the affine motion model (i.e. a, b, c, d), sub-block size (i.e. m*n) and filter tap lengths (i.e. fx*fy) used for motion compensation.
In the current VVC design (JVET-P2001), the sub-block size used for the affine mode is 4×4 (i.e. m=n=4), and the filter tap used for luma motion compensation of the affine mode is 6×6 (i.e. fx=fy=6). The reference block bounding box sizes for the VVC are defined in Equations 15, 16 and 17.
Now that the above concepts of reference block bounding box computation have been explained, various aspects of the disclosed subject matter using some or all of these concepts will be further described herein.
where m*n is the sub-block size, and fx*fy is the filter tap size used in the luma motion compensation.
where m*n is the sub-block size, and fx*fy is the filter tap size used in the luma motion compensation of the affine mode.
where δx*δy>0 defines the margin for controlling the memory bandwidth consumption.
where (x,y) for a sub-block vector can be the center location of the sub-block.
where (x0,y0) is the coordinate of the center point of the PU. (x0,y0) can be set to other locations of the PU. For example, if (x0,y0) is set to the coordinate of the top-left corner of the PU, then all the sub-block vectors {right arrow over (v)}=(vx,vy) of the PU are actually set to the control point motion vector of the PU at the top-left PU corner location.
By setting all the sub-block vectors to a same vector, the reference block bounding box size of 2×2, 2×1 and 1×2 sub-block vectors in the fallback mode is (fx+2m−1)*(fy+2n−1), (fx+2m−1)*(fy+n−1) and (fx+m−1)*(fy+2n−1), respectively, which is guaranteed to be smaller than the pre-defined thresholds Thredb, Thredh and Thredv.
In the affine mode of the current VVC design, the sub-block size is fixed to 4×4 (i.e. m=n=4) and the filter-tap is fixed to 8×8 (i.e. fx=fy=6). If δx and δy are set to δx=δy=2, thresholds Thredb, Thredh and Thredv become
Which means the memory bandwidth consumption of the affine mode controlled by the algorithm described in
where m*n is the sub-block size, and fx*fy is the filter tap size used in the luma motion compensation.
where δx*δy>0 defines the margin for controlling the memory bandwidth consumption.
where (x,y) for a sub-block vector can be the center location of the sub-block.
where (x0,y0) is the coordinate of the center point of the PU. (x0,y0) can be set to other locations of the PU. For example, if (x0,y0) is set to the coordinate of the top-left corner of the PU, then all the sub-block vectors {right arrow over (v)}=(vx,vy) of the PU are actually set to the control point motion vector of the PU at the top-left PU corner location.
By setting all the sub-block vectors to a same vector, the reference block bounding box size of 2×2 sub-block vectors in the fallback mode is (fx+2m−1)*(fy+2n−1), which is guaranteed to be smaller than the pre-defined threshold values of Thredb and Thredu.
It should be appreciated that in the bi-directional affine mode, a PU has both list 0 and list 1 predictions. In the disclosed subject matter, the reference bounding box size bxW4*bxH4 is computed independently for list 0 and list 1 prediction with the respective list 0/list 1 affine motion model parameters (a, b, c, d) of the PU, and the threshold Thredb is set separately for list 0 and list 1 prediction (though the values of the threshold could be the same). With the memory bandwidth control algorithms described above, the following four combinations are possible for a PU coded in bi-directional affine mode: 1) the regular sub-block motion vector fields are used for both the list 0 and list 1 motion compensation of the PU; 2) the regular sub-block motion vector field is used for list 0 motion compensation but the fallback mode (i.e. a single vector for list 1 prediction of the entire PU) is used for list 1 motion compensation; 3) the fallback mode (i.e. a single vector for list 0 prediction of the entire PU) is used for list 0 motion compensation but the regular sub-block motion vector field is used for list 1 motion compensation; and 4) the fallback mode (i.e. a first single vector for list 0 prediction and a second single vector for list 1 prediction of the entire PU) is used for both list 0 and list 1 motion compensation.
a. If, as determined in step 810, the PU uses bi-directional affine mode (i.e. bi-pred) and bxW4*bxH4>Thred4.
b. Or if, as determined in step 812, the PU uses uni-directional affine mode and either bxWh*bxHh>Thredh or bxWv*bxHv>Thredv, where Thred4, Thredh and Thredv may be pre-defined by using Equation 18.
The estimated affine CPMVs for the PU, if available, may be further evaluated against the costs of regular motion vectors and intra prediction mode estimated for the PU to decide whether the current PU should be encoded in the affine mode.
By using the proposed method, the worst-case memory bandwidth consumption for the affine mode is restricted not only at sub-block level but also at PU level. Note that the sub-block motion vector spread within a PU of affine mode is determined by the affine parameters (a, b, c, d).
It should be appreciated that in in step 810 the decision on whether the reference bounding box size exceeds the pre-defined threshold is done independently for the list 0 and list 1 prediction of a PU coded in bi-directional affine mode.
The restriction can also be imposed by a bitstream constraint. For example, the bitstream constraint can be specified as follows:
A bitstream conforming to the VVC standard shall satisfy the following conditions:
The implementation depicted in
For example, the memory bandwidth control algorithm can be modified based on the shape of the PU coded in the affine mode:
Otherwise, sub-block size m*n is used for the generation of the sub-block motion data field for the motion compensation of the PU.
In another variation, instead of adaptively selecting the sub-block size based on the size of reference block bounding box, the selection may be based on the width and/or height of the reference block bounding box, and/or based on the DDR burst size and alignments, or based on any combinations of above mentioned parameters (i.e. size, with and height of the reference block bounding box, the DDR burst size and alignments, and etc.).
It should be appreciated that in the disclosed matter the bitstream constraints and/or sub-block size adaptation are done independently for the list 0 and list 1 prediction of a PU coded in bi-directional affine mode.
In another variation, the sub-block vectors of affine mode used for motion compensation may be the same as or different from the ones used for (affine) merge/AMVP list derivation (used as spatial neighboring candidates), for de-blocking filter and for storage of temporal motion vectors (TMVPs). For example, the sub-block motion vectors of the affine mode for motion compensation may be generated by the algorithm that adaptively selects sub-block sizes (e.g. 8×8/8×4 adaptively), while the sub-block motion vectors of the affine mode for (affine) merge/AMVP list derivation (used as spatial neighboring candidates), for de-blocking filter and for storage of temporal motion vectors (TMVPs) may be generated by using a fixed sub-block size (e.g. 4×4). In another example, the sub-block motion vectors of the affine mode for motion compensation, for (affine) merge/AMVP list derivation (used as spatial neighboring candidates), for de-blocking filter and for storage of temporal motion vectors (TMVPs) may be generated by the algorithm described herein that adaptively selects sub-block sizes (e.g. 8×8/8×4 adaptively).
A hardware description of a computer/device (e.g., the image processing device 1000) according to exemplary embodiments, or any other embodiment, which is used to encode and/or decode video is described with reference to
Further, the claimed advancements may be provided as a utility application, background daemon, or component of an operating system, or combination thereof, executing in conjunction with CPU 1002 and an operating system such as Microsoft Windows, UNIX, Solaris, LINUX, Apple MAC-OS and other suitable operating system.
The image processing device 1000 may be a general-purpose computer or a particular, special-purpose machine. In one embodiment, the image processing device 1000 becomes a particular, special-purpose machine when the processor 1002 is programmed to perform network performance testing. The image processing device may be implemented as an encoder, a decoder, or a device which both encodes and decodes images. The image processing device can be implemented in a mobile phone, a laptop, a tablet, a general purpose computer, a set-top box, a video decoding device such as an Amazon Fire TV Stick or device, a Roku device, a television, a video monitor, a still camera, a video camera, a scanner, a multifunction printer, an automobile display, or any desired device.
Alternatively, or additionally, the CPU 1002 may be implemented on an FPGA, ASIC, PLD or using discrete logic circuits, as one of ordinary skill in the art would recognize. Further, CPU 1002 may be implemented as multiple processors cooperatively working in parallel to perform the instructions of the inventive processes described above.
The image processing device 1000 in
The image processing device 1000 further includes a display controller 1012, such as a graphics card or graphics adaptor for interfacing with display 1014, such as a monitor. A general purpose I/O interface 1016 interfaces with a keyboard and/or mouse 1018 as well as a touch screen panel 1020 on or separate from display 1014. General purpose I/O interface also connects to a variety of peripherals 1022 including printers and scanners.
A sound controller 1024 is also provided in the image processing device 1000 to interface with speakers/microphone 1026 thereby providing sounds and/or music.
The general-purpose storage controller 1028 connects the storage medium disk 1008 with communication bus 1030, which may be an ISA, EISA, VESA, PCI, or similar, for interconnecting all of the components of the image processing device 1000. A description of the general features and functionality of the display 1014, keyboard and/or mouse 1018, as well as the display controller 1012, storage controller 1028, network controller 1010, sound controller 1024, and general purpose I/O interface 1016 is omitted herein for brevity.
The exemplary circuit elements described in the context of the present disclosure may be replaced with other elements and structured differently than the examples provided herein. Moreover, circuitry configured to perform features described herein may be implemented in multiple circuit units (e.g., chips), or the features may be combined in circuitry on a single chipset. For that matter, any hardware and/or software capable of implementing any of the above embodiments, or any other embodiment, can be used instead of, or in addition to, what is disclosed above.
The functions and features described herein may also be executed by various distributed components of a system. For example, one or more processors may execute these system functions, wherein the processors are distributed across multiple components communicating in a network. The distributed components may include one or more client and server machines, which may share processing, in addition to various human interface and communication devices (e.g., display monitors, smart phones, tablets, personal digital assistants (PDAs)). The network may be a private network, such as a LAN or WAN, or may be a public network, such as the Internet. Input to the system may be received via direct user input and received remotely either in real-time or as a batch process. Additionally, some implementations may be performed on modules or hardware not identical to those described. Accordingly, other implementations are within the scope that may be claimed.
While preferred embodiments have been set forth above, those skilled in the art who have reviewed the present disclosure will readily appreciate that other embodiments can be realized within the scope of the disclosure. For example, disclosures of numerical values and of specific technologies are illustrative rather than limiting. Also, whenever technically feasible, features from different embodiments can be combined, and the order in which operations are performed can be varied. Further, wherever technically feasible, any feature disclosed herein can be used for encoding, decoding or both. The one or more aspects of the disclosed subject matter are not limited to VVC implementations and can be utilized with any video encoding/decoding system. Therefore, the one or more aspects of the disclosed subject matter should be construed as limited only by the appended claims.
The present application is a continuation of U.S. patent application Ser. No. 16/665,484, filed Oct. 28, 2019, which claims the benefit of U.S. Provisional Patent Application Nos. 62/757,004, filed Nov. 7, 2018; 62/769,875, filed Nov. 20, 2018; and 62/792,195, filed Jan. 14, 2019; each of which are hereby incorporated by reference in their entireties into the present disclosure.
Number | Name | Date | Kind |
---|---|---|---|
6037988 | Gu | Mar 2000 | A |
6553069 | Shin | Apr 2003 | B1 |
7823083 | Rohrabaugh | Oct 2010 | B2 |
7924317 | Lin | Apr 2011 | B2 |
8073196 | Yuan | Dec 2011 | B2 |
8233701 | Frakes | Jul 2012 | B2 |
8238429 | Nagori | Aug 2012 | B2 |
8842730 | Zhou | Sep 2014 | B2 |
9197736 | Davis | Nov 2015 | B2 |
9256806 | Aller | Feb 2016 | B2 |
9274742 | Phillips | Mar 2016 | B2 |
9355293 | Swayn | May 2016 | B2 |
9557162 | Rodriguez | Jan 2017 | B2 |
10438349 | Yu | Oct 2019 | B2 |
10448010 | Chen | Oct 2019 | B2 |
10510157 | Seyfi | Dec 2019 | B2 |
10553091 | Chen | Feb 2020 | B2 |
10560712 | Zou | Feb 2020 | B2 |
10602180 | Chen | Mar 2020 | B2 |
10631002 | Li | Apr 2020 | B2 |
10638152 | Chuang | Apr 2020 | B2 |
20050169378 | Kim | Aug 2005 | A1 |
20120242809 | White | Sep 2012 | A1 |
20140092439 | Krig | Apr 2014 | A1 |
20140269923 | Kwon | Sep 2014 | A1 |
20150024800 | Rodriguez | Jan 2015 | A1 |
20180047193 | Gao | Feb 2018 | A1 |
20180077417 | Huang | Mar 2018 | A1 |
20180098062 | Li | Apr 2018 | A1 |
20180192069 | Chen | Jul 2018 | A1 |
20180270500 | Li | Sep 2018 | A1 |
20180309990 | Alshina | Oct 2018 | A1 |
20180316918 | Drugeon | Nov 2018 | A1 |
20190007682 | Kanoh | Jan 2019 | A1 |
20190037231 | Ikai | Jan 2019 | A1 |
20190058896 | Huang | Feb 2019 | A1 |
20190082191 | Chuang | Mar 2019 | A1 |
20190110061 | Park | Apr 2019 | A1 |
20190124332 | Lim | Apr 2019 | A1 |
20190191171 | Ikai | Jun 2019 | A1 |
20190273943 | Zhao | Sep 2019 | A1 |
20190335170 | Lee | Oct 2019 | A1 |
20190342547 | Lee | Nov 2019 | A1 |
Number | Date | Country |
---|---|---|
WO 2019139309 | Jul 2019 | WO |
Entry |
---|
Lin S., “Affine transform prediction for next generation video coding”, ISO/IEC JTC1/SC29/WG11 MPEG2015/m37525, Oct. 2015. |
ITU-T, “Algorithm Description of Joint Exploration Test Model 1”, ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 Ind Meeting: Geneva. CH, Oct. 19-21, 2015. |
ITU-T H.265 Series H: Audiovisual and Multimedia Systems Infrastructure of audiovisual services—Coding of moving video—High efficiency video coding, 2016. |
CE4.1.3: Affine motion compensation prediction, JVET-K0337, 2018. |
Li g. , CE4-1 elated: affine merge mode with prediction offsets, JVET-L0320, 2018. |
Chen H. CE4: Common base for affine merge mode (Test 4.2.1), JVET-L0366-v1, 2018. |
Chang Y-C, CE4-related: Control point MV offset for Affine merge mode, JVET-L0389-v1, 2018. |
T-D Chuang et al: “CE 9 .2.1: Bilateral matching merge mode”. 11. JVET Meeting; Jul. 11, 2018-Jul. 18, 2018; Ljubljana; (The Joint Video Exploration Team of ISO/IEC JTC1/SC29/WG11 and ITU-T SG.16 ), No. JVET-K0254 Jul. 10, 2018 XP03019 9311 , Retrieved from the Internet: URL:http://phenix.int-evry.fr/jvet/doc end user/documents/11Ljubljana/wgll/JVET?K02 54 -v2.zip JVET-K0254-vl.docx [retrieved on Jul. 10, 2018]. |
Zhou (Broadcom) M: “CE2-related: A restriction on memory bandwidth consumption of affine mode”. 125. MPEG Meeting; Jan. 14, 2019-Jan. 18, 2019; Arrakec H; (Motion Picture Expert Group or MISO/IEC JTC1/SC29/WG11), No. m45303 Dec. 28, 2018, XP 030197660 , Retrieved from the Internet: URL:http ://phenix.int-evry.fr/mpeg/doc end user/documents/125Marrakech/wgll/m4 5303-JVET- M004 9-vl -JVET-M0 049 -vl.zip JVET-M0 049 -vl.docx [retrieved on Dec. 28, 2018]. |
He (Interdigital) Yet al: “Description 1-15 of Core Experiment 2 (CE2): Sub-block based motion prediction”, 12. JVET Meeting; Oct. 3, 2018-Oct. 12, 2018;Macao; (The Joint Video Exploration Team of ISO/IEC JTC1/SC29/WG11 and ITU-T SG.16), No. JVET-LI022, Nov. 3, 2018, XP030198596, Retrieved from the Internet: URL:http://phenix.int-evry.fr/jvet/doc enduser/documents/12 Macao/wgll/JVET-L1022-v 2.zip JVET-LI022-v2-clean.docx [retrieved on Nov. 3, 2018]. |
Extended European Search Report dated Mar. 4, 2020, issued in corresponding European Patent Application No. 19207520.8-1208. |
Number | Date | Country | |
---|---|---|---|
20220232204 A1 | Jul 2022 | US |
Number | Date | Country | |
---|---|---|---|
62792195 | Jan 2019 | US | |
62769875 | Nov 2018 | US | |
62757004 | Nov 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16665484 | Oct 2019 | US |
Child | 17561665 | US |