The present invention relates to data structures used in coding Virtual Reality (VR) streams using either Advanced Video Coding (AVC) or High Efficiency Video Coding (HEVC or H-265). More particularly, the present system relates to reference lists and indexing for reference pictures and subpictures used in coding VR pictures for AVC or HEVC.
VR (Virtual Reality) is the term describing a three-dimensional, computer generated environment, which can be explored and interacted with by a person. An example of use of VR is for 360 degree vision which could be achieved by special device with a Head Mounted Display (HMD) to enable a user to view all around. To cover the 360 degrees of vision in VR, a few projection formats have been proposed and used.
One VR format is cube projection which is illustrated using
Two other VR formats other than the cube projection are described, although other formats might be used. One such VR format is the Equal Rectangular Projection (ERP) which maps meridians of a map globe onto a two dimensional surface with equally spaced vertical straight lines, and with equally spaced horizontal straight lines. This enables longitude and latitude lines on a globe to be equally spaced apart on the cube. Projection onto the surface of the cube still results in 6 surfaces that can be laid out as shown in
Another VR format is the Equal Area Projection (EAP) which maps meridians of a map globe onto a two dimensional surface with equally spaced vertical straight lines, and with circles of latitude mapped directly to horizontal lines even if they are not equally spaced. Again, projection onto the surface of the cube still results in 6 surfaces that can be laid out as shown in
The existing video coding standards, such as Advanced Video Coding (AVC) or High Efficiency Video Coding (HEVC), may be used to code VR sequences. All those video coding standards are based upon a hybrid of temporal and spatial coding. That is, the coding uses motion estimation and compensation (ME/MC) to remove the temporal redundancy between consecutive pictures, and spatial prediction and spatial transform to remove the correlation among the pixels within a picture.
For ME/MC, the past-coded pictures are used as reference pictures for the current and future pictures. A block in a current picture may find a best-matched (prediction) block in one or more reference pictures. Specifically, AVC and HEVC have two reference lists, which hold some of the past-coded pictures for future reference. A block in a current picture may find a prediction block in one of the pictures in each list of references.
It is desirable to provide improvements for coding when VR formats are used.
Embodiments of the invention provide a method for coding video that includes VR sequences that enable more efficient encoding by organizing the VR sequence as a single 2D block structure. Reference picture and subpicture lists are created and extended to account for coding of the VR sequence. To further improve coding efficiency, reference indexing can be provided for the temporal and spatial difference between a current VR picture block and the reference pictures and subpictures for the VR sequence. Because the reference subpictures for the VR sequence may not have the proper orientation once the VR sequence subpictures are organized into the VR sequence, embodiments of the present invention allow for reorientation of the reference subpictures so that the reference subpictures and VR subpictures are orientated the same.
For embodiments of the present invention, the VR sequence can be treated as a regular 2D sequence. That is, each VR picture is treated as a single 2D picture. In this case, all the existing video coding standards can be applied to the single VR sequence directly. Since a VR picture in a cube of 4×3 or 3×2 includes six subpictures at each time instance, the six subpictures. The six VR picture subpictures can be treated as six tiles within a picture, similar to the concept defined in HEVC.
One embodiment of the present invention provides a method for coding of video with VR pictures, with the coding including a reference list of past-coded pictures and subpictures. In the method, a current VR picture in the VR pictures of the video is defined to include six subpictures as represented by the cube of
Another embodiment of the present invention provides a method for coding of video with VR pictures that includes indexing of reference subpictures relative to current subpictures to improve coding efficiency. In this embodiment also, a current VR picture in the VR pictures of the video is defined to include six subpictures. Next, a reference picture and reference subpictures are defined for the current VR picture. Then a reference list and index is built for the current VR picture and subpictures relative to the reference picture and subpictures. The indexing of subpictures is made according to temporal and spatial distances from a current block in one of the current subpictures to a reference block in the reference subpictures. The reference list and index created is then used in coding of the video and sent to a decoder.
A further embodiment of the present invention provides a method for coding of video with VR pictures that includes the ability to change subpicture orientation to enable efficient encoding of the VR pictures. In this embodiment, like the embodiments above, a current VR picture in the VR pictures of the video is defined to include six subpictures. Next, the subpictures for a reference picture for the current VR picture is identified. Finally, the subpictures of the reference picture are rotated to match the orientation of the subpictures of the current VR picture.
Further details of the present invention are explained with the help of the attached drawings in which:
A VR sequence in a video can be treated as a regular 2D sequence with six subpictures for the embodiments of the invention described herein. That is, each VR picture is treated as a single 2D picture and coding standards such as AVC and HEVC can be applied to the single VR sequence directly. The VR picture can be a 4×3 or 3×2 breakdown of a cube into six subpictures at each time instance, as illustrated in
To accomplish motion estimation and compensation (ME/MC) for embodiments of the present invention, the concept of reference pictures lists, reference indexing and an orientation of references relative to a current picture can be provided for a VR sequence for embodiments of the present invention. A description of each of these concepts is provided to follow.
The concept of reference pictures and lists can be extended for a VR sequence. Similar to AVC and HEVC, for a block in a current subpicture within a current picture, reference pictures can be provided and reference lists built to enable ME/MC. Reference pictures can be built from the past-coded pictures of subpictures as well as the past-coded subpictures of the current picture. A listing of these reference pictures can further be created.
The past-coded pictures can be included in at least one reference list, similar to AVC and HEVC. The past-coded subpictures for the current picture may be included in a second reference list.
Now for blocks, consider a current block in a current subpicture within a current picture. For the current block the reference prediction block can be found in one of the reference subpictures per reference list. One of reference subpictures in which the reference prediction block is found can be in one of the past-coded pictures in a different picture time instance than the current time instance forming the reference.
Due to the fact that the closer the reference picture and subpictures are to the current subpicture temporally and spatially, the higher the correlation between the reference picture and subpictures and the current picture, the reference pictures and subpictures for embodiments of the present invention may be indexed according to their temporal and spatial distance to the current subpicture.
Embodiments of the present invention provide for a default reference picture/subpicture index order. In particular, for a current block in a current subpicture for a current picture, a reference picture and subpictures in a reference picture list are indexed according to its temporal and spatial distances to the current block in the current subpicture of the current picture. In other words, the closest reference picture/subpicture to the current block in the current subpicture of the current picture temporally and spatially is assigned the index of 0, the second closest reference picture/subpicture is assigned the index of 1, and so on.
In embodiment for providing a reference list index, a reference subpicture is assigned a temporal index, i, and a spatial index, j, or a combination of temporal and spatial indexes, i+j. The temporal index, i, can be determined by the temporal distance between the reference picture and the current picture, i.e., the closer, the smaller the index. The spatial index, j, can be determined by the spatial distance between the reference subpicture in the reference picture and the current block collocated in the reference picture.
Not all the subpictures in a reference picture have the same orientation as the current subpicture of a current VR picture. To enable coding of the VR picture efficiently, the orientation of the six subpictures making up the VR picture that is made up of arranged faces of a cube should be organized to have the same orientation irrespective of arrangement of the cube faces.
Accordingly, embodiments of the present invention provide for the subpictures of a reference picture to be rotated as shown in
For better temporal and spatial prediction, the subpictures in a reference picture are rotated and rearranged accordingly so that the spatial content transition from a subpicture to its neighbor subpictures within the reference picture can be continuous and smooth. It is noted that in addition with rotation so that arrangement of subpictures of the current and reference pictures are the same, the spatial reference index, j, may not be necessary as the reference picture of six subpictures can be treated as one single picture in the reference list.
To perform motion estimation and compensation, encoder 1102 and decoder 1104 include motion estimation and compensation blocks 1104-1 and 1104-2, respectively. For bi-directional prediction, the motion estimation and compensation blocks 1104-1 and 1104-2 can use a combined bi-directional reference unit in the motion compensation process for the current unit.
For the encoder 1102 and decoder 1104 of
Although the present invention has been described above with particularity, this was merely to teach one of ordinary skill in the art how to make and use the invention. Many additional modifications will fall within the scope of the invention as that scope is defined by the following claims.
This application is a continuation of U.S. patent application Ser. No. 17/214,597 filed Mar. 26, 2021, which is a continuation of U.S. patent application Ser. No. 15/782,107 filed Oct. 12, 2017, now U.S. Pat. No. 11,062,482, which claims priority under 35 U.S.C. § 119(e) from earlier filed U.S. Provisional Application Ser. No. 62/407,108 filed on Oct. 12, 2016 and incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62407108 | Oct 2016 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17214597 | Mar 2021 | US |
Child | 17984186 | US | |
Parent | 15782107 | Oct 2017 | US |
Child | 17214597 | US |