The present disclosure relates to video processing technology, and more particularly, to a method and a system for providing a viewport division scheme for virtual reality (VR) video streaming BACKGROUND OF THE DISCLOSURE
Virtual Reality (VR) is a computer simulation technology for creating and experiencing a virtual world. For example, a three-dimensional real-time image can be presented based on a technology which tracks a user's head, eyes or hands. In the network-based virtual reality technology, full-view video data can be pre-stored on a server, and then transmitted to a display device. A display device can be glasses, a head-mounted display, etc. A video is displayed on the display device according to a user's field of view.
However, a high-resolution video data occupies large transmission of bandwidth and requires high computing power from the display devices. Presenting high-resolution VR video on internet is difficult. Precisely, the existing video streaming technology can not fulfill the virtual reality. Therefore, in order to present VR video smoothly in real-time, it is desirable to further improve the existing video streaming technology to save bandwidth and reduce performance requirements for display devices, by a new way to encode and store the VR video data on the server.
It is a problem in the art to play back video data on a player end with a better viewing experience through less video data transmission.
In view of this, the present disclosure provides a method and a system for providing a viewport division scheme for virtual reality (VR) video streaming, which determine a viewport block of the VR video data and process the VR video data by different methods for a viewport area and a non-viewport area so as to have a better viewing experience at a player end through less data transmission.
According to one aspect of the present disclosure, there is provided a method for providing a viewport division scheme for VR video streaming, comprising:
obtaining a projection area of VR video;
dividing the projection area into a plurality of grid blocks;
defining a core area in the projection area;
dividing the core area into a plurality of core blocks;
combining the core blocks to provide a plurality of viewport blocks; and
establishing a mapping relation between the plurality of viewport blocks and the plurality of grid blocks, wherein the plurality of viewport blocks are related to a user's field of view.
Preferably, the step of obtaining a projection area of VR video comprises:
projecting the VR video into a rectangle with the aspect ratio of 2:1 by equivalent rectangle matrix to obtain the projection area.
Preferably, the step of dividing the projection area into a plurality of grid blocks comprises:
dividing the projection area into a plurality of equal portions; and
dividing each of the plurality of equal portions into the same number of grid blocks.
Preferably, the method further comprises creating identifications for the plurality of grid blocks, the plurality of core blocks and the viewport, and establishing a mapping relationship between the viewport and the plurality of grid blocks by the identifications.
Preferably, the step of creating identifications for the plurality of grid blocks, the plurality of core blocks and the viewport, and establishing a mapping relationship between the viewport and the plurality of grid blocks by the identifications comprises:
setting the identificationidblock of a grid block as a positive integer started from zero, so as to satisfy Equation (1)
idblockϵ[0,n−1],idblockϵN+ (1);
setting an edge length of the grid block to satisfy Equation (2),
where heightvideo represents an image height, and N+ represents a positive integer;
setting the core area to be a set of grid blocks Setcore which satisfy Equation (3),
defining a core block to have a width equal to that of the plurality of grid blocks and a height equal to that of the core area, and setting the identification idcore_block of the core block, so as to satisfy Equation (4),
defining a viewport block in the core area to have a width determined by a user's field of view and a height equal to that of the core area, where a width widthviewport_block of the viewport block satisfies Equation (5),
where fov represents the user's field of view;
defining the number of the core blocks in the viewport block to be i, which satisfies equation (6),
for any identification of the viewport block, the identifications of the core blocks in the viewport block can be obtained according to Equation (7),
for any identification of the core block, the identification of the viewport block including the core block can be obtained uniquely according to Equation (8),
idviewport_block=f(idcore_block)=idcore_block_1st (8),
where idcore_block_1st represents the identification of the first core block that is included in the viewport block.
Preferably, the step of dividing the core area into the plurality of core blocks comprises:
dividing the core area into the plurality of core blocks with equal areas.
Preferably, the plurality core blocks each have a height equal to of the central and the core area has an area equal to integral times of the area of a grid block.
Preferably, the method further comprises dividing the VR video data into a plurality of sub-data according to the plurality of grid blocks, wherein each sub-data corresponds to one of the grid blocks, and determining a mapping relationship between the plurality of sub-data and the viewport according to the mapping relationship between the plurality of grid blocks and the viewport.
Preferably, the method further comprises creating a plurality of viewport blocks according to a user's field of view.
According to another aspect of the present disclosure, there is provided a system for providing a viewport for VR video streaming, comprising:
a configuration module, configured to obtain a projection area of VR video;
a first division module, configured to divide the projection area into a plurality of grid blocks;
a second division module, configured to define a core area in the projection area;
a third division module, configured to divide the core area into a plurality of core blocks;
a combination module, configured to combine the core blocks to provide a plurality of viewport blocks;
a mapping module, configured to establish a mapping relationship between the viewport and the plurality of grid blocks, wherein the plurality of viewport blocks are related to a user's field of view.
Preferably, the configuration module
projects the VR video into a rectangle with the aspect ratio of 2:1 by equivalent rectangle matrix to obtain the projection area.
In this embodiment, the VR source video data is divided into a plurality of grid blocks, and a mapping table between the viewport blocks and the grid blocks is established according to a specific field of view. When the user's field of view is changed, the viewport block is retrieved according to the mapping table, and the video data of a viewport area and a non-viewport area are processed by different methods to have a better view experience.
The above and other objects, advantages and features of the present disclosure will become more fully understood from the detailed description given hereinbelow in connection reference with the appended drawings, and wherein:
Exemplary embodiments of the present disclosure will be described in more details below with reference to the accompanying drawings. In the drawings, like reference numerals denote like members. The figures are not drawn to scale, for the sake of clarity. Moreover, some well known parts may not be shown.
In the above embodiment, the VR device 130 is a stand-alone head-mounted device. However, those skilled in the art should understand that the VR device 130 is not limited thereto, and the VR device 130 may also be an all-in-one head-mounted device. The all-in-one head-mounted device itself has a display screen, so that it is not necessary to connect the all-in-one head-mounted device with the external display device. For example, in this example, if the all-in-one head-mounted device is used as the VR device, the display device 120 may be omitted. At this point, the all-in-one head-mounted device is configured to obtain video data from the server 100 and to perform playback operation, and the all-in-one head-mounted device is also configured to detect user's field of view and to adjust the playback operation according to the user's field of view.
In step S10, the server performs source video data processing.
In step S20, display device side obtains some related information by interacting with the VR device.
In step S30, according to the related information, the display device side requests the server side to provide the video data and receives the video data.
In step S40, the display device side renders the received video data.
Wherein, the step S10 is used to process the video data stored on the server side. In this step, the video data can be divided into those in a viewport are and those in a non-viewport area, and then can be processed by different methods to reduce data transmission for the non-viewport. Therefore, the present disclosure further provides a method for providing a viewport division scheme to achieve the above functions.
In step S100, a projection area of VR video is obtained.
Although the VR video data is captured in a 360-degree panoramic view of the real-world environment, a projection process may be included in a VR video production process and a rendering process to change the VR video data to two-dimensional spatial data. The original video data may be subjected to the projection process, for example, an equivalent rectangle matrix process with the aspect ratio of 2:1, to obtain a projection area. In a case that the VR video data has been subjected to the projection process, the projection area can be obtained by decoding the VR video data. Two main VR video formats include the video data with the aspect ratio of 1:1 and the aspect ratio of 2:1, which are well-known in the industry as film sources. In some cases, the server needs to perform a normalization process in view of the user's requirements. For example, the video data with the aspect ratio of 1:1 is processed into the video data with the aspect ratio of 2:1 which is suitable for the user's VR device. In the equivalent rectangle matrix projection, for example, in an equally-spaced cylindrical projection, the meridians can be rendered to be vertical straight lines having a constant pitch, or the latitude circles can be rendered to be horizontal straight lines having a constant pitch. It should be noted that the VR video data here refers to original data and/or intermediate data in any VR processing.
In step S200, a core area of the projection area is divided.
As described above, the projection area of the VR video data is obtained in the previous step S100. In this step, the core area of the projection area is divided. The data after being projected is generally distorted. For example, a full-view image may be rendered by an equivalent rectangle matrix projection process to obtain an area with the aspect ratio of 2:1. According to stretching characteristics of the equivalent rectangle matrix projection, when an image is close to a pole, the image will be stretched and distorted. The degree of pixel distortion will be higher. Therefore, according to stretching characteristics of the equivalent rectangle matrix projection, a central portion will be defined at the pole.
In step S300, a block size is defined according to the parameter N.
The parameter N is a given parameter. In this step, an area of each grid block is obtained according to the N value. For example, assuming that the normalized video data has an area S and N is 32, each grid block has a block size equal to S/32.
In step S400, the core area is divided into grid blocks.
After the core area is determined, the core area can be divided into M core blocks based on a given value M. The core blocks may be numbered for identifying each core block. Generally, the number starts from 0.
In step S500, the projection area is divided into grid blocks.
In this step, the projection area is divided into a plurality of equal portions, each of which corresponds to a grid block. It should be understood that the grid block here is a portion of the VR video data. After that, each grid block is numbered. Generally, the number starts from 0.
In step S600, the core blocks are combined into a viewport block.
The viewport block contains one or more core blocks. When a field of view of the VR device is changed, the viewport block is also changed. Accordingly, the core blocks in the viewport block are also changed. Therefore, the core blocks may be combined into the viewport block according to the user's field of view.
In step S700, a mapping table between the viewport blocks and the grid blocks is established.
Based on the viewport block of each VR video data, which is obtained in step S600, the mapping table between the viewport blocks and the grid blocks is established. In this embodiment, the VR source video data is divided into a plurality of grid blocks, and a viewport block is defined according to a specific field of field of view (fov) to establish the mapping table between the viewport blocks and the grid blocks. When the user's field of view is changed, the viewport block is changed and obtained according to the mapping table. The video data is processed by different methods for the viewport area and for the non-viewport area, so as to have a better viewing experience. Further, the user's field of view is proportional to the area of the viewport block. Many viewport blocks can be defined for many fields of views (fov). In an alternative embodiment, the grid blocks, the core blocks and the viewport blocks have identifications, and a mapping relationship between the viewport blocks and the grid blocks is established with the identifications.
It should be noted that in the above embodiments, the video data is not actually segmented. Therefore, in an optional embodiment of the present disclosure, a segmentation step may also be included in which the video data is divided into a plurality of sub-data according to a plurality of grid blocks which are obtained according to the above embodiment. Each sub-data corresponds to a grid block. According to the above mapping relationship between the viewport blocks and the grid blocks, one can determine the mapping relationship between the plurality of sub-data and the plurality of viewport blocks.
Under some limitations, mathematical equations can be established based on the identifications of the grid blocks, the core blocks and the viewport blocks, so that the identification of the viewport block can be determined according to the identifications of the grid blocks and/or the core blocks, or the identifications of the core blocks can be determined according to the identification of the viewport block. Details will be described in the following paragraphs.
Assuming that an image of the video data has the aspect ratio of 2:1 and is divided into n grid units with the aspect ratio of 1:1, and each grid unit is a grid block and has the identification idblock as follows,
idblockϵ[0,n−1],idblockϵN+ (1);
An edge length of the grid block satisfies Equation (1).
where heightvideo represents an image height, and N+ represents a positive integer;
According characteristics of the equivalent rectangle matrix projection, the core area is defined as a set of grid blocks Setcore which satisfy Equation (3),
A core block is defined to have a width equal to that of the grid blocks and a height equal to that of the core are. The identification idcore_block of the core block satisfies Equation (4),
A viewport block is defined as having blocks in the core area, which are determined by a user's field of view fov and have a width equal to that of the core area. A width widthviewport_block of the viewport block satisfies Equation (5),
The number of the core blocks in the viewport block is defined as i, which satisfies equation (6),
For any identification of the viewport block, the identifications of the core blocks in the viewport block can be obtained according to Equation (7),
For any identification of the core block, the identification of the viewport block including the core block, which represents the identification of the first core block, can be obtained uniquely according to Equation (8), The identification satisfies Equation (8),
idviewport_block=f(idcore_block)=idcore_block_1st (8),
where idcore_block_1st represents the identification of the first core block that is included in the viewport block.
In this example, 400 represents a projection area, 401 represents a core area, and N is equal to 32. That is to say, the projection area is divided into 32 grid blocks, which are numbered as 0-31. The core blocks are numbered as 0-7. Assuming that one viewport block (not shown) contains two core blocks which are numbered as 1 and 2, the viewport block contains those grid blocks being numbered as 9-10 and 17-18. Thus, it is possible to establish a mapping relationship between the viewport blocks and the grid blocks. In this case, the identifications of the grid blocks, the core blocks and the viewport blocks may be numbered and be verified by the above equations.
It should be understood that this example is given only for the mapping relationship among the grid blocks, the core blocks and the viewport blocks under ideal conditions. That is to say, in this example, the core blocks and the grid blocks and the viewport blocks overlap each other. However, the present disclosure is not limited to this and can be applied in a case that the grid blocks, the core blocks and the viewport blocks partially overlap each other.
The system includes a configuration module 501, a first division module 503, a second division module 502, a third division module 504, a combination module 505, and a mapping module 506.
The configuration module 501 is configured to obtain a projection area of VR video. In a case that the VR video data has been processed into two-dimensional data, the VR video data can decoded to obtain the projection area of the video data from a parameter. In another case that the video data is not processed into two-dimensional data, the projection area can be obtained by projecting the VR video into a rectangle by an equivalent rectangle matrix projection process. Two main VR video formats include the video data with the aspect ratio of 1:1 and the aspect ratio of 2:1, which are well-known in the industry as film sources. The original VR video format may be normalized to have the aspect ratio of 2:1, and then further processed.
The first division module 503 is configured to divide the projection area into N grid blocks. For example, the projection area is divided into 32 equal portions. As a result, the 32 grid blocks each are a portion of the projection area with equal size.
The second division module 502 is configured to divide a core area of the projection area according to the user's field of view. The core area is defined in the projection area. For example, for a rectangle, a central portion in the X-axis direction or the Y-axis direction is defined as the core area.
The third division module 504 is configured to divide the core area into a plurality of core blocks. The core area is divided into a plurality of grid units, generally having equal area, but is not limited thereto. For example, if an area of an edge unit is insufficient, the area of the edge unit will be smaller.
The combination module 505 is configured to combine the core blocks into a plurality of viewport blocks. The core blocks are combined according to a predetermined rule. For example, two adjacent core blocks are combined into one viewport block. In another example, a core block at a central point and those core blocks surrounding the central point are combined into one viewport block.
The mapping module 506 is configured to establish a mapping relationship between the viewport blocks and the grid blocks. Each viewport block contains a plurality of core blocks. Each core block may are mixed with one or more grid blocks. Correspondingly, one or more grid blocks may be retrieved according to the core blocks.
By establishing the mapping relationship between the viewport block and the grid blocks, the system can locate the viewport block and improve viewing experience effect by optimizing the video data of the viewport block.
Preferably, the above system for providing a viewport for VR video streaming may further comprise a segmentation module (not shown). The segmentation module divides the video data into a plurality of sub-data according to the plurality of grid blocks. Each sub-data corresponds to a grid. The mapping relationship between the plurality of sub-data and the plurality of viewport blocks can be determined by the mapping relationship between the plurality of viewport blocks and the plurality of grid blocks.
The configuration module projects the VR video into a rectangle with the aspect ratio of 2:1 by equivalent rectangle matrix to obtain the projection area.
The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical applications, to thereby enable others skilled in the art to best utilize the disclosure. The disclosure is intended to cover alternatives, modifications and equivalents that may be included within the spirit and scope of the disclosure as defined by the appended claims.
The foregoing descriptions of specific embodiments of the present disclosure have been presented, but are not intended to limit the disclosure to the precise forms disclosed. It will be readily apparent to one skilled in the art that many modifications and changes may be made in the present disclosure. Any modifications, equivalence, variations of the preferred embodiments can be made without departing from the doctrine and spirit of the present disclosure.
This application claims the priority and benefit of U.S. provisional application 62/441,936, filed on Jan. 3, 2017, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62441936 | Jan 2017 | US |