This application is the U.S. National Stage, under 35 U.S.C. § 371, of International Application No. PCT/EP2019/066577 filed Jun. 24, 2019, which claims the benefit of European Application No. EP18305827.0 filed Jun. 28, 2018, the contents of which are incorporated herein by reference.
The present disclosure relates generally to the streaming of a tiles-based immersive videos (such as spherical videos, so called Virtual Reality (VR) 360° videos, or panoramic videos) to an end device through a delivery network.
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present disclosure that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Spherical videos offer an immersive experience wherein a user can look around using a VR head-mounted display (HMD) or can navigate freely within a scene on a flat display by controlling the viewport with a controlling apparatus (such as a mouse or a remote control).
Such a freedom in spatial navigation requires that the whole 360° scene is delivered to a player (embedded within the HMD or TV set) configured to extract the video portion to be visualized depending on the position of the observer's aiming point within the scene. In such a situation, a high throughput is necessary to deliver the video.
Therefore, one main issue relies on the efficient transmission of spherical videos over bandwidth constrained network with an acceptable quality of immersive experience (i.e. avoiding freeze screen, blockiness, black screen, etc.). Currently, for delivering a spherical video service in streaming, a compromise is being sought between immersive experience, resolution of video and available throughput of the content delivery network.
The majority of known solutions streaming spherical videos provides the full 360° scene to the end device, but only less than 10% of the whole scene is presented to the user. Since delivery networks have limited throughput, the video resolution is decreased to meet bandwidth constraints.
Other known solutions mitigate the degradation of the video quality by reducing the resolution of the portion of the 360° scene arranged outside of the current viewport of the end device. Nevertheless, when the viewport of the end device is moved upon user's action to a lower resolution area, the displayed video suffers from a sudden degradation.
Besides, when the targeted usage requires that the displayed video is always at the best quality, it prevents from using solutions based on a transitional degradation of resolution when the user's aiming point is varying. Consequently, the delivered video must cover a part of the scene large enough to allow the user to pan without risking a disastrous black area display due to a lack of video data. This part of the scene can for instance include the area which is currently viewed (i.e. the viewport or aiming point) and the surrounding region to prevent quality degradation when the user moves its viewport. This can be achieved by spatially tiling the scene of the immersive video with a set of tiles and temporally dividing the immersive video into a plurality of video segments defined by a plurality of tile segments, a tile covering a portion of a scene of the immersive video and a tile segment being associated with a tile of the set of tiles. One or more relevant tile segments of the immersive video (corresponding to tile(s) comprising the viewport and its surrounding) are delivered to a player.
However, while the user navigates within the scene, new tile(s) can be needed to display the viewport in order to react to eye's direction changes. The player will then request the corresponding tiles that match the current field of view.
The present disclosure has been devised with the foregoing in mind.
According to one or more embodiments, there is provided a terminal configured to receive an immersive video spatially tiled with a set of tiles, a tile covering a portion of a scene of the immersive video, comprising at least one processor configured for:
According to one or more embodiments, there is further provided a method configured to be implemented at a terminal adapted to receive an immersive video spatially tiled with a set of tiles, a tile covering a portion of a scene of the immersive video, comprising:
According to one or more embodiments, there is provided a computer program product at least one of downloadable from a communication network and recorded on a non-transitory computer readable medium readable by at least one of computer and executable by a processor, comprising program code instructions for implementing a method configured to be implemented at a terminal adapted to receive an immersive video spatially tiled with a set of tiles, a tile covering a portion of a scene of the immersive video, said method comprising:
According to one or more embodiments, there is provided a non-transitory program storage device, readable by a computer, tangibly embodying a program of instructions executable by the computer to perform a method configured to be implemented at a terminal adapted to receive an immersive video spatially tiled with a set of tiles, a tile covering a portion of a scene of the immersive video, said method comprising:
The methods according to the one or more embodiments may be implemented in software on a programmable apparatus. They may be implemented solely in hardware or in software, or in a combination thereof.
Some processes implemented by elements of the one or more embodiments may be computer implemented. Accordingly, such elements may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as “circuit”, “module” or “system”. Furthermore, such elements may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since elements can be implemented in software, some aspects of the embodiments can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like.
The one or more embodiments thus provide a computer-readable program comprising computer-executable instructions to enable a computer to perform above mentioned method.
Certain aspects commensurate in scope with the disclosed embodiments are set forth below. It should be understood that these aspects are presented merely to provide the reader with a brief summary of certain forms the one or more embodiments might take and that these aspects are not intended to limit the scope of the disclosure. Indeed, the disclosure may encompass a variety of aspects that may not be set forth below.
The disclosure will be better understood and illustrated by means of the following embodiment and execution examples, in no way limitative, with reference to the appended figures on which:
Wherever possible, the same reference numerals will be used throughout the figures to refer to the same or like parts.
The following description illustrates some embodiments. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody some aspects of the embodiments and are included within their scope.
All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the embodiments and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting embodiments, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying some aspects of the embodiments. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, read only memory (ROM) for storing software, random access memory (RAM), and nonvolatile storage.
In the claims hereof, any element expressed as a means and/or module for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
In addition, it is to be understood that the figures and descriptions of the present disclosure have been simplified to illustrate elements that are relevant for a clear understanding of the present embodiments, while eliminating, for purposes of clarity, many other elements found in typical digital multimedia content delivery methods, devices and systems. However, because such elements are well known in the art, a detailed discussion of such elements is not provided herein. Embodiments herein are directed to all such variations and modifications known to those skilled in the art.
Embodiments are depicted with regard to a streaming environment to deliver an immersive or large-field of view video (e.g. a spherical video, a panoramic video, etc.) to a client terminal through a delivery network.
As shown in the exemplary embodiment of
The client terminal 100 may wish to launch a streaming session for requesting a tile-based immersive video stored on the content server 200. The server 200 is then configured to stream segments of the tile-based immersive video to the client terminal 100, upon client's request, using a streaming protocol.
To that end, a list of tiles appearing partially or totally in a field of view of a user should be obtained in order to be requested by the client terminal 100 from the server 200.
In one embodiment, as illustrated in
To obtain the list of tiles to be requested by the client terminal, an intersection of the pyramid of vision 300 with tiles 600 covering the scene 400 is computed as described hereinafter.
As shown in the example of
As an example, the client terminal 100 is a portable media device, a mobile phone, a tablet or a laptop, a head mounted device, a TV set, a set-top box or the like. Naturally, the client terminal 100 might not comprise a complete video player, but only some sub-elements such as the ones for decoding the media content and might rely upon an external means to display the decoded content to the end user.
As shown in the embodiment of
Tiling of the Immersive Video
According to an exemplary embodiment shown in the
According to an embodiment, the server 200 (e.g. via its processor(s) 204 and/or its content generator 206) is configured to operate a tiling of the spherical video with a set of tiles in an orthogonal system R(O,x,y,z) of axes x,y,z (illustrated in
As shown in the examples of
In an embodiment, the size of the tiles can be defined large enough to allow a variation of the focusing point without being forced to obtain and decode instantaneously another tile. In particular, in the following, it is assumed that one tile delivered to the terminal 100 can cover at least the part of the scene 400 to be displayed through a viewport VP associated with the client terminal 100 requesting the immersive video as shown in
While not necessary, it is further assumed that an overlap exists between consecutive tiles of the set of tiles. In addition, while a tile of rectangular shape has been illustrated in
Definition of Surfaces Bounding Tiles
As shown in
In particular, a reference tile T0 is defined by a centroid CE0 arranged on the z axis and having the polar coordinates (0,0,1). φtile and θtile respectively define the maximum horizontal amplitude and the maximum vertical amplitude of the tile T0. (φ, θ) define the polar coordinates of a point P belonging to the mesh associated with the tile T0. In addition, the cartesian coordinates (x, y, z) of a point P of tile T0 can be defined as follows:
The minimum value zmin of all points P of the mesh of tile T0 on z axis corresponds to the minimum of cos(φ)*cos(θ), which is reached when φ=φtile and θ=θtile, so that:
zmin=cos(φtile)*cos(θtile)
As shown in
wherein λi=zmin/z, λi∈[zmin, 1].
The maximum value of coordinates (xint, yint) of a point Pint (for any ray intersecting the mesh on a point Pi (xi, yi, zi) is for instance given by:
so that, one can derive that the maximum cartesian coordinates of point Pint:
From the foregoing, it can be derived that the vertices {A0, B0, C0, D0} of the surface S0 have the following cartesian coordinates:
Once the coordinates of the surface S0 of the reference tile T0 (
={Ai,Bi,Ci,Di}=Rot(φj,θj,ψj)*{A0,B0,C0,D0}
wherein Rot(φj,θj,ψj) is a matrix product which can be defined by:
Rot(φj,θj,ψj)=Rot(z,ψi)×Rot(x,θj)×Rot(y,φj)
Sampling the Pyramid of Vision
Besides, the pyramid of vision 300 shown in
In this position, a sampling of the rays Ri defining the pyramid of vision 300 is performed to limit the number of rays Ri to N rays. The intersection of the sampled rays Ri with the base 301 of the pyramid defined N points PR_i; arranged such that they have a constant horizontal angular distance Δα and vertical angular distance Δα as illustrated in
Then, for each point PR_i of the base 301, a vector {right arrow over (OP)}l is defined and normalized to obtain a direction vector {right arrow over (D)}l. A list of N unitary direction vectors {right arrow over (D)}l is then built.
The eye's direction in the starting position defined by a vector {right arrow over (D)} coincides with z axis (0,0,1). A change of eye's direction can be defined by polar coordinates (φc, θc), wherein φc represents the horizontal angle between the eye's direction and the z axis, and θc corresponds to the vertical angle between the eye's direction and y axis. After eye's direction change (according to a rotation (φc, θc) of the initial direction vector {right arrow over (D)} of coordinate (0,0,1)), the new eye's direction vector {right arrow over (D′)} can be obtained by implementing a rotation matrix Rot(φc, θc), so that:
{right arrow over (D′)}=Rot(φc,θc)×{right arrow over (D)}
wherein Rot(φc, θc) can be defined as a product of two rotation matrices around y axis and x axis, i.e. Rot(φc, θc)=Rotx(θc)×Roty(φc).
Thus, when the eye's direction changes, a new sampling values of direction vectors {right arrow over (Dl)}, (called {right arrow over (D′l)}) with i belongs to [0, . . . , N−1] is implemented. The new coordinates of the N direction vectors {right arrow over (D′l)} can be obtained from the following equation:
{right arrow over (D′l)}=Rot(φc,θc)×{right arrow over (Dl)}
Algorithm for Selection of Tiles
For each direction {right arrow over (Dl)}′ with i belonging to [0, . . . , N−1], the client terminal 100 can check (e.g. via its processor(s) 105) if the considered direction {right arrow over (Dl)}′ (associated with a ray Ri) intercepts one or more surfaces Sj delimited by vertices {Aj,Bj,Cj,Dj} wherein j is an integer belonging to [0, . . . , M−1] (M corresponding to a number of candidate surfaces Sj which have been preselected as defined hereinafter). The determination of the interception of a direction {right arrow over (Dl)}′ with a surface Sj corresponds to solve the equation of the intersection of a vector with a plane, which can be achieved by using mathematic libraries. The number of equations to solve is then N×M.
To that end, a list of candidate tiles containing the M candidate surfaces Sj is established and a list of counters per tile is initially set to zero. Each time a direction {right arrow over (Dl)}′ hits a candidate surface Si amongst the M surfaces, the counter associated with the corresponding tile is incremented.
The intersection of the N vectors {right arrow over (Dl)}′ (the N rays Ri) with all the M surfaces Sj of the candidate list is determined. Then, the proportion of intersection of the pyramid of vision 300 with a surface Sj of the M preselected surfaces Sj is obtained from the corresponding counter (i.e. higher the value of a counter, higher the proportion of intersection).
To select (or pre-filter) M candidate tiles amongst the T tiles of the set of tiles covering the scene 400 (by assuming that the current eye's direction {right arrow over (D)} in polar coordinate is (φ, θ)), one embodiment consists in considering only tiles whose centroids altitudes are close to θ and/or whose centroids horizontal amplitude is close to φ.
It should be understood that the intersection of the pyramid of vision 300 with the M candidate tiles is determined each time a change of eye's direction is detected.
In particular, to determine one intersection of one ray Ri (associated with a vector {right arrow over (Dl)}′) of the pyramid 300 with one surface Sj, the following operations can be implemented:
It should be understood that the determination of the rays intersecting a surface Sj relies on one approximation. As shown in
It should further be noted that instead of considering a surface Sj having a rectangular shape, in another embodiment, a surface Sj having a trapezoidal shape (e.g. formed by two or more contiguous trapezia) is used. In particular, as shown in the example of
Algorithm for Asking Tiles to the Server
Once the intersections have been determined for the selected M tiles, an intersection list of tiles comprising the tiles j among the M selected tiles for which the number of hitting rays is positive (i.e. the corresponding counter is different from 0) is established. In the intersection list, the tiles Tj are listed by decreasing order of their counter value.
In one embodiment, the client terminal can request all the tiles of the intersection list. In another embodiment, the number of tiles which can be requested to the server can take into account the traffic size Sizej (i.e. corresponding to the bandwidth requirement to transmit a tile segment associated with a considered tile between server and client, and wherein Size, can depend on the size of the tiles) and the maximum available bandwidth on the network N. In that case, the tiles are selected by considering the intersection list (tiles are considered by decreasing order) and their traffic size Simj, so that the sum of Sizej of the selected tiles is less than maximum available bandwidth on the network. In a further variant, when tiles have a same counter value (e.g. a same number of rays hits the tiles), the selection of tiles can depend on the distance between the center I of the base 301 of the pyramid of vision 300 and the centroids CEj of the tiles Tj the selection being performed by considering first the tile with the smallest distance.
As shown in
In particular, in a step 701, the client terminal 100 can determine a surface S0 of a reference tile T0 and then the surfaces of the remaining tiles 600 by applying a rotation matrix to the reference surface S0.
In a step 702, the client terminal 100 can perform a sampling of the rays forming the pyramid of vision 300 by selecting rays Ri (i∈[0, . . . , N−1] so that the intersections of the sampled rays Ri with the base 301 of the pyramid 300 define N points PR_i presenting a constant angular distance Δα horizontally and vertically with each other.
In a step 703, the client terminal 100 can associate a direction vector {right arrow over (Dl)} to each sampled ray Ri. When a change of user's eyes direction is detected (i.e. the orientation of the pyramid of vision is modified), new N direction vectors {right arrow over (Dl)}′ associated with the corresponding sampled rays Ri, of the newly oriented pyramid of vision 300 are obtained.
In a step 704, the client terminal 100 can operate a prefiltering of M tiles amongst the T tiles of the set of tiles.
In a step 705, the client terminal 100 can determine, for each of the M surfaces Sj of the candidate tiles Tj, the proportion of their intersection with the pyramid of vision 300 (i.e. for each prefiltered tile Tj, the number of rays Ri hitting the corresponding surface Sj).
In a step 706, the client terminal 100 can sort the prefiltered tiles Tj in descending order of their proportion of intersection with pyramid of vision 300 (i.e. the tile having the most important intersection with the pyramid of vision first).
In a step 707, the client terminal 100 can select one or more prefiltered tiles Tj, so as to comply with a bandwidth criterium (e.g. the sum of the traffic sizes associated with the selected prefiltered tiles is less or equal to the maximum of the available bandwidth of the network N).
In a step 708, the client terminal 100 can request the one or more selected tiles to the content server 200.
It should be understood that, when a change of user's eyes direction is detected, steps 703 to 708 can be repeated.
At least some of the described embodiments make the selection of tiles much more robust and accurate, notably by triggering downloading of tiles sooner than other approaches, as soon as an “edge” of a tile appears in the field of view. It allows also to download in priority tiles contributing more to the field of view (i.e. tiles having a higher zone of intersection with the pyramid of vision). It should also be noted that some computations are massively parallelizable on GPU, making it efficient and fast. Indeed, some of the described embodiments can be implemented in hardware or software, using CPU or a GPU. As N×M equations—totally independent with each other—are computing, it should be appreciated that the selection step can be fully parallelizable, for instance on a GPU (e.g. each equation can then be solved by a single GPU thread).
References disclosed in the description, the claims and the drawings may be provided independently or in any appropriate combination. Features may, where appropriate, be implemented in hardware, software, or a combination of the two.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one implementation of the method and device described. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments.
Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
Although certain embodiments only of the disclosure have been described herein, it will be understood by any person skilled in the art that other modifications, variations, and possibilities of the disclosure are possible. Such modifications, variations and possibilities are therefore to be considered as falling within the spirit and scope of the disclosure and hence forming part of the disclosure as herein described and/or exemplified.
The flowchart and/or block diagrams in the Figures illustrate the configuration, operation and functionality of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, or blocks may be executed in an alternative order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of the blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. While not explicitly described, the present embodiments may be employed in any combination or sub-combination.
Number | Date | Country | Kind |
---|---|---|---|
18305827 | Jun 2018 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/066577 | 6/24/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/002180 | 1/2/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20170070835 | Silva | Mar 2017 | A1 |
20170154129 | Fainberg | Jun 2017 | A1 |
20180096494 | Zhou | Apr 2018 | A1 |
20180160160 | Swaminathan | Jun 2018 | A1 |
20190238861 | D'Acunto | Aug 2019 | A1 |
20190387224 | Phillips | Dec 2019 | A1 |
20200177916 | Niamut | Jun 2020 | A1 |
Number | Date | Country |
---|---|---|
2017202899 | Nov 2017 | WO |
2018069466 | Apr 2018 | WO |
Entry |
---|
Heymann et al., “Representation, Coding and Interactive Rendering of High-Resolution Panoramic Images and Video using MPEG-4,” Proceedings of Panoramic Photogrammetry Workshop (PPW) (2005). |
Rondao Alface et al., “Interactive Omnidirectional Video Delivery: A Bandwidth-Effective Approach,” Bell Labs Technical Journal 16, pp. 135-147 (Mar. 2012). |
Number | Date | Country | |
---|---|---|---|
20210266512 A1 | Aug 2021 | US |