The present invention relates to the field of methods and device for rendering a 3D video from a server to a client. In typical applications the 3D video is rendered to a user by means of a 2D display associated with the client device. The 2D image rendered on the display can correspond to a virtual viewing point of the user in a virtual 3D scene model.
There is an increasing set of applications that require high-end 3D rendering (including games and user interfaces). Existing solutions for 3D rendering are based on different principles. In 3D games or applications, the environment is described as a scene, containing objects in a 3D space. These objects are typically defined as facet structures which comprise for instance triangular facets. These facets are also provided with a predetermined “texture”, either by means of a very simple function (a static color or gradient), or by means of a picture (e.g. a jpeg file) or through more complex expressions of the surface physical behavior (e.g. the so-called bidirectional reflectance distributions BRDF as described by “Interactive Rendering with Arbitrary BRDFs using Separable Approximations”, Jan Kautz and Michael D. McCool, in the proceedings of Eurographics Rendering Workshop 1999).
Next to the object, light sources are defined with specific characteristics (color, diffusion model, etc.).
Current deployments run both the 3D processing and 2D processing either in a server node and output the 2D to a video stream towards a client for rendering on a 2D display associated to the client), or in the home (playstation, PC, . . . ) (see
It is an object of the present invention to provide a method according to claim 1, a server according to claim 6, and a routing means according to claim 10, which solve at least one of the above problems.
According to a first aspect of the present invention, a method is disclosed for transmitting a 3D representation of a 3D scene model, the 3D scene model being information defining geometry and material information for a set of 3D objects and light sources and being stored in a server, from the server to a first client device, over a data communication network, wherein the 3D representation corresponds to a virtual viewing point for a first user associated with the first client device, the method comprising:
wherein the representation information further comprises illumination information for the 3D objects, and wherein illumination information for the 3D object corresponds to more than one virtual viewing point for a user.
Geometry information is information relating to the (possibly changing) geometry of objects within the 3D scene model.
Material information is information defining the optical characteristics of the (virtual) materials which are defining the 3D objects. Material information may comprise information which describes the interaction of the respective 3D objects with light. Material information may comprise at least one of colour, light absorption, diffusion, refraction and reflection parameters of objects or parts of objects. Material information may comprise a plurality of or all of these parameters.
Light source information can comprise the number of light sources, the virtual position of the light source in the scene, type of light source, intensity of emitted light, etc.
According to embodiments of the present invention, the illumination information which is part of the representation information is compressed within said compressed representation information.
According to preferred embodiments, compression comprises at least one of temporal and spatial compression. Temporal and spatial compression on themselves are known to the skilled person, and can be found in for instance the area of 3D mesh representations (as described in “Spatially and temporally scalable compression of animated 3D meshes with MPEG-4/FAMC”, N. Stefanoski et al., In proceedings of 15th IEEE International Conference on Image Processing, ICIP, 2008).
According to preferred embodiments, the method further comprises deriving the illumination information by photon mapping techniques.
The illumination information is, according to embodiments of the present invention, derived by applying photon mapping mechanisms (as for instance described in “Global Illumination Using Photon Maps”, Henrik Wann Jensen, In Proceedings of the Seventh Eurographics Workshop on Rendering) to create colour information related to the illumination of the scene. This includes calculating the effects of the light sources on the scene in terms of reflection, refraction, shadow and diffusion, resulting in colour information for selected points in the scene. This information on the colour is than stored in one or multiple summarizing photon tables, and these tables are subsequently compressed, preferably both spatially (within the scene representation at a specific point in time) and temporally (referencing to the scene representation at different points in time).
According to preferred embodiments, the representation information comprises 3D facet structure information of the objects and the illumination information comprises respective illumination information for the facets.
The illumination information can be derived by applying ray tracing mechanisms as for instance described in “Advanced animation and rendering techniques” chapters 7-9, A Watt, ISBN 978-0201544121, to transform the scene information into colour information for each of the facets. This includes calculating the effects of the light sources on the scene in terms of reflection, refraction, shadow and diffusion, resulting in colour information for each of the facets. This information on the colour is than compressed preferably spatially (within the scene representation at a specific point in time) and temporally (referencing to the scene representation at different points in time).
According to preferred embodiments of the present invention, the representation information is used further on in the data communication network, in order to provide a solution for reducing the number of necessary data flows between a server and a number of client devices being served by the server. The data communication network comprises therefore at least an intermediate node in which the compressed representation information is received from the server, and the intermediate node derives required representation information for the first user from the compressed representation information, uses the required representation information to create a 2D view on the scene for the first client device, and derives from the same compressed representation information, the required representation information for a second client device associated to a second user, the second user being associated to the same 3D scene model and having a virtual viewing point which is the same as or different from the virtual viewing point of the first user, and for which the illumination information is also representative for creating a 2D view on the scene from the virtual viewing point of the second user.
According to embodiments of the present invention the second user can be the same as the first user. The second client device and the first client device can then be the same or different.
According to embodiments of the present invention, the first and second client devices are the same, while the first and second users are not the same. Rendering views for a first user and a second user different from the first user can for instance be performed on a single display, whereby each user gets assigned half of the display area.
Each client device can comprise or can be associated to a single display means, as for instance a screen for rendering 2D images/video.
According to a second aspect of the present invention, a server is disclosed for transmitting a 3D representation of a 3D scene model to a client device, the 3D scene model being information defining geometry and material information for a set of 3D objects and light sources, over a data communication network, wherein the 3D representation corresponds to a virtual viewing point for a user associated with the client device, the server being adapted for:
wherein the server is further adapted for deriving representation information which comprises illumination information for the 3D objects, and wherein the illumination information for the 3D object corresponds to more than one virtual viewing point for a user.
According to preferred embodiments, the server is further adapted for performing at least one of temporal and spatial compression of the representation information.
According to preferred embodiments, the server is further adapted for deriving the illumination information by photon mapping techniques.
According to preferred embodiments, the server is further adapted for deriving representation information by deriving 3D facet structure information of the 3D objects and respective illumination information for the respective facets.
According to a third aspect of the present invention, a routing means, as for instance a router, is disclosed for a communication network, being adapted for receiving compressed representation information for a server according to any of the claims 6 to 9, and further being adapted for;
Further aspects of the present invention are described by the dependent claims. The features from the dependent claims, features of any of the independent claims and any features of other dependent claims may be combined as considered appropriate to the person of ordinary skill, and not only in the particular combinations as defined by the claims.
As would be recognised by the skilled person, features described for one of the aspect of the present invention can also be combined with the other aspects.
The accompanying drawings are used to illustrate embodiments of the present invention.
Reference signs are chosen such that they are the same for similar or equal elements or features in different figures or drawings.
The description of aspects of the present invention is performed by means of particular embodiments and with reference to certain drawings but the invention is not limited thereto. Depicted figures are only schematic and should not be considered as limiting.
According to embodiments of the present disclosure, an intermediate representation of the 3D scene is generated, which is the result of heavy processing to create a realistic illumination. This intermediate presentation can be used to create multiple virtual viewpoints out of it by means of a relatively simple process the client side, and can be run on devices with limited processing capabilities. This allows to run a single processing instance per application for instance at a high end server in the cloud, while the 2D view is constructed close to the user or in the home with relatively low processing equipment. This is illustrated in
As a result, the transfer of information from the 3D scene can be split in (see
The following sections will detail this process and will give two example algorithms of how the scene can be encoded.
The idea to encode a 3D animation for improved temporal and spatial compression is expressed in the existing MPEG4 standard “Frame-based Animated Mesh Compression (FAMC)”; Amendement 2 of part 16 AFX (Animation Framework eXtension). The goal of this and other similar research has been to make it easier to exchange a polygon mesh information which corresponds to moving objects in scene, and describes compression algorithms for the position of the points in the mesh and their movement over time.
However, the encoding/decoding of the scene is not the heaviest subprocess of the rendering process: the lighting process still needs to be done for each animated frame. The latter can be performed according to state of the art techniques, but requires a huge amount of processing power.
The efficiency of 3D information transport can be improved with respect to the illumination aspect in different ways. Part of the improvement is a gain due to the re-use of information for multiple viewing points. Within a scene, most of the illumination is independent of the virtual viewing point. More specifically, shadows and intra-scene reflections (i.e. reflections of one object on another) are re-usable across viewing points. When two users request a view on the same scene, re-use of the illumination information can avoid double processing and thus reduce the processing requirements.
A second improvement is due to gain which can be made by exploiting temporal and spatial redundancy in the illumination information, such as:
The reference encoder shown in
Some methods which can be used are described below.
Ray tracing (see
Basic ray tracing algorithms are single phase techniques, thus distributing the processing between a heavy and lighter process is impossible. However, the result of one trace process for a view as a first prediction can be used for a second view. This allows to use reference information from one stream as a first prediction for another, allowing to better spatially and temporally compress the streams than as individual 2D streams.
Next to the missing gain in processing, the basic ray tracing is also limited in the representation of certain effects. Some interesting recent approaches use a dual phase: first the impact of the lights on the scene are simulated, after which the ray tracing process uses this information as a better estimate of the colour of a surface. In one popular algorithm “photon mapping”, the impact of the light of a scene is for instance calculated by simulating a large number of photons that traverse the scene until they get absorbed by an object or leave the scene. The results of this is summarized (“gathering photons”) and stored in one or multiple so-called “photon maps” (multiple maps can be used to concentrate photons around certain more complex areas in the scene). When performing the basic ray tracing, multiple views and rays can make use of these tables to avoid having to send too many secondary streams. By putting the photon mapping and ray tracing in two different processes, the complexity can be distributed over the network. The photon maps therefore needs to be encoded.
In the description of certain embodiments according to the present invention, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of aiding in the understanding of one or more of the various inventive aspects. This is not to be interpreted as if all features of the group are necessarily present to solve a particular problem. Inventive aspects may lie in less than all features of such a group of features present in the description of a particular embodiment.
While some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by the skilled person.
Number | Date | Country | Kind |
---|---|---|---|
10306336.8 | Dec 2010 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2011/070786 | 11/23/2011 | WO | 00 | 8/12/2013 |