Digital stereo viewing of still and moving images has become commonplace, and equipment for viewing 3D (three-dimensional) movies is more widely available. Theatres are offering 3D movies based on viewing the movie with special glasses that ensure the viewing of different images for the left and right eye for each frame of the movie. The same approach has been brought to home use with 3D-capable players and television sets. In practice, the movie consists of two views to the same scene, one for the left eye and one for the right eye. These views have been created by capturing the movie with a special stereo camera that directly creates this content suitable for stereo viewing. When the views are presented to the two eyes, the human visual system creates a 3D view of the scene. This technology has the drawback that the viewing area (movie screen or television) only occupies part of the field of vision, and thus the experience of 3D view is limited.
For a more realistic experience, devices occupying a larger area of the total field of view have been created. There are available special stereo viewing goggles that are meant to be worn on the head so that they cover the eyes and display pictures for the left and right eye with a small screen and lens arrangement. Such technology has also the advantage that it can be used in a small space, and even while on the move, compared to fairly large TV sets commonly used for 3D viewing. For gaming purposes, there are games that are compatible with such stereo glasses, and are able to create the two images required for stereo viewing of the artificial game world, thus creating a 3D view of the internal model of the game scene. The different pictures are rendered in real time from the model, and therefore this approach requires computing power especially if the game's scene model is complex and very detailed and contains a lot of objects. This synthetic model based approach is not applicable for real-world video playback.
There is, therefore, a need for alternative solutions that enable stereo recording and playback, that is, capturing and viewing of 3D images such as 3D video.
Now there has been invented an improved method and technical equipment implementing the method, by which the above problems are alleviated. Various aspects of the invention include a method, an apparatus, a server, a renderer, a data structure and a computer readable medium comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various embodiments of the invention are disclosed in the dependent claims.
The invention relates to forming a scene model and determining a first group of scene points, the first group of scene points being visible from a rendering viewpoint, determining a second group of scene points, the second group of scene points being at least partially obscured by the first group of scene points viewed from the rendering viewpoint, forming a first render layer using the first group of scene points and a second render layer using the second group of scene points, and providing the first and second render layers for rendering a stereo image. The invention also relates to receiving a first render layer and a second render layer comprising pixels, the first render layer comprising pixels corresponding to first parts of a scene viewed from a rendering viewpoint and the second render layer comprising pixels corresponding to second parts of the scene viewed from the rendering viewpoint, wherein the second parts of the scene are obscured by the first parts viewed from the rendering viewpoint, placing pixels of the first render layer and pixels of the second render layer in a rendering space, associating a depth value with the pixels, and rendering a stereo image using said pixels and said depth values. The first render layer therefore comprises pixels that represent those parts of the scene that is directly visible from a viewpoint and have e.g. been captured by a first camera. The second render layer and further render layers comprise pixels that represent those parts of the scene that is obscured behind one or more objects. The data for the further render layers may have been captured by further cameras placed in different locations from the first camera.
According to a first aspect, there is provided a method, comprising forming a scene model using first image data from a first source image and second image data from a second source image, said scene model comprising scene points, each scene point having a location in a coordinate space of said scene, determining a first group of scene points, said first group of scene points being visible from a viewing point, said viewing point having a location in said coordinate space of said scene, determining a second group of scene points, said second group of scene points being at least partially obscured by said first group of scene points viewed from said viewing point, forming a first render layer using said first group of scene points and a second render layer using said second group of scene points, said first and second render layer comprising pixels, and providing said first and second render layers for rendering a stereo image.
According to an embodiment, the method comprises determining a third group of scene points, said third group of scene points being at least partially obstructed by said second group of scene points viewed from said viewing point, forming a third render layer using said third group of scene points, said third render layer comprising pixels, and providing said third render layer for rendering a stereo image. According to an embodiment, said second render layer is a sparse layer comprising active pixels corresponding to scene points at least partially obstructed by said first group of scene points. According to an embodiment, the method comprises forming dummy pixels in said second render layer, said dummy pixels not corresponding to scene points, and- encoding said second render layer into a data structure using an image encoder. According to an embodiment, the method comprises encoding said render layers into one or more encoded data structures using an image encoder. According to an embodiment, forming said scene model comprises determining a three-dimensional location for said scene points by utilizing depth information for said source images. According to an embodiment, forming said scene model comprises using camera position of said source images and comparing image contents of said source images. According to an embodiment, the method comprises forming one or more of said render layers to a two-dimensional image data structure, said image data structure comprising render layer pixels. According to an embodiment, render layer pixels comprise color values and a transparency value such as an alpha value. According to an embodiment, the method comprises forming data of at least two of said render layers into a collated image data structure, said collated image data structure comprising at least two segments, each segment corresponding to a respective render layer.
According to a second aspect, there is provided a method comprising receiving a first render layer and a second render layer, said first and second render layer comprising pixels, said first render layer comprising pixels corresponding to first parts of a scene viewed from a rendering viewpoint and said second render layer comprising pixels corresponding to second parts of said scene viewed from said rendering viewpoint, wherein said second parts of said scene are obscured by said first parts viewed from said rendering viewpoint, placing pixels of said first render layer and pixels of said second render layer in a rendering space, associating a depth value with said pixels, and rendering a left eye image and a right eye image using said pixels and said depth values.
According to an embodiment, said pixels of said first render layer and said second render layer comprise colour values and at least pixels of said first render layer comprise transparency values such as alpha values for rendering transparency of at least pixels of said first render layer. According to an embodiment, the method comprises determining whether a render layer to be rendered comprises semitransparent pixels, and in case said determining indicates a render layer comprises semitransparent pixels, enabling alpha blending in rendering of said render layer, otherwise disabling alpha blending in rendering said render layer. According to an embodiment, the method comprises receiving said first render layer and said second render layer from a data structure comprising pixel values as a two-dimensional image, determining colour values for said pixels of said first and second render layers by using texture mapping. According to an embodiment, the method comprises receiving said first render layer and said second render layer from a data structure comprising pixel values as a two-dimensional image, and determining depth values for said pixels of said first and second render layers by using texture mapping, said depth values indicating a distance from a rendering viewpoint. According to an embodiment, the method comprises receiving said first render layer and said second render layer from a data structure comprising pixel values as a two-dimensional image, and determining viewing angle values for said pixels of said first and second render layers by using texture mapping.
According to a third aspect, there is provided an apparatus for carrying out the method according to the first aspect and/or its embodiments.
According to a fourth aspect, there is provided an apparatus for carrying out the method according to the second aspect and/or its embodiments.
According to a fifth aspect, there is provided a system for carrying out the method according to the first aspect and/or its embodiments.
According to a sixth aspect, there is provided a system for carrying out the method according to the second aspect and/or its embodiments.
According to a seventh aspect, there is provided a computer program product for carrying out the method according to the first aspect and/or its embodiments.
According to a sixth aspect, there is provided a computer program product for carrying out the method according to the second aspect and/or its embodiments.
In the following, various embodiments of the invention will be described in more detail with reference to the appended drawings, in which
a,
1
b,
1
c and 1d show a setup for forming a stereo image to a user;
In the following, several embodiments of the invention will be described in the context of stereo viewing with 3D glasses. It is to be noted, however, that the invention is not limited to any specific display technology. In fact, the different embodiments have applications in any environment where stereo viewing is required, for example movies and television. Additionally, while the description may use a camera setup as an example of an image source, different camera setups and image source arrangements can be used. It needs to be understood that the features of various embodiments may appear alone or in combination. Thus, even though the different features and embodiments have been described one by one, their combination has inherently also been disclosed herein.
In the setup of
In
In
The system of
As explained above, a single camera device may comprise a plurality of cameras and/or a plurality of microphones. A plurality of camera devices placed at different locations may also be used, where single camera device may comprise one or more cameras. The camera devices and their cameras may in this manner be able to capture image data of the objects in the scene in a more comprehensive manner than a single camera device. For example, if there is a second object hidden behind a first object when the objects are viewed from a certain viewpoint of a first camera device or a first camera, the second object may be visible from another viewpoint of a second camera device or a second camera. Thus, image data of the second object may be gathered e.g. for producing a 3D view where a part of the second object is partially visible from behind the first object to one eye but not the other. To produce unified picture data from two or more cameras, the picture data from different cameras needs to be combined together. Also, the different objects in the scene may be determined by analyzing the data from different cameras. This may allow the determination of the three-dimensional location of objects in the scene.
Alternatively or in addition to the video capture device SRC1 creating an image stream, or a plurality of such, one or more sources SRC2 of synthetic images may be present in the system. Such sources of synthetic images may use a computer model of a virtual world to compute the various image streams it transmits. For example, the source SRC2 may compute N video streams corresponding to N virtual cameras located at a virtual viewing position. When such a synthetic set of video streams is used for viewing, the viewer may see a three-dimensional virtual world, as explained earlier for
There may be a storage, processing and data stream serving network in addition to the capture device SRC1. For example, there may be a server SERV or a plurality of servers storing the output from the capture device SRC1 or computation device SRC2. The device comprises or is functionally connected to a computer processor PROC3 and memory MEM3, the memory comprising computer program PROGR3 code for controlling the server. The server may be connected by a wired or wireless network connection, or both, to sources SRC1 and/or SRC2, as well as the viewer devices VIEWER1 and VIEWER2 over the communication interface COMM3.
For viewing the captured or created video content, there may be one or more viewer devices VIEWER1 and VIEWER2. These devices may have a rendering module and a display module, or these functionalities may be combined in a single device. The devices may comprise or be functionally connected to a computer processor PROC4 and memory MEM4, the memory comprising computer program PROGR4 code for controlling the viewing devices. The viewer (playback) devices may consist of a data stream receiver for receiving a video data stream from a server and for decoding the video data stream. The data stream may be received over a network connection through communications interface COMM4, or from a memory device MEM6 like a memory card CARD2. The viewer devices may have a graphics processing unit for processing of the data to a suitable format for viewing as described with
The system described above may function as follows. Time-synchronized video, audio and orientation data is first recorded with the cameras of one or more camera devices. This can consist of multiple concurrent video and audio streams as described above. These are then transmitted immediately or later to the storage and processing network for processing and conversion into a format suitable for subsequent delivery to playback devices. The conversion can involve post-processing steps to the audio and video data in order to improve the quality and/or reduce the quantity of the data while preserving the quality ata desired level. Finally, each playback device receives a stream of the data from the network or from a storage device, and renders it into a stereo viewing reproduction of the original location which can be experienced by a user with the head mounted display and headphones.
Image data may be captured from a real scene using multiple cameras at different locations. Pairs of cameras may be used to create estimates of depth for every point matching in both images. The point estimates are mapped into a common origin and orientation, and duplicate entries removed by comparing their colour and position values. The points are then arranged into render layers, or layers as a shorter expression, based on their order of visibility from a render viewpoint.
The top layer is typically not sparse, and contains an entry for every point of the scene viewed from the origin (the render viewpoint). Each obscured pixel is moved into a sparse subsidiary layer, with one or more sparse layers created as is necessary to store recorded data and to represent the view in sufficient detail. In addition, synthetic data can be generated into the sparse layers surrounding the recorded data in order to avoid later problems with visible holes when rendering.
The layers may be represented as two-dimensional images, the images having pixels, and the pixels having associated color and depth values. The layers may be mapped to the rendering space via a coordinate transformation and e.g. by using texture operations of a graphics processor to interpolate colour and depth values of the pixels.
Each moment in time may be encoded with a new set of layers and mapping parameters, to allow time-based playback of changes in the 3D environment. In each frame, new layer data and mapping metadata is taken into use for each new frame. Alternatively, time/based playback can be paused and a single frame can be used and rendered from different positions.
Alternatively, synthetic video sources in a virtual reality model may be used for creating images for stereo viewing. One or more virtual camera devices, possibly comprising a plurality of cameras, are positioned in the virtual world of the movie.The action taking place may be captured by the computer into video streams corresponding to the virtual cameras of the virtual camera device (corresponding to so-called multi-view video where a user may switch viewpoints). Alternatively, a single camera location may be used as the viewing point. In other words, the content delivered to a player may be generated synthetically in the same way as for a conventional 3D film, however including multiple camera views (more than 2), and multiple audio streams allowing a realistic audio signal to be created for each viewer orientation. In practical terms, the internal three-dimensional (moving) model of the virtual world is used to compute the image source images. Rendering the different objects result in an image captured by a camera, and the computations are carried out for each camera (one or more cameras). The virtual cameras do not obstruct each other in the same manner as real cameras, because virtual cameras can be made invisible in the virtual world. The image data for the render layers may be generated from a complex synthetic model (such as a CGI film content model) using processing by a graphics processor or a general purpose processor to render the world from a single viewpoint into the layer format, with an predetermined number of obscured pixels (a predetermined number of obscured pixel layers) being stored in subsidiary layers.
At least 3 overlapping images are needed in order to estimate the position of some objects which are obscured in just one of the images by another object. This then gives 2 layers of information (first objects visible from the render viewpoint and objects hidden behind the first objects). For objects which are obscured in all but one image, rough position estimates can be made by extrapolating from the position of nearby similar known objects.
Multiple images may be captured at different times from different positions by the same camera. In this case the camera position will need to be measured using another sensor, or using information about the change in position of reference objects in the scene. In this case, objects in the scene should be static.
Alternatively, multiple images can be captured using multiple cameras simultaneously in time, each with a known or pre-calibrated relative position and orientation to a reference point. In this case objects in the scene, or the camera system itself, need not be static. With this approach it is possible to create sequences of layers for each moment in time matching the moments when each set of images was captured.
Another technique for creating point data for render layers is to use sensors employing a “time of flight” technique to measure the exact time taken for a pulse of light (from a laser or LED) to travel from the measuring device, off the object, and back to the measuring device. Such a sensor should be co-located and calibrated with a normal colour image sensor with the same calibration requirements as the multiple image technique, such that each pixel can be given an estimated colour and position in space relative to the camera. However, with only one pair of such sensors, only a single layer of data can be generated. At least two such pairs covering the same scene would be needed in order to generate two layers (to estimate positions for some objects obscured in the other pair). An additional pair may be used for each additional layer.
A related technique with similar restrictions is to use a “lidar” scanner in place of the time-of-flight sensor. This typically scans a laser beam over the scene and measures the phase or amplitude of the reflected light, to create an accurate estimate of distance. Again additional pairs of lidar+image sensors may be used to generate each additional layer.
As explained earlier, a number of points P1, . . . , PN and PX1, PX2 in
Alternatively a circlular mapping function may be used to map the spherical coordinates into 2d cartesian coordinates. These mapping functions create produces a circular image where every x and y value pair can be mapped back to spherical coordinates. The functions map the angle from the optical axis (theta) to the distance of a point from the image circle center (r). For every point the angle around the optical axis (phi) stays the same in spherical coordinates and in the mapped image circle. The relation between x and y coordinates and the r and phi in the mapped image circle is the following: x=x0+r*cos(phi), y=y0+r*sin(phi), where the point (x0,y0) is the center of the image circle.
An example of such mapping function is equisolid which is commonly used in fisheye lenses. The equisolid mapping depends on the focal length (f) of the lens and is the following: r=2*f*sin(theta/2). So for a point that's in the center of the optical axis (theta is 0), r becomes zero and thus the mapped point is also in center of the image circle. For a point that's on a vector perpendicular to the optical axis (theta is 90 degrees), r becomes 1.41*f and the point in the image circle can be calculated as follows: x=x0+1.41*f*cos(phi), y=y0+1.41*f*sin(phi). The x and y can be scaled with constant multipliers to convert the coordinates to pixels in the target resolution. Other mapping functions may be stereographic (r=2*f*tan(theta/2)), equidistant (r=f*theta) and orthographic (r=f*sin(theta)).
Each layer may be fully (that is, without holes, in a continuous way) covering space around the camera, such as RENDER LAYER 1 in
In addition, the encoding of the layers may allow for scaling of rendering complexity, or reducing delivered data quantity, while still giving good reproduction of the scene. One approach to this is to pack all layers into a 2D image with increasingly distant sub layers located further along one axis, for example along the increasing y axis (down). When less rendering is required, the lower data is simply not delivered, or not decoded/processed, with only the top layer and possibly a limited sub-set of the sub-layers
The invention may allow recording, distribution and reproduction of an complex 3D environment with a level of physically realistic behaviour that has not previously been possible other than with a large data processing capacity rendering a fully synthetic scene. This may improve earlier reproduction techniques based on multiple images from different viewpoints by greatly reducing the amount of data that needs to be delivered for a particular image resolution due to the use of the render layer structures.
In
It needs to be noticed here that a pixel of a render layer may in the render space represent a different size of an object. A pixel that is far away from the viewpoint (has a large depth value) may represent a larger object than a pixel closer to the viewpoint. This is because the render layer pixels may originally represent a certain spatial “cone” and the image content in that “cone”. Depending on how far the bottom of the cone is, the pixel represents a different size of a point in the space. The render layers may be aligned for rendering in such a manner that the pixel grids are essentially in alignment on top of each other when viewed from the render viewpoint.
For transforming the render layers to render space, they may need to be rotated. An example of a rotational transformation Rx of coordinates around the x-axis by an angle γ (also known as pitch angle) is defined by a rotational matrix
In a similar manner rotations Ry (for yaw) and Rz (for roll) around the different axes can be formed. As a general rotation, a matrix multiplication of the three rotations by R=Rx Ry Rz can be formed. This rotation matrix can then be used to multiply any vector in a first coordinate system according to v2=R v1 to obtain the vector in the destination coordinate system.
As an example of rotations, when the user turns his head (there is rotation represented by pitch, yaw and roll values), the head orientation of the user may be determined to obtain a new head orientation. This may happen e.g. so that there is a head movement detector in the head-mounted display. When the new head orientation has been determined, the orientation of the view and the location of the virtual eyes may be recomputed so that the rendered images match the new head orientation.
As another example, a correction of a head-mounted camera orientation is explained. A technique used here is to record the capture device orientation and use the orientation information to correct the orientation of the view presented to user—effectively cancelling out the rotation of the capture device during playback—so that the user is in control of the viewing direction, not the capture device. If the viewer instead wishes to experience the original motion of the capture device, the correction may be disabled. If the viewer wishes to experience a less extreme version of the original motion—the correction can be applied dynamically with a filter so that the original motion is followed but more slowly or with smaller deviations from the normal orientation.
For a frame to be displayed, layers can be rendered in multiple render passes, starting from opaque layers and ending with layers containing semitransparent areas. Finally a separate post-processing render pass can be done to interpolate values for empty pixels if needed.
During rendering, the graphics processing (such as OpenGL) depth test is enabled to discard occluded fragments and depth buffer is enabled for writing. A1pha blending is enabled during rendering if rendered layer contains semitransparent areas, otherwise it is disabled. The scene geometry contains a large number of unconnected vertices (GL_POINT) which each correspond to one pixel in the stored render layer data. Depending on the layer storage format, a vertex can have different number of attributes. Vertex attributes are e.g. position (x, y, z), colour, or a texture coordinate pointing to actual layer image data.
OpenGL vertex and fragment processing is explained next as an example. Other rendering technologies may also be used in a similar manner.
Vertex and fragment processing may be slightly different for different layer storage formats. Steps to process a layer stored in a uncompressed list format may be as follows (per vertex):
1. Initially all vertices are allocated and passed to vertex processing stage with their attributes including view angle, colour, and depth relative to common origin (the render viewpoint). If the processed layer has semitransparent content, vertices must be sorted according to their depth values.
2. (Yaw, pitch, depth) representation of the vertex is converted into 3d Cartesian vector (x, y, z).
3. Camera and world transformations are applied to the vertex by multiplying it with corresponding matrices.
4. Vertex colour attribute is passed to fragment processing stage.
5. Final vertex coordinate is written to the output variable (gl_Position)
6. At the fragment processing stage colour data received from vertex processing is written directly into the output variable (gl_FragColor)
The steps to process a layer stored in a compressed image format, that is, the render layers comprising pixels with pixel colour data and depth values, may be as follows (per vertex):
1. Initially all vertices are allocated evenly around the scene having same depth value.
2. If a vertex is not inside the viewer's current field of view, a transform function is applied in order to position it inside the current field of view. A purpose of this transform is to initially concentrate all available vertices into currently visible area.
Otherwise the pixel data that is represented by that vertex would be clipped out during rendering at the fragment processing stage. Avoiding clipping in this case improves rendering quality. Position transformation can be done in a way that vertices outside the field of view get distributed evenly inside the field of view. For example, if the field of view is horizontally from 0 degrees to 90 degrees, a vertex which is originally located horizontally at direction 91 degrees would then be transformed into a horizontal position at 1 degrees. Similarly, vertices from horizontal positions at 91 degrees to 180 degrees would be transformed into 1 to 90 degrees range horizontally. Vertical positions can be calculated in the same way. To avoid transformed vertices getting into precisely same position as other vertices that are already inside field of view, a small constant fraction (e.g. in this example case 0.25 pixels) can be added to vertex new position value.
3. Texture coordinate for vertex colour data is calculated from transformed vertex position and it is passed to fragment processing stage.
4. A depth value is fetched for the vertex using a texture lookup from a texture.
5. View angles for vertex are calculated using a mapping function.
6. (Yaw, pitch, depth) depth representation of the vertex is converted into Cartesian 3d vector (x, y, z).
7. Camera and world transformations are applied to the vertex by multiplying it with corresponding matrices.
8. Pixel resolution causes small rounding errors in the final vertex position, this can be taken into account by calculating (sub pixel) rounding error and passing it to the fragment processing stage.
9. Final vertex coordinate is written to the shader output variable (gl_Position)
10. At the fragment processing stage colour data is retrieved from colour texture using received texture coordinate and taking into account sub pixel rounding error value in order to interpolate a more suitable colour value using the surrounding points (this is not possible with the uncompressed list format). Colour value is then written into the output variable (gl_FragColor)
The source pixels may aligned during rendering in such a manner that a first pixel from a first render layer and a second pixel from a second render layer are registered on top of each other by adjusting their position in space by a sub-pixel amount. Depending on the storage format of the render layers, the vertices (pixels) may first be aligned to a kind of a virtual grid (steps 1 and 2, in “compressed” image format), or not. The vertices may finally aligned/positioned in the steps where the camera and world transformations are applied after fetching the correct depth and transforming & mapping the coordinates (step 7). It needs to be understood that alignment may happen in another phase, as well, or as a separate step of its own.
A third group of scene points may also be determined, the third group of scene points being at least partially obscured by the second group of scene points viewed from the render viewing point. Then, a third render layer may be formed using the third group of scene points, the third render layer comprising pixels, and the third render layer may be provided for rendering a stereo image.
The second render layer may be a sparse layer comprising active pixels corresponding to scene points at least partially obstructed by the first group of scene points. Also, the third render layer may be a sparse layer. Because pixels may be “missing” in some sparse layers, dummy pixels may be formed in the second render layer, where the dummy pixels are not corresponding to any real scene points. This may be done to encode the second render layer into a data structure using an image encoder. The render layers may be into one or more encoded data structures using an image encoder, for the purpose of storing and/or transmitting the render layer data. For example, a file with a data structure comprising the render layers may be created. One or more of the render layers may be formed into a two-dimensional image data structure, the image data structure comprising render layer pixels. The render layer pixels may comprise color values and a transparency value such as an alpha value. Data of at least two of the render layers may be formed into a collated image data structure, as explained earlier, the collated image data structure comprising at least two segments, each segment corresponding to a respective render layer.
Forming the scene model may comprise determining a three-dimensional location for said scene points by utilizing depth information for said source images. Forming the scene model may comprise using camera position of said source images and comparing image contents of said source images, as has been explained earlier.
The pixels of the first render layer and the second render layer may comprise colour values and at least pixels of the first render layer may comprise transparency values such as alpha values for rendering transparency of at least pixels of the first render layer. To make this transparency processing more efficient, it may be determined whether a render layer to be rendered comprises semitransparent pixels, and in case the determining indicates that the render layer does comprise semitransparent pixels, alpha blending is enabled in rendering of the render layer, otherwise alpha blending is disabled in rendering the render layer.
The first render layer and the second render layer may be received from a data structure comprising pixel values as a two-dimensional image. For example, the render layers may be stored in image data format into an image file, or otherwise represented in a data structure (e.g. in the computer memory) in a two-dimensional format. The colour values for the pixels of the first and second render layers may be determined by using texture mapping by using the data in the data structure and mapping the colour values from the data structure to the rendering space with the help of texture processing capabilities of graphics rendering systems (like OpenGL graphics accelerators).
In a similar manner, the first render layer and the second render layer may be received from a data structure comprising pixel values as a two-dimensional image, and depth values for the pixels of the first and second render layers may be determined by using texture mapping, where the depth values indicate a distance from a rendering viewpoint. That is, the depth data may also be stored or transmitted in an image-like data structure corresponding to the colour values of the render layers.
For the purpose of rendering light reflections and shading, the render layers may comprise information of viewing angle values for the pixels of the render layer. The first render layer and the second render layer may be received from a data structure comprising pixel values as a two-dimensional image, and the viewing angle values may be determined from these pixel values for the pixels of the first and second render layers by using texture mapping. Such determining of the viewing angle values may, for example, happen by using a so-called “bump mapping” capability of a graphics processor. In such a method, the angle of orientation of pixels is calculated using a texture, and the reflection of light from light sources by pixels depends on this angle of orientation. In other words, for the purpose of computing the image to be displayed, the pixels may have a surface normal having another direction than towards the viewer.
In
The different render layers may have their own image data structures, or the render layers may be combined together to one or more images. For example, an image may have a segment for the first render layer data, another segment for the second render layer data, and so on. The image may be compressed using conventional image compression technologies.
The various embodiments of the invention may be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the invention. For example, a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment. Yet further, a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
It is obvious that the present invention is not limited solely to the above-presented embodiments, but it can be modified within the scope of the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FI2014/050684 | 9/9/2014 | WO | 00 |