The present disclosure generally relates to image rendering and more particularly to image rendering using point cloud techniques.
Volumetric video capture is a technique that allows moving images, often in real scenes, be captured in a way that can be viewed later from any angle. This is very different than regular camera captures that are limited in capturing images of people and objects from a particular angle only. In addition, video capture allows the captures of scenes in a three-dimensional (3D) space. Consequently, data that is acquired can then be used to establish immersive experiences that are either real or alternatively generated by a computer. With the growing popularity of virtual, augmented and mixed reality environments, volumetric video capture techniques are also growing in popularity. This is because the technique uses visual quality of photography and mixes it with the immersion and interactivity of spatialized content. The technique is complex and combines many of the recent advancements in the fields of computer graphics, optics, and data processing.
Volumetric visual data is typically captured from real world objects or provided through use of computer generated tools. One popular method of providing common representation of such objects is through use of point cloud. A point cloud is a set of data points in space that represent a three dimensional (3D) shape or object. Each point has its set of X, Y or Z coordinates. Point cloud compression (PCC) is a way of compressing volumetric visual data. A subgroup of MPEG (Motion Picture Expert Group) works on the development of PCC standards. MPEG PCC requirements for point cloud representation require view-dependent attributes per 3D position. A patch, or to some extent points of a point cloud, is viewed according to the viewer angle. However, viewing any 3D object in a scene, according to different angles may require modification of different attributes (e.g. color or texture) because certain visual aspects may be a function of the viewing angle. For example, properties of light can impact the rendering of an object because angle of viewing can change the color and shading of it depending on the material of the object. This is because texture can be dependent on incident light wavelength. Unfortunately, current prior art does not provide realistic views of objects under all conditions an angles. Modulated attributes according to viewer angle for a captured or even scanned image does not always provide a faithful rendition of the original content. Part of the problem is because even when preferred viewer angle is known when rendering the image, the camera settings and the angle that were used to capture the image as relating to 3D attributes are not always documented in a way that can provide a realistic rendering possible at a later time and 3D point cloud attributes can become uncertain at some viewing angles. Consequently, techniques are needed to address these short comings of the prior art when rendering views and images that are realistic.
In one embodiment, a method and device are provided for rendering an image. The method comprises receiving an image from at least two different camera positions and determining a camera orientation and at least one image attribute associated with each of the positions. A model is then generated of the image based on the attribute and camera orientation associated with the received camera positions of the image. The model is enabled to provide a virtual rendering of the image at a plurality of viewing orientations and selectively providing appropriate attributes associated with the viewing orientation.
In another embodiment, a decoder and encoder are provided. The decoder has means for decoding from a bitstream having one or more attributes data, the data having at least an associated position corresponding to an attribute capture viewpoint. The decoder also has a processor configured to reconstruct a point cloud from the bitstream using all said attributes received, and provide a rendering from the point cloud. The encoder can encode the model and the rendering.
The teachings of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
Images are captured and presented in two dimensions such as the one provided in
In some embodiments as will be discussed, both PCC and MPEG standards are used. The MPEG PCC requirements for point cloud representation require view dependent attributes per 3D position. A patch, for example as specified in V-PCC (FDIS ISO/IEC 23090-5, MPEG-I part 5), or to some extent points of a point cloud, is viewed according to the viewer angle. However, viewing a 3D object in a scene, represented as a point cloud, according to different angles may show different attribute values (e.g. color or texture) function of the viewing angle. This is due to property of the material composing the object. For example the reflection of light on the surface (isotropic, non-isotropic, etc.) can change the way the image is rendered. Properties of light in general impacts the rendering, as materials reflection of the surfaces of an object are dependent on incident light wavelength.
The prior art does not provide solution that allow modulating rendition attributes according to viewer angle, for either captured or even scanned material under different viewpoints faithfully because camera settings and the angles that were used to capture each 3D attributes are not documented in most cases and 3D attributes become undertain from certain angles.
In addition when using PCC and MPEG standards, the view-dependent attributes do not address the 3D graphics as intended despite the tiling, volumetric SEI and viewport SEI messages. In addition, while some information is carried in the V-PCC stream, the point attributes of a same type captured by a multi-angle acquisition system (that might be virtual in case of CGI), may be stored across attributes “count” (ai_attribute_count in attribute_information(j) syntax structure) and identified by an attribute index (vuh_attribute_index, indicating the index of the attribute data carried in the attribute video data unit) that causes some issues. For example, there is no information on acquisition system position or angle used to capture a given attribute according to a given angle. Thus, such a collection of attributes stored in the attributes dimension can only be modulated arbitrarily according to the angle of viewing of the viewer as there is no relationship between capture attributes and their capture position. This leads to a number of disadvantages and weaknesses such as lack of information on the position of the captured attributes, arbitrary modulation of content during rendering and unrealistic renditions that are unfaithful to the original content attributes.
In a point cloud arrangement, the attributes of a point may change according to the viewpoint of the viewer. In order to capture these variations, the following elements need to be considered:
The video-based PCC (or V-PCC) standards and specification does address some of these issues in providing the position of the viewer (Item 1) through the “viewport SEI messages family” which enables rendering view-dependent attributes. Unfortunately, however, this, as can be understood, presents rendering issues. The rendering is affected because in some of these cases there is no indication about the position from where attributes were captured. (It should be noted that in one embodiment, ai_attribute_count only indexes the lists of captured attributes however there is no information where they were captured from). This can be resolved by different possibilities in storing the capture position in a descriptive metadata once generated and calculated.
In Item 2, it should be noted that certain capture camera may not capture the attributes (colors) of the object (for instance if you consider a head, the camera in front will capture the cheeks, eyes . . . but not the rear of the head . . . ) so that each point is not provided with an actual attributes for every angle.
The position of the camera used to capture attributes is provided in an SEI message. This SEI has the same syntax elements and the same semantics as the viewport position SEI message except that, as it qualifies capture camera position:
More information about the specifics of this is provided in Table 1 as shown in
Alternatively, the cp_attribute_index is not signaled and derived implicitly as being in the some order than the attribute data stored in the stream (i.e. order of derived cp_attribute_index is the same as vuh_attribute_index in decoding/stream order).
In yet another alternative embodiment, the capture position syntax structure loops on the number of attribute data sets presents. The loop size may be explicitly signaled (e.g. cp_attribute_count) or inferred from ai_attribute_count[cp_atlas_id]−1. This is shown in
In addition, alternatively or optionally, a flag can be provided to indicate in the capture position SEI message whether the capture position is the same as the viewport position. When this flag is set equal to 1, cp_rotation-related (quaternion 4 rotation) and cp_center_view_flag syntax elements are not transmitted.
Alternatively at least an indicator can be provided that specifies whether attributes are view-independent according to an axis (x, y, z) or directions. Indeed, view-dependency may only occur relatively to a certain axis or position.
In another embodiment, again additionally or optionally to one of previous examples, an indicator associates sectors around the point cloud with attributes data sets identified by cp_attribute_index. Sector parameters such as angle and distance from the center of the reconstructed point cloud may be fixed or signaled.
In an alternate embodiment, the capture position can be provided via processing of SEI messages. This can be discussed in conjunction with
In one embodiment, using the attributes discussed. The angles are relative, in on embodiment to a system coordinate. In this embodiment, the angles (or rotation) are determined for example with a variety of models known to those skilled in the art such as the quaternion model. (see cp_attribute_index (and optionally, cp_attribute_partition_index which links the position of attributes capture system to the index of the attribute information—i.e. the matching vuh_attribute_index, the index of the attribute data carried in the attribute video data unit—it relates to). This information enables matching attributes values seen from the capture system (identified by cp_attribute_index) with attributes values seen from the viewer (possibly identified by the viewport SEI message). Typically, the attribute data set selected is the one for which the viewport position parameters (as indicated by the viewport SEI message) are equal or near (according to some thresholds and some metrics like Mean Square Error) the capture position parameters (as indicated by the capture position SEI message).
In one embodiment, at rendering, for each point of the point cloud to be rendered by:
Alternatively, in a different embodiment the set of captured viewpoints can be selected by “ALL” the capture viewpoints within a specific maximum angular distance and then blend in the same way as depicted previously.
Therefore, only need is to encode the model type (i.e. Octahedral, or other let for further use) and the discretization square size (e.g. n=11 at maximum). These two values stand for all the points and are very compact to store. As an example, the scan order of the unit square is raster scan or clockwise or anti-clockwise. An exemplary syntax can be provided such as:
where:
Alternatively, only the same representation model can be used and it is not signalled in the bitstream. Filling the actual regular values from an unregular capture rig can be done by using the algorithm presented in previous section at the compression stage with a user defined value of n.
Alternatively, only the same representation model is used and it is not signalled in the bitstream. Filing the actual regular values from an unregular capture rig can be done by using the algorithm presented previously at a point where a user can define a value of n for compression.
Alternatively, an implicit model SEI message can be used for processing as shown in
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application
Number | Date | Country | Kind |
---|---|---|---|
20306195.7 | Oct 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/078148 | 10/12/2021 | WO |