The invention relates to a video signal with depth information. The invention also relates to methods and systems for generating a video signal with depth information and rendering a video signal with depth information.
Since the introduction of display devices, a realistic 3-D display device has been a dream for many. Many principles that should lead to such a display device have been investigated. One such principle is a 3-D display device based on binocular disparity only. In these systems the left and right eye of the viewer perceives another perspective and consequently, the viewer perceives a 3-D image. An overview of these concepts can be found in the book “Stereo Computer Graphics and Other True 3-D Technologies”, by D. F. McAllister (Ed.), Princeton University Press, 1993. For example, shutter glasses may be used in combination with for instance a CRT. If the odd frame is displayed, light is blocked for the left eye and if the even frame is displayed light is blocked for the right eye.
Display devices that show 3-D without the need for additional appliances such as glasses are called auto-stereoscopic display devices. For example multi-view auto-stereoscopic display devices have been proposed. In the display devices as disclosed in U.S. Pat. No. 6,064,424 a slanted lenticular is used, whereby the width of the lenticular is larger than two sub-pixels. In this way there are several images next to each other and the viewer has some freedom to move to the left and right. Other types of auto-stereoscopic display devices are known in the art.
In order to generate a 3-D impression on a multi-view display device, images from different virtual viewpoints have to be rendered. This requires either multiple input views or some 3D or depth information to be present. This depth information can be recorded, generated from multi-view camera systems or generated from conventional 2D video material. For generating depth information from 2D video several types of depth cues can be applied: such as structure from motion, focus information, geometric shapes and dynamic occlusion. Preferably a dense depth map is generated, i.e. per pixel a depth value. This depth map is subsequently used in rendering a multi-view image to give the viewer a depth impression.
Existing video connections are designed to exchange sequences of images. Typically the images are represented by two-dimensional matrices of pixel values at both sides of the connection, i.e. the transmitter and receiver. The pixel values correspond to luminance and/or color values. Both transmitter and receiver have knowledge about the semantics of the data, i.e. they share the same information model. Typically, the connection between the transmitter and receiver is adapted to the information model. An example of this exchange of data is an RGB link. The image data in the context of transmitter and receiver is stored and processed in a data format comprising triplets of values: R (Red), G (Green) and B (Blue) together forming the different pixel values. The exchange of the image data is performed by means of three correlated but separated streams of data. These data streams are transferred by means of three channels. A first channel exchanges the Red values, i.e. sequences of bits representing the Red values, the second channel exchanges the Blue values and the third channel exchanges the Green values. Although the triplets of values are typically exchanged in series, the information model is such that a predetermined number of triplets together form an image, meaning that the triplets have respective spatial coordinates. These spatial coordinates correspond to the position of the triplets in the two-dimensional matrix representing the image. Examples of standards, which are based on such an RGB link, are DVI (digital visual interface), HDMI (High Definition Multimedia Interface) and LVDS (low-voltage differential signaling). However in the case of 3-D, along with the video data, the depth related data has to be exchanged too.
WO 2006/137000 A1 discloses a method of combined exchange of image data and further data being related to the image data, such as depth data, the image data being represented by a first two-dimensional matrix of image data elements and the further data being represented by a second two-dimensional matrix of further data elements. The method comprises combining the first two-dimensional matrix and the second two-dimensional matrix into a combined two-dimensional matrix of data elements. The above method however is somewhat limited with respect to the information provided and may not provide sufficient information for accurate rendering.
It would be advantageous to have an improved way of exchanging image data. To better address this concern, in a first aspect of the invention a system is presented for generating a signal representing a three dimensional scene from a primary view, comprising:
Each stripe corresponds to a rectangular area of image information within the primary view, thus a stripe may correspond to a single pixel, a one-dimensional array of pixels in the form of a line, or a two-dimensional array of pixels. Thus although a stripe corresponds to a rectangular area of image information, due to the inclusion of depth elements, the actual data represented by the stripe can describe a three-dimensional structure.
Since the stripes comprise data elements indicative of the position of the rectangular area of image information within the primary view, it becomes possible to more flexibly accommodate occlusion or side area information into the video signal. Any information about portions of the scene that might be available to the system can be inserted into one or more of such stripes. The video-like characteristics of the signal can be preserved to a large extend, because the stripes comprise familiar data elements indicative of color and depth. Consequently, these data elements may be encoded in a way known in the art of video encoding. This allows addressing backwards compatibility issues. It also allows applying standard video compression methods for information comprised within a stripe.
Since the stripes comprise data elements indicative of the position of the stripe, which may be in the form of data elements comprised in tuples of color, depth and position, it becomes easy to vary the sampling density between stripes or within a stripe. This enables inclusion of image information for occluded and/or side areas in the video signal. Also, the portions of an object which are close to parallel to a viewing direction of the primary view may be stored with improved resolution. These side areas may be occluded or poorly defined in a conventional image coded for the primary view. Consequently the improved resolution in which these portions are stored may be used to generate stereoscopic views with improved recovery of such side object portions.
Information of rear areas may also be included to further enhance the stereoscopic views. Information of rear areas also improves the possibility to look around objects: the scene may be viewed from very different perspectives, for example to allow a viewer to virtually move through the scene.
As indicated above a stripe defines a rectangular area of image information within the primary view, here rectangular area is understood to comprise rectangular areas comprising two-dimensional areas, one-dimensional areas, and/or points. An example of a two dimensional area is a rectangular array of equidistant samples, an example of a one-dimensional area would be a one-dimensional array of equidistant samples.
It should be noted that a stripe, although it a rectangular area of image information within the primary view, may actually comprise more information from the underlying three-dimensional scene than visible within the primary view. This is in fact the strength of the stripe representation, for this additional information may become visible when a different view is rendered.
A one-dimensional, line-based representation has the advantage that is enables representation of more erratic shaped objects without unnecessary storage loss. Whereas a two-dimensional, i.e. multi-line based representation has the advantage that it enables improved compression of stripe data as spatial redundancy within a stripe can be exploited using e.g. block based compression schemes.
The data elements can be grouped as tuples comprising color, depth and position data elements. In case color and depth are represented at one and the same resolution a representation using tuples (rgb, z, p) may be used, comprising red, green and blue-values representing a pixel color data element, a z-value representing a pixel depth data element and a p-value representing a pixel position data element.
In case the depth information is subsampled and represented at a quarter of the color resolution a representation using tuples (rgb1 , rgb2 , rgb3 , rgb4, z, p) may be used. It will be clear to the skilled person that the use of RGB data elements is merely exemplary and other color data elements such as YUV, or subsampled YUV (4:2:0) can be used instead. In the preceding tuple a single p-value and z-value are used to indicate the position of both the color and depth information, wherein the actual position of the color and depth data-elements can be derived from the p-value. When using line based stripes the p-value may represent an offset along the line relative to the start of the line. However in case of multi-line stripes the p-value itself may represent both an x and y coordinate, or alternatively a line number and an offset relative to the start of the line.
The above examples only comprise a single p-value for all coordinates. Alternatively when bandwidth/storage is less critical, more elaborate tuples such as:
(rgb1,rgb2,rgb3,rgb4,z,prgb1234,pz) (1)
(rgb1,rgb2,rgb3,rgb4,z,prgb13,prgb24,pz), (2)
(rgb1,rgb2,rgb3,rgb4,z,prgb1,prgb2,prgb3,prgb4) (3)
(rgb1,rgb2,rgb3,rgb4z,prgb13,prgb24), or (4)
(rgb1,rgb2,rgb3,rgb4,z,prgb1,prgb2,prgb3,prgb4,pz) (5)
may be used wherein position information is provided for more and/or for all individual color and depth data-elements.
For example, tuple (1) above includes two p-values, one for the color data-element and one for the depth data-element. Tuple (2) in turn represents a situation where the color data-elements are spread over two lines, and wherein the color sample points 1 and 2 are on the top line, and sample points 3 and 4 are located directly below on the bottom line. As the points 1 and 3 have the same offset within their respective line, a single p-value here suffices. The tuples (3) and (4) in turn do not comprise a separate p-value for the depth data-element. In the tuples (3) and (4) the p-value for the depth data-element is derivable from the p-values of the color data-elements. Finally tuple (5) allows full control of the position of sampling points within the rectangular area of image information within the primary view.
The signal may be split into a first subset of tuples representing samples corresponding to stripes representing an image of the three dimensional scene from the primary view, and a second subset comprising stripes representing occlusion and side area information. As a result the color data elements of the first subset can be coded as a first data stream and the depth data elements of the first subset can be coded as a second data stream.
In this manner compatibility with conventional three dimensional scene representations such as image-and depth can be achieved. The color, depth and position data elements of the occlusion or side area information in turn may be coded in a single stream, or in multiple streams.
The independent claims define further aspects of the invention. The dependent claims define advantageous embodiments.
These and other aspects of the invention will be further elucidated and described with reference to the drawing, in which
In recent years, much effort has been put in the development of 3D displays and data representations suitable to drive such displays. Auto-stereoscopic 3D displays do not require the viewer to wear special eyewear (such as the red/green glasses), but usually rely on displaying more than two views which allow users to freely look around the scene which is displayed and perceive depth because their left and right eyes “see” two of these different views. Since displays can vary in the number of views displayed, and also in other attributes, such as the depth range they can portray, a data format which is independent of such differences is needed. The image-and-depth format has been adopted in MPEG-C part 3.
While the image-and-depth format is suitable for the first generation 3D displays, which have moderate depth range capabilities, it needs to be extended in order to allow for more look-around and less so-called occlusion artifacts. However, occlusion artifacts may also occur in the further generations of 3D displays, which would advantageously be removed by using an improved image-and-depth format.
The stripes S921 and S927 represent rectangular areas of image information comprising data elements defining color and depth of part of the background plane 943 which in the two dimensional image would be respectively above and below the cube 944. Likewise the stripes S922 and S925 represent rectangular areas of image information of parts of the background plane 943 to the left and right of the cube 944 respectively. The stripes S923 and S925 represent rectangular areas of image information comprising data elements defining color and depth of two sides of the cube, along the scan path 942.
The sequence of stripes as established by the sequence generator (104) can be used to generate the signal directly, i.e. in the order determined by the scan path. The advantage of doing so is that image information needed for rendering a line is located in stripes in relatively close proximity.
Moreover stripes located adjacent in the scan direction could be clustered by splitting the sequence of stripes resulting in three sequences of stripes being: a first sequence corresponding to S921, a second sequence corresponding to the stripes S922, S923, S924, S925 and S926 and a third sequence corresponding to the stripe S927. Each of these sequences of stripes could be coded relative to one another such that only a horizontal offset is needed to indicate their respective position. However as can be seen in
Color, depth and position data elements may be coded together in one and the same stripe in the form of three or more valued tuples. Alternatively each of the different types of data elements may be coded in individual streams, thereby de-multiplexing the different types of data elements. In this manner a signal may be obtained that more closely resembles conventional image-and-depth representations.
Although for the sake of clarity no occlusion information was encoded in the above example, a preferred embodiment comprises both image information from side areas and occlusion areas. In this manner not only the sides of objects can be rendered more accurately for different views, but also de-occluded areas can be filled in with appropriate image information.
The above example included a cube in front of a background plane however the present invention may also be applied to more complex three dimensional scenes. In that case, the situation may occur that for certain regions of the rectangular area there is no image information available. This can be addressed in various manners, such as e.g. by adding a mask or transparency bit for those data elements.
An advantage of using stripes that correspond to rectangular areas of image information covering data elements of multiple video-lines, hereafter multi-line stripes, is that image information encoded in this manner can be compressed in a manner that takes into account spatial redundancy between pixels. The latter is particularly useful when using a compression scheme that uses a frequency domain transforms which address multiple data elements, such as an 8×8 DCT.
A further advantage of using multi-line stripes is that it allows the use of different sampling frequencies for color information and depth information. It may for example be possible to represent color information RGB at a first resolution and to use depth at a second resolution, e.g. at a quarter of the first resolution.
Although using multi-line stripes has certain advantages, it is also possible to use stripes that comprise data elements of a single video-line. Hereafter the present invention will be further elucidated primarily using examples of stripes comprising data elements of a single video line only for the sake of clarity.
System 100 comprises a contour generator 102 for generating at least part of the contours of the objects that are visible in the cross section 1100. Such a contour generator may be implemented in a way known in the art, for example using depth-from-motion algorithms or by using more than one camera to record the scene and applying depth computation techniques. Such algorithms may not be able to reconstruct the complete contour, especially the rear side 1108 of an object 1102 may not be visible in any of the images, and in such a case this portion of the contour information may not be available. Also, other parts of the scene may be occluded because of other objects in front of it. When more camera positions are used to record the scene, more contour information may become available. The contour 1154 in
The system 100 further comprises a sequence generator 104 for generating a sequence of stripes defining at least part of the representation of the three dimensional scene from a view. Each stripe here represents a rectangular area of image information comprising data elements defining a color, a depth and a position of the rectangular area. In this line-based embodiment the rectangular area is considered to have a height of one data element. The sample points on the contour have associated with them various data elements such as color, depth and position that may be organized as tuples. All sample points shown in
Most current multi-view displays render multiple views wherein the viewing direction for each of the views differs in the respective horizontal direction only. As a result rendering of images generally can be done in a line-based manner. As a result the video line 1002 preferably is a horizontal video line. However, the present invention may also be applied for video lines oriented in a vertical direction.
These sample points 1202 may be selected out of a plurality of segments of contours 1102 of the objects in the scene. The data elements associated with the sample points may be indicative of a color, for example expressed in red, green, and blue (RGB) components, or other formats known to those skilled in the art corresponding to the color of the object contour at the corresponding contour point. In case a more flexible solution is desired it is possible to allow the addition of further information such as e.g. a transparency data-element, which could be a binary, or a multi-value data-element, thus allowing the encoding of transparent or semi-transparent objects.
The data elements may also be indicative of a depth 1208. Such a depth may be expressed as a coordinate in the direction indicated by the arrow at 1208, i.e. providing information with regard to the distance to the view point. The depth may also be expressed as a disparity value, as known in the art. The depth as expressed corresponds to a particular primary view which corresponds to the viewing direction 1106 mentioned before. The viewing direction 1106 relates here to for example a direction of a line parallel to a line through the view point and the center of the background 1104. If the camera location is nearby the scene, the depth coordinates may correspond to divergent directions according to the projection of the scene onto the background. The data elements may also be indicative of a video line position 1210 in the direction indicated by the arrow at 1210. This video line position 1210 indicates a display position within the video line 1002 of the video image 1000, according to the primary view.
In particular when dealing with irregularly formed shapes it may be relevant to explicitly code the position and depth data elements associated with all sample points. In this manner it is possible to code any distribution of sample points with respect to the contour. For example data elements may relate to sample points chosen equidistant on the contour surface, or alternatively may be chosen equidistant with respect to a particular object contour normal. Alternatively, when coding more regular polygon structures, a more efficient position coding can be adopted, e.g. when using an equidistant sample grid on a polyline.
The sequence generator 104 selects consecutive points along the contour lines 1150. For example, if the video line is a horizontal line, the selector 104 may select the consecutive points from left to right. Alternatively the points may be selected from right to left. The selector 104 may start with the leftmost portion 1152 of the background, and work to the right until no information is present because of an object in front of the background. Then the selector 104 may continue with the contour of the object 1154. The selector may start at the left most endpoint of the contour 1154, work all the way along the contour 1154 until the right most endpoint is reached, and from there continue with the next object, which is in this case the remaining portion 1156 of the background.
The sequence generator 104 may be capable of including in the sequence of stripes a first subsequence, containing data elements of sample points near 1204, or consecutive data elements of sample points selected from a segment which is part of a side area of the at least one object 1102 in the primary view. The sequence generator 104 may also include a second subsequence, containing data elements of sample points near 1206, or consecutive data elements of sample points selected from a segment which is part of a frontal area of the at least one object in the primary view. The difference between the video line positions of two consecutive sample points of the first subsequence 1204 is smaller than a difference between the video line positions of two consecutive sample points of the second subsequence 1206. In this manner certain sequence portions are represented using data elements sampled at a higher sample frequency, to improve image quality of the rendered output, or alternatively at a lower sample frequency for the sake of representation size.
The sequence generator 104 may be arranged for including tuples indicative of one or more transparent data elements 1212 within a stripe to indicate a connection between different stripes. These transparent samples 1212 assist in efficiently rendering the sequence of stripes in a display system 150. For example a special data element may be included in the stripe, or in a tuple of data elements indicative of whether a piece of contour is transparent or not, alternatively a particular color value, or a color range may be reserved to indicate ‘transparent’. The use of a range may e.g. be particularly beneficial when the signal is subsequently subjected to lossy compression. The system 100 further comprises a signal generator 106 for generating a video signal comprising data elements comprised in the sequence of stripes 1350. This signal generator may be implemented in any way, as long as the sequence of stripes is appropriately encoded. Use may be made of digital signal encoding methods, such as MPEG standards. Other analog and digital signals, including storage signals and transmission signals, may be generated and are within the reach of the skilled person in view of this description. For example, the digital sequence of stripes may simply be stored in a file on a magnetic disc or on a DVD. The signal may also be broadcast via satellite, or cable TV, for example, or be transmitted via the Internet, or be transmitted on an interface like DVI or HDMI, to be received by a display system 150.
A plurality of respective sequences of stripes may be prepared and incorporated in the signal for a plurality of respective video lines 1002. This allows encoding a complete 3D video image 1000.
The several means 102, 104, and 106 may communicate their intermediate results via a random access memory 110, for example. Other architectural designs are also possible.
The system 100 also allows including samples from a segment which is occluded in the primary view and/or of rear areas of objects.
To further improve the compression ratio, the signal generator 106 may be arranged for aligning data elements in a first sequence of stripes with those in a second sequence of stripes, both sequences relating to a portion of the at least one object by inserting padding data elements. For example, consider the situation wherein the first sequence of stripes relates to a first video line and the second sequence of stripes relating to an adjacent video line. In this case the sequences may be aligned horizontally such that data element number N in the first sequence of stripes has the same horizontal position as data element number N in the second sequence of stripes.
When the sequence of stripes is encoded along a scan direction, the sequence of stripes can be encoded in a data stream by the sequence generator such that spatially adjacent data elements in the data stream are aligned with spatially proximate data elements in a direction perpendicular to the scan direction.
The signal generator 106 may further generate a third data stream 1306 which comprises the positions of at least the first subset of the stripes. These position values may be encoded as position values relative to a fixed reference point (for example corresponding to the left side of a video image). Preferably, the positions of consecutive samples are expressed as a delta (difference) between the video line positions of the consecutive samples. In the latter case the values may be efficiently compressed using run-length encoding, a well known lossless compression technique. However, compression is optional, and in situations wherein processing requirements are more critical than bandwidth, compression may not be necessary. For example, compression may not be necessary when using a display interface such as DVI or HDMI. In such a case, the delta values or the values relative to a fixed reference point may be encoded in uncompressed form, for example in two of the color channels, e.g. the green and blue channels, whereas the depth may be encoded in a third color channel, e.g. the red channel.
It is also possible to extract other portions of the information for inclusion in one or more backwards compatible streams. For example, a plurality of image-and depth layers, or another layered depth images (LDI) representation, may be included in a backwards compatible stream; the remaining information not included in the backwards compatible stream and/or remaining information included in the backwards compatible stream in an unsatisfactory resolution may be included separately.
An embodiment comprises a signal 1300 representing a three dimensional scene from a primary view, the signal comprising a sequence 1350 of stripes defining at least part of the representation of the three dimensional scene from the view. Each stripe in turn represents a rectangular area of image information comprising data elements defining a color, a depth 1208 and a position 1210, wherein the color and depth data elements for each stripe are derived from surface contour information 1102 of at least one object in the scene. The position data element is derived from the position of the surface contour information of the at least one object within the view 1202 and at least one stripe 1204 of the sequence of stripes represents surface contour information of the at least one object selected from an occluded area or a side area of the at least one object in the scene.
The sequence of stripes comprises a first stripe 1204 of data elements associated with consecutive points selected from a segment which is part of an occluded area or a side area of the at least one object in the primary view and a second stripe 1206 of consecutive data elements selected from a segment which is part of a frontal area of the at least one object in the primary view. Also, a first difference between the horizontal positions of two consecutive position data elements of the first subsequence may be smaller than a second difference between the horizontal positions of two consecutive position elements of the second subsequence.
Referring to
The display system 150 further comprises an image generator 154 for generating a plurality of images corresponding to stereoscopic views using the sequence of stripes. The stereoscopic views have different viewing directions; i.e. they correspond with different views of the same three-dimensional scene. The views are preferably horizontally distributed, or at least along a horizontal direction. An image of the plurality of images may be generated as follows. First the position and depth data elements are transformed into video line positions and depths that correspond with the viewing direction and viewpoint of the image that is to be generated. Second, an image is rendered using these transformed values, wherein for any horizontal position only a depth value indicative of a position closest to the viewpoint needs to be taken into account. In effect, the sequence of tuples represents one or more 3D polylines, in case of line-based stripes or polygons in case of multi-line based stripes. These polylines may be rendered using z-buffering, as known in the art. For example, the data elements associated with the sequence of stripes may be rendered one by one, using z-buffering. The exact manner of rendering of the data elements does not limit the present invention.
The display system may comprise a display 156 for displaying the plurality of images. The display 156 may be an autostereoscopic slanted lenticular display, for example. The several images may be rendered on such a display in an interleaved way. Alternatively, two images can be displayed time-sequential, and shutter glasses may be used for proper 3D image perception by a human. Other kinds of display modes, including stereoscopic display modes, are known to the person skilled in the art. A plurality of images may also be displayed in sequence on either a 3D display or a 2D display, which may produce a rotating effect. Other ways of displaying the images, for example interactive virtual navigation through a scene, are also possible.
In step 204, a sequence 1350 of stripes is generated defining at least part of the representation of the three dimensional scene from the primary view, wherein each stripe represents a rectangular area of image information comprising data elements defining a color, a depth 1208 and a position 1210. The color and depth data elements for each stripe are derived from surface contour information 1102 of the at least one object in the scene. The position data element is derived from the position of the surface contour information of the at least one object within the primary view 1202. Moreover, step 204 may involve including in the sequence of stripes a first stripe 1204 comprising data elements of consecutive points selected from a segment which is part of a side area of the at least one object in the primary view. A second stripe 1206 comprising data elements of consecutive points may be selected from a segment which is part of a frontal area of the at least one object in the primary view. A first difference between the horizontal positions of two consecutive position data elements of the first subsequence may be smaller than a second difference between the horizontal positions of two consecutive position data elements of the second subsequence.
Steps 202 and 204 may be repeated for a plurality of video lines in the image. In step 206, a video signal is generated including the resulting sequence or sequences of samples. In step 210, the process terminates. As indicated earlier the method can be applied for line based sequences of stripes and multi-line based sequences of stripes alike. The step 202 may be performed as follows. A plurality of images of the at least one object as seen from a plurality of different views is received. Depth information is established for pixels of the plurality of images, or may be provided as additional input, e.g. depth values determined using a range finder. The pixels of the secondary views are warped to the primary view, such that information indicative of a depth and a horizontal position according to the primary view of the at least one object is obtained for the pixels. This way, contour information is obtained.
The processes and systems described herein may be implemented in part or completely in software.
The drape 600 describes the contour line along the surfaces of the objects in the scene. Preferably such a contour line is completely within a cross section of the scene. The drape 600 not only comprises parts of the contour line which are frontal sides 602 of the objects, but also the left side 601 of object 1 and the left side of object 2, as well as the right side 603 of object 3 and the right side of object 2. Consequently, compared to the image-and-depth format, more occlusion data is captured. Some parts of the drape 600 contain image data. Examples of this are parts 601, 602, and 603. Other parts of the drape 600 are transparent. An example of a transparent part is part 610, 611, 612 and 613. Such a transparent part does not require a lot of storage space. For example, such a part may be skipped altogether. Preferably an indication is inserted in the signal to indicate that a portion of the drape is transparent. Alternatively, when a distance between successive pieces of drape is above a predetermined threshold, the portion in between the successive pieces of drape is set to transparent.
Next to the amount of tightening, also the resolution at which information is stored along the drape can be varied to balance the amount of information and storage/transmission capacity. The “transparent” parts mentioned earlier are an extreme example of this, but one could also choose to, for example, encode the sides of the objects (and especially the rear of the objects) at lower resolutions. The drape then may consist of a series of data elements associated with equidistant or non-equidistant points. These data elements may include information about color and possibly also transparency. Optionally additional information may be included to capture view-direction dependent effects, such as bi-directional reflectance distribution data, may also be included, as well as any other relevant information. The samples may have associated coordinates (x and z for a drape as shown in the figures, and a series for each line when a full 3D image is represented). Different methods can be used to store these series. Chain codes might be used, in particular if lossless compression is used.
It is possible to retain vertical cohesion for subsequent horizontal drape-lines. This allows achieving good compression performance. For example, the regular image-and-depth representation may be extracted or stored separately, and the additional pieces of the drape (which can be inserted back into the image-and-depth samples) may be stored as additional data. This ensures backwards compatibility with the current image-and-depth format, and adds the full drape-data as an optional extra. Moreover, the regular image-and-depth representation may be compressed using high-performance compression techniques. The remaining pieces in the additional data can then be arranged such that vertical cohesion is maximized for optimal compression. If the drape-lines correspond to vertical video lines, horizontal cohesion may be retained in a similar fashion.
A drape representation can be constructed from the images (and possibly depths) of several cameras looking at the scene from different positions, or can for example be derived from a voxel representation obtained by slicing through a (virtual) scene. Rendering a view from a drape may be realized by means of a process of depth-dependent shift with proper occlusion and de-occlusion handling.
In the field of computer graphics, boundary representations are known, such as for example described in “Relief texture mapping” by M. M. Oliveira et al., in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 359-368, 2000, ISBN 1-58113-208-5. These computer graphics representations are usually very geometrical in nature (for example mesh-based), whereas the drape may be used in a video-like representation in which not only colors but also depths may be represented as video signals which can be compressed very well.
It is also possible to encode vertical de-occlusion information using the technology described herein. For example, one or more sequences of samples may have vertical positions instead of horizontal positions associated with it. These “vertical drape lines” can be used instead of or in addition to the “horizontal drape lines”. Alternatively, the vertical spacing between successive sequences of samples may be made variable to accommodate visualizing an upper and/or a lower edge of an object.
A “drape” may be described as a sequence of stripes. These stripes may comprise a color value, a horizontal position value (e.g. pixel number on a line of a primary view), a depth value or disparity value, and/or a transparency indicator or value. It will be apparent that a color is not needed for a fully transparent portion or that a particular color value may be reserved for indicating “transparent”. Sides of a cube with a front side normal close to the viewing direction will be described using successive tuples having (either almost or exactly) the same position p, but different d, and appropriate color values. Objects which are in front of each other may be connected by means of a transparent portion of the drape. Using a “loose drape”, only frontal surfaces and side surfaces of objects are described in the drape. Using a “tight drape”, also the back surfaces of objects are described in the drape. In many cases, some side and rear surface information is present, but not all information. The drape can be used to accommodate any information available. It is not necessary to waste storage space for information which is not available or which is not needed at the receiving end. Also, it is not necessary to store redundant data. In video encodings using multiple layers, some storage space may be wasted if there is not enough information available to fill all layers, even after compression.
Using for example three images (a left, middle, and right image) of the same scene taken by three adjacent cameras (a left, middle, and right camera), it is possible to consolidate the information of the three images into a single drape. First, the depth map is reconstructed for all three images. Stereoscopic computations involving for example camera calibration may be employed. Such computations are known in the art. Next, the right and left images are warped to the geometry of the middle image. Surfaces appearing in the warped left image, the warped right image, and the middle image may be stitched together by detecting overlapping or adjacent surface areas. Next, the drape lines may be constructed by sampling or selecting from these (warped) image points.
To maintain vertical consistency, it is possible to insert transparent samples. This improves compression ratios obtained when using known video compression techniques.
Rendering of a drape line may be performed in a way similar to rendering a 3D polyline using z-buffering.
The sequences of samples representing the drape lines may be stored in a number of images. The first image may comprise color information. It is also possible to encode each of the components such as R, G, and B, or Y, U and V as three separate images. It is also possible to convert the colors to for example YUV color space which can be compressed better by subsampling U and V, as is known in the art. The second image may comprise depth information. This depth information may be encoded by means of a coordinate or by means of disparity information, for example. The third image may comprise horizontal coordinates: the video line position, for example expressed in whole pixels, or alternatively using an indication allowing sub-pixel precision (e.g. a floating point value). These images may further be compressed using standard video compression. Preferably, the image containing the x-coordinates may be expressed in deltas: the difference between the x-coordinates of consecutive samples may be stored instead of the absolute values of the x-coordinates. This allows performing efficient run-length encoding compression. These images may be stored in separate data streams.
Preferably, backward compatibility is provided by extracting a regular 2D image with optional depth information, to be stored or transmitted separately as a conventional video stream. The depth image may be added as an auxiliary stream. The remaining portions of the sequences of samples may be stored in one or more separate streams.
The image data of two of the cameras may be warped to the third camera position, e.g. the leftmost and rightmost images may be warped to the middle camera, which changes the x-values of the pixels of the warped images. It may happen, as is the case for the side surface of the cube object in
It will be appreciated that the invention also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice. The program may be in the form of source code, object code, a code intermediate source and object code such as partially compiled form, or in any other form suitable for use in the implementation of the method according to the invention. The carrier may be any entity or device capable of carrying the program. For example, the carrier may include a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a floppy disc or hard disk. Further the carrier may be a transmissible carrier such as an electrical or optical signal, which may be conveyed via electrical or optical cable or by radio or other means. When the program is embodied in such a signal, the carrier may be constituted by such cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant method.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb “comprise” and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. The article “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Number | Date | Country | Kind |
---|---|---|---|
08157420.4 | Jun 2008 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB09/52225 | 5/27/2009 | WO | 00 | 11/22/2010 |