The invention relates to the coding of 3D video signals, specifically the transport format used to broadcast 3D contents.
The domain is that of 3D video, that includes cinema content used for cinema projection, for diffusion on DVD media or for broadcast by television channels. Thus it specifically involves 3D digital cinema, 3D DVD and 3D television.
Numerous systems exist today for the display of images in relief.
3D digital cinema, known as the stereoscopic system, is based on the wearing of glasses for example with Polaroid filters and uses a stereographical pair of views (left/right), or the equivalent of two “reels” for a film.
The 3D screen for digital television in relief, known as the autostereoscopic system as it does not require the wearing of glasses, is based on the use of Polaroid lenses or bands. These systems are designed to enable the viewer to have, in an angular cone, a different image arriving on the right eye and the left eye:
It should also be mentioned that the current solutions lead to a loss in spatial resolution, on account of the complimentary information to be transmitted for the 3D display. For example, for a high definition panel, 1080 lines of 1920 pixels, each of the views among the 8 or 9 views will have a spatial resolution loss of a factor of 8 or 9, the transmission bitrate used and the number of pixels of the television remaining constant.
Studies in the domain of the display of images in relief on screens are orientated today towards:
Moreover, it is noted that the contents relating to 3D digital cinema can be distributed by the intermediary of DVD media, systems studied currently are called for example Sensio or DDD.
The formats of video elementary streams used to exchange 3D contents are not harmonized. Proprietary solutions coexist. A single format is standardized that is a transport encapsulation format (MPEG-C part 3) but it relates only to the encapsulation system in the MPEG-2 TS transport stream and therefore does not define a new format for the elementary stream.
This multiplicity of video elementary stream formats for 3D video contents, this absence of convergence, does not facilitate conversions from one system to another, for example from digital cinema to DVD distribution and TV broadcast.
One of the purposes of the invention is to overcome the aforementioned disadvantages.
The purpose of the invention is a coding device intended to exploit the data from different 3D production means, data relating to a right image and a left image, data relating to depth maps associated with right images and/or left images and/or data relating to occlusion layers, characterized in that it comprises the means to generate a stream structured on more than one level:
According to a particular embodiment, the data relating to level 0, level 1 or level 2 come from 3D synthesis image generation means and/or the 3D data means of production from:
According to a particular embodiment, the 3D data production means use, for the calculation of data relating to level 1, specific means for depth information acquisition and/or means for depth map calculation from data coming from stereo cameras and/or multiview cameras.
According to a particular embodiment, the 3D data production means use, for the calculation of data relating to level 2, occlusion map calculation means from data coming from depth information acquisition means, from stereo cameras and/or multiview cameras.
The purpose of the invention is also a decoding device for 3D data from a stream for their display on a screen, structured in several levels:
for their display on a display device, characterized in that it comprises a 3D display adaptation circuit using the data of one or more data stream layers received to render them compatible with the display device.
According to a particular embodiment, the 3D display adaptation circuit uses:
The purpose of the invention is also a video data transport stream, characterized in that the stream syntax differentiates the data layers according to the following structure:
A single “stacked” format is used to diffuse the different 3D contents on different media and for different display systems, such as contents for 3D digital cinema, 3D DVD, 3D TV.
Thus 3D contents can be recovered coming from different existing production modes and the range of autostereoscopic display devices can be addressed, from a single transmission format.
Thanks to the definition of a format for the video itself, and due to the structuring of data in the stream, enabling the extraction and the selection of appropriate data, the compatibility of a 3D system with another is assured.
Other specific features and advantages will emerge clearly from the following description, the description provided as a non-restrictive example and referring to the annexed drawings wherein:
It seems that the multiview autostereoscopic screens, for example the Newsight screen provide the best results, in terms of quality return, when they are supplied with N views where the extremes correspond to a pair of stereoscopic views and where the intermediary images are interpolated, only when supplied with the result of a multicamera acquisition. This is due to the constraints that must be respected between the focals of the cameras, their aperture, their positioning (inter-camera distance, directions relative to optic axes, etc.), the size and the distance of the subject filmed. For real scenes, interior or exterior, and “realist” cameras, that is to say of reasonable focal length and apertures that do dot give an impression of distortion of the scene at the display, typically camera systems are used whose optical axes must be spaced at a distance of the order of 1 cm. The average human inter-ocular distance is 6.25 cm.
It would appear therefore advantageous to transform the data relating to multicameras into data relating to the right and left stereoscopic views corresponding with the inter-ocular distance. This data is processed to provide stereoscopic views with depth maps and possibly occlusion masks. It therefore becomes useless to transmit multiviews, that is to say data relating to the number of 2D images corresponding to the number of cameras used.
For data relating to stereoscopic cameras, the left and right images can be processed to provide, in addition to the images, depth maps and possibly occlusion masks enabling exploitation via autostereoscopic display devices after processing.
As for the depth information, this latter can be estimated from adapted means such as laser or infra-red or calculated by measurement of motion disparity between the right image and the left image of in a more manual way by estimation of the depth for the regions.
The video data from a single 2D camera can be processed to provide two images, two views permitting the relief. A 3D model can be created from this single 2D video, with human intervention consisting in for example a reconstruction of scenes via exploitation of successive views, to provide stereoscopic images.
It appears that the N views exploited for a multiview display system and coming from N cameras can in fact be calculated from the stereoscopic contents, by carrying out interpolations. Hence the stereoscopic contents can serve as a basis for the transmission of television signals, the data relating to the stereoscopic pair enabling the N views for the 3D display device to be obtained by interpolation and eventually by extrapolation.
By taking account of these observations, it can be deduced that the different data types necessary for the display of a 3D video content, according to the display device type are the following:
The current 2D conventional contents, coming from for example transmission or storage means, referenced 1, the video data from a standard 2D camera, referenced 2, are transmitted to the means of production, referenced 3, realizing the transformation into 3D video.
The video data from stereo cameras 4, from multiview cameras 5, the data from distance measurement means 6 are transmitted to a 3D production circuit 7. This circuit comprises a depth map calculation circuit 8 and an occlusion masks calculation circuit 9.
The video data coming from a synthetic images generation circuit 10 are transmitted to a compression and transport circuit 11. The information from 3D production circuits 3 and 7 are also transmitted to this circuit 11.
The compression and transport circuit 11 realizes the compression of data using, for example, the MPEG 4 compression method. The signals are adapted for transport, the transport stream syntax differentiating the object layers of the structuring of video data potentially available at input to the compression circuit and described later. This data from circuit 11 can be transmitted to the reception circuits in different ways:
The signals are thus transmitted by the compression and transport circuit according to the structure of the transport stream described later, the signals are arranged in the DVD, or reels, according to this transport stream structure. The signals are received by an adaptation circuit to the 3D display devices referenced 12. This block carries out, from different layers in the transport stream or the programme stream, the calculation of data required by the display device to which it is connected. The display devices are of type screen for stereographic projection 13, stereographic 14, autostereographic or multiview autostereoscopic 15, autostereoscopic with servo 16 or other.
In the vertical direction are defined the layers of level zero, of level one and of level two. In the horizontal direction are defined, for a level, a first layer and possibly a second layer.
The video data of the first image of a stereoscopic pair, for example the left view of a stereoscopic image, are assigned a base layer, first layer of level zero according to the appellation proposed above. This base layer is that used by a standard television, the conventional type video data, for example the 2D data relating to the image displayed by a standard television, being also assigned to this base layer. A compatibility with existing products is thus maintained, a compatibility that does not exist in the standardization of Multiview Video Coding (MVC)
The video data of the second layer of the stereoscopic pair, for example the right view, are assigned to the second layer of level zero, called the stereographic layer. It involves an enhancement layer of the first layer of level zero.
The video data concerning the depth maps are assigned to enhancement layers of level one, the first layer of level one called the left depth layer for the left view, the second layer of level one is called right depth layer for the right view.
The video data relating to occlusion masks is assigned to an enhancement layer of level two, the first layer of level two is called the occlusions layer.
A stacked format for the video elementary stream, consists therefore in:
Due to this organization of data in the different layers, the contents can be converged that are relative to the stereoscopic devices for 3D digital cinema, to multiview type autostereoscopic devices or using depth maps and occlusion maps. The stacked format enables at least 5 different types of display device to be addressed. The configurations used for each of these types of display device are indicated in
The base layer, alone, reference 17, addresses conventional display devices.
The base layer adjoined to the stereographic layer, grouping referenced as 18, enables a 3D cinema type projection as well as the displaying of DVD on stereoscopic screens, with glasses, or autostereoscopic with only two views with head tracking.
The base layer associated with the “left” depth layer, grouping 19, enables a Philips 2D+z type display device to be addressed.
The base layer associated with the “left” depth layer and with the occlusion layer, that is to say the first layer at level zero and the first level one and two enhancement layers, grouping 20, enables an LDV (Layered Depth Video) type display device to be addressed.
The base layer associated with the stereographic layer and with the left and right depth layers, that is to say level zero and level one layers, grouping 21, addresses MVD (Multiview Video+Depth maps) type autostereoscopic 3DTV type display devices.
Such a structuring of the transport stream enables a convergence of formats, for example of type Philips 2D+z, 2D+z+occlusions, LDV with formats of type stereoscopic of type cinema and with formats of type LDV or MVD.
Returning to
Hence, the conventional 2D or 3D video signals, whether they come from recording media, radio transmission or by cable, can be displayed on any 2D or 3D system. The decoder, that for example contains the adaptation circuit, selects and exploits the layers according to the 3D display system to which it is connected.
It is also possible to transmit to the receiver, for example by cable, due to this structuring, only the layers required by the 3D display system used.
The invention is described in the preceding text as an example. It is understood that those skilled in the art are capable of producing variants of the invention without leaving the scope of the invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 0854934 | Jul 2008 | FR | national |
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/EP2009/059331 | 7/21/2009 | WO | 00 | 1/14/2011 |