The present invention relates to a method for generating, transporting and reconstructing a stereoscopic video stream.
For transmission of 3D video signals, so-called “frame-compatible” formats are commonly used. Such formats allow to enter into a Full HD frame, which is used as a container, the two images that make up the stereoscopic pair. In this way, the 3D signal, consisting of two video streams (one for the left eye and one for the right eye) becomes a signal consisting of a single video stream, and therefore can pass through the production and distribution infrastructures used for 2D TV and, most importantly, can be played by 2D and 3D receivers currently available on the market, in particular for High Definition TV.
a and 1b schematically show two HD frames composed of 1920 columns by 1080 rows of pixels (referred to as 1080p), respectively belonging to the video streams for the left eye L and for the right eye R. The two left L and right R images can be entered into a composite frame, by selecting their respective pixels, one next to the other, thus creating the so-called “side-by-side” format, or one on top of the other, thus creating the so-called “top-and-bottom” or “over-under” format (see
A third format, called “tile format”, has also been proposed, wherein two 720p images (1280×720 progressive-scan pixels) are entered into a 1080p container frame. According to this format, one of the two images is entered unchanged into the container, while the other one is divided into three parts, which are in turn entered into the space left available by the first image (see
These entry operations are carried out at the frame rate frequency of the video stream involved, the typical values of which are approximately 24, 50 or 60 Hz (or fps, frames per second), depending on the adopted standard.
Usually, the stream images are then compressed by using a suitable coding technique and may be subjected to further treatments (multiplexing, channel coding, and the like) in order to be adapted for storage or transmission prior to reproduction.
All these three formats can be used, as aforesaid, for generation and transport (transmission or storage on a physical medium), whereas other formats, not suitable for transport purposes, are used for visualization, namely the so-called “line-alternate” and “frame-alternate” formats.
In the “line-alternate” format, the two images L and R are interleaved; for example, with reference to
In the “frame-alternate” display system, on the contrary, the image L and the image R are displayed alternatively on the screen (see
However, such treatment independency does not allow to optimize the quality of the images. In other words, if the frame-alternate display format is used, the optimal transport format will be different from the one which would be optimal for the line-alternate display format, and vice versa. This fact is generally ignored, with the consequence that either the available band is not fully exploited or alterations are introduced into the stereoscopic image. In other words, the frame-packing formats currently used for transporting video streams are not optimized in view of their visualization on reproduction apparatuses.
For a frame-alternate display, all three of the above-mentioned frame-compatible formats can be used for transporting the video signal, the best one being the tile format because it preserves the balance between horizontal and vertical resolution. However, all three formats suffer from a drawback, i.e. the two images L and R entered into the same composite frame refer to the same time instant (in that the two video cameras are synchronized (“gen-locked”) by the same synchronism signal (“gen-lock”, for generator lock), but are displayed in temporal succession.
If 1080p video cameras are used, the two images in question are captured simultaneously at preset time intervals Δt, but they are displayed in a delayed and alternated manner at halved intervals Δt/2. If, for example, the television system in use is the 50 Hz European one (one pair of frames L-R every 20 ms), then the display will show a succession of images at 100 Hz (one frame L or R every 10 ms), with L,R,L,R alternation, and so on.
Such errors are similar to those produced by the so-called “Pulfrich effect”, which is visible on test images containing horizontally moving objects, e.g. a pendulum oscillating in a plane perpendicular to the eyes-pendulum conjunction line (see
The pendulum's apparent direction of rotation depends on which eye is being screened; in the case of
The Pulfrich effect is very suggestive, since it causes three-dimensional images to appear on the screen of a normal 2D television set displaying a normal 2D image. This is an optical illusion, which has already been used in order to intentionally create three-dimensional effects, but it is of little use in practice because the three-dimensional effect shows in an uncontrolled manner and only in the presence of objects moving horizontally with respect to the observer.
An object of the present invention is therefore to provide a method for generating, transporting and reconstructing a stereoscopic video stream which, when reproduced on a frame-alternate display, has no depth errors.
In brief, in order to eliminate the above-described optical illusion, it is necessary that the two images L and R entered into the same composite frame be not captured simultaneously, but mutually delayed by half frame (in the case of progressive formats) or by half field (in the case of interleaved formats), i.e. 10 ms when using the 50 Hz European television system, where one frame or one field is captured every 20 ms. This applies to all three frame-compatible formats (e.g.: side-by-side, top-and-bottom, tile format).
Of course, if this time shift is made during the capturing stage, the video signal should include a suitable signalling specifying which one of the two views of a stereoscopic pair has been captured first. In fact, if said pairs are displayed in the reverse order with respect to the capturing process, so that, for example, the left images are displayed alternately on the screen after the right ones, but were captured first, the depth error in the viewer's vision will be increased, not removed.
This signalling is particularly simple, since only two possibilities exist: either the left image L is captured first or the right image R is captured first. Therefore, by way of example, this signalling may be assigned just one bit, the value 0 (zero) of which indicates that the former of said cases is true, whereas the value 1 (one) indicates that the latter case is true.
If, however, one also wants to signal the case wherein the two images are captured simultaneously, i.e. the case wherein the present invention is not used (e.g. because a line-alternate display is used), it is clear that the signalling must comprise at least two bits, one of which may indicate, for example, the contemporaneousness or non-contemporaneousness of the two images, and the other bit may indicate which one of the two images precedes the other image. The first bit may be used by the receiver to understand if the signal being transmitted is optimized for the type of display in use: it should be reminded that the transmission of images not captured simultaneously is optimal for frame-alternate displays, while the transmission of images captured simultaneously is optimal for line-alternate displays. In the event of non-optimal transmission, the receiver can take different actions: for example, it may notify the user, by means of a message displayed on the screen, about the probable presence of depth errors and/or it may suggest the user to select the 2D mode, or it may even automatically switch to 2D mode. Another possibility for the receiver is to try and correct the depth errors by locally processing the received images L and R: however, such processing is quite burdensome in computational terms, and the correction obtained will never be perfect.
Further features and objects of the invention are set out in the appended claims, which are intended to be an integral part of the present description, the teachings of which will become more apparent from the following detailed description of a preferred but non-limiting example of embodiment thereof with reference to the annexed drawings, wherein:
a, 2b and 2c show a pair of stereoscopic images in the side-by-side, over-under and tile formats, respectively;
a and 6a schematically show a method according to the prior art for capturing and displaying temporally successive left and right frames comprising a rectangular object moving horizontally relative to the viewpoint of video cameras shooting it;
b and 6b schematically show a method according to the invention for capturing and displaying the temporally successive left and right frames of
A genlock apparatus for generating the capture synchronism 810 generates a common synchronization signal for both video cameras in order to dictate the times of video image capture, which in the European video system takes typically place at a frequency 1/Δt of 50 Hz, i.e. one image every 20 ms, equal to the interval Δt elapsing between the capture of two stereoscopic images belonging to successive pairs L-R. One of these two genlock signals, e.g. the one supplied to the second video camera 830″, is delayed by a time interval substantially equal to Δt/2, i.e. 10 ms for the 50 Hz video standard, by a delaying device 820 interposed between the genlock apparatus 810 and the second video camera 830″. If the delaying device 820 is of the multistandard type, i.e. capable of operating with both the 50 Hz European standard and the 60 Hz US standard, it can be provided that said time interval is adjustable or programmable via suitable adjusting or programming means.
As a consequence, the left images L are captured with the same frequency 1/Δt (typically 50 or 60 Hz) as the right ones, but anticipated by Δt/2 with respect to the images R of the same stereoscopic pair (see
The present invention is applicable without distinction to any type of video camera. In particular, it can operate with different video resolutions, e.g. the Full HD resolution, i.e. 1920×1080 pixels (abbreviated as 1080) or 1280×720 pixels (abbreviated as 720). Furthermore, it can output a progressive (p) or interleaved (i) video signal, at 50 or 60 Hz or fps. In particular, it is applicable, for example, to a pair of 2D video cameras capable of capturing a video stream in at least one of the following modes: 1080p@50 Hz, 1080p@60 Hz, 720p@50 Hz, 720p@60 Hz, 1080i@50 Hz and 1080i@60 Hz. Other high-end formats used for cinematographic shooting and projection utilize 24 images per second.
In the case of interleaved 1080i formats, the video cameras 830′ and 830″ output video streams consisting of an alternation of odd and even half-frames of 1920×540 pixels, respectively constituted by 540 odd rows and 540 even rows of the same Full HD 1080p frame. The two lines 83′ and 83″, therefore, carry the time-alternate odd and even half-frames of, respectively, the views L and R belonging to one stereoscopic pair, wherein the capturing of one of the two views is delayed in time.
When the invention is applied to a TV production studio, the video cameras 830′ and 830″ output two video signals formatted in accordance with one of the standard of the SDI (Serial Digital Interface) family, regulated by the SMPTE (Society of Motion Picture and Television Engineers).
The images generated by the video cameras 830′ and 830″ are then packed by a frame packer 840 into one of the above-mentioned formats, i.e. side-by-side, top-and-bottom or tile. The stereoscopic video stream thus obtained is compressed by an encoder 850, which may possibly also add the signalling, on the basis of information coming, for example, from the genlock apparatus 810 (see the dashed connection 81 in
Capturing devices already exist, whether of the consumer or professional type, which incorporate into a single container both video cameras required for stereoscopic shooting. In this case, also the delaying device of the genlock apparatus 810 may advantageously be incorporated into the capturing device.
As aforesaid, the present invention is suitable for use in combination with display devices operating with the so-called frame-alternate technique, wherein the left and right images of each stereoscopic pair are displayed alternately in time on the screen. If the display device operates with the line-alternate technique, the present invention will not be applied.
The signalling entered into the video stream being transmitted, indicating which one of the two images contained in a given composite frame is delayed with respect to the other, must be used by the display device in order to reconstruct the correct frame-alternate sequence. In fact, if the sequence is reconstructed incorrectly, i.e. the image displayed first is the one that was delayed when capturing took place, then the depth error will be increased, not removed.
The reproduction and/or reception system may comprise, for example, a television tuner 910 (DVB-T/T2, DVB-S/S2 or DVB-C/C2, ATSC, and the like) enabled to tune to a television signal comprising a stereoscopic video stream generated by a stereoscopic stream generation system according to the invention (e.g. it may be a system like the one shown in
As an alternative or in addition, the video stream 92 may come from a reading unit (not shown in
The video stream with delayed stereoscopic capture 92 is sent to a decoder 930, e.g. of the MPEG4-AVC (H.264) type, which carries out the decompression operation inverse to that carried out at the production stage by the encoder 850. It also reads the signalling entered by the encoder 850, indicating which one of the images L and R contained in a composite frame C was captured before the other.
The decoder video stream 93 may then be subjected to an interleaving operation, if the input video stream comes from capturing systems operating with the interleaved capturing system. This operation can be carried out by a suitable unit 940, which receives the interleaved decoded stream 93 and produces a progressive video stream 94 with delayed stereoscopic capture. If the stream images come from progressive capturing systems, then the de-interleaving operation is not necessary and the decoded stream 93, which is already in progressive form, can be directly supplied to the unpacking unit 950, which carries out the operation inverse to that carried out by the packing unit 840.
The decoded progressive stereoscopic video stream 93 or, respectively, 94 is then broken up into two single-image video streams 95′ L and 95″ R, by extracting the left images L and the right images R from each composite frame C. The two video streams for the left eye and for the right eye must not necessarily be supplied to the next stage 960 over two separate connection lines in the form of distinct video streams, as shown by way of example in
The next stage 960 comprises a video processor enabled to create the frame-alternate sequence with the two right and left images in the correct order, which can be deduced from the signalling received by the decoder 930, which must in some way be transmitted to the device 960. By way of example,
As an alternative to the layout shown in
It should be noted that the video processing system 900 may be incorporated, for example, into a television signal receiver, whether or not equipped with a built-in screen 970; therefore it may be used, for example, within a set-top box or a television set.
Likewise, the system 900 may be incorporated into any multimedia reproduction apparatus capable of displaying three-dimensional video contents, such as, for example, a DVD or Blu-ray disk reader, a tablet, etc., whether or not equipped with a built-in screen for image display.
It must be pointed out that the present invention can also be used for generating and reproducing virtual images with the help of software and hardware means capable of entirely simulating the live capture of three-dimensional stereoscopic scenes (computer graphics). Virtual capture is commonly used for making animation videos and films, where the three-dimensional effect is based on the same general principle of shooting one scene from two points of view, so as to simulate the human visual system.
It can therefore be easily understood that what has been described herein may be subject to many modifications, improvements or replacements of equivalent parts and elements without departing from the novelty spirit of the inventive idea, as clearly specified in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
TO2012A000208 | Mar 2012 | IT | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2013/051865 | 3/8/2013 | WO | 00 |