The present invention relates to the processing of digital stereoscopic video signals with accurate rendering of three-dimensional (3D) effects.
In the stereoscopic three-dimensional viewing of a scene, distinct but similar images are presented to the left and right eyes. Disparities between the observed left and right images act as depth cues to the human visual system (HVS), creating the illusion of depth in the perceived image when the HVS combines the left eye and right eye images in the visual cortex. Extending this idea to video, when a time varying sequence of left-eye images and right-eye images is rapidly presented with appropriate disparities between corresponding left-eye and right-eye images, an illusion of depth in a moving scene can be created.
Various stereoscopic 3D display technologies exist, or can be envisaged, that present over a given time period a sequence of correlated image pairs to the left and right eyes.
In some stereoscopic 3D display technologies isolated images are displayed separately to the left and right eyes using independent display systems for instance using head-mounted displays. In general, such systems are only suitable for a single viewer.
In some technologies, the left-eye and right-eye images are displayed simultaneously by being merged into a single image seen by both eyes. Filters respectively placed in front of the two eyes then extract the relevant images from the merged image. The extraction of the images intended for the left and right eyes can be based on frequency separation, like in the so-called Dolby-3D system. Another technology uses different polarization states for the two eyes, like in the so-called RealD system.
On the other hand, in frame-sequential 3D systems, images intended for the left eye are displayed at one time and images intended for the right eye are displayed at another time. The display system alternates between the display of left-eye and right-eye images. During the display of the left-eye image, the path to the right eye is blocked and likewise during the display of the right-eye image, the path to the left eye is blocked. Thus, each eye sees its intended image sequence and sees blackness when an image intended for the other eye is being displayed. For maximum viewer comfort, the system alternates between image and blackness at a sufficiently high rate such that the viewer does not perceive flicker.
In such frame-sequential 3D systems, the blocking of light may be achieved by active eyewear, for example eyewear embedding a pi-cell (optically compensated bend mode LCD surface mode device with parallel rub direction) into each lens of the eyewear. The pi-cell is alternately switched between clear and opaque, synchronously to the frame rate of the television set. Therefore, if the TV alternately supplies left-eye and right-eye images, the active eyewear can steer the corresponding image to each eye, creating the 3D stereoscopic effect.
Alternatively, in another type of frame-sequential 3D system, the blocking may be achieved by passive eyewear incorporating polarizing filters and a switchable polarizer on the display device that can be switched between two opposed polarization states.
Thus, stereoscopic display systems can generally be categorized as (1) simultaneous-type, i.e. a left-eye image and a right-eye image are displayed at the same instant, or (2) staggered-type, i.e. the display shows a left-eye image followed by a right-eye image followed by a left-eye image, etc.
Likewise, two types of video recording apparatus can be distinguished. The first type of apparatus shoots two images at the same time for the left and right eyes, for example using two separate optics spaced apart to account for the stereo effect and supplying incoming light to two respective sensors sampled simultaneously. In the other type, the left-eye and right-eye images are captured at different times, for example using a single light sensor and a prism or mirror in the optical path to switch back and forth between two viewpoints providing the stereo effect. When the 3D video is made of synthesized images, the left-eye and right-eye images are generally built as representations of a 3D scene at the same instant.
Due to the various technologies used for generating and displaying stereoscopic video signals, there can be a slight discrepancy in the time sampling structure between capture and display. For an image sequence representing a static scene, the different time sampling structures between the left and right image sequences create no problem. For a dynamically changing scene, the movement of objects within the scene is slightly changed. However, the perception of movement by the viewer is almost unaffected because the fluctuation in the object speeds is hardly noticeable at the usual frame rates of video contents.
It would be desirable to optimize the rendering of 3D effects in stereoscopic video applications, which is currently based on spatial disparities between the left-eye and right-eye image sequences.
A method of processing stereoscopic video signals is proposed, comprising:
The conversion into the output stereoscopic video signal comprises a time interpolation of the frames of at least one of the first and second frame sequences with interpolation parameters selected to apply a relative time shift of τ2−τ1 to the right-eye channel with respect to the left-eye channel.
The time interpolation can be applied to the frames of one of the first and second frame sequences or both depending on the implementation. In an embodiment, the interpolation parameters are selected to apply a time shift of (τ2−τ1)/2 to the right-eye channel and a time shift of (τ1−τ2)/2 to the left-eye channel.
The third and fourth frame sequences may have a frame rate different from that of the first and second frame sequences, in particular when the display uses a higher frame rate than the input video signal. The time interpolation of the conversion into the output stereoscopic video signal is then advantageously performed as part of a frame rate conversion process.
The present inventors have realized that when handling stereoscopic video signals, referring to a time sampling structure for display which is different from the time sampling structure used for generating the left-eye and right-eye frame sequences can cause artifacts in the perception of depth when the scene includes fast moving objects, unless special measures are taken. Even if the perception of speed is unaffected by a change of time reference, there is an impact on the disparities between the observed left and right images which may give rise to strange 3D effects, particularly if there are both fast moving objects and static or slow moving objects in the scene.
To illustrates this, consider an example where the stereoscopic video signal was generated with a time offset of zero between the left-eye and right-eye frame sequences (τ1=0) and is displayed with the right-eye frame sequence delayed by τ2>0 with respect to the left-eye frame sequence to account for the technology and frame refresh rate of the display device. Then, an object moving from left to right will have an increased disparity on the display and will thus be perceived as closer than it actually is, while an object moving to the left will have a lower disparity and will look farther away. Objects having opposite speeds, or a fast moving object versus a static background, thus give rise to artifacts in the 3D rendering. The processing method avoids such depth artifacts by realigning the time sampling structures on the input and output sides.
The processing method can be applied in a stereoscopic video display if the technology that it uses to provide the 3D effect relies on a time sampling structure different from the one which was used to generate or transmit the 3D content. The method can also be applied on the recording side or by a broadcaster if the stereoscopic camera system samples the 3D scene with a time reference different from the one intended for display or required for transmission to receivers/displays of an unknown type.
The time offset τ2 in the output stereoscopic video signal is often known a priori. It may be fixed by the stereoscopic technology and the frame rate used in the display device, or set by some standard for transmission of stereoscopic video signals over networks or media. On the other hand, the time offset τ1 in the input stereoscopic video signal can be unknown because details of the source of the signal are often unavailable on the display side. However, τ1 may be detected by a suitable analysis of the input stereoscopic video signal. The processing method then comprises: detecting the time offset τ1 by analyzing the first and second frame sequences; and determining the relative time shift of τ2−τ1 based on the detected time offset τ1 and a predetermined time offset τ2. There are different ways of detecting the time offset τ1. A convenient one comprises: comparing at least one frame of the first frame sequence with at least one frame of the second frame sequence to identify directions of regularity between said frames; and detecting a non-zero time offset τ1 when non-horizontal directions of regularity are identified while τ1=0 is decided when only horizontal directions of regularity are identified.
Another aspect of the invention relates to a stereoscopic video signal processor, comprising:
The converter is arranged to perform time interpolation of the frames of at least one of the first and second frame sequences with interpolation parameters selected to apply a relative time shift of τ2−τ1 to the right-eye channel with respect to the left-eye channel.
Other features and advantages of the method and apparatus disclosed herein will become apparent from the following description of non-limiting embodiments, with reference to the appended drawings.
In the embodiment shown in
In the embodiment shown in
The time conversion processor 20 converts the input stereoscopic video signal SL, SR into an output signal whose left-eye and right-eye frame sequences are noted S′L, S′R. The output signal S′L, S′R is fed to the transmitter 22 which transmits or broadcasts this signal. Again, the transmission may be via telecommunication networks and/or storage media as mentioned above.
The formats on the transmission medium may be in accordance the specification of a timing structure for the transmitted stereoscopic video signal. The timing structure typically includes a frame rate which may be one of a plurality of frame rates supported on the transmission medium. In this case, the time conversion processor 20 may apply an FRC process as in
In the context of
The general layout of the time conversion processor 10, 20, 30 may be that of an FRC component. There are many known methods for performing FRC.
In one method, a motion estimator unit forms an estimate of motion in the video by performing a block matching search, or similar spatial matching algorithm. The motion field is described by motion vectors, typically one or more candidated for each block (often 4×4 pixel, though other sizes are possible). The motion field is then smoothed using known techniques. Finally, vectors are assigned to individual pixels using a process known as dilation. Once the motion fine-scale motion field is known, an intermediate frame can be created by interpolating pixels along the direction of motion.
In another method, each image is first transformed, block by block, into the frequency domain using a Fourier transform. The phase information from each transform is then compared in a phase correlator. The peaks in the phase correlator correspond to motion occurring in the underlying blocks. Individual pixels within a block are then assigned to specific motions either derived from this phase correlation, or the phase correlation of neighboring blocks. This assignment is typically performed by dilation. Once the motion fine-scale motion field is known, an intermediate frame can be created by interpolating pixels along the direction of motion.
In a third method, a sparse set of candidate motion vectors is obtained using a heuristic method. Individual pixels in the underlying image are assigned to motion vectors using energies calculated from a series of L-norms. Once the motion fine-scale motion field is known, an intermediate frame can be created by interpolating pixels along the direction of motion.
The FRC technology can shift an image sequence in time by computing new intermediate images representing samples at intervening moments in time. Image statistics gathered in the process of FRC computation can be used to infer the temporal sampling structures of a sequence of stereoscopic image pairs.
An FRC process applied to a sequence of frames having a frame rate F=1/T converts input frames at times t, t+T, t+2T, t+3T, . . . into output frames at times t+Δt, t+Δt+T′, t+Δt+2T′, t+Δt+3T′, . . . , the output sequence having a frame rate F′=1/T′. In addition to the conversion of the frame rate F→F′ or frame interval T→T′, a time shift Δt≧0 occurs.
The present invention can be implemented using any type of frame rate conversion method.
The FRC component looks for directions of regularity in the input frame sequence in order to associate each pixel of an output frame at time t+Δt+kT′ (k≧0) with one or more directions. The direction(s) associated with an output pixel are then used to interpolate the value of that output pixel. Different kinds of directional interpolation can be applied, as well as different methods for detecting the directions of regularity, as exemplified by the three methods mentioned above.
In the case of stereoscopic video signals, there are two frame sequences, one for the left eye having input frames at times tL, tL+T, tL+2T, tL+3T, . . . and one for the right eye having input frames at times tR, tR+T, tR+2T, tR+3T, . . . the frame rate F=1/T is in principle the same for both channels. However, the time sampling reference tL, tR may be different, the offset being designated as τ1=tR−tL. If the input frames were acquired synchronously in the left-eye and right-eye channels, then τ1=0. Values τ1<0 result from the right-eye channel being sampled ahead of the left-eye channel, while values τ1>0 correspond to a delay in the sampling of the right-eye channel as compared to the left-eye channel.
When applied in parallel to a stereoscopic video signal, the FRC process yields output frames at times tL+ΔtL, tL+ΔtL+T′, tL+ΔtL+2T′, tL+ΔtL+3T′, . . . in the left-eye channel and output frames at times tR+ΔtR, tR+ΔtR+T′, tR+ΔtR+2T′, tR+ΔtR+3T′, . . . in the right-eye channel. Since the interpolation takes place in parallel in the two channels, we can select different values of the time shifts ΔtL, ΔtR if needed.
In particular, this is helpful when the kind of technology relied on to display the video sequence assumes a time offset τ2 of the right-eye sequence with respect to the left-eye sequence such that τ2≠τ1, e.g. τ2=0 for a simultaneous-type of stereoscopic display or τ2=±T′/2 for a staggered-type of stereoscopic display. In such a case (τ2≠τ1), the FRC parameters are adjusted such that tR+ΔtR=tL+ΔtL+τ2 or, in other words, a relative time shift ΔtR-L=ΔtR−ΔtL=τ2−τ1 is applied to the right-eye frame sequence with respect to the left-eye frame sequence.
In the embodiment shown in
The relative time shift value τ2−τ1 is provided by an offset adjustment unit 41. The detected directions of regularity are taken into account by the scene evolution analyzer 40 to identify the sampling instants of the two channels on the output side, and by respective interpolators 42, 43 which compute the pixel values of the left-eye and right-eye output frame sequences S′L, S′R. The interpolation weights used by the interpolators 42, 43 depend on the time shifts ΔtL, ΔtR which are selected to apply the relative time shift τ2−τ1 to the right-eye channel with respect to the left-eye channel.
For applying such time shifts ΔtL, ΔtR, different alternative ways are possible, for example:
The latter option has the advantage of distributing the interpolated frames between the two stereoscopic channels. Otherwise, if the two frame rates F, F′ are multiple of each other (or F=F′), one of the output sequences S′L, S′R contains more non-interpolated frames than the other, and that output sequence may be perceived as having a higher quality than the other sequence. It may be perceptually preferable to balance the slight degradation introduced by the interpolation process between the two stereoscopic channels.
In an embodiment, for determining the relative time shift to τ2−τ1 to be applied between the right-eye and the left-eye channels, the offset adjustment unit 41 receives the values τ1, τ2 as settings determined in advance. For example, in a TV set (
In situations where the time offset τ1 in the input stereoscopic video signal is not known, the conversion processor 10, 20, 30 may detect it by analyzing the two frame sequences SL, SR of the input signal. This analysis for detecting τ1 can be part of the analysis carried out by the scene evolution analyzer 40. From a predetermined value of the time offset τ2 on the output side, the offset adjustment unit 41 can then determine the relative time shift τ2−τ1 to be applied in the time conversion processor 10, 20, 30.
Different analysis methods can be used in order to detect the time offset τ1 in the input video signal. A convenient method consists in providing two corresponding frames from the left-eye channel and the right-eye channel, respectively, to the analyzer 40 in order to identify directions of regularity. A motion detection process based on these two frames yields vectors which denote stereoscopic disparities and, if the two frames are misaligned in time, some motion component. If the two frames were generated simultaneously (τ1=0), then there is a very strong probability that only horizontal vectors will be detected since the two frames differ only by the disparities between the left and right channels. If, on the other hand, motion vectors having significant vertical components are detected, the analyzer 41 can decide that τ1≠0, which means, in general, that τ1=T/2. This kind of test between the frames of the left-eye and right-eye channels can be performed once for a given program, or periodically (at a frequency much smaller than the frame rate) for more reliability. The test may be repeated if only horizontal vectors are detected since this may also be due to a 3D scene shot with τ1≠0 but including only objects which mostly move horizontally.
It will be appreciated that the embodiments described above are illustrative of the invention disclosed herein and that various modifications can be made without departing from the scope as defined in the appended claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2010/068143 | 11/24/2010 | WO | 00 | 9/28/2012 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2011/120601 | 10/6/2011 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20080309756 | Verburgh et al. | Dec 2008 | A1 |
20090009590 | Russell et al. | Jan 2009 | A1 |
20100053306 | Hirasawa | Mar 2010 | A1 |
20100238274 | Kim | Sep 2010 | A1 |
20110285815 | Kervec | Nov 2011 | A1 |
Number | Date | Country |
---|---|---|
1592264 | Apr 2006 | EP |
Entry |
---|
International Search Report and Written Opinion from International Application No. PCT/EP2010/068143 dated Feb. 28, 2011. |
Number | Date | Country | |
---|---|---|---|
20130033570 A1 | Feb 2013 | US |
Number | Date | Country | |
---|---|---|---|
61320594 | Apr 2010 | US |