The present invention relates to interpolation methods used in digital video processing such as frame-rate conversion.
In digital systems, a video sequence is typically represented as an array of pixel values It(x) where t is an integer time index, and x is a 2-dimensional integer index (x1, x2) representing the position of a pixel in the image. The pixel values can for example be single numbers (e.g. gray scale values), or triplets representing color coordinates in a color space (such as RGB, YUV, YCbCr, etc.).
Temporal video interpolation is commonly performed using motion detection, or more generally detection of directions of regularity, in a sequence of input video frames. A popular technique for motion detection is referred to as “block matching”. In the case where an object is moving at a given speed in the video sequence, the best directional interpolation is well defined and corresponds to the speed of the object. This works well when the objects are visible in both reference frames at times t and t+1. In the case of an occlusion or a disocclusion, some pixels belonging to an object are visible in one of the two reference frames only. There is therefore no unambiguous direction of regularity. If, however, the algorithm applies a directional interpolation in this region, artifacts such as ghosting or halo are inevitable.
A one-sided directional interpolation can be used to reduce these artifacts. Still it is necessary to determine the proper motion, knowing that only one of the two reference frames provides the right pixel information, and which reference frame will provide the pixel values to be copied.
In U.S. Pat. Nos. 4,890,160 and 7,010,039, it has been proposed to detect beforehand obscured areas (occlusions) and uncovered areas (disocclusions). For interpolation, picture information in uncovered areas is taken from the following picture in the input sequence, while information in obscured areas comes from the preceding picture. However, the preliminary detection of occlusion/disocclusion areas requires information from a number of consecutive input frames. This has a negative impact in terms of line buffer size requirement in the video processing device. Also, the occlusion/disocclusion detection can have a significant computational cost.
There is thus a need for a method in which the interpolation parameters can be determined with limited complexity.
A video interpolation method as proposed herein comprises:
Remarkably, occlusions and disocclusions are dealt with using pixels only from two consecutive frames of the input video sequence. They can be decided as and when the output pixels are processed, by comparing directional energies and/or directions of regularity determined for the current output frame and for the two consecutive input frames located before and after the current output frame. The interpolation is made very efficiently with limited internal buffer requirements. This has a strong impact in terms of component cost reduction.
In an embodiment of the method where the comparison of results of the first, second and third optimizations bears on the minimized directional energies, the determination of a pixel value for an output pixel comprises:
An occlusion typically occurs at an output pixel where the minimal directional energy for the “second pixel”, projected onto the second frame, is much lower than the minimal directional energies determined for both the output pixel and the “first pixel”, projected onto the first frame. In this case, the interpolation uses pixels of the first frame which includes the object being occluded. The interpolation will be based on a direction of regularity belonging to the object in the second image, i.e. a direction corresponding to a speed of the object being occluded. Conveniently, this can be the direction of regularity associated with the second pixel. For example, the pixel value for the output pixel is taken as a value of a pixel of the first frame aligned with the output pixel along the direction of regularity associated with the second pixel.
Likewise, a disocclusion typically occurs at an output pixel where the minimal directional energy for the “first pixel”, projected onto the first frame, is much lower than the minimal directional energies determined for both the output pixel and the “second pixel”, projected onto the second frame. In this case, the interpolation uses pixels of the second frame which includes the object being uncovered. The interpolation will be based on a direction of regularity belonging to the object in the first image, such as the direction of regularity associated with the first pixel. For example, the pixel value for the output pixel is taken as a value of a pixel of the second frame aligned with the output pixel along the direction of regularity associated with the first pixel.
In this embodiment, a number X may be said to fulfill one of the “comparison criteria” with respect to another number Y when X is markedly larger than Y. The extent meant by “markedly” here may depend on the application, the ranges for the pixel values, etc. Examples will be given further below. In addition, “markedly” can encompass different relative magnitude levels in the first and second comparison criteria. However, it is convenient to use first and second comparison criteria which are the same.
If the minimized directional energies for the first and second pixels both fulfill the first comparison criterion with respect to the minimized directional energy for the output pixel, the determination of a pixel value for the output pixel can use at least one pixel value from the first frame of the input video sequence if the minimized directional energy for the second pixel is smaller than the minimized directional energy for the first pixel, and at least one pixel value from the second frame of the input video sequence if the minimized directional energy for the first pixel is smaller than the minimized directional energy for the second pixel. In particular, it is possible to take for the output pixel a value of a pixel of one of the first and second frames aligned with the output pixel along the direction of regularity associated with the output pixel.
The output pixels of the frame of the output video sequence are typically processed in a raster order using a first buffer containing values of input pixels from the first and second frames of the input video sequence and a second buffer for containing the directions of regularity and the minimized directional energies respectively associated with said input pixels from the first and second frames. The comparison of results of the first, second and third optimizations for an output pixel then comprises reading from the second buffer the directions of regularity and/or the minimized directional energies respectively associated with the first and second pixels. The determination of a pixel value for the output pixel comprises reading at least one pixel value from the first buffer.
Prior to the comparison of results of the first, second and third optimizations for an output pixel, new values of the directions of regularity and of the minimized directional energies respectively associated with two input pixels can be computed and stored in the second buffer, the two input pixels having respective pixel positions in the first and second frames depending on a position of the output pixel being processed and on maximum vertical and horizontal magnitudes for the directions of regularity.
According to another feature, the first and second optimizations can be performed with a lower accuracy than the third optimization. An option for this is to use, in the first and second optimizations, pixel values from the first and second frames of the input video sequence represented more coarsely than pixel values from the first and second frames used in the third optimization.
Another aspect of the invention relates to a video interpolation device, comprising:
The video processor 6 performs interpolation between frames of the input video sequence using the block matching technique. The method proposed here makes it possible to determine an interpolated output frame at a time t+Δ (0<Δ<1) based on consecutive input frames at times t and t+1. In the application to frame rate conversion, the processor 6 operates at the rate of the output sequence. If an output frame is aligned in time with one of the input frames, it is simply determined as a copy of that input frame. An output frame located in time at t+Δ (0<Δ<1) is determined using the interpolation between two consecutive input frames at t and t+1.
In the block matching technique as used here, estimating the motion at a pixel x=(x1, x2) and at time t+α (0≦α≦1) consists in identifying the motion vector or direction of regularity v=(v1, v2) which minimizes a matching energy Ex,t+α(v) over a spatial window W which is a set of offsets d=(d1, d2). A possible form of the matching energy (L1-energy) is
Another possible form is the L2-energy or Euclidean distance:
In the optimization process, the directions of regularity v=(v1, v2) can be selected from a limited set Ω of candidate directions in order to reduce the computation load. A suitable way of determining this set of candidate directions Ω is described in WO 2009/087493. The method described in co-pending international patent application No. PCT/EP2010/050734 can also be used.
In
For certain pixels of the intermediate frame to be interpolated at time t+Δ, such as pixels x and y shown in
The interpolation at pixel x, y and time t+Δ can then be performed using pixels from both the first and second frames, for example linearly:
It+Δ(x)=(1−Δ)·It(x′)+Δ·It+1(x″),It+Δ(y)=(1−Δ)·It(y′)+Δ·It+1(y″).
However, some pixels z of the intermediate frame at time t+Δ belong to the occluded portion B′ and are thus visible in the first frame at time t but not in the second frame at time t+1. The direction of regularity vz is determined for such a pixel z as minimizing the directional energy Ez,t+Δ(v), namely
In the occlusion case, the direction vz is not reliable. It points to a first pixel z′ in the first frame at time t and to a second pixel z″ in the second frame at time t+1.
In the present interpolation method, the minimal directional energies
at pixels z′ and z″ of the first and second frames, and the associated directions of regularity
are available when processing the output pixel z. In that processing, the relative magnitudes of Ez, E′z′ and E″z″ are compared. The occlusion case typically occurs where E″z″ is markedly lower than both Ez and E′z′. Indeed, pixel z′ generally belongs to portion B′ when E″z″<<Ez and E′z′. Like pixel z, it does not have an unambiguous counterpart in the second frame at time t+1 while pixel z″ has a much higher probability to belong to a part of the background object which appears in both the first and second frames. This criterion can be used to decide on the fly that pixel z pertains to an occluded area. When that decision is made (occlusion), one-sided interpolation is performed for example as follows:
A simple possibility is to copy at (z, t+Δ) the pixel value at ({tilde over (z)}, t) with {tilde over (z)}=z−Δ·v″z″ as illustrated in
Symmetrically,
A simple possibility is to copy at (z, t+Δ) the pixel value at ({tilde over (z)}, t+1) with {tilde over (z)}=z+(1−Δ)·v′z′ as illustrated in
It can happen that the comparison done when processing the output pixel z shows that both E′z′ and E″z″ are markedly lower than Ez. There are then three possibilities:
The latter situation is much less frequent than a regular occlusion or disocclusion case. If it occurs, the comparison between E′z′ and E″z″ is used to decide which of the first and second frames will provide the pixel(s) for the one-sided interpolation of frame t+Δ. Namely if E″z″<E′z′, one or more pixels are copied from the first frame at t, while E′z′<E″z″, one or more pixels are copied from the second frame at t+1. This one-sided interpolation can be performed based on the direction of regularity vz determined for pixel z at time t+Δ, i.e. It+Δ(z)=It(z′) if E″z″<E′z′ and It+Δ(z)=It+1(z″) if E′z′<E″z″.
In the embodiment of
in the figure.
If none of E′z′ and E″z″ fulfills that comparison criterion
with respect to
linear interpolation is performed in step 50 to calculate pixel z as mentioned above:
It+Δ(z)=(1−Δ)·It(z′)+Δ·It+1(z″).
Otherwise, another comparison is performed between the minimal directional energies E′z′, E″z″ for the first and second pixels in step 45, 46 or 47.
test 45 is applied to check whether
where
denotes another comparison criterion which may be the same as or different from the first criterion
If test 45 is positive, it is decided that there is an occlusion at pixel z in frame t+Δ (
It+Δ(z)=It(z−Δ·v″z″).
If
test 46 is applied to check whether
If test 46 is positive, it is decided that there is a disocclusion at pixel z in frame t+Δ (
It+Δ(z)=It+1(z+(1−Δ)·v′z′).
If tests 43-44 show that
or if one of tests 45 and 46 is negative, the occlusion/disocclusion situation is complex as mentioned above. Test 47 is then applied to determine the largest energy among E′z′ and E″z″. If Ez′<Ez″, one-sided interpolation from the second frame is applied in step 53: It+Δ(z)=It+1(z″). If Ez″<Ez′, one-sided interpolation from the first frame is applied in step 54:
It+Δ(z)=It(z′).
Possible expressions for the comparison criteria are
if and only if X<pa·Y+qa, and
if and only if X<pb·Y+qb where pa, pb are positive numbers and qa, qb are positive or negative numbers. In particular, for simplicity, the comparison criteria
can be identical; in the above example, this gives pa=pb=p (e.g. p=3) and qa=qb=q whose magnitude is chosen based on the dynamic ranges for the pixel values.
In an embodiment, the second comparison criterion
is a simple inequality, i.e. pb=1 and qb=0. Tests 45, 46 and 47 are then the same and the procedure of
in steps 43-44, test 47 is performed to determine the largest energy among E′z′ and E″z″. If Ez″<Ez′, the occlusion-type of one-sided interpolation is applied in step 51. Otherwise, the disocclusion-type of one-sided interpolation is applied in step 52. The remainder of the procedure is the same as in
Alternatively, the comparisons made to decide whether linear interpolation is suitable, or whether one-sided interpolation should be used instead, can involve the directions of regularity vz, v″z′ and v″z″ determined for the pixels z, z′ and z″. For example, a first test can be made to check whether vz=v′z′=v″z″ (or vz≈v′z′≈v″z″). If vz=v′z′−v″z″, linear interpolation 50 can be selected. If not, the minimized directional energies Ez, E′z′, E″z″ are further compared to decide, along the lines indicated above, which of the input frames is better suited for one-sided interpolation.
The pixels of the output video sequence at time t+Δ are typically processed in the raster order.
The pixel values of regions R′ (time t) and R″ (time t+1) must be made available to the interpolator module 63 in buffers 15′ and 15″ in order to perform interpolation based on the direction of regularity v′z′, v″z″ or vz which will be selected.
In
Since the pixels are processed in the raster order, the contents of the buffers can be arranged as horizontal stripes of lines extending over the width of the frames. This is indicated by boxes T′, U′ in
Before processing the output pixel z, the bottom right pixel w′, w″ of regions S′, S″ of the first and second frames are loaded in line buffer parts 15′, 15″ to be included in the pixels regions U′, U″ stored in those buffer parts 15′, 15″. The pixels of regions S′, S″ are then available to the optimization modules 60″, 60′ for calculating the new directions of regularity at times t+1 and t, respectively. After this calculation, these directions of regularity v′z′
Various modifications can be made to the examples which have been described above.
For example, in the embodiment shown in
The decompressor 24 can operate at different resolution scales for representing the pixel values. In the example of
While a detailed description of exemplary embodiments of the invention has been given above, various alternative, modifications, and equivalents will be apparent to those skilled in the art. Therefore the above description should not be taken as limiting the scope of the invention which is defined by the appended claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2010/050744 | 1/22/2010 | WO | 00 | 8/10/2011 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2010/091937 | 8/19/2010 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4727422 | Hinman | Feb 1988 | A |
4890160 | Thomas | Dec 1989 | A |
5151784 | Lavagetto et al. | Sep 1992 | A |
5661525 | Kovacevic et al. | Aug 1997 | A |
5784114 | Borer et al. | Jul 1998 | A |
6625333 | Wang et al. | Sep 2003 | B1 |
6744931 | Komiya et al. | Jun 2004 | B2 |
7010039 | De Haan et al. | Mar 2006 | B2 |
7343044 | Baba et al. | Mar 2008 | B2 |
7899122 | Ohwaki et al. | Mar 2011 | B2 |
7983339 | Francois et al. | Jul 2011 | B2 |
8224126 | Sartor et al. | Jul 2012 | B2 |
20020171759 | Handjojo et al. | Nov 2002 | A1 |
20030142748 | Tourapis et al. | Jul 2003 | A1 |
20040091046 | Akimoto et al. | May 2004 | A1 |
20050157792 | Baba et al. | Jul 2005 | A1 |
20050212974 | Michel et al. | Sep 2005 | A1 |
20060193535 | Mishima et al. | Aug 2006 | A1 |
20060222077 | Ohwaki et al. | Oct 2006 | A1 |
20080231745 | Ogino et al. | Sep 2008 | A1 |
20090245694 | Sartor et al. | Oct 2009 | A1 |
20110096227 | Bruna Estrach | Apr 2011 | A1 |
20110310304 | Bruna Estrach | Dec 2011 | A1 |
20120243611 | Kondo | Sep 2012 | A1 |
Number | Date | Country |
---|---|---|
1734767 | Dec 2006 | EP |
1855474 | Nov 2007 | EP |
9922520 | May 1999 | WO |
2004039074 | May 2004 | WO |
WO 2005004479 | Jan 2005 | WO |
2005022922 | Mar 2005 | WO |
2005027525 | Mar 2005 | WO |
2009053780 | Apr 2009 | WO |
2009087493 | Jul 2009 | WO |
2010091930 | Aug 2010 | WO |
2010091934 | Aug 2010 | WO |
Entry |
---|
International Search Report dated Apr. 27, 2010 from corresponding Application No. PCT/EP2010/050744. |
Silveira et al., “Variable Block Sized Motion Segmentation for Video Coding,” 1997 IEEE International Symposium on Circuits and System, Jun. 9-12, 1997, Hong Kong, pp. 1293-1296. |
Mertens et al., “Motion vector field improvement for picture rate conversion with reduced halo,” Proc. SPIE, 2001, vol. 4310, pp. 352-362. |
Braspenning et al., “Efficient Motion Estimation with Content-Adaptive Resolution,” Sep. 23, 2002, pp. E29-E34. |
Kim et al., “Motion-adaptive alternate gamma drive for flicker-free motion-blur reduction in 100/120-Hz LCD TV,” Journal of the SID, 2009, vol. 17/3, pp. 203-212. |
International Preliminary Report on Patentability and Written Opinion dated Aug. 16, 2011 for Application No. PCT/EP2010/050744. |
Number | Date | Country | |
---|---|---|---|
20110311163 A1 | Dec 2011 | US |
Number | Date | Country | |
---|---|---|---|
61152121 | Feb 2009 | US |