This application claims the benefit, under 35 U.S.C. §119 of European Patent Application No. 14306853.4, filed Nov. 21, 2014.
The invention relates to a method and to an apparatus for tracking the motion of image content such as a point of interest or an object or a group of pixels in a video frames sequence using sub-pixel resolution motion estimation.
Companies are interested in allowing insertion or replacement of advertisements within existing image/video/film sequences. A human operator can identify areas of interest that allow carrying advertisements—such as plain walls of buildings. Such areas need to be tracked along the image sequence by means of motion estimation and tracking processing. As an alternative, areas of interest can be identified automatically or semi-automatically, provided the necessary technology is available. One approach would be to place, in a single image frame F, markers (e.g. small squares or rectangles or even points) at positions in and/or around areas of interest that seem appropriate for motion estimation and tracking along the video sequence. The corresponding image content or rather (center) pixel, found by motion estimation, feature point tracking or other methods in the next frame F+1, is used as a starting point for estimating the motion from frame F+1 to frame F+2, etc. In case of integer pixel motion estimation, there may result a spatial deviation from the precise motion in the scene, which deviation may accumulate over time and result in misplaced points. For good-quality point tracking, sub-pel resolution is required in the motion estimation. Typically, integer-pel motion estimation—by hierarchical motion estimation or other methods—is carried out first, followed by sub-pel refinement stages.
However, without further measures, also spatial deviations resulting from sub-pel resolution motion estimation may accumulate over time, and can lead to even worse processing results.
A problem to be solved is to provide improved-accuracy tracking of objects or points of interest in video sequences over time, allowing improved-accuracy image content.
The described processing can be used for motion estimation and tracking of image content, such as a point or points of interest or an object or a group of pixels, over a long sequence of frames with sub-pel resolution, without or with only marginal accumulated spatial deviations over time. Initially, motion estimation for specific image content (which term in the following can mean point or points of interest or an object or a group of pixels) is performed with integer-pel resolution, followed by sub-pel refinement. Advantageously thereby computational complexity is reduced. Hierarchical motion estimation or any other suitable motion estimation can be used. The spatial offset introduced when rounding sub-pel motion vector x-y coordinates found for a current frame to the integer-pel grid, prior to the motion estimation for the following frame, is compensated. This processing allows tracking of specific image content or points of interest, as determined in a specific frame out of a long sequence of frames, over that long sequence of frames with improved reliability and accuracy.
The processing is performed from the reference frame in forward and/or in backward directions, e.g. until a scene change occurs or is detected.
In principle, the described method is adapted for tracking the motion of an image content, such as a point of interest or an object or a group of pixels, in a video frames sequence using sub-pixel resolution motion estimation, comprising:
a) estimating the motion of an image content between a reference frame and a following or preceding frame, starting from an integer pixel position in said reference frame;
b) if the end point of the corresponding motion vector has a position between integer pixel positions in said following or preceding frame, replacing the coordinates of said motion vector end point by the coordinates of an adjacent integer pixel position in said following or preceding frame, and storing the error value between said end point coordinates and said replacement coordinates;
c) defining said following or preceding frame as a current frame, and estimating the motion of said point of interest or of said object between said replacement coordinates in said current frame and a following or preceding frame;
d) correcting the following or preceding frame end point coordinates of the corresponding motion vector by using said stored error value in opposite direction, so as to get a corresponding precise motion vector;
e) if the end point coordinates of said precise motion vector have a position between integer pixel positions in said following or preceding frame, replacing the coordinates of said precise motion vector end point by the coordinates of an adjacent integer pixel position in said following or preceding frame, and storing the error value between said end point coordinates and said replacement coordinates;
f) continuing with step c) for further frames of said video frames sequence.
In principle the described apparatus is adapted for tracking the motion of an image content, such as a point of interest or an object or a group of pixels, in a video frames sequence using sub-pixel resolution motion estimation, said apparatus comprising means adapted to:
a) estimating the motion of an image content between a reference frame and a following or preceding frame, starting from an integer pixel position in said reference frame;
b) if the end point of the corresponding motion vector has a position between integer pixel positions in said following or preceding frame, replacing the coordinates of said motion vector end point by the coordinates of an adjacent integer pixel position in said following or preceding frame, and storing the error value between said end point coordinates and said replacement coordinates;
c) defining said following or preceding frame as a current frame, and estimating the motion of said point of interest or of said object between said replacement coordinates in said current frame and a following or preceding frame;
d) correcting the following or preceding frame end point coordinates of the corresponding motion vector by using said stored error value in opposite direction, so as to get a corresponding precise motion vector;
e) if the end point coordinates of said precise motion vector have a position between integer pixel positions in said following or preceding frame, replacing the coordinates of said precise motion vector end point by the coordinates of an adjacent integer pixel position in said following or preceding frame, and storing the error value between said end point coordinates and said replacement coordinates;
f) continuing with process c) for further frames of said video frames sequence.
Exemplary embodiments of the processing are described with reference to the accompanying drawings, which show in:
(a) always rounding,
(b) rounding/cutting off on a frame basis,
(c) rounding/cutting off on an occurrence basis;
(a) rounding with storage of sub-pel offsets and adding to final estimate in next frame,
(b) rounding with storage of sub-pel offsets and adding to final estimate in next frame, with rounding of ½-pel estimates with 4 neighbors to the most similar neighbor,
(c) like (b) with 4 or 2 neighbors;
Even if not explicitly described, the following embodiments may be employed in any combination or sub-combination.
Besides of estimating a motion vector for every picture element in a frame, a hierarchical motion estimator with e.g. block matching can be adapted for tracking specified image content or points/pixels of interest. Other types of motion estimation like gradient-based, phase correlation, ‘optical flow’ processing can also be used.
A number of such pixels of interest are defined for a selected frame F in a stored frame sequence. In
Because the precise motion of an object within the successive frames may correspond to sub-pixel x-y positions, the estimated motion vector end point may lie slightly besides the correct motion vector end point.
If sub-pel motion estimation is applied, the sub-pel x-y coordinates found in frame F−1 are rounded to the nearest integer-pel position, and the hierarchical search with respect to frame F−2 is started from that nearest integer-pel position.
Because a measurement window is used in the motion estimation, however small it may be in the final levels of the hierarchy, a match is actually found for a small part of the image—i.e. a small neighborhood of the point of interest or object or a piece of image content—rather than for a single picture element, such that a slightly shifted motion vector will still be valid for that object.
By rounding/cutting off the sub-pel component before motion estimation for the next frame, some of the accuracy obtained by sub-pel motion estimation and tracking is lost. The result may be even worse—if no further measures are taken—than with integer-pel motion estimation where no position rounding errors arise.
For improving image content or point tracking processing in case of motion estimation with sub-pel x-y resolution, different methods are considered in the following, comprising dealing with ½-pel x-y coordinates, which methods make use of the sub-pel x-y components removed (i.e. ‘rounding vectors’ or offsets or errors) by storing them in connection with the rounding/cutting-off procedure and adding them again later:
In
In a further embodiment, when considering that a motion vector end point should still lie within the intended object after rounding/cutting off, rounding off in case a ½-pel pixel coordinate p1 in x and y directions was found is carried out towards the image signal (e.g. luminance or U or V or R or G or B value) or amplitude value of that neighboring pixel out of the four neighboring pixels a, b, c, d, which has an image signal or amplitude value most similar to the interpolated image signal or amplitude value of the ½-pel coordinate interpolated pixel (e.g. p1 to a, for instance smallest Euclidean distance or MSE or other similarity measure in RGB, YUV, Y or other color space), as shown in
In case of ½-pel resolution coordinate in x or y directions of the motion vector end point, the rounding or cutting off is carried out towards the image signal or amplitude value of that neighboring pixel out of the two horizontally or vertically neighboring pixels a, b, c, d which has an image signal or amplitude value most similar to an interpolated image signal or amplitude value of the ½-pel coordinate interpolated pixel p2 or p3, respectively.
A further way is rounding towards the image signal or value of the corresponding pixel specified originally in frame F in
A still further way will therefore be rounding towards the stored updated value of the pixel, wherein that updated value is stored instead of the initial one. It is updated after motion estimation of each frame, using the sub-pel interpolated signal.
Finally, the image content or point-of-interest PoI coordinates found in the frames of the sequence can be gathered in an array and written to a file.
In the processing flow chart
In
The results differ somewhat whether 4 or 6 levels of hierarchy are used. In case of 6 integer levels of hierarchy also the two points on top of the posts in the front side wall of the highway behind which a car passes by keep their position.
Sub-pel motion estimation has first been carried out before using the above-described processing. With ⅛-pel resolution as an example, the point at the small rectangular road sign at the right is kept while it is moved by the passing dark car to the small round road sign otherwise. However, for other points, e.g. in the fence, at the street lamp or the big rectangular road sign at the bottom, points only deviate somewhat from their original position.
In the motion estimation, due to the measurement window, however small it may be, a match is actually found for a small part of the image rather than for a single picture element. As described above, the motion vector rounding can lead to a difference with respect to the match found in the last integer-pel level of the hierarchy, and such differences accumulate and turn into positioning deviations over time. In case of motion estimation with ½-pel resolution and without any further measures, such deviations have been found to be much larger than with ⅛-pel estimation (see
In
In
In
In
In the enhanced processing in
In
In
Frame-based rounding/cutting off (see
Rounding with offset storage and add shows very similar results (see
Rounding with offset storage and add and with rounding of ½-pel coordinates that have four neighbors to the neighbor with the most similar image signal shows very similar results (see
Rounding with offset storage and add and with rounding of ½-pel coordinates that have four or two neighbors to the neighbor with the most similar image signal shows again slightly better results (see
Although the processing is focusing at the application of targeted content it is likewise applicable to many other applications requiring motion estimation and/or object or point tracking.
The described processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the complete processing.
The instructions for operating the processor or the processors according to the described processing can be stored in one or more memories. The at least one processor is configured to carry out these instructions.
Number | Date | Country | Kind |
---|---|---|---|
14306853 | Nov 2014 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
6104439 | Jeong et al. | Aug 2000 | A |
7792191 | Kwon | Sep 2010 | B2 |
8498338 | Wang et al. | Jul 2013 | B1 |
20090180032 | Heng | Jul 2009 | A1 |
20100111182 | Karczewicz et al. | May 2010 | A1 |
20130028530 | Drugeon | Jan 2013 | A1 |
20130076915 | Ramachandran et al. | Mar 2013 | A1 |
Number | Date | Country |
---|---|---|
101841703 | Jan 2012 | CN |
103313058 | Sep 2013 | CN |
2381418 | Oct 2011 | EP |
Entry |
---|
Search Report Dated Sep. 23, 2015. |
Number | Date | Country | |
---|---|---|---|
20160148392 A1 | May 2016 | US |