Some embodiments of the invention use a motion estimation system divided into two parts, or phases. In the first phase, an exhaustive search is performed over an entire HD (High Definition) frame for each 16×16 macroblock. In the second phase, the search is refined based on the global minima found in the first phase, and refinements may be performed for different partition sizes and for fractional vectors. Phase one has the advantage of minimizing external memory bandwidth and on-chip storage, and can also be further enhanced by using a other quality match criteria than SAD (Sum of Absolute Differences), a simplified matching criterion used in known systems. Phase two has the advantage of performing a logarithmic search technique directed by the minima from phase one, which reduces memory bandwidth and computation time. Further in phase two, to better balance memory bandwidth and computation time, a calculation using the quantization parameter (QP) may also be used during the search, rather than after the search in known systems. By using more complex calculations during the search, deciding which vector to use may be much nearer to the optimum choice. Further, the phase two refinements may be performed on more than one potential global candidate vector that was produced in the first phase to allow better choices after refinement. Further, the topology features from the phase one vector field may be used for: determining any global motions such as pan and zoom; determining how best to fracture the picture elements; and to smooth the vector choices so that the differential vector values do not change much as the picture is compressed.
The first phase has a goal of detecting the best match for a 16×16 macroblock, which is found by exhaustively calculating every match signature for each possible vector across an entire video frame. In the first phase, the vectors during the search may use integer-pel values only, without degrading the quality of the inventive motion vector system. Results from the first phase seed the phase two refinements. The best result of a match in phase one is the identification of a 16×16 macroblock that has a minimum difference within the context of matches to neighbors in the same frame and to matches across frames. Multiple vectors choices may be generated, so that a secondary high-quality logarithmic search completely covers all the areas where the optimum choice may be.
In phase one, the searches are performed using a match signature such as the Sum of All Differences or the Sum of All Square Differences. The SAD value is calculated using:
The SSD is calculated using:
The advantage of SAD is that no multiplications are required, although both SAD and SSD required that the every one of the 256 differences is summed for a 16×16 macroblock. The SSD signature is better than SAD because it is less affected by random noise in each pel value. Another alternative signature is to use a frequency domain transform so that high frequency terms (which the eye is insensitive to) can be discarded before comparison, and also allows a simple noise filter to be applied. Such a signature is a DCT16 signature, which is calculated from sixteen DCT4×4 transforms which are defined in the H.264 standard. The DCT4×4 does not require multiplications as defined. A match value for a 16×16 macroblock is determined by calculating:
A significant advantage is that many of the DCT terms in the summation can be ignored during the comparison. In preferred embodiments of the invention, the first phase uses memory bandwidth and local storage so efficiently that any or all of the signatures may be able to be computed in time, as compared to known systems where the balance of memory access and compute is such that only SAD can be used on a limited number of vector choices.
To generate the DCT signature values, an entire row of macroblocks is buffered, and a 16×16 DCT value is calculated for every pixel location, and can use the stored 4×4 calculations. Therefore, each new vector location only needs four new DCT 4×4 values. To compute approximately 16.8 million DCT16 signatures, 67.1 million DCT 4×4s must be computed. Performing a DCT 4×4 can be coded to fewer than 100 instructions in a typical processor.
Thus, it is possible with the multi-processor cores of today, that phase one can perform the matches in the frequency domain on one or two chips, using a DCT match “signature” which can ignore noise and high-frequency terms and so lead to vector selections forming smooth vector fields that lock to natural picture motion, not noise and edges. It has been shown also that potentially phase one can also search exhaustively all integer-pel vector values across an entire HD frame using one or two chips, and (if needed) that quantizing the near and far searches can reduce the computation overhead without significant loss.
Phase two of the motion estimation refines the vector(s) initially determined in phase one. Phase two includes some standard elements in motion estimation.
Phase two is “logarithmic” search using the commonly used four-step-search (FSS). The FSS is effective provided there are no false minima in the region of the search, and is a good prediction of motion in the surrounding macroblocks. The selection methods used to determine the starting seeds from phase one ensures that phase two provides near optimum results using the FSS.
More than one vector can start any FSS. The best vector candidates are either the seeds from phase one or ‘predicted vectors’ obtained from the phase two results of the neighbors' vectors using techniques described in the above-referenced H.264 standard. Also adaptive heuristics can be used to store “close-match” selections so that previous results for neighbors can be re-adjusted according to the result for the current macroblock. Being able to use the Quantization Parameter QP at this stage can help the heuristics, because after quantization many of the choices may become similar, and so a vector close to the predicted value that otherwise would have been rejected or skipped may become a better choice.
One of the aspects of refinement using the FSS is the ability to perform the FSS on all possible partition sizes, such as 4×4, 4×8, 16×8, 8×8 etc., as defined in the H.264 standard. One method to reduce the number of FSS searches is to use the topology from phase one to encourage and discourage certain fracture patterns and so limit the number of FSS searches performed for each 16×16 macroblock.
In phase two, the phase one motion vector field is scanned (typically in display raster order), which detects the topology regions and generates the “predicted vectors” as additional start points for the phase two refinement. Next the FSS is performed for each of the partitions allowed by the topology regions. Next the integer-pel vector solution is refined to a quarter-pel resolution (which can have the quarter bits either both zero (integer-pel) or 10 (half-pel), and both results are output to the encoder. The above processes can be repeated with the additional candidate vectors, if any are present. Further, any matches that do give high difference values or distort the vector field wildly, for example a moving object such a ball disappearing behind a player or reappearing from behind a crowd, can be searched in other frames for a better match.
To produce the topology regions, each macroblock is tagged with an identifier according to the vector from phase one. “Similar” motion is set within parameterized bounds, for example a vector Euclidean length within +/− one pixel. Thus, macroblocks on a region edge will have a different identifier to a neighbor. The “predicted vector” candidate is calculated as described in the H.264 reference, as illustrated in
Next an FSS is performed and partitions selected. A significant feature of performing this calculation can include the order in which each partition size is searched (denoted as levels). Important considerations include where to start the search at each new level, and how to control the cost function for each level. These can be based on region biases and based on the cost of the previous level.
In performing the FSS, searching takes place +/− 16 pels, starting from a “parent” vector.
a) 16×16 refined search using macroblock candidate vector
b) two 16×8 searches using result of a) as a parent
c) two 8×16 searches using result of a) as a parent
d) four 8×8 searches using result of a) as a parent
e) eight 8×4 searches using results of d) as parent
f) eight 4×8 searches using results of d) as parent
g) sixteen 4×4 searches using results of d) as parent
Each level can halt if the cost becomes too high, without affecting the completion of the next levels. If step d aborts, for instance, the parent vector does not change. Note that there are 7 (equivalent) 16×16 searches.
Interesting features in phase two include where to start the search for each level and controlling the cost function for each level. Embodiments of the invention use a parent vector for each level to start the search, and cost is controlled by performing several techniques. First, if a region is on an edge, the relative cost of a vector is reduced by a parameterized factor, such as ⅔. Also, when a decision has been made at each level, QP is applied to generate a “true cost” for that level. The vector-cost at all lower-levels is compared to the “true cost” and the search is aborted if the vector cost is greater. This stops smaller partitions being chosen when QP is high.
Thus, phase two is a refinement of phase one. Vector smoothing is helped by using parent vectors for each level, using QP to affect decisions at lower levels, and using the edges of motion regions.
The techniques of phase one and phase two are inherently scalable, and can operate on video frames of almost any size.
Different embodiments of phase two could operate on predicted vectors rather than those determined in phase one. For example, they could be predicted from results of the first loop of phase two. Additional refinements could further smooth the vector field, in addition to predictions, using more than one candidate parent vector per macroblock, using QP during the search, and using topology features from the phase one vector field.
Embodiments implementing phase two may use QP to limit the number of partitions, use a parent hierarchy to find better matches, and may use vector field topology to bias partitioning.
From the foregoing it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention.
This application claims benefit of U.S. provisional application 60/790,913, filed on Apr. 10, 2006, entitled MOTION COMPENSATION IN DIGITAL VIDEO, which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60790913 | Apr 2006 | US |