Various embodiments generally relate to methods and devices for estimating motion in a plurality of frames.
Typically, a video sequence contains many redundancies, where successive video frames can contain the same static or moving objects. Motion estimation (ME) may be understood as being a process which attempts to obtain motion vectors that represent the movement of objects between frames. The knowledge of the object motion can be used in motion compensation to achieve compression.
In block-based video coding, the motion vectors are determined by the best match for each macroblock in the current frame with respect to a reference frame. A best match for a N×N macroblock in the current frame can be found by searching exhaustively in the reference frame over a search window of ±R pixels. This amounts to (2R+1)2 search points, each requiring 3N2 arithmetic operations to compute the sum of absolute differences (SAD) as block distortion criterion. This is very high for software implementation.
Some conventional ME techniques to reduce the number of search points using predefined search patterns and early termination criteria assume unimodal error surface; i.e., matching error increases monotonically away from the position of global minimum.
When content motion is large or complex, the assumption of a unimodal error surface may no longer be valid. Consequently, fast ME methods may produce false matches, thus leading to inferior quality motion-compensated frames that degrade coding performance.
In various embodiments, a method for estimating motion in a plurality of frames is provided, the method including determining a first set of motion vectors with respect to a first frame and a second frame, the second frame being in succession with the first frame along a time direction, determining a second set of motion vectors with respect to a predicted frame and the second frame, the predicted frame being in succession with the first frame along the time direction; wherein some motion vectors of the second set of motion vectors are interpolated from motion vectors of the first set of motion vectors; and determining a third set of motion vectors based on the first set of motion vectors and the second set of motion vectors.
In the drawings, like reference characters generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of various embodiments. In the following description, various embodiments are described with reference to the following drawings, in which:
The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the invention. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration”. Any embodiment or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or designs.
In an embodiment, a “circuit” may be understood as any kind of a logic implementing entity, which may be hardware, software, firmware, or any combination thereof. Thus, in an embodiment, a “circuit” may be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g. a microprocessor (e.g. a Complex Instruction Set Computer (CISC) processor or a Reduced Instruction Set Computer (RISC) processor). A “circuit” may also be software being implemented or executed by a processor, e.g. any kind of computer program, e.g. a computer program using a virtual machine code such as e.g. Java. Any other kind of implementation of the respective functions which will be described in more detail below may also be understood as a “circuit” in accordance with an alternative embodiment.
In the following, vectors and matrices will be indicated using bold letters as well as underlining interchangeably.
In
As will be described in more detail below, various embodiments provide a framework (which will also be referred to as Lacing in the following) that integrates seamlessly with as such conventional fast ME methods and may improve their motion prediction accuracy when employing the HB structure by e.g. extending their effective motion search range through successive motion vector interpolation along the macroblock's (a macroblock may include one or more blocks, each block including a plurality of pixels) motion trajectories across the frames within the GOP. It has been observed that rigid body motions may produce continuous motion trajectories spanning a number of frames across time. By exploiting these motion characteristics, Lacing may help to progressively guide the motion prediction process while locating the ‘true’ motion vector even across a relatively large temporal distance between the current and reference frames. In this context, it is to be noted that fast ME algorithms, which may be very effective for motion estimation over relatively small motion search ranges, can become ineffective when applied in the HB structure. In various embodiments, fast ME methods may be provided to provide a fast speed and simple motion estimation even with increasing temporal distance.
As shown in
In the following, an implementation of the lacing process 204 will be described in more detail.
Having observed the motion continuity of rigid body motions across frames, the Lacing framework (in other words, the lacing process 204) may exploit these strong temporal correlations in the motion vector fields of neighbouring frames, such that:
Mt,t−2(p)≈Mt,t−1(p)+Mt−1,t−2(p+Mt,t−1(p)) (1)
where Mt
m
j
=m
j−1
+M
t
−j,t
−j−1,(p+mj−1) (2)
with initial condition
m
0
=M
t
,t
−1(p). (3)
It is noted that the updating term in equation (2) is a motion vector from f(t1−j) to f(t1−j−1), which is only across a unit temporal interval. Thus, the updating motion vector can be computed using fast (or small search range) ME methods. This contrasts with the direct computation of Mt
In various embodiments, in each iteration of equation (2), the macroblock at p+mj−1 is motion estimated. Using the exhaustive method with ±v motion search range, each macroblock may require an average of (t1−t0)(2v+1)2 search points. For a GOP (e.g. GOP 202) of T frames and with 1+log2 T temporal levels in the HB structure 100, each macroblock may require an average of (1+log2 T)(2v+1)2 search points.
The following process outlines the steps to reduce the average number of search points to (2v+1)2 per macroblock.
For t0≠t1, Mt
m
j
=m
j−1
+u(Mt
p
j
=p+m
j−1 (5)
with s=sgn(t1−t0) and the initial condition
m
0
=M
t
,t
−s·1(p) (6)
The updating vector function u in equation (4) is a motion vector at pj interpolated from the neighboring motion vectors (in various embodiments, bilinear interpolation may be used to obtain u; note that other interpolation methods are applicable, some of which will be described in more detail below):
In the following, the process will be summarized in a pseudo code form:
Equations (4)-(6) form computing steps in the Lacing framework, which is outlined in Algorithm 1 for motion estimating frames in the HB structure (such as e.g. HB structure 100). Unlike equation (2), no motion estimation may be required when evaluating the updating vector in equation (4), since Mt,t±1 can be precalculated (see step 1 to 2 in Algorithm 1). In various embodiments, only Mt,t±1 may be accessed at fixed macroblock positions.
In the following, a complexity analysis will be provided on the above-described lacing process.
When motion estimation is used with Lacing, the computation overheads are attributed to the following processes:
Using the exhaustive method with a search range of ±v pixels, and applying Lacing to a HB-structured GOP of T frames and 1+log2 T temporal levels requires an average of (4−3/T)(2v+1)2 search points, or 2(2v+1)2 search points without the refinement step 12 in Algorithm 1.
Various embodiments provide an application of a hierarchical B-pictures structure in e.g. a H.264/SVC video coding standard and provide a solution to meet the challenge for effective motion estimation (ME) across frames with much larger temporal distance. Various embodiments provide a Lacing framework which may integrate seamlessly with as such conventional fast ME methods to extend their effective search range along the motion trajectories. Experiments showed that Lacing can yield significantly better motion prediction accuracy by as high as 3.11 dB improvement in quality and give smoother motion vector fields that require fewer number of bits for encoding the motion vectors.
In the following, a more concrete implementation of the above described embodiment of the lacing process will be described. It is to be noted, that in the following, a modified notation will be used compared with the lacing process described above.
As already mentioned above, motion estimation (ME) is a mechanism provided in video compression. It is a process of obtaining motion information to predict video frames. The video can be compressed by coding the motion information and prediction error. This method works because similar blocks of pixels can usually be found in neighboring picture frames. The motion information coded may be the displacement between matching pixel blocks, or macroblocks.
This coded data may also be referred to as motion vectors (such as e.g. motion vectors 214). To obtain a matching for a N×N macroblock, an exhaustive search can be performed over ±M pixel range in the preceding picture frame. This requires N2(2M+1)2 computations (using minimum sum of absolute differences (SAD) as a matching criteria), which is very high for software implementation.
Examples of fast ME techniques or ME methods that may be used in various embodiments are, inter alia: three-step search, 2D logarithmic search, new three-step search, diamond search (DS) and adaptive rood pattern search (ARPS).
As also already mentioned above, under the HB prediction structure (e.g. HB structure 100), each frame in the GOP (group of pictures) 202 may be bi-directionally estimated from reference pictures at a lower temporal level. At lower temporal levels, the distance (also referred to as temporal distance) between the estimated and reference frames increases. Motion estimation may become more difficult as the temporal distance increases. First, there is likely to be fewer good-matching macroblocks due to occluding and uncovering areas. This may lead to large prediction error and reduces coding efficiency. Secondly, due to longer motion trajectories, a larger search area may be needed to find the matching macroblock. This may significantly increase the computation cost. Hence, when fast ME methods are applied to the HB structure (e.g. HB structure 100), they generally fail to give satisfactory performance because of their limited effective search range.
Various embodiments may improve the prediction accuracy of fast ME algorithms in the HB structure (e.g. HB structure 100). This may be achieved by extending their effective search range through tracing motion trajectories across GOP.
Lacing is algorithmically simple with modest computation overhead. Yet, significant performance gain may be observed with the Lacing framework.
As will be described in more detail below, the Lacing framework may extend the effective search range of existing fast motion estimation methods and may improve their prediction accuracy in the hierarchical B-pictures structure. One idea of various embodiments including Lacing is to trace the motion trajectories of macroblocks across GOP.
The ‘lace’ of macroblocks along each trajectory are likely to have high similarity. The position of macroblocks on each ‘lace’ can be used to determine the motion vector of a macroblock with reference to any picture frame in the same GOP. The rational is that the trajectories of moving objects in a picture sequence are generally coherent and continuous across time.
We begin by illustrating the motion trajectory tracing of macroblocks across GOP.
Let f(t) represent a picture frame at time t. Also, let X(t1,t0) denotes the set of motion vectors of f(t0) with reference frame f(t1). If t0>t1, then X(t1,t0) is a set of forward motion vectors. Backward motion vectors if otherwise.
For simplicity, a motion trajectory tracing to determine forward motion vectors will be described in more detail; the adaptation of the process for a motion trajectory tracing to determine backward motion vectors is straightforward.
Consider a GOP of K frames {f(t)}0≦t≦K with key frame f(0), its set of forward motion vectors is denoted
χp={X(1,0), X(2,1), . . . , X(K,K−1)}, (8)
which can be obtained using fast ME techniques. Then, the Lacing algorithm estimates the HB forward motion vectors,
from χp by tracing. As an example, X(k,k−2) will be estimated from both X(k,k−1) and X(k−1,k−2).
For each N×N macroblock positioned [m,n] in f(k), its motion vector is denoted as
x(k,k−1;m,n)∈ X(k,k−1). (10)
The referenced macroblock in f(k−1) is positioned at
[m′,n′]=[m,n]+x(k,k−1;m,n). (11)
However, it is likely that x(k−1,k−2;m′,n′) may not be in X(k−1,k−2) since m′ and n′ are not necessarily (cN−1) for some integer c. To continue tracing the trajectory into f(k−2), the motion vector may be interpolated
where
b
l(q)=[1−q,q],
b
r(q)=blT(q)I2,
Finally, the interpolated motion vector {tilde over (x)} may be used to compute
{tilde over (x)}[k,k−2;m,n]=x[k,k−1;m,n]+{tilde over (x)}[k−1,k−2;m′,n′] (13)
Generally, for 0≦J<K, x(K,J;m,n) can be obtained by iterating the following
To obtain the backward motion estimation in the HB structure, the same procedures may be repeated with the set
χb={X(K−1,K), X(K−2,K−1), . . . , X(1,2)}, (15)
and iterating for L>K,
The following summarizes the Lacing procedures in accordance with one implementation:
In various embodiments, an effect of the Lacing technique may be low computational complexity, which may depend on the type of fast ME method applied. From step 4 in the summarized Lacing procedures above, the number of search points per macroblock in the Lacing method can be 1.5 times2 that of the corresponding fast ME techniques. This may be acceptable since fast ME methods have low average search points per macroblock to begin with.
Another source of extra computation comes from interpolating the motion vectors in eqn. (2), which attributed an additional 2×(12MULS+6ADDS) per macroblock on average.
This is a reasonably small overhead compared to N2 ABS+(2N2−1)ADD operations required to calculate the SAD per macroblock at each search point.
{tilde over (x)}
(t,t−2;0,0)=x(t,t−1;0,0)+{tilde over (x)}(t−1,t−2;m′,n′) (17)
where {tilde over (x)}(t−1,t−2;m′,n′) is interpolated from the neighbouring motion vectors.
In various embodiments, one or more of the following GOPs (e.g. GOP 202) may be provided.
The set χp is an illustrating example for forward motion estimation that follows the {IPPP} frame coding pattern. This frame coding pattern is one of the simplest and commonly used in video coding (from the earliest standards like H.261 and MPEG-1, to the latest H.264).
Other representation is also possible, but it is of course limited to practicality.
In the alternate example where {X(1, 0), X(4, 1), X(5, 3) . . . X(K,K−n)}, some of the inter-frame distance is large, such as X(4, 1). This means the motion estimation may have to search a wider range to get accurate estimation. That is why the {IPPP . . . } pattern with unit inter-frame distance, i.e. {X(1, 0), X(2, 1), . . . , X(K,K−1)}, is still provided in many conventional video coding applications for speed and accuracy reasons. However, by restricting to unit inter-frame distance, the video application may be unable to utilize more advanced or more feature-enhanced frame coding patterns such as the hierarchical-B-picture (HB) structure and {IBBP} (as an alternative picture structure which may be provided in alternative embodiments) since these coding patterns may require interframe distance to be greater than a unit for motion estimation. That is, X(t1,t0) where t1−t0>1. Computation complexity (for motion estimation) may increase as t1−t0 becomes large because a large search area required to maintain the quality of estimation.
In scalable video coding in accordance with various embodiments, which use the hierarchical B-pictures structure, the ME representation may depend on the different temporal levels of hierarchy in the HB structure (such as e.g. HB structure 100). It is to be noted that other non-dyadic HB structures may also be used in alternative embodiments. It should further be noted that the Lacing algorithm is not restricted by whether the HB structure is dyadic or not.
In the following, some more details about various possible implementations of interpolation processes in accordance with various embodiments will be described.
Bilinear interpolation: Suppose the function f is known at four corners (0, 0), (1, 0), (0, 1) and (1, 1) of a unit square (e.g. a macroblock). For 0≦(x,y)≦1, the interpolated surface p is given by
where
a
00=f(0,0) (19)
a
10=f(1,0)−f(0,0) (20)
a
01=f(0,1)−f(0,0) (21)
a
11=f(0,0)−f(1,0)−f(0,1)−f(1,1) (22)
In this description, f may be replaced by the values of the motion vectors.
Bicubic interpolation: Suppose the function f is known at four corners (0, 0), (1, 0), (0, 1) and (1, 1) of a unit square (e.g. a macroblock). For 0≦(x,y)≦1, the interpolated surface p is given by
where the 16 coefficients aij are first obtained by solving a linear system constraint by values of f and its derivatives (fx,fy,fxy) at the four corners.
In this description, f may be replaced by the values of the motion vectors.
In the following, some examples of Group of Pixels are illustrated which may be provided in various embodiments:
In video coding, a picture may usually be divided into blocks also referred to as macroblocks.
There are a few reasons for doing this, such as memory efficiency, localized analysis and processing, and coding efficiency.
Conventionally, the default macroblock size is 16×16. There is no particular mathematical reasoning for this choice and other choices may be provided in various embodiments. If the block is too big, local analysis may not be achieved. If the block is too small, say 1×1, it may lead to poor coding efficiency and render the analysis meaningless. So 16×16 size macroblocks may be a reasonable choice.
In various conventional video codecs, there are more varied choice of macroblock dimensions such as 16×8, 8×8, 4×4 etc. These blocks are called sub-blocks to differentiate them from the traditional coding approach of using 16×16 blocks, i.e. the macroblocks.
When describing the above embodiments, the word “macroblocks” may be used as a unit of data for measurement and processing. But it does not restrict the lacing algorithm to work on only 16×16 blocks. It is equally applicable to, for example, 8×8 or 16×8 and all other sub-blocks dimensions that are used in H.264/SVC.
Some more details on the lacing process will be provided below.
For a GOP of length K, we have
χforwardIPP={X1,0, X2,1, . . . , XK,K−1} (24)
and
χbackwardIPP={XK−1,K, XK−2,K−1, . . . , X1,2}, (25)
where Xa,b denotes the set of motion estimation result obtained by estimating f(a) from f(b).
In a GOP, e.g. of length K. Denote f as a picture frame. Therefore, f(1), f(2), . . . , f(K−1), f(K) are all in a GOP. Referring to
Merely for illustration purposes, let K=8. Looking at the first GOP of the HB structure 100 in
χforwardHB={X8,0,X4,0,X2,0,X6,4,X1,0,X3,2,X5,4,X7,6} (26)
and
χbackwardHB={X4,8,X2,4,X6,8,X1,2,X3,4,X5,6,X7,8} (27)
As has been noted previously, it may be difficult to obtain the result Xa,b when |a−b|>>1, i.e., large temporal distance between f(a) and f(b). The usage of the HB structure in H.264/SVC may require to compute χforwardHB and χbackwardHB, which nobody has been able to do it efficiently and accurately without using exhaustive methods. Fast ME methods may be unable to compute accurately Xa,b, where |a−b|>>1.
However, fast ME methods usually work nicely if |a−b|=1. Thus, in Lacing in accordance with various embodiments, the information χforwardIPP and χbackwardIPP may first be computed which may be obtained confidently with fast ME methods. It should be noted that the embodiments are not restricted to fast ME methods, however, by way of example, any block-based ME method may be provided in alternative embodiments.
Lets restrict the following discussion to computing χforwardHB and χforwardIPP, since the procedures can be mirrored similarly for computing χbackwardHB and χbackwardIPP.
First, it is denoted Xa,b(x,y)∈ χforwardHB as the motion vector of macroblock located at (x,y) in frame f(a) estimated from frame f(b).
Similarly, it is denoted Ma,a−1(x,y)∈ χforwardIPP as the motion vector of macroblock located at (x,y) in frame f(a) estimated from frame f(a−1). For t1>t0, the approximation of Xt
[xj,yj]=[xj−1,yj−1]+mt
m
t
,t
j
=m
t
,t
j−1
+u (31)
with the initial conditions
[x0,y0=[x,y] (32)
m
t
,t
0=Mt
and
b
l(q)=[1−q,q] (34)
b
r(q)=blT(q)I2 (35)
and
The above equation to determine u is a bilinear interpolation of the motion vectors from neighboring macroblocks around the macroblock positioned at (xj,yj). It is possible to use other as such conventional interpolation techniques to obtain the motion vector, as discussed earlier. The above iterative equations are computing steps in a Lacing framework in accordance with various embodiments, which we outline in the following for motion estimating (forward and backward) frames ordered in the hierarchical B-pictures structure:
Motion estimation is usually performed in spatial picture domain (block based) unless otherwise specified, such as “Motion estimation via Phase Correlation” or “Motion Estimation in FFT domain”. Motion estimation may be understood as a process of obtaining motion information between two or more pictures frames. That information is also referred to as a motion vector.
For Lacing, it uses the motion information (computed by motion estimation method, say, XYZ) to predict motion vectors, that could not be computed otherwise by method XYZ. That is, given a set of motion vectors M, Lacing can use the information in set M to predict motion vectors that could not be computed by the same method that gives the set M.
In summary, the lacing can be described by the following (which is a plain english explanation of the above iteration equations and Algorithm 1):
A method of estimating motion vectors for a block-based video compression scheme including:
i) a current frame, a reference frame and a set of intermediate frames between the current frame and reference frame;
ii) a set of motion vectors (which will be described in more detail below);
iii) predicting the motion vector of the current frame and the reference frame from the set of motion vectors.
Item (i) states the settings in which various embodiments apply. Assume a current frame for which should be obtained its motion estimation from a reference frame. However there are one or more frames (the intermediate frames) that are between the current frame and reference frame, according to their temporal display order (either incremental or decremental in time). This motion estimation scenario applies to several coding structures such as IBBP, IBPBP, and Hierarchical-B pictures.
Item (ii) states the data required to compute the predicted motion vector in item (iii). This data is the set X of motion vectors is described by:
Item (iii) describes an idea of various embodiments. Using item (ii) to predict motion vector in the setting describe by item (i). The steps of item (iii) is describe as follows:
In various embodiments, a method for estimating motion in a plurality of frames is provided, the method including determining a first set of motion vectors with respect to a first frame and a second frame, the second frame being in succession with the first frame along a time direction, determining a second set of motion vectors with respect to a predicted frame and the second frame, the predicted frame being in succession with the first frame along the time direction; wherein some motion vectors of the second set of motion vectors are interpolated from motion vectors of the first set of motion vectors; and determining a third set of motion vectors based on the first set of motion vectors and the second set of motion vectors.
In an implementation of this embodiment, the second set of motion vectors may be determined with respect to the predicted frame and the second frame, wherein the predicted frame and the second frame are separated from the first frame by the same temporal distance, along the time direction. Illustratively, the predicted frame may be at the same temporal location as the second frame along the time direction. In another implementation of this embodiment, one or more predicted frames may be selected or chosen from any temporal location along the time direction across the plurality of frames, and motion vectors associated to these predicted frames may be determined along the time direction with reference to any first frame along the time direction in the plurality of frames. In yet another implementation of this embodiment, the first set of motion vectors may be determined with respect to a group of pixels in the first frame and a group of pixels in the second frame to provide a set of motion vectors associated with the groups of pixels in the second frame. In yet another implementation of this embodiment, each motion vector in the second set of motion vectors may be determined with respect to a group of pixels in the predicted frame and the group of pixels in the second frame to provide a motion vector associated with the group of pixels in the predicted frame. In yet another implementation of this embodiment, the motion vector associated with the group of pixels in the predicted frame may be interpolated from the motion vectors associated with the groups of pixels in the second frame, wherein the groups of pixels in the second frame is adjacent to the group of pixels in the predicted frame. In yet another implementation of this embodiment, the motion vector associated with the group of pixels in the predicted frame may be interpolated from the motion vectors associated with the groups of pixels in the second frame, the groups of pixels in the second frame having pixels overlapping the group of pixels in the predicted frame. In yet another implementation of this embodiment, as such, the third set of motion vectors may include motion vectors associated with the groups of pixels in the predicted frames. In yet another implementation of this embodiment, the motion vector associated with the group of pixels in the predicted frame may be determined by interpolating the motion vectors associated with the groups of pixels in the second frame being adjacent to the position of the group of pixels in the predicted frame, wherein the position of the group of pixels in the predicted frame may be determined with respect to the group of pixels in the predicted frame and the group of pixels in the first frame. In yet another implementation of this embodiment, illustratively, the position of the group of pixels in the predicted frame may be estimated from position of the group of pixels in the first frame. The position of the group of pixels in the predicted frame may be in the region surrounded by groups of pixels in the second frame, wherein two or more groups of pixels in the second frame being adjacent or overlapping the position of the group of pixels in the predicted frame. The motion vector associated with these two or more groups of pixels in the second frame may then be interpolated to provide the motion vector associated to the group of pixels in the predicted frame at the position. As such, the third set of motion vectors may include interpolated motion vectors associated with the groups of pixels in the second frame. In yet another implementation of this embodiment, the motion vector associated with the group of pixels in the predicted frame may be the motion vector associated with the group of pixels in the second frame, wherein the group of pixels in the predicted frame may be at the same position as the group of pixels in the second frame. In yet another implementation of this embodiment, illustratively, the group of pixels in the predicted frame matches the position of the group of pixels in the second frame. As such, interpolation may not be required and the motion vector of the group of pixels in the predicted frame may be updated with the motion vector associated with the group of pixels in the second frame. In yet another implementation of this embodiment, the method for estimating motion in a plurality of frames may further include determining a fourth set of motion vectors with respect to the first frame and the second frame, the second frame being in succession with the first frame along another time direction being opposite to the time direction. In yet another implementation of this embodiment, the method may further include determining a fifth set of motion vectors with respect to the predicted frame and the second frame, the predicted frame being in succession with the first frame along another time direction being opposite of the time direction; wherein motion vectors of the fifth set of motion vectors are interpolated from motion vectors of the fourth set of motion vectors. In yet another implementation of this embodiment, illustratively, the predicted frame and the second frame may be separated from the first frame by the same temporal distance, along another time direction. The predicted frame may be at the same temporal location as the second frame along the time direction. In yet another implementation of this embodiment, illustratively, one or more predicted frames may be selected or chosen from any temporal location along the another time direction across the plurality of frames, and motion vectors associated to these predicted frames may be determined along the time direction with reference to any first frame along the another time direction in the plurality of frames. In yet another implementation of this embodiment, illustratively, the direction of determining the motion vectors of the fourth set of motion vectors and of the fifth set of motion vectors may be opposite to the direction of determining the motion vectors of the first set of motion vectors and of the second set of motion vectors. The motion vectors of the fourth set of motion vectors and of the fifth set of motion vectors may be backward motion vectors, whereas the motion vectors of the first set of motion vectors and of the second set of motion vectors may be forward motion vectors. The implementations of determining the first set of motion vectors and the second set of motion vectors can be applied to the fourth set of motion vectors and the fifth set of motion vectors at the group of pixels level. In yet another implementation of this embodiment, the method may further include determining an estimation error of each motion vector of the second set of motion vectors, and an estimation error of each motion vector of the fifth set of motion vectors. In yet another implementation of this embodiment, illustratively, for the second set of motion vectors and the fifth set of motion vectors, the estimation error may be computed using a minimum possible residual energy determined between the group of pixels in the predicted frame and the group of pixels in the second frame. In yet another implementation of this embodiment, the estimation error may be computed using the sum of absolute difference (SAD). In yet another implementation of this embodiment, the estimation error of each motion vector of the second set of motion vectors may be compared against the estimation error of each motion vector of the fifth set of motion vectors, to provide comparison results. In yet another implementation of this embodiment, the third set of motion vectors may then be determined depending on the comparison results. In yet another implementation of this embodiment, the third set of motion vectors may include motion vectors of the fourth set of motion vectors and motion vectors of the fifth set of motion vectors if the estimation errors of the motion vectors of the fifth set of motion vectors are lower than the estimation errors of the motion vectors of the second set of motion vectors. In yet another implementation of this embodiment, illustratively, if the estimation error of the motion vector of the fifth set of motion vectors is lower than the estimation error of the motion vector of the second set of motion vectors, the motion vector of the fifth set of motion vectors may be selected and may be included in the third set of motion vectors. The motion vector of the second set of motion vectors may be retained or selected if otherwise. In yet another implementation of this embodiment, the groups of pixels in the first frame, the groups of pixels in the second frame, and the group of pixels in the predicted frame may have the same number of pixels. In yet another implementation of this embodiment, the group of pixels may be a square block of pixels, a rectangular block of pixels, or a polygonal block of pixels. In yet another implementation of this embodiment, each group of pixels may be a macroblock, the macroblock size may be selected from 16 pixels by 16 pixels, 16 pixels by 8 pixels, 8 pixels by 8 pixels, 8 pixels by 16 pixels, 8 pixels by 4 pixels, 4 pixels by 8 pixels, and 4 pixels by 4 pixels. In yet another implementation of this embodiment, the temporal distance between the first frame and the second frame may be less than or equal to three frames. In yet another implementation of this embodiment, the temporal distance between the first frame and the second frame may be exactly one frame. In yet another implementation of this embodiment, the temporal distance between the first frame and the predicted frame may be between 1 and K−1, where K being the number of frames in the plurality of frames. In yet another implementation of this embodiment, the first frame may be the reference frame. The second frame may be the intermediate frame. The predicted frame may be the current or target frame. In yet another implementation of this embodiment, the third set of motion vectors may include a series of motion vectors that represent the motion information obtained iteratively between the predicted frames or current frames and a first frame or reference frame. The third set of motion vectors may further represent the motion trajectory from one frame in the plurality of frames, to the target or current frame, across the plurality of frames, the plurality of frames being a group of picture (GOP) including three or more frames. In yet another implementation of this embodiment, the first set of motion vectors and the fourth set of motion vectors may be determined using a fast search algorithm. The fast search algorithm may be selected from but not limited to three-step search, two-dimensional logarithmic search, diamond search, and adaptive rood pattern search. In yet another implementation of this embodiment, the plurality of frames may be associated with a group of pictures coded according to an Advanced Video coding structure. In yet another implementation of this embodiment, the plurality of frames may be associated with a group of pictures coded according to a Scalable Video coding structure. In yet another implementation of this embodiment, the plurality of frames may be associated with a group of pictures encoded according to a Hierarchical B-picture prediction structure, wherein motion estimation across the GOP may be determined in accordance with the direction and coding order of the Hierarchical B-picture prediction structure. In yet another implementation of this embodiment, the method may be referred to as lacing with a possible effect to improve the prediction accuracy of fast motion estimation in the Hierarchical B-picture prediction structure. In yet another implementation of this embodiment, the group of pixels in each frame may be transformed using a domain transform to provide a set of domain transformed coefficients for each frame. The domain transform may be a domain transform such as e.g. type-I DCT, type-IV DCT, type-I DST, type-IV DST, type-I DFT, type-IV and DFT. In yet another implementation of this embodiment, the domain transform may be a linear transform such as e.g. karhunen loeve transform, hotelling transform, fast fourier transform (FFT), short-time fourier transform, discrete wavelet transform (DWT), and dual tree wavelet transform (DT-WT).
In another embodiment, a device for estimating motion in a plurality of frames is provided. The device may include a first circuit configured to determine a first set of motion vectors with respect to a first frame and a second frame, the second frame being in succession with the first frame along a time direction, a second circuit configured to determine a second set of motion vectors with respect to a predicted frame and the second frame, the predicted frame being in succession with the first frame along the time direction; wherein some motion vectors of the second set of motion vectors are interpolated from motion vectors of the first set of motion vectors; and a third circuit configured to determine a third set of motion vectors based on the first set of motion vectors and the second set of motion vectors.
In an implementation of this embodiment, the device may include an interpolating circuit configured to interpolate the motion vector associated with the group of pixels in the predicted frame from the motion vectors associated with the groups of pixels in the second frame, the groups of pixels in the second frame being adjacent to the group of pixels in the predicted frame. The interpolating circuit being configured to interpolate the motion vector associated with the group of pixels in the predicted frame from the motion vectors associated with the groups of pixels in the second frame, the groups of pixels in the second frame having pixels overlapping the group of pixels in the predicted frame. In another implementation of this embodiment, the device may include a fourth circuit configured to determine a fourth set of motion vectors with respect to the first frame and the second frame, the second frame being in succession with the first frame along another time direction being opposite to the time direction. In yet another implementation of this embodiment, in addition, a fifth circuit configured to determine a fifth set of motion vectors with respect to the predicted frame and the second frame, the predicted frame being in succession with the first frame along another time direction being opposite of the time direction; wherein motion vectors of the fifth set of motion vectors are interpolated from motion vectors of the fourth set of motion vectors. In yet another implementation of this embodiment, the device may further include an estimation error circuit configured to determine an estimation error of each motion vector of the second set of motion vectors, and an estimation error of each motion vector of the fifth set of motion vectors. In yet another implementation of this embodiment, the device may further include a comparator circuit configured to compare the estimation error of each motion vector of the second set of motion vectors against the estimation error of each motion vector of the fifth set of motion vectors, wherein the third set of motion vectors may be determined depending on the comparison results.
While the invention has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SG2009/000200 | 6/5/2009 | WO | 00 | 4/14/2011 |
Number | Date | Country | |
---|---|---|---|
61059502 | Jun 2008 | US |