The invention is related to the field of video compression.
Moving object extraction methods are traditionally used in video compression techniques to extract a contour of a moving object in a video sequence. Traditional methods often explicitly introduce a model to extract the contour. These traditional methods can have significant problems such as discretization of the contour and difficulty controlling the length and curvature as the contour evolves.
For example, simple segmentation of a motion block can be performed to capture multiple moving objects so as to reduce the prediction error. This process can be achieved by using a quadtree segmentation of a block having a large prediction error into sub-blocks for improved motion estimation. The block having the large prediction error is typically quadtree segmented using a straight line model of the moving object's boundary.
Other approaches in motion segmentation rely on optical flow estimates or parametric (i.e., affine) motion models. These approaches have problems, such as occlusion effects, near object boundaries. Some degree of smoothness in the segmentation field, and hence in object boundaries, can be achieved using MAP/Bayesian methods, which include a prior probability term. These methods constrain the connectivity of the segmentation field without any explicitly coupled model to account for the object boundary and motion fields.
In some conventional approaches, a curvature evolution model is used to capture the moving object boundary. However, these approaches do not involve motion estimations, and they rely only on a temporal difference operator in the model for object boundary evolution.
There is a need for a moving object extraction method that performs a region competition so as to grow the object from an initial condition, and to reach a state that provides a balance among prediction error reduction, boundary stability (i.e., no holes in the object, and a smoothness to the contour), and a coupling to image features.
A method of extracting a moving object boundary includes estimating an initial motion vector for an object whose motion is represented by a change in position between a target image and a reference image, estimating an initial vector for a background area over which the object appears to move, using the estimated vectors to find a first iteration of a dynamical model solution, and completing at least one subsequent iteration of the dynamical model solution so as to extract a boundary of the object.
The present invention is illustrated by way of example and may be better understood by referring to the following description in conjunction with the accompanying drawings, in which:
In the following description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention. For example, skilled artisans will understand that the terms field or frame or image that are used to describe the various embodiments are generally interchangeable as used with reference to video data.
An object extraction method estimates the contours of a moving foreground object represented in video images by using a dynamical model to evolve the boundary of the object. The dynamical model allows compact and coherent structures to emerge. In some embodiments, the dynamical model uses a two-dimensional boundary field, which is defined at each pixel in the image so that no constrained or parameterized boundary model is needed, to extract the object boundary. The dynamical model also uses a local object motion field to provide motion vectors along the boundary that account for non-rigid motion of the object. The dynamical model incorporates a diffusion term and an annealing term to couple the boundary to image gradient features. The object extraction method performs a hypothesis testing of past and future prediction errors so as to minimize errors caused by occlusion. The method allows object motion vectors at the boundary of the object to handle more complex local object motion. The method can be used in motion segmentation or video coding applications to generate motion vector sampling for improved temporal prediction of a target image.
An example of a method of extracting a moving object boundary from a background region extracts a contour of a single object from the background is shown in
Dynamical Model for the Boundary Field
A dynamical model of a two-dimensional boundary field B(x, y) is defined, where B is a value for a pixel at location (x, y) in the image. A positive value of B indicates that the corresponding pixel is within an object, and a negative value indicates that the pixel is in the background. The method starts with an initial condition for B(x, y) and iteratively evolves the boundary field to form a better estimate of the object boundary.
In some embodiments, the method evolves the boundary field by numerically solving the dynamical model and advancing it forward in time. The boundary field is then expressed as a function of time, B(x, y, t), where the initial condition Bo(x, y) starts at an initial time of 0, such that
Bo(x,y)=B(x,y,t=0).
The initial condition is
Bo(x,y)˜1
for a region within an object, and
Bo(x,y)˜−1
elsewhere, with a gradual transition between the two states. The initial condition may also be seeded with prior knowledge about the object boundary to improve performance, if such knowledge is available. In some embodiments, the initial value for the boundary field is:
where (xo, yo) is the center of the seed, and a measures the size of the seed.
The method grows the seed around a gradient ∇ of the boundary field B(x, y), where B(x, y)˜0, using an evolution equation:
∂tB(x,y,t)=T|∇B(x,y)| (1)
to evolve the dynamical model according to:
B(x,y,t+τ)=B(x,y,t)+τT|∇B(x,y,t)| (2)
where T is a composite driving term, and τ is a time step parameter to advance the state from time t to t+τ. The method is repeated until a stopping criterion, such as convergence, is reached.
An example of boundary fields that are grown with the dynamical model is shown in
The composite driving term T in eq. (2) is a combination of terms, such as the prediction error, stability, coupling, and template driving terms:
Each weighting value {λi} determines the relative strength of the corresponding driving term. The past/future prediction error driving term includes the error from past and future reference image processing. This driving term is considered during a hypothesis testing method to account for occlusions and uncovered regions, as discussed below. The stability driving term Tstability is used to ensure that B(x, y) maintains smoothness so that the extracted motion layers have a degree of compactness and the extracted moving object boundary has a degree of connectivity. The coupling driving term Timage
Including the expression of the driving terms from eq. (3) into the dynamical model from eq. (2) yields:
Past and Future Prediction Error Driving Term
The past and future prediction error driving term represents the prediction error difference between using the background motion vector or the object motion vector at some pixel location. The estimate of the background motion vector is denoted as νb(x, y) and the object motion vector is denoted as νo(x, y). This driving term is expressed as:
where ε2(νo) is the prediction error at some pixel location when the motion vector νo(x, y) is used, and ε2(νb) is the prediction error at the pixel location when the motion vector νb(x, y) is used. Placing this term in the dynamical model of eq. (3) yields the contribution of this term as:
∂tB1(x,y)=(ε2(νb)−ε2(νo))|∇B(x,y) (6)
Thus, at pixel location (x, y), if the prediction error is smaller when the object motion vector is used (that is, (ε2(νb)−ε2(νo)) is positive), then B(x, y) increases since the time derivative is positive, and the pixel moves toward the object, which is expressed as pixel locations having positive values of B(x, y). Similarly, at pixel location (x, y), if the prediction error is smaller when the background motion vector is used (that is, (ε2(νb)−ε2(νo)) is negative), then B(x, y) decreases since the time derivative is negative, and the pixel moves toward the background, which is expressed as pixel locations having negative values of B(x, y).
The prediction error increases if an occlusion is present in the reference image.
The occlusion regions shown in
A more detailed pixel-wise decision on whether to use past or future reference field is made by extending the prediction error driving term as follows:
∂tB1(x,y)=(ε2(νbhyp)−ε2(νohyp))|∇B(x,y)| (7)
where νhyp denotes the hypothesis for the motion vector at pixel location (x,y). The hypothesis test for the background motion vector is performed as shown in
νbhyp(x,y)=minν(ε2(ν=νbpast),ε2(ν=νbfuture)).
The hypothesis test for the foreground object motion vector is performed as shown in
In this example, the parameter S is set it to 0, since object and background are separated by the zero values of B(x, y). The hypothesis selection of the object or background motion vector uses past and future reference image information, along with current object state information, to better handle the occlusion.
Stability Driving Term
The stability driving term allows for a compact, stable structure to emerge from the nonlinear model, and the term is expressed as:
Tstability=−∇·{circumflex over (n)} (8)
where {circumflex over (n)} is the normal vector for the boundary field, defined as:
which is the direction normal to the curve where B(x, y)=constant. Placing this term in eq. (3) yields:
∂tB2(x,y)=−(∇·{circumflex over (n)})|∇B(x,y) (9)
Thus, if the contour of the object near the boundary, where |∇B(x, y)| is nonzero, has a positive curvature (i.e., an outward shape from the positive region), then ∇·{circumflex over (n)} is positive, and B(x, y) decreases to straighten the curve. Similarly, if the contour of the object near the boundary, where |∇B(x, y)| is nonzero, has a negative curvature (i.e., an inward shape from the positive region), then ∇·{circumflex over (n)} is negative, and B(x, y) increases to straighten the curve.
The stability driving term controls the degree of curvature of the object boundary topology. This term acts as a diffusion term that reduces the length of the contour.
∂tB2(x,y)=−(∇−{circumflex over (n)})|∇B(x,y)|−∇2B(x,y) (10)
The Laplacian term on the right of eq. (10) causes the Boundary field to be relatively smooth and homogeneous.
Image Coupling Driving Term
The moving object boundary may have a correlation to some local spatial image activity. For example, often an object boundary has an intensity gradient normal to the boundary. This type of local spatial activity correlation is incorporated into the model using the image coupling driving term:
Timage
where {circumflex over (n)} is the normal to the boundary field, and |∇I(x, y)| is the magnitude of the image intensity gradient. Placing this term in eq. (3) yields the contribution of this factor as:
∂tB3(x,y)=(∇·({circumflex over (n)}|∇I(x,y)|))|∇B(x,y)| (12)
Thus, if an image gradient is near the object boundary width, then the boundary aligns along the image gradient.
Template Driving Term
The template driving term is used, for instance, in an embodiment that learns information about the objects in the scene from previous sequences, or that has prior information about the expected shape of an object. This information provides a template for the object boundary. The object boundary template driving factor may be characterized by the crossing point of a two-dimensional function {tilde over (B)}template(x, y). The template driving term is expressed as:
∂tB4(x,y)=−(B(x,y)−{tilde over (B)}template(x,y))|∇B(x,y)| (13)
Thus, if the boundary field B(x, y) is larger than the template {tilde over (B)}template(x, y) at a pixel position near the object boundary, then B(x, y) decreases. Similarly, if the boundary field B(x, y) is smaller than the template {tilde over (B)}template(x, y) at a pixel position near the object boundary, then B(x, y) increases. Eventually, an equilibrium is reached where B(x, y)˜{tilde over (B)}template(x y).
The dynamical model evolves the spatial two-dimensional boundary field B(x, y) according to eq. (4). The parameters {λ1, λ2, λ3} determine the relative weights of each term. In some embodiments, λ3 is initially set to 0 and slowly increases so that it becomes more effective in later stages of the growth of the object boundary. Often, λ4 is set to 0 for the entire method because no prior knowledge about the object boundary is available. The driving terms are functions of the boundary field B(x, y) and the motion field. The nonlinear cooperative effects of the driving terms allow for a stable evolution and emergence of the boundary field for the moving object.
As the boundary field is updated in the dynamical model, the prediction error between using the background and object motion for each pixel “x” needs to be computed at every iteration. The background motion is usually very robust and stable, and so a single background motion may be used for all pixels. In some instances, however, the object motion may involve non-local or non-rigid motion. In these instances, bulk and local/boundary motion vectors for the object motion are used in some embodiments.
Generally, the bulk motion vector vbulk can be used to represent the object motion for each pixel in the object. For a pixel along the boundary, a boundary motion vector that is near, or has local spatial correlation with, the pixel can be used to represent the object motion, in order to handle non-rigid motion in which several parts of the object move in different directions. For example, if an object is a person, a bulk motion vector can indicate that the person is moving to the right, and a boundary motion vector along a hand of the person can indicate that the hand is moving to the left relative to the bulk motion vector.
In one embodiment, the boundary extraction method is used in video coding for encoding an image (or frame, or field) of video data, as shown in
At 940, a temporal prediction filtering process is applied to the irregular motion sampling pattern. This adaptive filtering process uses the motion vectors, irregular sampling pattern, and reference images to generate a prediction of the target image. At 950, the motion vector values are coded and sent to the decoder. At 960, a residual is generated, which is the actual target data of the target image minus the prediction error from the adaptive filtering process. At 970, the residual is coded and at 980 is sent to the decoder.
In another embodiment, the adaptive sampling pattern of motion vectors is used in decoding a image (or frame, or image) of video data, as shown in
While the invention is described in terms of embodiments in a specific system environment, those of ordinary skill in the art will recognize that the invention can be practiced, with modification, in other and different hardware and software environments within the spirit and scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
4922341 | Strobach | May 1990 | A |
5047850 | Ishii et al. | Sep 1991 | A |
5654771 | Tekalp | Aug 1997 | A |
5818536 | Morris et al. | Oct 1998 | A |
5872866 | Strongin et al. | Feb 1999 | A |
5974188 | Benthal | Oct 1999 | A |
6178205 | Cheung et al. | Jan 2001 | B1 |
6208692 | Song et al. | Mar 2001 | B1 |
6212235 | Nieweglowski et al. | Apr 2001 | B1 |
6259802 | Jolly et al. | Jul 2001 | B1 |
6466624 | Fogg | Oct 2002 | B1 |
6480615 | Sun et al. | Nov 2002 | B1 |
6590934 | Kim | Jul 2003 | B1 |
6591015 | Yasunari et al. | Jul 2003 | B1 |
6608865 | Itoh | Aug 2003 | B1 |
6690729 | Hayashi | Feb 2004 | B2 |
6754269 | Yamaguchi et al. | Jun 2004 | B1 |
6765965 | Hanami et al. | Jul 2004 | B1 |
6782054 | Bellers | Aug 2004 | B2 |
6864994 | Harrington | Mar 2005 | B1 |
20040057517 | Wells | Mar 2004 | A1 |
20040062307 | Hallapuro et al. | Apr 2004 | A1 |
20040131267 | Adiletta et al. | Jul 2004 | A1 |
20040233991 | Sugimoto et al. | Nov 2004 | A1 |
20050100092 | Sekiguchi et al. | May 2005 | A1 |
20050135483 | Nair | Jun 2005 | A1 |
Number | Date | Country |
---|---|---|
WO 0016563 | Mar 2000 | WO |
WO 0178402 | Oct 2001 | WO |
WO 0237859 | May 2002 | WO |
WO 0237859 | May 2002 | WO |
WO 2004047454 | Jun 2004 | WO |
WO 2005069629 | Jul 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20070065025 A1 | Mar 2007 | US |