This disclosure relates in general to estimating motion trials in video image sequences.
Recent advances in digital technology have led to new communication media in which video information plays a significant role. Digital television, high definition TV (HDTV), video-conferencing, video-telephony, medical imaging, and multi-media are but a few examples of emerging video information applications.
When compared with text or audio media, video media require a much larger bandwidth, and therefore would benefit more from compressing data having redundancies. In the framework of video coding (encoding and decoding), statistical redundancies can be characterized as spatial or temporal. Due to differences in the spatial and temporal dimensions, the compressing of the data is usually handled separately.
Motion of an object is a prominent source of temporal variations in image sequences. In order to model and compute motion, an understanding is needed as to how images (and therefore image motion) are formed. In video compression, the knowledge of motion helps remove temporal data redundancy and therefore attain high compression ratios. Motion estimation is a fundamental component of such standards as H.261, H.263 and the MPEG family.
A moving object may be characterized by coherent motion characteristics over its entire region of support. Therefore, an accurate estimate of the motion facilitates an accurate segmentation of the object. The process of partitioning frames into motion regions is referred to as image segmentation. Efficient image detection and segmentation operations need to be used to divide the image contents into semantic regions that can be dealt with as separate objects. An accurate segmentation of the object is needed in order to estimate the motion accurately. Image segmentation may include block-based, region-based or pixel-based image segmentation. Segmentation sometimes depends upon the results of motion estimation. Motion estimation basically tries to predict the current frame from the previous one by estimating the motion between the two frames. Hence, the motion and prediction error information are transmitted instead of the image itself.
While there are number of standards for video coding, e.g., MPEG-1, MPEG-2, MPEG-4, and H.263, etc, these standards only define the syntax and semantic of the compressed bit stream. The methods used to produce the bitstream are not specified. In other words, the above standards specify how the bitstream should appear so that decoders will operate properly, but do not specify the details of how the bitstream is actually produced.
Most standard operations, such as MPEG-1, MPEG-2, MPEG-4, and H.263, etc, use block-based segmentation. With block-based segmentation, the optical flow or “motion” of the pixels in the blocks is analyzed to estimate motion information. Compression is achieved, for example, by sending a block once, and then sending the motion information that indicates how the block “moves” in following frames. The efficiency of block-based video compression relies on its ability to predict the next frame using blocks of image elements, which is a method known as block-based “motion compensation.” Accurate prediction reduces the amount of data used to correct errors made by frame-to-frame prediction (residue coding). Over the years, refinements in motion compensation and residue coding techniques have played a major role in improving prediction in block-based compressors. However, these approaches have long-since exhausted their potential for further dramatic improvements. This is because arbitrary blocks, inherent in MPEG-like coding, rarely occur in natural images, and thus have no relationship to the real objects and their motion.
Unlike block-matching operations, which may require costly searches for image displacement, other compression techniques have been developed that approach image displacement using estimation techniques. Two techniques involve regression on the datasets with response variables chosen, and clustering on the datasets that do not have response information. Regression is merely a method for finding dependency between some attributes, e.g., motion vectors. Basically, regression takes a numerical dataset and develops a mathematical formula that fits the data. The results may then be used to predict future behavior by taking new data and plugging it into the developed formula thereby resulting in a prediction. Robust regression methods have been shown to provide some improvements in motion estimates in a variety of situations. For example, based on the motion data for a frame n, the scores and residual vector can be estimated using a number of different regression estimation methods.
Clustering is used to reveal the structure within complex distribution of data, for example, video media. Cluster analysis is a classification of objects from the data. Classification involves a labeling of objects with class (group) labels. As such, clustering does not use previously assigned class labels, except perhaps for verification of how well the clustering worked. Thus, cluster analysis is distinct from pattern recognition or the area of statistics know as discriminant analysis and decision analysis, which seek to find rules for classifying objects given a set of pre-classified objects.
For example, data clustering may be used to partition a data set into groups of similar items, as measured by some distance metric. Dissimilarity is labeled by the index of the partitions, which provide additional supervision to the K regressions so that each works on a subset of similar data. The similarity, or rather the dissimilarity, is provided by the K regressions and used in the clustering phase to partition the dataset.
The Regression Clustering (RC) operation handles the case in between regression and clustering operations, i.e., the datasets that have response variables, but the response variables do not contain enough information to guarantee high quality learning. Missing information is generally caused by insufficiently controlled data collection due to lack of means, lack of understanding or other reasons.
Regression Clustering provides an advantage because without separating the clusters with very different response properties, the residue error of the regression is large. Input variable selection could also be misguided to a higher complexity by the mixture. In RC, K (>1) regression functions are applied to the dataset simultaneously, which guide the clustering of the dataset into K subsets each with a simpler distribution matching its guiding function. Each function is regressed on its own subset of data with a much smaller residue error. Both the regressions and the clustering optimize a common objective function.
Regression clustering has been studied under a number of different names. For example, clusterwise linear regression uses linear regression and partition of the dataset to locally minimize the total mean square error over all K-regression. An incremental version of this operation was developed to allow adding new observations into the dataset. This operation is similar to the K-Means operation. The K-Means (KM) operation is a popular operation, which attempts to find a K-clustering, which minimizes MSE. The K-Means operation is a clustering operation that involves a two-step iteration. First, each data item is assigned to the closest center. All centers are recalculated and each center is moved to the geometric centroid of the points assigned to it. Alternative methods for performing clusterwise linear regression have also been proposed. For example, the maximum likelihood methodology has also been used for performing clusterwise linear regression, wherein the objective function is locally minimized.
However, all of the above clustering operations have disadvantages. For example, the dependency of the K-Means performance on the initialization of the centers is a major problem. Moreover, previous regression clustering methods have exhibited the same problem with the convergence being sensitive to initialization. For example, previous work on RC used K-Means and expectation-maximization (EM) demonstrated the same problem of their convergence being sensitive to initialization. The present invention may address one or more of the above issues.
The various embodiments of the present invention estimate motion trials in video image sequences using regression clustering operations that may be less sensitive to initialization of the center choices. The various embodiments include a method, apparatus and program storage device. Data points representing information from an image sequence are provided and regression clustering using K-Harmonic Means functions is performed to cluster the data points and to provide motion information for the data points.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
In the following description of the embodiments, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration the specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized because structural changes may be made without departing from the scope of the present invention.
The embodiments of the present invention provide a method, apparatus and program storage device for estimating motion trials in video image sequences. Embodiments of the present invention use regression clustering operations that may be far less sensitive to initialization of the center choices.
The system 500 also includes memory 530 capable of storing data processed by processor 510 and data sent to or received from I/O device 520. System 500 may be connected to a display 540, such as a cathode ray tube (CRT), for displaying information. Processor 510, I/O device 520, memory 530, and display 540 are connected via a bus 560.
The ME module 610 generates one or more motion vectors (MVs) or motion paths for predicting motion in the new frame with reference to previous positions in the current frame. The ME module 610 computes these MV's using a method for simultaneously estimating multiple motion trials in video image sequences according to an embodiment of the present invention. A prediction error (PE) is then computed from each MV.
The encode module 612 within the server 606 receives the MVs and PEs from the ME module 610. The encode module 612 encodes the frames into a compressed bit-stream 614. The compressed bit-stream 614 is then transmitted to the client 602. A decoder 616 within the client 602 decodes the bit-stream into the new frame to be presented 630.
Static or video images contain regions of continuous changes and boundaries of sudden changes in color. A static image can be treated as a mapping from a 2D space to the 3D RGB color space
image: [a,b]x[c,d]→[0,255]x[0,255]x[0,255].
Similarly a video image can be treated as a mapping from 3D space to another 3D space,
video: [a,b]x[c,d]xT→[0,255]x[0,255]x[0,255].
Regression clustering is capable of automatically identifying the regions of continuous change and assigning a regression function to it, which interpolates that part of the image. Both image segmentation and interpolation (compression) may be performed using RC.
For example, the data may be partitioned into K partitions. There have been many methods for determining the right K, i.e., the optimal number of clusters. For example, given a dataset with supervising responses, Z=(X,Y)={(xi, yi)|i=1, . . . , N}, a family of functions Φ={f} and a loss function e( )≧0, regression solves the following minimization problem,
Commonly,
for linear expansion of simple parametric functions such as polynomials of degree up to m, Fourier series of bounded frequency, neural networks, Radial Basis Function (RBF) techniques, etc. Further, usually, e(f(x),y)=∥f(x)−y∥p, with p=1, 2 most widely used. However, equation 1 is not effective when the data set contains a mixture of very different response characteristics 700. Rather, it is much better to find the partitions in the data and learn a separate function on each partition as shown in the graph of the three regression functions 750.
In RC operations, K regression functions M={f1, . . . , fK}⊂Φ are applied to the data, which will each find its own partition Zk and regress on it.
(Zk∩Zk′=Ø, k≠k′). Thus, the solution of the following optimization problem,
Accordingly, the RC-KM Operation includes picking K functions f1(0), . . . , fK(0)∈Φ randomly, or by any heuristics that are believed to give a good start and then in the clustering phase, the database is repartitioned in the r-th iteration, r=1, 2, . . . , as:
Zk(r)={(x,y)∈Z|e(fk(r−1)(x),y)≦e(fk′(r−1)(x),y) ∀k′≠k}. (5)
A tie may be resolved randomly among the winners. Intuitively, each data point is associated with the regression function that gives the smallest approximation error on it. Algorithmically, for r>1, a data point in Zk(r−1) is moved to Zk′(r) if and only if
e(fk′(r−1)(x),y)<e(fk(r−1)(x),y) and a)
e(fk′(r−1)(x),y)≦e(fk″(r−1)(x),y) for all k″≠k, k′. b)
Zk(r) inherits all the data points in Zk(r−1) that are not moved. In the regression phase, any regression optimization operation that gives the following
Nevertheless, K-Means clustering operations are known to be sensitive to the initialization of its centers due to its “hard” partitioning of the data set. Since the same partitioning policy is used by the RC-KM, it is also sensitive to initialization. Further, as described above, previous regression clustering method that used K-Means and EM demonstrated the same problem of convergence being sensitive to initialization, which is a well-known problem of the K-Means and EM clustering operations.
In contrast to previous regression clustering methods, embodiments of the present invention use the K-Harmonic Means clustering operation, which demonstrates very strong insensitivity to initialization due to its dynamic weighting of the data points and its non-partitioning membership function.
RC-KHMp's objective function is defined by replacing the MIN( ) function in equation (4) by harmonic average HA( ), and the error function is
e(fk(xi),yi)=∥fk(xi)−yi∥p
In the last step of equation (9), Lp is used instead of L2. An iterative operation is then used for finding a local optimum of equation (9). First, K functions f1(0), . . . , fK(0)∈Φ are selected. In the clustering phase, in the r-th iteration, let
di,k=∥fk(r−1)(xi)=yi∥. (10)
The hard partition
in RC-KM, is replaced by a “soft” membership function, i.e., the i -th data point is associated with the k-th regression function with the probability
The choice of q will put the regression's error function in Lq-space. For simpler notations, p(Zk|zi) and ap(zi) in equation (12) are not indexed by q. Quantities di,k,p(Zk|zi), and ap(zi) should be indexed by the iteration r, which is also dropped. In RC-KHM, not all data points fully participate in all iterations like in RC-KM. Each data point's participation is weighted by
In the regression phase, any regression optimization operation that gives the following
For linear regression, q has been chosen to be equal to 2. However, other values of q may also be used. Equation (13) may then be rewritten in matrix form as:
Thus, clustering recovers a discrete estimation of the missing part of the responses and provides each regression function with the correct subset of data. The performance of LinReg-KHM increases over LinReg-EM and LinReg-KM as K and D becomes larger. In the general form of RC's, the regression part of the operation is completely general, and the RC operation adds no requirement to it. This implies that RC. operations work with any type of regression and that RC operations can be built on top of existing regression libraries and the existing regression program may be called as a subroutine. Regression helps understanding the data by replacing it with an analytical function plus a residue noise. When the noise is small, the function describes the data well. However, RC does a much better job on this. The compact representation of data by a regression function can also be considered as (or part of) data compression. With a significantly smaller mean residue noise, RC does a much better job on this also.
(x,y)=(fk,x(t),fk,y(t)),k=1, . . . ,K
(x,y,color)=(fk,x(t),fk,y(t)),k=1, . . . ,K
The process illustrated with reference to
The foregoing description of the exemplary embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not with this detailed description, but rather by the claims appended hereto.