This invention relates generally to video processing, and more particularly to predicting using a dimensionality reduction process.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies: Copyright© 2008, Sony Electronics Inc., All Rights Reserved.
A fundamental problem in video coding is prediction of high resolution images using available low resolution data. Least square (LS) filters are commonly used in these cases, and the LS filter coefficients are used to produce higher resolution images using available low resolution data, possibly from different times.
Zooming requires utilizing data possibly from different times and combining the available information to obtain a high resolution image for the current time. The training of the LS filters is possible by generating decimated images and finding the best set of LS filters to obtain back the original images minimizing the different between the original and predicted images.
A filter tap is a pattern overlaid on a region of pixels. The pixels overlapping the filter tap form the basis of filter coefficients associated with that pixel.
Multi-field taps are defined based on a decimated field. The multi-field taps are used to generate correlation matrices, the elements of which are used to generate covariance matrices. A principal component space is obtained by projecting the correlation matrix elements on to eigenvectors. The principal component space is partitioned into classes and a least square filter set is generated for each class.
In the following detailed description of embodiments of the invention, reference is made to the accompanying drawings in which like references indicate similar elements, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical, electrical, functional and other changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.
Beginning with an overview of the operation of the invention,
In one embodiment, encoder 105 may classify frames of video data and assign individual prediction filters to each class in the frame. Adaptive prediction with classification involves partitioning the pixels of a frame into a number of classes and adapting a filter to that class. A filter may include coefficients for each pixel which may be used to predict or decimate that pixel. The filter coefficients and the corresponding pixel are related through a filter tap. A filter tap is a pattern overlaid on the corresponding pixel and pixels around that pixel. The pattern defines which pixels will be used to generate coefficients. In one embodiment, Principal Component Analysis (PCA) is used to classify video frames into classes. The PCA may represent each pixel in a low resolution frame with the corresponding correlation matrix elements defined by the filter taps. PCA uses the correlation information for each pixel to group pixels with similar spatiotemporal variations in the images into classes.
Defining classification filters depends on the relationship between decimated images and their corresponding original resolution images.
where I(x) is the original image and P(x) is the predicted image, where P(x) is defined as:
where ID(y) represents the decimated image.
At block 210, the method defines multi-field taps over the decimated fields. Multi-field filter taps straddle more than one field of data to be used to generate coefficients for the corresponding pixel. Multi-field filter taps are described in greater detail below in conjunction with
At block 215, the method generates correlation matrices that include the cross relationships between the pixels in the multi-field filter taps. Elements of the correlation matrices represent the pixels in the decimated image in a high dimensional space. The method carries the problem over to a lower dimensional space using PCA to classify the pixels more efficiently.
At block 220, the method generates a covariance matrix of the correlation matrix elements as in equation 3. The method places the correlation matrix elements for each decimated pixel into a column format and generates an observation matrix X with m columns after removing the mean vector from each column. Mean vector removal emphasizes the local changes around the pixels. The method applies eigendecomposition to the covariance matrix of X. The decomposition produces n eigenvectors and n eigenvalues where n is the number of elements in the correlation matrices. The covariance matrix has a rank n which is the smaller of m and n, m being the number of pixels. The covariance matrix of X may be defined as:
is the mean vector.
Eigendecomposition operation can be summarized as in equation 5. Q includes the eigenvectors in the columns and Λ is a diagonal matrix with the corresponding eigenvalues:
Ĉxx=QΛQT=QΛQ−1 (5)
QT can be replaced with Q−1 because the eigenvectors constitute an orthonormal basis.
At block 225, the method projects the matrix X on to eigenvectors by projecting columns of X on to the columns of Q to obtain the principal components and generate a space with the distribution of principal components as in equation 6. The principal components indicate how similar each column of X is to each eigenvector. The matrix P includes the PCs. Each pixel is represented by a predetermined number of eigenvectors.
P=QTX (6)
Filter coefficients are functions of the correlation matrices corresponding to the decimated pixels around the center pixel. The filter tap identifies the filter coefficients. The method classifies the decimated image pixels based on the similarities of the corresponding correlation matrices and generates a different set of filter coefficients for each class. The method obtains eigenvectors using correlation information of a filter tap to represent each decimated image pixel and sorts the eigenvectors according to their eigenvalues.
The method 200 may use PCA to transform the classification problem into a space with axes defined by the eigenvectors. The method uses the similarity between columns of X (each column represents a different pixel) and the eigenvectors as the coordinates of the decimated pixels in the reduced dimensional space.
At block 230, the method 200 partitions the PC space into classes. PC space represents the decimated image pixels in the lower dimensional space and regions with similar correlation matrices are grouped together in the PC space. The method initially partitions each of the axes into three regions such that there will be equal number of pixels in each region, which results in a total of twenty-seven regions in three dimensions, with the pixels classified into twenty-seven groups. In other embodiments, the number of classes is not statically defined as 27. Selecting different numbers of classes is described in greater detail below in conjunction with
The method 200, as illustrated in
where Aj,k is the correlation matrix (indices j and k run over two filter taps T) and bk is the observation vector. ID represents the decimated image.
The method 200 generates the LS filter using an observation vector and an inverse of the correlation matrices. The method uses different LS filters specific to each region. Assuming that there are m pixels and c classes (regions) in the decimated image,
m=m
1
+m
2
+ . . . +m
c (11)
shows the number of pixels in each class. For a given class u,
is the prediction filter classification map for the target pixel x.
In one embodiment, the method obtains improved results using the first three eigenvectors in a three-dimensional space, representing each of the decimated pixels with three coordinates.
In another embodiment, at block 230 the method 200 uses clustering to classify pixels into finer groups in the reduced dimensional space using a dynamic validity index. The dynamic validity index is a cost function that measures the compactness of a pixel class and the distance between the classes themselves. The goal of clustering is to minimize the distance between pixels in the same class, which increases the compactness of the class, and maximizing the distance between classes. Empirical data suggests that iterative clustering classifies the pixels efficiently and provides an SNR improvement.
At block 1410, the method labels each pixel with a number and selects a uniform distribution of c numbers in this range. The method uses this uniform selection as initial values for the centroids (weighted centers) of the c classes to be clustered.
At block 1415, the method assigns each pixel to the closet centroids (and cluster) based on the distance of the pixel to each of the centroids.
At block 1420, the method determines new cluster centroids for each cluster using the pixel assignments.
At block 1425, the method evaluates the dynamic validity index of the current clustering configuration. In other embodiments, other cost functions may be used. The dynamic validity index may be defined as:
where N is the number of points (data objects), k is the number of clusters, x is the location of a data point, zi is the cluster center location (centroid) of the ith cluster and Ci is the set of points in the ith cluster. The dynamic validity index includes an intra-cluster term (numerator) and an inter-cluster term (denominator). The intra-cluster term is the average distance of each point to their corresponding centroid and measures the average level of compactness of all clusters. The inter-cluster term is the distance between the pair of clusters that are closest to each other among all pairs of clusters.
At block 1430, the method assigns new class membership to each point (pixel) in the PC space based on the closest cluster centroid.
At block 1435, the method determines new cluster centroids of each cluster based on the new class assignments.
At block 1440, the method evaluates the validity index of the new configuration, and compares this validity index with the validity index generated at block 1425. If the difference between the indexes exceeds a threshold, the method returns to block 1430 and goes through an additional iteration of clustering. Otherwise, the method uses the current clustering scheme. During the first iteration, the method compares the index evaluated at block 1440 with the index evaluated at block 1425. During subsequent iterations, the method compares the index evaluated at block 1440
The method evaluates the dynamic validity index at each iteration and continues until the change between iterations drops below a threshold. When this occurs, no further assignments are made and the method 1400 uses the current clustering scheme to partition the PC space. Empirical data suggests that between twenty and thirty iterations are required before the results saturate and drop below the threshold.
The method 1400 uses the dynamic validity index to provide a point of comparison for clustering configurations. The method recursively adjusts the class assignments based on the current set of centroids and adjusts the centroids based on the current class assignments. In this way, the method iteratively improves the clustering until the improvement saturates.
After the method 1400 completes the clustering operation at block 230 of
Iterative clustering classifies pixels in the decimated images and generates a different set of LS coefficients for each class.
The particular methods of the invention are described in terms of computer software with reference to a series of flow diagrams illustrated in
In one embodiment, as shown in
One embodiment of a computer system suitable for use as server 701 is illustrated in
The description of
Adaptive prediction using dimensionality reduction has been described. Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement which is calculated to achieve the same purpose may be substituted for the specific embodiments shown. This application is intended to cover any adaptations or variations of the present invention.
The terminology used in this application with respect to adaptive prediction using dimensionality reduction is meant to include all of these environments. Therefore, it is manifestly intended that this invention be limited only by the following claims and equivalents thereof.
The present application is related to the following commonly-owned, concurrently-filed applications: application Ser. No. ______ (Attorney Docket No. 080398.P738), filed ______, entitled “Dynamic Clustering for Adaptive Prediction filters”.