The comparing videos application 120 can comprise: model module 125, identify parameters module 135, canonical form module 130, register module 140, and compare module 145. The model module 125 can model each video sequence. In one embodiment, each video sequence can be modeled as the output of a Linear Dynamical System (LDS). The identify parameters module 135 can identify the parameters of the models (e.g., the LDSs from the multiple video sequences). The canonical form module 130 can transform the parameters into a canonical form so that the all parameters from all video sequences can be expressed with respect to the same basis. The associate parameters module 140 can associate scenes from the video sequences using the transformed parameters and register the associated scenes with each other. The compare module 145 can compare the associated images to each other using various image-comparing mechanisms.
In 305, each video sequence can be modeled, for example, as the output of a LDS. For example, the video sequence {I(t)}t=1F (where I(t) is a p-dimensional vector representing the image frame of the video at time t, p is the number of pixels in each image , and F is the number of frames in the video) can be modeled as the output of a LDS as follows:
z(t+1)=Az(t)+Bv(t)
I(t)=C0+Cz(t)+w(t)
In the above formulas, z(t) is an n-dimensional vector, with n much smaller than p, representing a compressed version of the image at time t. Specifically, z(t) are the coefficients needed to express the image I(t) in terms of the appearance images (columns of C) which represent the appearance of the video, and the mean image C0. Together, C0 and C form a basis for all the images in a video. This LDS model can decouple the appearance of the video (represented by C0 and C) from the temporal evolution of the video (represented by A). Since C0 and C are the only model parameters that depend on the image pixels, the spatial registration can be recovered independently from the temporal lag between the video sequences. In addition, Bv(t) and w(t) model the error between the output of the LDS and the image I(t) due to noise. These errors can be ignored (if there is no noise) or approximated (when there is noise).
Note that
Referring back to
where Z=SVT and C=U . The parameter A can then be computed as follows:
A=[z(2), . . . , z(F)][z(1), . . . , z(F−1)]t
From above, the parameters A and Ci were found. Then, if the pi by n matrix Ci is in the matrix formed by rows Σj=1i−1pj+1 to Σj=1ipi of C, the pair (A, Ci) can be converted to a canonical form, as explained below with respect to 315.
Referring back to
where the eigenvalues of Ac are given by {σ1±√{square root over (−1)}ω1, L , σq±√{square root over (−1)}ωq, λ1, L , λn-2q}, where σi and ωi are parameters capturing the oscillatory behavior of image intensities in the video, λi captures transient behaviors, and q is the number of complex eigenvalues in each image. Additional information on the JCF can be found at W. J. Rugh. Linear System Theory (Prentice Hall, 2d. ed. 1996). It should be noted other canonical forms, which use a different basis (e.g., a Reachability Canonical Form (RCF), an Observability Canonical Form (OCF) can be used in other embodiments. Additional information on the RCF and the OCF can be found at W. J. Rugh, Linear System Theory (Prentice Hall, 2d. ed. 1996).
The procedure for converting the LDS parameters (A, Ci) into any canonical form involves finding an n by n invertible matrix MI such that MiAMi−1, γCiMi−1)=(Ac, Cc), where the subscript c represents any canonical form, and the p-dimensional vector γ (where p is the number of pixels) is an arbitrary vector. In one embodiment, γ can be chosen to be [1 1 L 1], so that all rows of C are weighted equally. Once M is found, M can be used to convert LDS (A, Ci) into the canonical using the formula (MiAMi−1, CiMi−1).
It should be noted that the JCF is unique only up to a permutation of the eigenvalues. However, if the eigenvalues are different, a predefined way can be used to sort the eigenvalues to obtain a unique JCF.
Referring back to 320 of 3×M and X2ε
3×M.The corresponding columns of X1 and X2 are the location of the matched features in homogenous co-ordinates and M is the total matches from the n+1 image pairs. The homography H (such that X2˜HX1) is recovered. This can be done by running Random Sampling And Consensus (RANSAC) to obtain the inliers from the matches. RANSAC is a method that can be used to remove outliers from a set. It can be based on randomly sampling points and analyzing how consistent the model obtained with the sampled point is with the rest of the data. More information on RANSAC can be found in M. A. Fischler et al., “RANSAC Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”, Communications of the ACM, 26 (1981) 381-395. A homography can then be fit using a non-linear method. More information about such a non-linear method can be found in R. Hartley et al., Multiple View Geometry in Computer Vision (Cambridge 2000). This allows the contribution of the correspondences from every image pair to be weighted equally, because the best matches given by RANSAC could arise from the mean image or the dynamic appearance images or both.
As set forth above in
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant art(s) that various changes in form and detail can be made therein without departing from the spirit and scope of the present invention. Thus, the present invention should not be limited by any of the above-described exemplary embodiments.
For example, multiple examples of different types of video sequences are illustrated in
In addition, it should be understood that the figures described above, which highlight the functionality and advantages of the present invention, are presented for example purposes only. The architecture of the present invention is sufficiently flexible and configurable, such that it may be utilized in ways other than that shown in the figures.
Further, the purpose of the Abstract of the Disclosure is to enable the U.S. Patent and Trademark Office and the public generally, and especially the scientists, engineers and practitioners in the art who are not familiar with patent or legal terms or phraseology, to determine quickly from a cursory inspection the nature and essence of the technical disclosure of the application. The Abstract of the Disclosure is not intended to be limiting as to the scope of the present invention in any way.
Finally, it is the applicant's intent that only claims that include the express language “means for” or “step for” be interpreted under 35 U.S.C. 112, paragraph 6. Claims that do not expressly include the phrase “means for” or “step for” are not to be interpreted under 35 U.S.C. 112, paragraph 6.
This application is based on provisional application 61/167,715, which was filed on Apr. 8, 2009, and which is herein incorporated by reference in its entirety.
This invention was made with government support under ISS-0447739 and EHS-0509101 awarded by the National Science Foundation and under N00014-05-1-0836 awarded by the Office of Naval Research. The government has certain rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
6320624 | Ayer et al. | Nov 2001 | B1 |
6798897 | Rosenberg | Sep 2004 | B1 |
7289138 | Foote et al. | Oct 2007 | B2 |
7418649 | Li | Aug 2008 | B2 |
7428345 | Caspi et al. | Sep 2008 | B2 |
7583275 | Neumann et al. | Sep 2009 | B2 |
8073287 | Wechsler et al. | Dec 2011 | B1 |
8170326 | Gulati et al. | May 2012 | B2 |
8224093 | Grady et al. | Jul 2012 | B2 |
20060215934 | Peleg et al. | Sep 2006 | A1 |
20070097266 | Souchard | May 2007 | A1 |
20080056577 | Grady et al. | Mar 2008 | A1 |
20080247646 | Chefd'hotel et al. | Oct 2008 | A1 |
20080310716 | Jolly et al. | Dec 2008 | A1 |
20090060333 | Singaraju et al. | Mar 2009 | A1 |
20090097727 | Jolly et al. | Apr 2009 | A1 |
20100104186 | Grady et al. | Apr 2010 | A1 |
20110261262 | Vercauteren et al. | Oct 2011 | A1 |
Entry |
---|
Richard Szeliski, “Image Alignment and Stitching: A Tutorial”, Fundametanl Trends in Computer Graphics and Vision, vol. 2, No. 1, pp. 1-104 (2006). |
Chris Harris et al., “A Combined Corner and Edge Detection”, In Proceedings of the Fourth Alvey Vision Conference, pp. 147-151 (1988). |
David G. Lowe et al., “Distinctive Image Features from Scale-invariant Keypoints”, In International Journal of Computer Vision, vol. 60, No. 2, pp. 91-110 (2004). |
Matthew Brown et al., “Multi-Image Matching Using Multi-Scale Oriented Patches”, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp. 510-517, Jun. 2005. |
Martin A. Fischler et al., “Random Sample Comsensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”, Communications of the ACM, vol. 24, No. 6, pp. 381-395, Jun. 1981. |
Yaron Caspi et al., “Spatio-Temporal Alighment of Sequences”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, No. 11, pp. 1409-1424, Nov. 2002. |
Yaron Caspi et al., “Feature-Based Sequence-to-Sequence Matching”, International Journal of Computer Vision, vol. 68, No. 1, pp. 53-64 (2006). |
Yaron Ukrainitz et al., “Aligning Sequences and Actions by Maximizing Space-Time Correlations”, In European Conference on Computer Vision, pp. 538-550 (2006). |
Avinash Ravichandran et al., “Mosaicing Non-Rigid Dynamical Scenes”, In Workshop on Dynamic Vision, pp. 1-13 (2007). |
Gianfranco Doretto et al., “Dynamic Textures”, International Journal of Computer Vision, vol. 51, No. 2, pp. 91-109 (2003). |
Peter Van Overschee et al., “N4SID: Subspace Algorithms for the Identification of Combineed Deterministic-Stochastic Systems”, Automatica, Special Issue in Statistical Signal Processing and Control, pp. 75-93 (1994). |
Antoni B. Chan et al., “Probabilistic Kernels for the Classification of Auto-Regressive Visual Processes”, in Conferecne on Computer Vision and Parttern Recognition (CVPR'05), vol. I, pp. 846-851 (2005). |
Rene Vidal et al., “Optical Flow Estimation and Segmentation of Multiple Moving Dynamic Textures”, In Conferecne on Computer Vision and Parttern Recognition (CVPR'05), vol. II, pp. 516-521 (2005). |
Wilson J. Rugh, “Linear System Theory: Second Edition”, Prentice Hall, Copyright 1996 (600 pages). |
Richard Hartley et al., “Multiple View Geometry in Computer Vision: Second Edition”, Cambridge University Press, Copyright 2003 (670 pages). |
Yaron Caspi et al., “Sequence-to-Sequence Alignment”, http://www.wisdom.weizmann.ac.il/˜vision/VideoAnalysis/Demos/Seq2Seq/Seq2Seq.html, printed Apr. 7, 2014 (7 pages). |
Yaron Caspi et al., “Feature-Based Sequence-to-Sequence Matching”, http://www.wisdom.weizmann.ac.il/˜vision/VideoAnalysis/Demos/Traj2Tray, Oct. 6, 2006 (2 pages). |
Y. Ukrainitz et al., “Aligning Sequences and Actions by Maximizing Space-Time Correlations”, http://www.wisdom.weizmann.ac.il/˜vision/SpaceTimeCorrelations.html, Feb. 25, 2006 (8 pages). |
Ali Kemal Sinop et al., “A seeded image segmentation framework unifying graph cuts and random walker which yields a new algorithm,” ICCV 2007, Oct. 2007 (8 pages). |
Grady et al., “Three interactive graph-based segmentation methods applied to cardiovascular imaging” in O.F. Nikos Paragios, Yunmei Chen, editor, Handbook of Mathematical Models in Computer Vision, pp. 453-469, Springer, 2006. |
Avinash. Ravichandran et al., “Video Registration using Dynamic Textures”, European Conference on Computer Vision, 2008 (12 pages). |
Avinash Ravichandran et al., “Mosaicing Nonrigid Dynamical Scenes”, International Workshop on Dynamical Vision, Oct. 2007 (13 pages). |
U.S. Appl. No. 12/558,649. |
Number | Date | Country | |
---|---|---|---|
20100260439 A1 | Oct 2010 | US |
Number | Date | Country | |
---|---|---|---|
61167715 | Apr 2009 | US |