Method for optimizing off-line facial feature tracking

Information

  • Patent Grant
  • 6834115
  • Patent Number
    6,834,115
  • Date Filed
    Monday, August 13, 2001
    23 years ago
  • Date Issued
    Tuesday, December 21, 2004
    20 years ago
Abstract
The present invention relates to a technique for optimizing off-line facial feature tracking. Facial features in a sequence of image frames are automatically tracked while a visual indication is presented of the plurality of tracking node locations on the respective image frames. The sequence of image frames may be manually paused at a particular image frame in the sequence of image frames if the visual indication of the tracking node locations indicates that at least one location of a tracking node for a respective facial feature is not adequately tracking the respective facial feature. The location of the tracking node may be reinitialized by manually placing the tracking node location at a position on the particular image frame in the monitor window that corresponds to the respective facial feature. Automatic tracking of the facial feature may be continued based on the reinitialized tracking node location.
Description




BACKGROUND OF THE INVENTION




The present invention relates to avatar animation, and more particularly, to facial feature tracking.




Animation of photo-realistic avatars or of digital characters in movie or game production generally requires tracking of an actor's movements, particularly for tracking facial features. Accordingly, there exists a significant need for improved facial feature tracking. The present invention satisfies this need.




SUMMARY OF THE INVENTION




The present invention is embodied in a method, and related apparatus, for optimizing off-line facial feature tracking. In the method a monitor window is provided that has a visual indication of a plurality of tracking node locations with respect to facial features in a sequence of image frames. The monitor window has a control for pausing at an image frame in the sequence of image frames. The facial features in the sequence of image frames are automatically tracked while the visual indication is presented of the plurality of tracking node locations on the respective image frames. The sequence of image frames may be manually paused at a particular image frame in the sequence of image frames if the visual indication of the tracking node locations indicates that at least one location of a tracking node for a respective facial feature is not adequately tracking the respective facial feature. The location of the tracking node may be reinitialized by manually placing the tracking node location at a position on the particular image frame in the monitor window that corresponds to the respective facial feature. Automatic tracking of the facial feature may be continued based on the reinitialized tracking node location.




In other more detailed features of the invention, the tracking of facial features in the sequence of facial image frames of the speaking actor may performed using bunch graph matching, or using transformed facial image frames generated based on wavelet transformations, such as Gabor wavelet transformations.




Other features and advantages of the present invention should be apparent from the following description of the preferred embodiments taken in conjunction with the accompanying drawings, which illustrate, by way of example, the principles of the invention.











BRIEF DESCRIPTION OF THE DRAWINGS





FIG. 1

is a flow diagram for illustrating a method for optimizing off-line facial feature tracking using manual reinitialization of track node location, according to the present invention.





FIG. 2

is a schematic diagram of a monitor window for providing a visual indication of a plurality of tracking node locations with respect to facial features in a sequence of image frames for use in the method for off-line facial feature tracking of FIG.


1


.





FIG. 3

is a schematic diagram of a monitor window for providing a visual indication of a plurality of tracking node locations with respect to facial features in a sequence of image frames for use in the method for optimizing off-line facial feature tracking of FIG.


1


.











DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS




The present invention provides a technique for optimizing facial feature tracking for extraction of animation values or parameters. Automatic extraction generally increases the speed and reduces the tedium associated with the extraction task. Manual intervention permits correction of tracking inaccuracies or imperfections that may reduce the desirability of automatic extraction.




With reference to

FIG. 1

, the invention may be embodied in a method, and related apparatus, for optimizing off-line facial feature tracking. In the method, a monitor window


22


is provided that has a visual indication of a plurality of tracking node locations


24


with respect to facial features in a sequence of image frames


26


(step


12


). The monitor window has a control


28


for pausing at an image frame in the sequence of image frames. The facial features in the sequence of image frames are automatically tracked while the visual indication is presented of the plurality of tracking node locations on the respective image frames (step


14


). The sequence of image frames may be manually paused at a particular image frame in the sequence of image frames if the visual indication of the tracking node locations indicates that at least one location


24


-


1


of a tracking node for a respective facial feature is not adequately tracking the respective facial feature (step


16


). The location of the tracking node may be reinitialized by manually placing the tracking node location at a position


24


-


1


′ on the particular image frame in the monitor window that corresponds to the respective facial feature (step


18


). Automatic tracking of the facial feature may be continued based on the reinitialized tracking node location (step


20


).




The tracking of facial features in the sequence of facial image frames of the speaking actor may performed using bunch graph matching, or using transformed facial image frames generated based on wavelet transformations, such as Gabor wavelet transformations. Wavelet-based tracking techniques are described in U.S. Pat. No. 6,272,231. The entire disclosure of U.S. Pat. No. 6,272,231 is hereby incorporated herein by reference. The techniques of the invention may be accomplished using generally available image processing systems.




The manual intervention allows scalability of the animation tracking. For high quality animation, frequent manual interaction may be employed to insure accurate tracking. For lower quality animation, manual interaction may be employed less frequently for correcting only significant inconsistencies.




Although the foregoing discloses the preferred embodiments of the present invention, it is understood that those skilled in the art may make various changes to the preferred embodiments without departing from the scope of the invention. The invention is defined only by the following claims.



Claims
  • 1. Method for optimizing off-line facial feature tracking, comprising the steps for:providing a monitor window that has a visual indication of a plurality of tracking node locations with respect to facial features in a sequence of image frames, the monitor window having a control for pausing at an image frame in the sequence of image frames; automatically tracking the facial features in the sequence of image frames while presenting the visual indication of the plurality of tracking node locations on the respective image frames; manually pausing the sequence of image frames at a particular image frame in the sequence of image frames if the visual indication of the tracking node locations indicates that at least one location of a tracking node for a respective facial feature is not adequately tracking the respective facial feature; reinitializing the at least one location of the tracking node by manually placing the tracking node location at a position on the particular image frame in the monitor window that corresponds to the respective facial feature; and continuing automatic tracking of the facial feature based on the reinitialized at least one tracking node location.
  • 2. Method for optimizing off-line facial feature tracking as defined in claim 1, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using bunch graph matching.
  • 3. Method for optimizing off-line facial feature tracking as defined in claim 1, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using transformed facial image frames generated based on wavelet transformations.
  • 4. Method for optimizing off-line facial feature tracking as defined in claim 1, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using transformed facial image frames generated based on Gabor wavelet transformations.
  • 5. Apparatus for optimizing off-line facial feature tracking, comprising:a monitor window means for providing a visual indication of a plurality of tracking node locations with respect to facial features in a sequence of image frames, the monitor window having a control for pausing at an image frame in the sequence of image frames; means for automatic tracking of the facial features in the sequence of image frames while presenting the visual indication of the plurality of tracking node locations on the respective image frames; means for manually pausing the sequence of image frames at a particular image frame in the sequence of image frames if the visual indication of the tracking node locations indicates that at least one location of a tracking node for a respective facial feature is not adequately tracking the respective facial feature; means for reinitializing the at least one location of the tracking node by manually placing the tracking node location at a position on the particular image frame in the monitor window that corresponds to the respective facial feature; and means for continuing automatic tracking of the facial feature based on the reinitialized at least one tracking node location.
US Referenced Citations (44)
Number Name Date Kind
4725824 Yoshioka Feb 1988 A
4805224 Koezuka et al. Feb 1989 A
4827413 Baldwin et al. May 1989 A
5159647 Burt Oct 1992 A
5168529 Peregrim et al. Dec 1992 A
5187574 Kosemura et al. Feb 1993 A
5220441 Gerstenberger Jun 1993 A
5280530 Trew et al. Jan 1994 A
5333165 Sun Jul 1994 A
5383013 Cox Jan 1995 A
5430809 Tomitaka Jul 1995 A
5432712 Chan Jul 1995 A
5511153 Azarbayejani et al. Apr 1996 A
5533177 Wirtz et al. Jul 1996 A
5550928 Lu et al. Aug 1996 A
5581625 Connell Dec 1996 A
5588033 Yeung Dec 1996 A
5680487 Markandey Oct 1997 A
5699449 Javidi Dec 1997 A
5714997 Anderson Feb 1998 A
5715325 Bang et al. Feb 1998 A
5719954 Onda Feb 1998 A
5736982 Suzuki et al. Apr 1998 A
5764803 Jacquin et al. Jun 1998 A
5774591 Black et al. Jun 1998 A
5802220 Black et al. Sep 1998 A
5809171 Neff et al. Sep 1998 A
5828769 Burns Oct 1998 A
5917937 Szeliski et al. Jun 1999 A
5982853 Liebermann Nov 1999 A
5995119 Cosatto et al. Nov 1999 A
6011562 Gagné Jan 2000 A
6031539 Kang et al. Feb 2000 A
6044168 Tuceryan et al. Mar 2000 A
6047078 Kang Apr 2000 A
6052123 Lection et al. Apr 2000 A
6331853 Miyashita Dec 2001 B1
6353437 Gagne Mar 2002 B1
6714661 Buddenmeier et al. Mar 2004 B2
20020067362 Agostino Nocera et al. Jun 2002 A1
20020136435 Prokoski Sep 2002 A1
20030007666 Stewartson et al. Jan 2003 A1
20030169907 Edwards et al. Sep 2003 A1
20040012594 Gauthier et al. Jan 2004 A1
Foreign Referenced Citations (3)
Number Date Country
4406020 Jun 1995 DE
0807902 Nov 1997 EP
WO9953443 Oct 1999 WO
Non-Patent Literature Citations (59)
Entry
King et al, Automatic face location detection and tracking for moel based video coding, 1996, pp. 1098-1101.*
Notification of Transmittal of the International Search Report or the Declaration, International Search Report for PCT/US02/23973, mailed Nov. 18, 2002.
Valente, Stephanie et al., “A Visual Anaylsis/Synthesis Feedback Loop for Accurate Face Tracking”, Signal Processing Image Comunication, Elsevier Science Publishers, vol. 16, No. 6, Feb. 2001, pp. 585-608.
Yang, Tzong Jer, “Face Analysis and Synthesis”, Jun. 1, 1999, Retrieved from Internet, http://www.cmlab.csie,ntu.edu.tw/ on Oct. 25, 2002, 2 pg.
Yang, Tzong Jer, “VR-Face: An Operator Assisted Real-Time Face Tracking System”, Communication and Multimedia Laboratory, Dept. of Computer Science and Information Engineering, National Taiwan University, Jun. 1999, pp. 1-6.
International Search Report for PCT/US99/07935.
Akimoto, T., et al., “Automatic Creation of Facial 3D Models”, IEEE Computer Graphics & Apps., pp. 16-22, Sep. 1993.
Ayache, N. et al., “Rectification of Images for Binocular and Trinocular Stereovision”, Proc. Of 9th Int'l., Conference on Pattern Recognition, 1, pp. 11-16, Italy, 1988.
Belhumeur, P., “A Bayesian Approach to Binocular Stereopsis”, Int'l. J. Of Computer Vision, 19 (3), pp. 237-260, 1996.
Beymer, D. J., “Face Recognition Under Varying Pose”, MIT A.I. Lab, Memo No. 1461,pp. 1-13, 12/93.
Beymer, D.J., “Face Recognition Under Varying Pose”, MIT A.I. Lab. Research Report, 1994, pp. 756-761.
Buhman, J. et al., “Distortion Invariant Object Recognition By Matching Hierarchically Labeled Graphs”, In Proceedings IJCNN Int'l Conf. Of Neural Networks, Washington, D.C. Jun. 1989, pp. 155-159.
DeCarlo, D., et al., “The integration of Optical Flow and Deformable Models with Applications to Human Face Shape and Motion Estimation”, pp. 1-15, In Proc. CVPR '96, pp. 231-238 (published)[TM Sep. 18, 1996].
Devemay, F. et al., “Computing Differential Properties of 3-D Shapes from Steroscopic Images without {3-D} Models”, INRIA, RR-2304, pp. 1-28, Sophia, Antipolis, 1994.
Dhond, U., “Structure from Stereo: a Review”, IEEE Transactions on Systems, Man, and Cybernetics, 19(6), pp. 1489-1510, 1989.
Fleet, D.J., et al., “Computation of Component Image Velocity from Local Phase Information”, Int., J. Of Computer Vision, 5:1, pp. 77-104 (1990).
Fleet, D.J., et al. Measurement of Image Velocity, Kluwer Academic Press, Boston, pp. I-203, 1992.
Hall, E.L., “Computer Image Processing And Recognition”, Academic Press 1979, 99. 468-484.
Hong, H.,et al., “Online Facial Recognition based on Personalized Gallery”, Proceedings of Int'l Conference On Automatic Face And Gesture Recognition, pp. 1-6, Japan Apr. 1997.
Kolocsai, P., et al, Statistical Analysis of Gabor-Filter Representation, Proceedings of International Conference on Automatic Face and Gesture Recognition, 1997, 4 pp.
Kruger, N., “Visual Learning with a priori Constraints”, Shaker Verlag, Aachen, Germany, 1998, pp. 1-131.
Kruger, N., et al, “Principles of Cortical Processing Applied to and Motivated by Artificial Object Recognition”, Institut fur Neuroinformatik, Internal Report 97-17, Oct. 97, pp. 1-12.
Kruger, N., et al, “Autonomous Learning of Object Representations Utilizing Self-Controlled Movements”, 1998, Proceedings of NN98, 5 pp.
Kruger, N., et al, “Object Recognition with a Sparse and Autonomously Learned Representation Based on Banana Wavelets”, Internal Report 96-11, Institut fur Neuroinformatik, Dec. 96, pp. 1-24.
Kruger, N., et al, “Object Recognition with Banana Wavelets”, European Symposium on Artificial Neural Networks (ESANN97), 1997, 6 pp.
Kruger, N., “An Algorithm for the Learning of Weights in Discrimination Functions Using a priori Constraints”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, No. 7, Jul. 1997, pp. 764-768.
Lades, M., et al, “Distortion Invarient Object Recognition in the Dynamic Link Architecture”, IEEE Transactions on Computers, vol. 42, No. 3, 1993, 11 pp.
Luong, Q. T., et al, “The Fundamental Matrix, Theory, Algorithm, and Stability Analysis”, INRIA, 1993, pp. 1-46.
Manjunath, B. S., et al, “A Feature Based Approach to Face Recognition”, In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 373-378, 3/92.
Mauer, T., et al, “Single-View Based Recognition of Faces Rotated in Depth”, In Proceedings of the International Workshop on Automatic Face and Gesture Recognition, pp. 248-253, Zurich, CH, Jun. 26, 1995.
Mauer, T., et al, “Learning Feature Transformations to Recognize Faces Rotated in Depth”, In Proceedings of the International Conference on Artificial Neural Networks, vol. 1, pp. 353-358, Paris, France, Oct. 9-13, 1995.
Mauer, T., et al, “Tracking and Learning Graphs and Pose on Image Sequences of Faces”, Proceedings of 2nd International Conference on Automatic Face and Gesture Recognition, Oct. 14-16, 1996, pp. 176-181.
Maybank, S. J., et al, “A Theory of Self-Calibration of a Moving Camera”, International Journal of Computer Vision, 8(2), pp. 123-151, 1992.
McKenna, S.J., et al, Tracking Facial Feature Points With Gabor Wavelets and Shape Models, (publication & date unknown), 6 pp.
Okada, K., et al, “The Bochum/USC Face Recognition System”, 19 pp. (publication & date unknown).
Okutomi, M., et al, “A Multiple-Baseline Stereo”, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 15, No. 4, pp. 353-363, Apr. 1993.
Peters, G., et al, “Learning Object Representations by Clustering Banana Wavelet Responses”, Tech. Report IR-INI 96-09, Institut fur Neuroinformatik, Ruhr Universitat, Bochum, 1996, 6 pp.
Phillips, P. J., et al, “The Face Recognition Technology (FERET) Program”, Proceedings of Office of National Drug Control Policy, CTAC International Technology Symposium, Aug. 18-22, 1997, 10 pages.
Pighin, F, et al, “Synthesizing Realistic Facial Expressions from Photographs”, In SIGGRAPH 98 Conference Proceedings, pp. 75-84, Jul. 1998.
Roy, S., et al, “A Maximum Flow Formulation of the N-Camera Stereo Correspondence Problem”, IEEE, Proceedings of International Conference on Computer Vision, Bombay, India, Jan. 1998, pp. 1-6.
Sara, R. et al “3-D Data Acquision and Interpretation for Virtual Reality and Telepresence”, Proceedings IEEE Workshop Computer Vision for Virtual Reality Based Human Communication, Bombay, Jan. 1998, 7 pp.
Sara, R. et al “Fish-Scales: Representing Fuzzy Manifolds”, Proceedings International Conference Computer Vision, ICCV '98, pp. 811-817, Bombay, Jan. 1998.
Sara, R., et al, “On Occluding Contour Artifacts in Stereo Vision”, IEEE, Proceedings of International Conference Computer Vision and Pattern Recognition, Puerto Rico, 1997, 6 pp.
Steffens, J., et al, “PersonSpotter—Fast and Robust System for Human Detection, Tracking, and Recognition”, Proceedings of International Conference on Automatic Face and Gesture Recognition, 6 pp., Japan—Apr. 1998.
Theimer, W. M., et al, “Phase-Based Binocular Vergence Control and Depth Reconstruction using Active Vision”, CVGIP: Image Understanding, vol. 60, No. 3, Nov. 1994, pp. 343-358.
Tomasi, C., et al., “Stereo Without Search”, Proceedings of European Conference on Computer Vision, Cambridge, UK, 1996, 14 pp. (7 sheets).
Triesch, J., et al, “Robust Classification of Hand Postures Against Complex Backgrounds”, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, Killington, VT, Oct. 1996, 6 pp.
Turk, M., et al, “Eigenfaces for Recognition”, Journal of Cognitive Neuroscience, vol. 3, No. 1, pp. 71-86, 1991.
Wiskott, L., et al, “Face Recognition and Gender Determination”, Proceedings of International Workshop on Automatic Face and Gesture Recognition, pp. 92-97, Zurich CH, Jun. 26, 1995.
Wiskott, L., et al, “Face Recognition by Elastic Bunch Graph Matching”, Internal Report, IR-INI 96-08, Institut fur Neuroinformatik, Ruhr-Universitat, Bochum, pp. 1-21, Apr. 1996.
Wiskott, L., “Labeled Graphs and Dynamic Link Matching for Face Recognition and Scene Analysis”, Verlag Harr Deutsch, Thun-Frankfurt am Main. Reihe Physik, Dec. 1995, pp. 1-109.
Wiskott, L., “Phanton Faces for Face Analysis”, Proceedings of 3rd Joint Symposium on Neural Computation, Pasadena, CA, vol. 6, pp. 46-52, Jun. 1996.
Wiskott, L., “Phanton Faces for Face Analysis”, Internal Report, IR-INI 96-06, Institut fur Neoroinformatik, Ruhr-Universitat, Bochum, Germany, Apr. 1996, 12 pp.
Wiskott, L. “Phantom Faces for Face Analysis”, Pattern Recognition, vol. 30, No. 6, pp. 837-846, 1997.
Wiskott, L., et al, “Face Recognition by Elastic Bunch Graph Matching”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), pp. 775-779, 1997.
Wong, R., et al, “PC-Based Human Face Recognition System”, IEEE, pp. 641-644, 1992.
Wurtz, R., “Object Recognition Robust Under Translations, Deformations, and Changes in Background”, IEEE Transactions on Patern Analysis and Machine Intelligence, vol. 19, No. 7, Jul. 1997, pp. 769-775.
Wurtz, R., et al, “Corner Detection in Color Images by Multiscale Combination of End-stopped Cortical Cells”, Artificial Neural Networks—ICANN '97, Lecture Notes in Computer Science, vol. 1327, pp. 901-906, Springer-Verlag, 1997.
Yao, Y., et al, “Tracking a Dynamic Set of Feature Points”, IEEE Transactions on Image Processing, vol. 4, No. 10, Oct., 1995, pp. 1382-1394.