The present invention relates generally to motion capture, and more particularly to integrated motion capture where body motion capture and facial motion capture are performed substantially simultaneously and the results are integrated into a single motion capture output.
Existing methods and systems for motion capture (“MOCAP”) utilize certain specialized techniques for facial and body motion capture. The techniques share certain common elements, such as acquiring a motion with a plurality of MOCAP cameras, reconstructing a three-dimensional (“3-D”) virtual space modeling of the physical space in which the motion was captured, and tracking and labeling images of markers coupled at various places on the actor's body through a temporal sequence of volumetric frames comprising the virtual space. Each type of motion capture, however, has unique inherent difficulties that can be overcome in different ways.
Certain implementations as disclosed herein provide for integrated motion capture.
In one aspect, an integrated motion capture method is disclosed. The method includes: applying a marking material having a known pattern to a body and a face of an actor; configuring at least one first video motion capture camera to capture the marking material on the body of the actor; configuring at least one second video motion capture camera to capture the marking material on the face of the actor; substantially simultaneously capturing body motion data using the at least one first video motion capture camera and facial motion data using the at least one second video motion capture camera; and integrating the body motion data and the facial motion data.
In another aspect, an integrated motion capture system is disclosed. The system includes: marking material having a known pattern applied to body and face of an actor; at least one first video motion capture camera to capture the marking material on the body of the actor; at least one second video motion capture camera to capture the marking material on the face of the actor; a processor configured to: substantially simultaneously capture body motion data using the at least one first video motion capture camera and facial motion data using the at least one second video motion capture camera; and integrate the body motion data and the facial motion data.
Other features and advantages of the present invention will become more readily apparent to those of ordinary skill in the art after reviewing the following detailed description and accompanying drawings.
The details of the present invention, both as to its structure and operation, may be gleaned in part by study of the accompanying drawings, in which:
Certain implementations of the present invention as disclosed herein provide for integrated motion capture. One implementation utilizes sparse camera coverage. In this implementation, one high-definition (“HD”) motion capture (“MOCAP”) video camera is used for the body of an actor, another HD MOCAP video camera is used for the face of the actor, and a film camera is used to capture the entire performance (e.g., “film plate”). During a motion capture performance, integrated motion capture is achieved by acquiring both the face and body data substantially simultaneously, along with a film plate.
After reading this description it will become apparent to one skilled in the art how to practice the invention in various alternative implementations and alternative applications. However, although various implementations of the present invention will be described herein, it is understood that these embodiments are presented by way of example only, and not limitation. As such, this detailed description of various alternative implementations should not be construed to limit the scope or breadth of the present invention as set forth in the appended claims.
Body motion capture typically involves capturing the motion of an actor's torso, head, limbs, hands, and feet. These motions may be regarded as relatively gross movements. MOCAP cameras are placed about a “capture volume” large enough to encompass the actor's performance. The resulting reconstructed 3-D virtual space models the capture volume, and images of the markers coupled to the actor's body are temporally tracked through the frames of the reconstructed virtual space. Because the actor's body movements are relatively gross, large markers may be used to identify specific spots on the actor's body, head, limbs, hands, and feet. The large markers are more easily locatable in the resulting volumetric frames than smaller markers.
By contrast, facial motion capture involves capturing the movements only of the actor's face. These motions are regarded as relatively fine movements due to the subtle use of facial muscles required to manifest various human expressions. Consequently, the capture volume is usually only large enough to encompass the head, or even just the face. Further, many more comparatively small, markers are required to capture subtle expressive facial movements as opposed to more gross body movements. As shown in
Because of the differences in these types of motion capture, and the elaborate requirements for pluralities of specialized cameras and capture volumes, MOCAP systems and methods for improving the efficiency of capturing both facial and body motion significantly advance the state of the art.
One implementation illustrated in
As shown, two motion capture cameras 920, 922 and one film camera 924 are connected to the motion capture processor 910. One HD MOCAP video camera 920 is used for the body of an actor, another HD MOCAP video camera 922 is used for the face of the actor, and a film camera 924 is used to capture the entire performance. The MOCAP video camera 920 is focused on the actor's body 940 on which markers 960B-960F have been applied, and the MOCAP video camera 922 is focused on the actor's face 950 on which ink markers 960A have been applied. In some implementations, the camera 922 configured to be focused on the actor's face 950 can be attached to the head of the actor (e.g., on a helmet worn by the actor). In other implementations, other markers or facial features on the face 950 can be tracked by the camera 922.
The placement of the markers/features 960A is configured to capture movements of the face 950, while the placement of the markers 960B-960F is configured to capture motions of the body 940 including hands 970, arms 972, legs 974, 978, and feet 976 of the actor.
Example placements of ink markers on a model of an actor's face are shown in
Facial surface capture scans may also be acquired from the HD video used to capture the facial motion. In one implementation, a special pattern is projected onto the actor's face and captured along with the MOCAP data. The pattern may comprise visible light, IR light, or light of virtually any wavelength, and a matched band-pass filter may be used to isolate the pattern in real-time or during post-processing. The pattern may be projected only on a first frame and one other frame, or periodically, such as at every other frame. Many different frequencies of projection may be used depending upon circumstances. The pattern may also comprise, for example, a known (identifiable) random pattern, or a grid, or virtually any type of pattern.
Retroreflective markers may also be used with conventional MOCAP camera configuration, in addition to ink markings acquired using HD cameras. Such a configuration may provide real time face (and body) capture and display, while the HD camera arrangement provides for higher resolution and improved labeling during post-processing.
In one implementation, 2-D tracking is performed using video data obtained with one HD camera to capture facial motion. Ink markers on the face, for example, are tracked from frame to frame of the HD video data. Tracking relatively small ink dots is facilitated by the high resolution available using the HD camera. In another implementation, two or more HD cameras are used, from which 2-D tracking may be performed. Additionally, 3-D tracking may be performed, including reconstructing a 3-D virtual space as described above, with additional benefits stemming from the high resolution of the HD cameras. Further, FACS type processing may enhance tracking and facial model reconstruction in 3-D.
In the implementation illustrated in
In another implementation, the encoding scheme for the markers includes “active” as well as “passive” encoding. For example, as discussed above, passively encoded patterns include a code that is captured by motion capture cameras and the camera and decoded. The decoded data can be further used for integration of the motion of a digital character. However, active encoding may be used where the visual/optical signal of the marker to be captured is changing temporarily.
In yet another implementation, the patterns can use fluorescent material. These patterns operate as “primary markers,” which have an “active identity” but are “passively powered.” (By comparison, an “actively powered” marker typically emits energy of some kind, e.g., an LED, which emits light).
Referring to
Also, the marker placements on the 3-D model depicted in
Referring back to
The motion capture processor 910 performs the integration (i.e., performs a “reconstruction”) of the 2-D images to generate the frame sequence of three-dimensional (“3-D,” or “volumetric”) marker data. This sequence of volumetric frames is often referred to as a “beat,” which can also be thought of as a “take” in cinematography. Conventionally, the markers are discrete objects or visual points, and the reconstructed marker data comprise a plurality of discrete marker data points, where each marker data point represents a spatial (i.e., 3-D) position of a marker coupled to a target, such as an actor. Thus, each volumetric frame includes a plurality of marker data points representing a spatial model of the target. The motion capture processor 910 retrieves the volumetric frame sequence and performs a tracking function to accurately associate (or, “map”) the marker data points of each frame with the marker data points of preceding and subsequent frames in the sequence.
In one implementation, one or more known patterns are printed onto strips 960D. The strips 960D are then wrapped around each limb (i.e., appendage) of an actor such that each limb has at least two strips. For example, two strips 960D are depicted in
In addition to the known pattern markers, the actor may be outfitted with a large number of LEDs on the body. In one implementation, the actor wears a special suit on which the LEDs are disposed. In one example, the LEDs are disposed in a pattern comprising lines. The lines of LEDs may be separated by known distances, thus forming a grid. Such a grid of LEDs is tracked in conjunction (and in one implementation, simultaneously) with the known pattern markers. The known pattern markers serve to improve tracking resolution and labeling of the grid pattern by providing unique identity information to the otherwise substantially uniformly disposed plurality of identical LEDs. Thus, temporal tracking and labeling continuity in the virtual space are enhanced.
In another implementation, further improvement in tracking resolution and labeling of the LEDs is achieved by using differently colored LEDs for the lines comprising the grid. Intersections of the differently colored lines (i.e., vertices of the grid) therefore gain greater identifiability during tracking. By comparison, like-colored LEDs comprising the grid would be individually difficult to track, and rotation and orientation information would be difficult to derive. That is, like-colored LEDs may be considered as “passive identity,” “actively powered,” “secondary markers.” In one implementation, however, the LEDs are given “active identity” characteristics by configuring them to pulse or blink according to identifiable temporal sequences.
Motion capture cameras are then set up in a capture space. In one implementation, at least one HD MOCAP video camera is configured to be used for motion capturing the body of an actor (at box 1020), and at least one other HD MOCAP video camera is configured to be used for motion capturing the face of the actor (at box 1030). Further, a film camera is set up to capture the entire performance on a film plate. Then, at box 1040, body motion data and face motion data are captured substantially simultaneously. The captured body motion data and the facial motion data are integrated, at box 1050.
In one implementation, 2-D tracking is performed using video motion data obtained with one HD to capture body motion. Known pattern markers on the body and limbs, for example, are tracked from frame to frame of the HD video data. Tracking the known patterns is facilitated by the high resolution available using the HD camera. In another implementation, two or more HD cameras are used, from which 2-D tracking may be performed. Additionally, 3-D tracking may be performed, including reconstructing a 3-D virtual space as described above, with additional benefits stemming from the high resolution of the HD cameras. Also, FACS type solving may enhance tracking and body model reconstruction in 3-D. A predefined skeleton model may be used to aid construction of a skeleton modeling the actual data obtained using multiple HD cameras to capture the body motion data.
In one implementation, a system implementing facial and body motion capture methods described in the foregoing is augmented with improved tracking methods. A multi-point tracker is implemented for tracking both the primary and secondary patterns. A solver then resolves the translation information from the secondary markers (secondary markers providing no rotation or orientation information), and the translations and rotations from the primary markers onto a skeleton model. The solver may be used to re-project the skeleton data and position information for the primary and secondary markers onto the original film plate. Thus, inconsistencies in tracking, labeling, and other stages of processing may be identified and/or rectified at an early stage by ensuring that the resolved data are in lock step with the images acquired on the film plate.
Various illustrative implementations of the present invention have been described. However, one of ordinary skill in the art will recognize that additional implementations are also possible and within the scope of the present invention. For example, known and identifiable random patterns may be printed, painted, or inked onto a surface of an actor or object. Further, any combination of printing, painting, inking, tattoos, quantum nanodots, and inherent body features may be used to obtain a desired pattern.
Accordingly, the present invention is not limited to only those embodiments described above.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US08/75284 | 9/4/2008 | WO | 00 | 6/30/2010 |
Number | Date | Country | |
---|---|---|---|
60969908 | Sep 2007 | US |