Embodiments are generally related to systems for creating a graphical model of a motion capture subject from information collected from camera images and optionally from position sensors, and more specifically to a method for accurately calibrating motion capture images with a scale frame.
An articulated, movable graphical model of a person may be created by measuring the movements of a human body performing motions such as walking, flexing arms or legs, rotating the head, and so on. The graphical model may take the form of a biomechanical skeleton. The positions of a person's limbs and joints may be recorded and mapped onto the biomechanical skeleton to simulate the motions of a human being or other motion capture subject. An image of a character may be superimposed over the biomechanical skeleton in a scene in a video, computer game, or motion picture. A biomechanical skeleton may be articulated differently than a human skeleton, possibly by modeling fewer joints or by aggregating some complicated structures, such as a hand or foot, into a simpler model. For example, a foot in a biomechanical skeleton may lack individually movable toes.
Motion capture systems have used several different approaches for recording and measuring a subject's motions and determining parameters for a model such as a biomechanical skeleton. Some motion capture systems use triangulation to detect limb and joint positions in a camera image, for example by recording a scene with more than one camera simultaneously and comparing images captured by each camera with known camera positions, camera angles, and other factors to compute skeleton parameters such as limb length, limb angle, joint position, head position, head tilt and rotation angles, waist and torso positions and angles, and so on. Motion capture systems using triangulation may require space for mounting more than one camera outside the field of view representing a scene to be captured. Such systems may be very expensive to set up, difficult to calibrate, complicated to operate, and may require sophisticated post-acquisition data analysis to process images from different cameras, each with a different view of a scene and motion capture subject.
Some motion capture systems place one or more capture targets on a motion capture subject to provide reference positions or reference points for triangulation. The capture targets, for example reflective patches, reflective hemispheres, paint dots, and the like, may require intense illumination, illumination with infrared or other frequencies not visible to the human eye, cameras sensitive to infrared light, or other specialized photography equipment to be effective. Capture targets may interfere with the appearance or responses of the motion capture subject. A capture target may be blocked from the field of view of a camera when a motion capture subject moves about, possibly impairing accurate motion capture. For example, one or more capture targets may be occluded by another target, by a limb or other part of a motion capture subject's body, or by an object near the motion capture subject. Capture targets on the front of a person's torso may be obscured from the camera's position when then person turns his back to a camera, preventing accurate motion capture. Target occlusion is a well-known problem in prior art systems and leads to the use of more cameras, longer post-processing, and possibly artistic limitations in the scenes which can be created. The greater the number of capture targets a motion capture system uses for forming a graphical model, the more likely the occurrence of target occlusion from one or more cameras. Motion capture systems using triangulation of capture targets have been too complicated to set up and operate, too expensive, and map a biomechanical skeleton or a character into a scene too slowly for mass-market applications such as computer games.
Other motion capture systems attach one or more position sensors to limbs, joints or other reference positions to be represented in a graphical model of a motion capture subject. For motion capture systems previously known in the art, each separately movable portion of an articulated model may use a separate position sensor to measure the movement and position of the corresponding part of the subject's body. Parts of the subject's body which are collectively represented by one sensor may be positioned inaccurately in the resulting graphical model. For example, placing one sensor on a subject's wrist may allow a model wrist to mimic the subject's motions, but the model's elbow may not move the same way as the subject's elbow unless another sensor is placed on the subject's elbow.
Some motion capture systems require a person to wear an articulated frame for measuring angles between parts of a limb, spine, torso, or other parts of a person's body. Examples of articulated frames and biomechanical skeletons are described in U.S. Pat. No. 5,826,578, although articulated frames and biomechanical skeletons may take other forms. Articulated frames may be useful for measuring relative limb angles but do not provide direct measurement of translational changes in limb position, that is, displacements with a component of motion parallel to one or more of the three conventional spatial axes in a motion capture coordinate system. The articulated frame may be susceptible to damage during vigorous activity, may interfere with a person's speed of motion or impair a full range of motion, and may have a visual appearance that detracts from a preferred aesthetic effect in a camera image.
A biomechanical skeleton may model a motion capture subject as a combination of rigid links joined to one another by rotatable joints. An image from a camera of a motion capture subject may be analyzed to map selected locations in the image to joints and links in the biomechanical skeleton. Images may be combined with data from inertial measurement sensors, accelerometers, or articulated frames to assign positions and lengths to links, positions and angles to joints, and positions and postures for a biomechanical skeleton. However, sensors used for measuring position data, direction of motion, or angles may be subject to measurement error and drift. Measurement errors may be cumulative, especially for repetitive motions such as walking, leading to cumulative errors in the location of a biomechanical skeleton relative to other objects or to an absolute position reference, and possibly leading to errors in relative positions or angles between parts of the skeleton. Cumulative errors may cause an abrupt, undesirable jump in the position of a biomechanical skeleton or of part of the skeleton such as a foot or hand. Or, cumulative errors may cause a biomechanical skeleton to be positioned incorrectly in a scene, for example with part of a character's foot below the surface of the floor or with a character's hand intersecting the volume occupied by another solid object in the scene. Cumulative errors may prevent a biomechanical skeleton from achieving a preferred posture or arrangement of limbs or may locate the skeleton incorrectly relative to other objects in a scene. For example, a motion capture subject may rise from a chair, walk around a table, and return to the chair, but a biomechanical skeleton executing the same sequence may end the series of motions by stopping in a seated position in empty space near the chair or with part of a leg from the skeleton occupying the same volume as a solid part of the chair.
Motion capture accuracy, for example accurate determination of link lengths and joint positions in a biomechanical skeleton, may be improved by determining a distance from the motion capture subject to the camera used for recording images of the motion capture subject. Some motion capture systems use a noncontact distance measuring instrument that measures the time of flight of a radio frequency pulse or acoustic pulse to determine a separation distance between the camera and a reference location on a motion capture subject. The distance between the motion capture subject and the camera may be referred to as the camera-subject distance or the object distance. The distance measuring system may measure an incorrect camera-subject distance when the reference location is blocked from the field of view of the measuring instrument. Systems using triangulation may report incorrect camera-subject distances when a capture target on a motion capture subject is not visible from the viewing angle of a motion capture camera. For example, a person may interpose a hand between a motion capture camera and a reference position on the person's body, preventing the camera from viewing the reference position and preventing accurate motion capture.
An example of an apparatus embodiment includes a scale frame having at least three struts and at least four calibration markers. One of each of the at least four calibration markers is attached to an end of each of the at least three struts and the at least three struts are joined at right angles to one another by one of the at least three calibration markers. The apparatus embodiment further includes a camera and a computer implemented in hardware. The computer is in data communication with the camera. The computer is adapted to receive an image from the camera, convert the image to a silhouette, and extract parameters for a biomechanical skeleton from the image. The apparatus embodiment optionally includes a motion capture sensor in data communication with the computer.
An example of a method embodiment includes positioning a camera facing a scale frame with an optical axis for a lens on the camera horizontal and directed at a front side of the scale frame; positioning a motion capture subject inside the scale frame; and recording at least two images, each image including the motion capture subject and the scale frame. The example of a method embodiment further includes converting a first image of the motion capture subject to a first silhouette image; converting a second image of the motion capture subject to a second silhouette image; assigning a first biomechanical reference location for a biomechanical skeleton from a comparison of the first silhouette image to the second silhouette image; and assigning a second biomechanical reference location for the biomechanical skeleton from a comparison of the silhouette image to the second silhouette image. The example of a method embodiment also includes connecting a link in the biomechanical skeleton between the first and second biomechanical reference locations; assigning the projected length of the link from the positions of the first and second biomechanical reference locations measured from the first and second images of the motion capture subject; measuring the projected length of a selected strut on the scale frame in the first and second images; determining a true length of the link from the projected length of the link and the projected length of the strut in the first image and the projected length of the strut in the second image; and assigning the true length of the link to the biomechanical skeleton.
An embodiment, also referred to herein as a motion capture system or mocap system, employs one camera to record a sequence of images of a motion capture subject, for example a person. Images from the sequence may be processed to produce a corresponding sequence of silhouettes of the motion capture subject. Each silhouette is processed to assign values to parameters for a graphical model capable of accurately emulating selected body positions, postures, and motions performed by the motion capture subject. The graphical model, also referred to as a biomechanical skeleton, is calibrated by an apparatus embodiment to accurately simulate motions that could be made by the motion capture subject. An accurate camera-subject distance for each silhouette may be determined from the calibrated biomechanical skeleton. Depth cues may be determined for different parts of a silhouette from the biomechanical skeleton. One or more motion capture sensors may optionally be included to improve positioning accuracy of the biomechanical skeleton or graphical model in a scene. The motion capture sensors may provide a real-time position estimate of the motion capture subject while the subject's movements are being captured.
An apparatus embodiment includes a scale frame having known linear dimensions for measuring the lengths of objects in or near the frame and for calibrating images from a camera and a computer system implemented as hardware to analyze images collected by the camera. By comparing known sizes of scale frame components to sizes of the same frame components measured in captured images, dimensions, angles, and positions of objects in the frame, adjacent the frame, or within a known distance of the frame in captured images may be determined accurately. Examples of parameters which may be determined accurately from an image of a motion capture subject include, but are not limited to, limb angles, limb length, joint position, limb and joint positions with respect to an absolute position reference, limb angles, and distances traversed by the motion capture subject or by parts of the subject's body. After calibration is performed, the frame may be removed from the scene and distances traversed by the motion capture subject, positions of the subject relative to other objects in a scene, and positions of limbs and other parts of the person's body may be determined with high accuracy.
Embodiments are capable of making a new, accurate measurement of camera-subject distance for each image of a motion capture subject in a sequence of camera images. A measured camera-subject distance may be compared to a calculated camera subject distance to detect and remove accumulated errors in the position or posture of a biomechanical skeleton, thereby improving motion capture accuracy compared to motion capture systems previously known in the alt.
Embodiments are well suited to real-time motion capture and display of mapped images. An embodiment is considered to be real time because capture, processing, and display steps can be performed on each frame in a sequence of image frames streamed at conventional video display rates in television images, computer games, and video recordings.
The model used in an embodiment, also referred to as an actor file, represents a person as an articulated biomechanical skeleton comprising rigid links joined to one another at biomechanical reference locations. A biomechanical reference location may also be referred to as a biomechanical joint centroid. Some biomechanical reference locations represent the position of a joint in a human skeleton, for example the position of a wrist joint, knee joint, or hip joint. Other biomechanical reference locations represent a length, width, or thickness of part of a human body, for example the length of the upper arm or the separation distance between two reference points on a spine. A biomechanical reference location may optionally represent a compound structure comprising more than one joint or more than one link. For example, a single biomechanical reference location may be assigned to represent a human hand. A biomechanical skeleton used in a model may have different articulation and possibly different connections between joints than a human skeleton.
Parameters to be supplied to an actor file are collected by recording a sequence of images from a person who follows a sequence of motions for each extremity to be captured in the actor file while maintaining close proximity to the scale frame. Following a sequence of isolated motions improves model accuracy and reduces cumulative error in the positions and angles of limbs and other body parts represented in the model. Each image to be analyzed is converted to a silhouette representing the edges of the motion capture subject's limbs, torso, head, and other parts of the subject's body. Biomechanical reference locations may be placed on each image at the ends of extremities, for example the top of a person's head or the bottom of the person's heel, at the centroid of each area determined to represent a skeletal joint on the motion capture subject, on a position selected to represent a complex structure such as a hand, or at any location on the biomechanical skeleton that may be used to represent the position of the person's body with respect to some external position reference, such as the origin of a coordinate system or the position of another object in the field of view of the camera. Embodiments may optionally be adapted to capture images and extract parameters for use in commercially available biomechanical models.
An example of an apparatus in accord with an embodiment appears in
The camera 114 includes a lens 126 with an optical axis 128 positioned at a height 120 above a horizontal reference surface 156 parallel to the XY plane and tangent to the bottom side of a scale frame 102. The camera may optionally be mounted on an adjustable-height tripod 116 or similar camera support. The camera lens 126 is separated from a front side of the scale frame 102 by a separation distance 118. The optical axis in the example of
A motion capture subject 148 stands with back and legs straight inside an example of a scale frame 102 in
A computer 122 receives images 162 captured by the camera 114 over a data communications connection 124. The computer, a computing device implemented in hardware, includes volatile and nonvolatile memory, a central processing unit (CPU) comprising semiconductor devices, at least one data input device such as a keyboard or mouse, and an image display, for example a liquid crystal display, a plasma display, or a light-emitting diode display. Examples of a data communication connection between the computer 122 and camera 114 include, but are not limited to, a wired connection, a wireless connection, a computer network such as a local area network, and the Internet. Alternatively, the computer 122 may receive images from the camera 114 on nonvolatile computer-readable media such as an optical disk, a magnetic disk, magnetic tape, a memory stick, a solid-state disk, or the like.
In the example of
The calibration markers 106 at each corner of the scale frame may all have a same diameter 130 or may alternatively have different diameters. The diameter 106 may be selected to raise the bottom side of the scale frame sufficiently to permit a person's foot to slide under a strut 104, thereby permitting the person to position their legs and torso as close as possible to the plane of the front side of the scale frame, where the front side of the scale frame is the side closest to the camera 114 and approximately perpendicular to the optical axis 128 of the camera lens 126.
The scale frame 102 in the examples of
An image captured by the camera may be processed to extract parameters for an actor file.
A biomechanical reference location 152A may represent a complex combination of links and joints. For example, reference location 152A in
A length dimension may be assigned to each link from the coordinates of the biomechanical reference locations at opposite ends of the link. A calibrated biomechanical skeleton comprises a measured coordinate position for each biomechanical reference location in the skeleton, possibly referenced to a root position for the skeleton, and may include length and orientation properties for each link and optionally a range of angular motion for each link and joint. A position may be calculated for each biomechanical reference location at an end of a segment by comparing sequential images if the motion capture subject having a different rotational position of a distal end of the segment in each of the sequential images.
An embodiment may optionally include any one or more of the following steps for calibrating a biomechanical skeleton, in which spatial directions are defined with respect to the orientation of the x-, y-, and z-axes as shown in
positioning the camera 114 along the y-axis at a distance 118 selected to fit the motion capture subject 148 into the camera's field of view;
positioning the camera 114 at a height 120 of about half the height of the subject 148;
viewing a video image output from the camera 114 on the image display for the computer 122;
as shown in the example of
positioning the motion capture subject 148 with the right edge of the right foot in contact with the lower right front calibration marker 136 on the scale frame 102, the upper right front calibration marker 132 visible in the camera image, and the front of the subjects torso, hands, and legs as close to the plane of the front side of the scale frame as possible;
converting the image of the motion capture subject (for example image 162 in
overlaying an actor file, optionally an actor file compatible with a commonly used data format such as Biovision Hierarchical Data (BVH), on the silhouette;
optimizing the positioning of biomechanical reference locations on the silhouette 150;
as shown in the example of
repeating the step of optimizing the positioning of biomechanical reference locations;
wherein the step of optimizing further comprises any one or more of the following steps, singly or in combination, optionally performed with the right side of the torso facing the camera or alternatively with the subject in the initialization pose:
flapping hands around an axis parallel to the optical axis of the camera lens;
flapping upper arms the axis parallel to the optical axis of the camera lens;
raising arms to a horizontal position, also referred to as a “T=pose”, and raising shoulder tips (clavioscapular) along the axis parallel to the optical axis of the camera lens while keeping arms parallel to the ground;
relaxing (dropping) shoulder tips;
returning to the T-pose without using clavicles and then rotating only the elbows around an axis parallel to the optical axis of the camera lens;
returning to the initialization pose;
rotating the head and neck about a horizontal axis parallel to the optical axis of the camera lens;
rotating the rib cage about a horizontal axis parallel to the optical axis of the camera lens;
moving the torso from the waist up in rotation about a horizontal axis parallel to the optical axis of the camera lens;
putting weight on the right foot, slipping the left foot out from under the strut at the front of the scale frame by bending the left knee and straightening the left knee after the left foot has passed the lower front left calibration marker;
raising and lowering the left upper thigh in rotation about a horizontal axis parallel to the optical axis of the camera lens;
raising the left foot slightly higher than the lower front left calibration marker and rotating the ankle;
putting weight on the left foot, slipping the right foot out from under the strut at the front of the scale frame by bending the right knee and straightening the right knee after the right foot has passed behind the upper front right calibration marker;
raising and lowering the right upper thigh in rotation about a horizontal axis parallel to the optical axis of the camera lens;
raising the right foot slightly higher than the lower front right calibration marker and rotating the ankle;
raising the right upper arm with relaxed clavicle joint, stiff elbow, and stiff wrist while keeping the arm parallel to the ground and thumbs pointed up, then rotating the wrist about an axis parallel to the optical axis of the camera lens;
raising the clavicle so the scapula rotates around the a horizontal axis parallel to the optical axis of the camera lens;
keeping arms parallel to ground and pointing forward, pushing the clavicles forward and back, corresponding to rotation about the z axis;
slouching forward to rotate the ribcage around a horizontal axis parallel to the optical axis of the camera lens;
rotating the head and neck forward and back around a horizontal axis parallel to the optical axis of the camera lens;
bending from the waist forward around a horizontal axis parallel to the optical axis of the camera lens;
putting weight on the left foot, moving the right foot from under front lower strut and raising the right leg from behind, raising the leg as far as possible without moving other limbs, around the right upper leg's pelvis joint;
without bumping the scale frame, raising the right leg without bending the knee joint, toes pointed up, preferably moving the leg in rotation about an axis parallel to the optical axis of the camera lens, and preferably keeping the torso and pelvis stationary;
with elbows locked, swinging both arms in rotation about a horizontal axis parallel to the optical axis of the camera lens;
raising the right knee so the thigh is parallel to the ground and motionless, then rotating the lower leg around a horizontal axis parallel to the optical axis of the camera lens;
subtracting the radius of motion for the lower leg from the radius of motion for the entire leg;
keeping the thigh parallel to the ground with the lower leg dangling down, rotating the ankle joint around a horizontal axis parallel to the optical axis of the camera lens;
activating a warning to the subject when unwanted motions in model segments are detected;
calculating a position for each joint from a radius created by the rotation of each segment's distal end; and
after calculating a position for each joint, assigning a length for each link between adjacent joints by comparing a measured distance between joints to a known dimension on the scale frame.
Conventional optical principles permit calculation of a value for camera-subject distance for an object having a known dimension from measurements of the corresponding dimension on an image of the object and parameters of an optical system used to make the image. For example, a value for camera-subject distance (also referred to as “object distance”) may be determined from values for image distance, image height, and object height, or from angular resolution values applicable to a particular combination of image sensor pixel size, pixel counts, and lens focal length. A camera-subject distance may be calculated by comparing the height of a silhouette in a camera image to the known height of a motion capture subject standing in an initialization pose, for example a posture with the back, legs, and neck straight. However, when a motion capture subject is in a posture with flexed knees, a bent neck, or a bent torso, as may occur during running, sitting, jumping, and so on, the measured height of a silhouette in an image collected in a prior-art motion capture system may not be related to the height measured when the subject was standing straight. Occlusion of a motion capture target may prevent a prior art motion capture system from making any determination of limb and joint positions and would therefore prevent determination of camera-subject distance.
Embodiments are capable of determining a value for camera-subject distance by using the calibrated biomechanical skeleton to compensate for the posture of the motion capture subject by calculating an accurate value of object height from scaled values of link lengths in a biomechanical skeleton overlaid on a silhouette of the subject. The value of object height that applies to the particular posture of the motion capture subject may be entered into a conventional lens formula with image height measured from a silhouette and image distance determined by camera parameters to calculate camera-subject distance. A scaling ratio may be determined by dividing the measured length of a link in a biomechanical skeleton overlaid on the silhouette by the true length of the corresponding link in the calibrated biomechanical skeleton. A separate scaling ratio may apply to each link in the biomechanical skeleton overlaid on the silhouette. Measurements of the length of each link's z-axis component (i.e., the component contributing to height) in the image may be scaled and summed to give an overall dimension in the direction of the z-axis for the motion capture subject that may be used with the image height measured from the silhouette to calculate a subject-lens distance.
In
An alternative method embodiment includes one or more of the following steps:
capturing a first sequence of images of a motion capture subject;
for each image in the first sequence, determining a silhouette for the motion capture subject;
determining a calibrated biomechanical skeleton from the sequence of silhouettes;
capturing a second sequence of images of the motion capture subject, each image optionally at a different camera-subject distance and each image optionally representing a different posture of the motion capture subject from the preceding image; and
for each image in the second sequence of images:
forming a silhouette of the motion capture subject;
mapping an image biomechanical skeleton onto the silhouette;
determining an image height from the image biomechanical skeleton;
determining an object height from the image biomechanical skeleton and the calibrated biomechanical skeleton; and
calculating a camera-subject distance from the image height and object height for each image.
Some embodiments are capable of resolving positioning ambiguities resulting from similar link projections for different body postures by comparing biomechanical reference locations 152 to measurements from at least one motion capture sensor worn by a motion capture subject. An example of a motion capture sensor 170 in accord with an embodiment is shown in the prior art illustration of
Motion capture sensors may be worn by a motion capture subject at positions corresponding to biomechanical reference locations. For example, a motion capture sensor 170 is shown at the biomechanical reference location 152H in
Unless expressly stated otherwise herein, ordinary terms have their corresponding ordinary meanings within the respective contexts of their presentations, and ordinary terms of art have their corresponding regular meanings.
Number | Date | Country | |
---|---|---|---|
61895052 | Oct 2013 | US |