The present invention relates generally to motion capture. More particularly, the present invention relates to a system and method for gathering and processing motion capture data.
Motion capture technology has been used for some time to generate data and images based on a user's body position and movement. More recently, this same motion capture technology has been combined with feedback systems to provide the user with position and movement correction.
Work in the area of motion guidance can include a range of motion capture and feedback methods and displays. However, many of these systems can be cumbersome, complicated to use, and prohibitively expensive for widespread use. In one example, a full-body vibrotactile display for correcting posture in sports uses spatial cues activated remotely by an instructor. In another example, a tactile sleeve for collision avoidance and skill training in virtual environments, uses infrared motion capture, and in another a wearable jacket with vibration motors for correcting posture and violin bowing technique uses inertial motion capture.
More particularly, one example includes an upper-body suit with voice coil actuators for vibrotactile feedback for 5-DOF motion of the arm. Using an optical tracking system, a user's motion was compared to the desired trajectory of a teacher and vibrotactile feedback proportional to joint error was delivered simultaneously to the wrist, elbow, and shoulder to help the user mimic the teacher's motion. Sensory saltation, requiring a sequence of spatially and temporally specific vibration pulses, signaled joint rotation. In a motor task, it was found that tactile feedback improved performance by 27%, most significantly at flexion joints of wrist and elbow, and further, improved task learning by 23%.
In another of the examples, the system was focused on regulating the orientation of the forearm in Cartesian space, using a magnetic tracking system. Tactile feedback was delivered to the wrist via an instrumented bracelet with four motors oriented in quadrants. Vibration proportional to error was delivered in the direction of suggested movement. In the task of 2-DOF forearm orientation, a combination of visual and tactile feedback was found to be most effective, where the former dominated the initial trajectory and the latter corrected small-scale positioning error.
In another example, the spatially distributed tactile feedback sleeve incorporated sensing via a magnetic system and delivered feedback to 4 DOF of the arm. Eccentric mass motors provided localized, repulsive vibration that “pushed” users towards the correct posture or motion trajectory. Vibrational amplitude was proportional to errors in joint space and an angular tolerance dictated the acceptable joint regions. This system targeted apraxic stroke patients who may benefit from haptic versus visual stimuli in virtual reality interaction.
It would therefore be advantageous to provide a system and method that would gather motion capture data and provide feedback to the user in a cost effective manner.
The foregoing needs are met, to a great extent, by the present invention, wherein in one aspect depth imaging data from a single range camera is analyzed in conjunction with data from inertial sensors to capture a user's position with six degrees-of-freedom, wherein three degrees-of-freedom are captured from the position and three degrees-of-freedom are captured from the orientation.
In accordance with one aspect of the present invention, a system for motion capture of a user includes a single range camera configured to collect depth and color data and inertial sensors configured to collect inertial motion data. The system can also include a computer readable medium configured to analyze data from the single range camera and the inertial sensors in order to determine a position of the user.
In accordance with another aspect of the present invention, the computer can be programmed to compare the position of the user to data for an exemplary position or motion. The data for an exemplary position can include data from a teacher also using the system for motion capture and predetermined data for the exemplary position stored on a computer readable medium. Additionally, the computer can be programmed to provide a calculation of the error between the position of the user and the data for the exemplary position. The system can also include a device for providing feedback to the user to correct the error between the position or motion of the user and the data for the exemplary position or motion, and the device for providing feedback to the user can include feedback in the form of at least one of the following: visual feedback, tactile feedback, vibrotactile feedback, and sound feedback.
In accordance with yet another aspect of the present invention, the system for motion capture can include an eccentric mass motor for providing feedback to the user. The inertial sensor takes the form of an accelerometer, and more particularly the accelerometer further takes the form of a 3-axis accelerometer. The inertial sensor is positioned on a body part of the user for which it is to collect inertial motion data. Specifically, the inertial sensors are positioned to collect inertial motion data for a whole body of the user, and can be mounted in bands to allow the inertial sensors to be secured to the user. The inertial sensors are further positioned at a distal-most point of a limb-segment of the user. The system for motion capture can also include a visual display. The visual display is further configured to show the user a visual representation of the user's movement and configured to show the user a visual representation of the user's movement relative to an exemplary movement. Additionally, the system for motion capture includes a microcontroller. The microcontroller is configured to control the inertial sensors as well as feedback to the user.
In accordance with another aspect of the present invention, a method for motion capture of a user includes collecting data related to the user's motion using a single range camera and collecting inertial motion data for the user's motion. The method also includes analyzing the data from the single range camera and the inertial motion data in order to determine a position of the user.
In accordance with yet another aspect of the present invention, the method further includes comparing the position of the user to data for an exemplary position. The methods also includes calculating error between the position of the user and the data for the exemplary position. Additionally, the method includes providing feedback to the user to correct the error between the position of the user and the data for the exemplary position and providing visual feedback to the user.
The invention will now be described with reference to the drawing figures, in which like reference numerals refer to like parts throughout. An embodiment in accordance with the present invention provides a motion capture system that can be coupled with a feedback system. The system can be used for collecting motion capture data about a user's position and/or movement and comparing that data to an exemplary position or movement to determine errors in the position or movement. The system can include a single range camera to provide color and depth data coupled with inertial sensors to provide inertial measurements. Additionally, the system can provide feedback to the user to encourage correction of the errors the system finds with respect to the user's position and/or movement.
As illustrated in
Also illustrated in
Additionally, as illustrated in
Also, as illustrated in
As illustrated in
The data from the inertial sensors 30, transmitted by the microcontroller 31, is also processed by the host computer illustrated in
Complete 6-DOF joint sensing was realized via an algorithm that combines 3D single range camera position information with inertial sensor orientation data. The pose of each joint is described by a homogeneous transform (rotation matrix and translational vector) between world (single range camera) and joint frames. The rotation matrix is used to calculate joint error, whereas the translational vector is used to plot the joint coordinate frames at the proper location on the visual skeleton. Joint frames are shown during system testing only. The algorithms for measuring translation and rotation of torso and limbs are described, herein, below.
The translation vector from world frame, W, to joint frame, J, was measured directly by the single range camera and existing software in the form of x, y, and z joint coordinates:
W
T
J
=[x
Kinect
y
Kinect
z
Kinect]T (1)
The rotation of the torso frame with respect to the world frame was computed from the torso center PC, neck PN, and left shoulder PLS skeleton points. It was assumed that the torso moved as a single plane connecting these points. The form of the torso rotation matrix was:
For both shoulder and elbow joints, the rotation of the joint frame with respect to the world frame, WRJ, was constructed from the multiplication of two rotation matrices individually derived from single range camera and inertial sensor data:
In this notation, W, J, and Ji represent the world, joint, and joint intermediate frames, respectively, and cosine and sine functions are abbreviated as c and s, respectively. WRJi is defined by the rotations about the world X-axis by θ and about the world Y-axis by φ; JiRJ describes rotation about the intermediate Z-axis by ζ. The Z-axis of the joint frame is constrained to be aligned with the limb axis, such that the Z-axis of the joint frame with respect to world frame, WZJ, relates to the limb axis by the rotation, WRJi:
WZJ=WRJi JiZJ (2)
Here, WZJ=px py pzT is the normalized vector from joint n to joint n+1 and JiZJ=0 0 1T is the Z-axis of the joint frame n. Substitution of (7) in equation (9) yields the relationship between the position vector and the third column of the rotation matrix, WRJi:
By equating the left and right sides of this equation, the equation can be sequentially solved for θ and φ:
θ=α tan 2(−py, pz) (11)
φ=α tan 2(cθpx, pz) (12)
Inertial sensor data was used to directly identify the last rotation angle, ζ. Given that the total acceleration is pointing downwards, in the −Y direction of the world frame, WA describes the magnitude of total acceleration in the world frame, defined as WA=0−A 0T. The reading from the three-axis inertial sensor is the total acceleration written in the joint frame, JA. The two are related by the rotation from world to joint frame:
JA=JRwWA
(13)
Further, by expressing JRw as (WRJ)T, equation (13) becomes:
The measured acceleration data is then equated to the scaled rotation matrix column:
To solve for ζ:
ζ=α tan 2(cθαxsφsθαy, sφsθαx+cθαy) (16)
Finally, the rotation matrix describing the world frame written in joint frame is calculated:
JRw=Rz(ζ)TRy(φ)T Rx(θ)T (17)
For every recorded time step, a frame is generated for each joint, including (1) torso, (2) right shoulder, (3) left shoulder, (4) right elbow, and (5) left elbow.
When a user is in a desired pose, a key press activates the recording of all joint frames. This recorded pose is assigned as the prior. For every future time step, comparisons are made between this prior and the current pose, denoted as the active. The joint angle error, JRerr, between prior and active frames is found for each joint:
where, JRerr is the joint error, JRw is the active joint frame, and JRw is the prior joint frame. For the calculation of torso error:
JRerr=TorsoRwW {circumflex over (R)}Torso (20)
X-Y-Z fixed angles (γ, β, and α, respectively) are calculated directly from JRerr. For each time point when active is being compared to prior, a total of 15 angular errors are calculated.
For both
With respect to
With respect to the potential for correction and the examples shown in
To identify a maximum error, each joint DOF is assigned a threshold value that creates a deadband for angular misalignment, specifying when a joint is close enough to the target position. Angular thresholds, shown in Table 1, below, ranged from ±3° for the highly sensitive torso to ±15° for shoulder rotation about Z. For each iteration of the control loop, all above-threshold joint errors are identified.
Joint errors are then weighted with respect to 1, allowing the prioritization of the correction of some joints over others. For the application of standing postural correction, preferably, trunk stability and proximal joint stability can be prioritized to maintain stability about the trunk and prioritize proximal joints. Torso errors are weighted most heavily by applying a scaling factor of 4. Shoulder error is prioritized over wrist error, and further, wrist errors about the X and Y axes are weighted more heavily compared to wrist rotation about Z. The weighted values of all above-threshold errors are sorted, and the maximum is defined.
The appropriate motor or sequence of motors, described above with respect to
An experiment was performed to evaluate the accuracy of the combined single range camera and inertial sensor sensing by comparison to the Optotrak Certus optical tracking system (Northern Digital Inc., Waterloo, Canada). The Optotrak, widely used in research, provides 6-DOF sensing using bodymounted infrared LED markers and camera sensors. The system is accurate to within 0.1 mm, with a sensing resolution of 0.01 mm.
As illustrated in
A human user study of movement performance was also conducted with haptic feedback (H) in comparison to no feedback (N) and visual feedback (V). Subjects were trained on the present bands system and made targeted, single-DOF movements with the torso and arms across the three feedback modalities. The participants were 12 right-handed subjects (8 male and 4 female) between the ages of 22 and 41, with an average age of 27. To investigate the effect of hand dominance on movement performance, subjects were divided into two groups, each n=6, that performed arm movements using the dominant or non-dominant arm.
Prior to training, each subject was fit with the present bands system and positioned facing a visual display screen on which the single range camera was mounted, as shown in
After training, subjects performed the experiment with the four 1-DOF movements indicated in
In the recorded trials, natural movements to each target were made in random order; movement recording was ended when the subject vocalized that he believed he had reached the target. This task was repeated across the three feedback conditions, presented in random order to counteract any learning effects:
The procedure was repeated for each of the four movements with rest periods enforced to reduce fatigue. End-point error was measured for each trial. Error was computed as the absolute value of angular difference between current and target position at the end of movement and was averaged over the 4 targets. After trials, subjects completed a survey indicating the intuitiveness of the haptic feedback patterns and the comfort of the system.
The sensing error of the system ranged from approximately 1° to 12°.
Plots of average error for each movement and feedback condition are shown in
Results for lateral torso bend (F(1.05; 11.60)=43.49; p=0.000026; {acute over (ε)}=0.5272) and torso twist (F(1.03; 11.34)=17:66; p=0:0013; {acute over (ε)}=0.5153), showed a significant difference in average error across feedback conditions. Torso movements made with N produced largest endpoint errors between approximately 5° and 10°.
Pairwise F-tests of group differences with Bonferroni adjustment (αAdj=0.0167), confirmed that average errors for V and H conditions were significantly less than error with N. A two-tailed Student's t-test between average error for torso rotation and torso bend with H, indicated comparable performance of single-axis and rotational cues (p=0.121).
For arm movements, a mixed ANOVA test with {acute over (ε)} adjustment was done using arm group as the between-subjects factor (levels: dominant and non-dominant) and feedback condition as the within subjects factor. Both main effects and their interaction were statistically analyzed.
In arm raise, arm dominance (p=0.0365), feedback condition (p=0.0003), and their interaction (p=0.0294) were all significant factors influencing average endpoint error. To further measure degree of effect, {acute over (ω)}2 was calculated for each factor, resulting in feedback condition having the greatest influence on data variance ({acute over (ω)}ond=0.6182).
Investigation of sensing accuracy showed that the system can most accurately capture torso movements as compared to shoulder and elbow movements. This difference in accuracy across joints may be explained by (1) inherent inertial sensor sensing limitations or (2) the method for computing relative joint angles using the Optotrak.
Torso position relies solely on the single range camera and vector computation, whereas shoulder and elbow measurements incorporate inertial sensor sensing that is susceptible to drift and dependent on limb position. While recorded movements were made with the arms stretched outwards to user's front or side, inertial sensor sensing breaks down as the arm approaches a vertical pose. When completely vertical, the gravity vector projects only on the longitudinal Z-axis of the arm, yielding too little information to determine orientation.
Potential alternatives for resolving inertial sensor error could involve more aggressive filtering or the implementation of an IMU which incorporates gyroscopes to determine rotation; the tradeoff is both higher cost and introduction of accumulated error.
More likely, discrepancy between single-range-camera-tracked body points and Optotrak marker locations lead to an offset or scaling error, especially in forearm rotation. The position of the “wrist” tracked by the single range camera actually can represent position localized within the point cloud of the hand. It is possible that the single range camera tracks a more distal position, compared to Optotrak wrist markers, resulting in further rotational misalignment. The Microsoft Kinect SDK (software development kit) that was released in July 2011 has the capability of tracking separate coordinates for the hand and wrist. Integrating this software in the next project iteration may resolve these accuracy flaws.
Further, in this initial test, error was based on a single recorded data point. More thorough investigation of error across multiple trials and movements that begin and end in the same position would better address both sensing accuracy and repeatability. For the application of postural correction, repeatability may be an important parameter, ensuring that the target pose remains constant in relation to the world frame over time. The Optotrak measurements may not be perfect, with a claimed accuracy of 0.1 mm.
With respect to human performance, the results of all single-DOF movements showed that without feedback, large errors are made across all joints, as users have imperfect proprioceptive memory of joint position. Users commented that without feedback, quick, but less accurate movements were made. The fact that haptic feedback was as effective as visual feedback in all movements indicates that in posture holding, where movement accuracy is a priority, the present bands system can be used as a substitute for visual feedback.
While endpoint error was the single metric under investigation in this study, user comments indicated a focus on the time-scale of response. With visual feedback the tendency was to “quickly overshoot the target and come back,” using visual cues for position refinement close to the target, but movements with haptic feedback were more “gradual and slow,” requiring users to wait for directional signals. Users felt that the haptic feedback introduced a delay in movement; one user suggested the presentation of a continuous vibration signal over pulsed vibration. These comments indicated that movement response time is another important variable for future exploration.
When rating intuitiveness of the haptic displays, users scored single-axis feedback at 8.8 out of 10, compared to 7.1 for rotational (saltatory) feedback. Responses for rotational feedback ranged from 4 to 10, indicating that some users found it very difficult to distinguish directionality (especially at the wrist), while others thought it was easy to follow. Large variance in user opinion suggests the need for tuning rotational speed to match individual sensitivity. While rotation was harder to learn for some, average error with rotational and single-axis haptic displays was comparable in torso movement, indicating user ability to adapt to rotational cues. Diagonal feedback about 2 axes was rated a 7.6, averaged across the 10 subjects that engaged this type of feedback. While the diagonal pattern had a favorable response, it was suggested that additional motors would improve the directional resolution of the feedback.
Arm dominance played a role only in arm raise error but not on forearm rotation error. Forearm rotation required only small angular deviations of the wrist, whereas arm raise required displacement of the whole arm mass which may trigger greater activation of joint sensors and proprioceptive cues. Within the arm raise movement, arm dominance improved error in the no feedback condition. This result contradicts previous studies that found no significant difference in proprioception between dominant and non-dominant shoulders for normal subjects. It is possible that “the added mass or the setup of the experiment” lead to this difference. When only haptic feedback was considered, arm dominance did not have a significant effect and vibrotactile cues are equivalently interpreted by both arms.
Improved accuracy and positive user-feedback in the human performance study suggest that the present bands system may be an effective tool in guiding pose. Subjects found the system to be favorable in terms of overall comfort (7.9) and ability to move without restriction (8.5). Future testing is planned to evaluate the effectiveness of haptic feedback in maintaining challenging user poses such as those required for yoga. Two considerations in the design of this study will be refinement of the error correction algorithm and implementation of directional based visual feedback. According to one user, haptic display for a single joint axis was frustrating when he got “stuck” trying to correct a single error despite the drift of other joints. This suggests an alternative correction scheme where joint errors may be corrected in groups. Presentation of grouped errors still reduces the high cognitive load required to attend to multiple joints at once. Secondly, in comparing the feedback modalities, visual feedback included only a color change to indicate joint misalignment, whereas haptic feedback included a directionality component that relayed how to move to correct error. According to post-training comments, visual feedback without such directionality was more difficult to interpret. In future studies, directional-based visual cues are necessary as a better comparison to vibrotactile feedback, perhaps in the form of a color-based paradigm.
The novel algorithm for the fusion of single range camera and inertial sensor sensing was implemented, found to be accurate for capturing torso, shoulder, and elbow joint rotation to within 4°, 12°, and 10°, respectively. In the scope of implementation, pose correction is most accurate when the wrist remains fixed and body segments remain un-occluded. Human performance testing indicated that haptic feedback was as effective in reducing targeting error as compared to movements made using visual feedback. Improved movement accuracy across all joints was observed using vibrotactile feedback.
The next step in system testing and design is investigating the effectiveness of haptic feedback in a pose-maintenance task. Though there are many potential applications for a posture-guidance system, a possible avenue for application-based testing will be instruction for yoga pose. Collaboration with yoga experts is already in progress.
The bands system can also be implemented as a full body wearable system that includes the legs and feet. Resolution of inertial sensor sensing limitations, integration of additional control hardware, and implementation of wireless solutions will guide the expansion of the system to the lower body. Haptic display will continue to be refined through the use of more sophisticated tactors and control that address the user feedback collected in this study. It is also possible, in a classroom learning environment, the single range camera's ability to simultaneously track both teacher and student in real-time, could be used to compare student's pose to that of a teacher's, instead of to his own prior.
While this system has been described for use with positions and postures, such as for yoga, it is possible that this system could be applied to provide motion capture and feedback numerous activities including but not limited to: sports, rehabilitation, dance, occupational therapy, physical therapy, entertainment, and gaming.
The many features and advantages of the invention are apparent from the detailed specification, and thus, it is intended by the appended claims to cover all such features and advantages of the invention which fall within the true spirit and scope of the invention. Further, since numerous modifications and variations will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.
This application claims the benefit of U.S. Provisional Patent Application No. 61/604,244 filed Feb. 28, 2012, which is incorporated by reference herein, in its entirety.
Number | Date | Country | |
---|---|---|---|
61604244 | Feb 2012 | US |