Spatial Alignment of Inertial Measurement Unit Captured Golf Swing and 3D Human Model For Golf Swing Analysis Using IR Reflective Marker

Description

BACKGROUND

As an increasingly popular sport, golf has attracted millions of people around the world. Athletes and amateurs are always looking for ways to improve their skills. Sensor based golf coaching systems are commercially available. One such system provides an IMU (inertial measurement unit) sensor, denoted as M-Tracer™, on the golf club. The sensor tracks the golf club and outputs a high frequency swing trajectory as well as many other metrics such as impact speed, shaft angle etc. Although the sensor based golf coaching systems provide useful information, it is still difficult for a normal user to understand the information and link that information to his or her performance. It is within this context that the embodiments arise.

SUMMARY

In some embodiments, a method for spatial alignment of golf-club inertial measurement data and a three-dimensional human model for golf club swing analysis is provided. The method includes capturing inertial measurement data of a golf club swing through an inertial measurement unit (IMU), and sending the inertial measurement data of the golf club swing from the inertial measurement unit to a computing device. The computing device is configured to determine a three-dimensional trajectory of the golf club swing in IMU coordinate space, determine in human model coordinate space a three-dimensional trajectory of an infrared marker in a video of the golf club swing with the video having depth or depth information, determine a transformation matrix from human model coordinate space to IMU coordinate space, perform spatial alignment of the three-dimensional trajectory of the golf club swing and a three-dimensional human model based on the video having depth or depth information, using the transformation matrix, and overlay a projected golf club trajectory onto the three-dimensional human model in a sequence representing the golf club swing.

In some embodiments, a method for spatial alignment of golf-club inertial measurement data and a three-dimensional human model for golf club swing analysis, performed by a computing device is provided. The method includes receiving captured inertial measurement data of a golf club swing from an inertial measurement unit (IMU) and receiving or capturing a video with depth or depth information, of the golf club swing. The method includes determining a three-dimensional trajectory in human model coordinate space of an infrared marker, based on detecting and tracking the infrared marker in the video with depth or depth information and determining a three-dimensional trajectory in IMU coordinate space of the IMU attached to the golf club, from the inertial measurement data of the golf club swing. The method includes estimating a transformation matrix from the human model coordinate space to the IMU coordinate space, and overlaying a projected golf club trajectory onto a three-dimensional human model sequence of the golf club swing, based on spatial alignment of the inertial measurement data of the golf club swing and a three-dimensional human model, using the transformation matrix.

In some embodiments, a tangible, non-transitory, computer-readable media having instructions thereupon which, when executed by a processor, cause the processor to perform a method. The method includes receiving, from an inertial measurement unit (IMU), inertial measurement data of a golf club swing, and receiving, from at least a camera, a video of the golf club swing, having depth or depth information. The method includes determining, in human model coordinate space, a three-dimensional trajectory of an infrared marker, based on detecting and tracking the infrared marker in the video having depth or depth information, and determining, in IMU coordinate space, a three-dimensional trajectory of the IMU, based on the inertial measurement data of the golf club swing. The method includes determining a transformation matrix from the human model coordinate space to the IMU coordinate space, and overlaying a projected golf club trajectory, generated from the inertial measurement data of the golf club swing, onto a three-dimensional human model sequence of the golf club swing, generated from the video with depth or depth information, with the overlaying based on spatial alignment of the inertial measurement data of the golf club swing and a three-dimensional human model, using the transformation matrix.

Other aspects and advantages of the embodiments will become apparent from the following detailed description taken in conjunction with the accompanying drawings which illustrate, by way of example, the principles of the described embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

The described embodiments and the advantages thereof may best be understood by reference to the following description taken in conjunction with the accompanying drawings. These drawings in no way limit any changes in form and detail that may be made to the described embodiments by one skilled in the art without departing from the spirit and scope of the described embodiments.

FIGS. 1A and 1B depict an inertial measurement unit (IMU) captured golf swing trajectory overlaid on a three-dimensional video, in a golf coaching system in accordance with some embodiments.

FIG. 2 is a flow diagram of a method for overlaying an IMU captured golf swing trajectory onto a three-dimensional video in accordance with some embodiments.

FIG. 3 is a view of a golf club with a ball marker in accordance with some embodiments.

FIG. 4 is a marker detection result in a near infrared (NIR) image in accordance with some embodiments.

FIG. 5 depicts a refinement process for refining marker positions with low confidence values in accordance with some embodiments.

FIG. 6 is a flow diagram of a method for determining arm-club and arm-floor angles in a golf swing analysis in accordance with some embodiments.

FIG. 7 depicts a human skeleton model in accordance with some embodiments.

FIG. 8 is an example of a skeleton detection result in accordance with some embodiments.

FIG. 9 illustrates estimating hand position by crossing elbow-hand lines in accordance with some embodiments.

FIG. 10 is a block diagram of a golf coaching system in accordance with the present disclosure.

FIG. 11 is a flow diagram of a method for spatial alignment of an inertial measurement unit captured golf club swing and a 3D human model of the golf club swing in accordance with some embodiments.

FIG. 12 is an illustration showing an exemplary computing device which may implement the embodiments described herein.

DETAILED DESCRIPTION

A golf coaching system for golf swing analysis performs spatial and temporal alignment of an inertial measurement unit (IMU) captured golf swing and a three-dimensional human model, based on a three-dimensional (3D) video of the golf swing, using an apparatus and various methods described herein. In one embodiment, the golf coaching system detects and tracks an infrared reflective marker, and uses this tracking for the spatial alignment. In another embodiment, the golf coaching system detects and tracks skeleton points, and uses this tracking for the spatial alignment. Also, the golf coaching system calculates an arm-golf club angle and an arm-floor angle for the golfer, from a three-dimensional human skeleton model based on the three-dimensional video and the spatial alignment with the IMU captured golf swing. These angles and various videos with overlays can be output by the system, for use in coaching a golfer. The methods can be performed on one or more processors, such as a processor of an IMU, a processor of a computing device and/or a processor of a mobile device (which could also be a computing device).

One device that is suitable for performing portions of various methods and serving as a portion of a suitable apparatus is the M-Tracer™ of the assignee, which is an IMU that can be mounted to a golf club. The M-Tracer™ is equipped with wireless communication, and can send IMU data to another wireless device. Although embodiments are described herein using the M-Tracer™ as an IMU in one embodiment, it should be appreciated that variations and further embodiments are readily devised using other IMU systems, as the embodiments are not limited to the M-Tracer™ product.

FIGS. 1A and 1B depict different view angles for an inertial measurement unit (IMU) captured golf swing trajectory 101 overlaid on a three-dimensional video 103, in a golf coaching system. Embodiments of the golf coaching system allow a user to directly see a visualization of the IMU system trajectory on top of his 3D swing video at any view angle, as viewed on a display screen of a computing device or mobile device such as a smartphone or a tablet as shown in FIGS. 1A and 1B. Using the golf coaching system, the golf swing trajectory of the golfer is captured by an IMU system attached to the golf club, while the three-dimensional video 103 is made, using a camera with depth or depth information such as a camera with depth sensors or a stereo camera (or even a plenoptic camera). The system develops a three-dimensional human model from the three-dimensional video 103, in some embodiments using a three-dimensional human skeleton model.

The embodiments provide a method to automatically calibrate the IMU system and the 3D human model system is described. After calibration, the IMU system captured trajectory can be overlaid on top of the 3D video. The following is a method to spatially align the IMU system captured golf swing trajectory with a 3D human model based on 3D video captured by one or multiple depth sensors. Variations using other types of 3D video (e.g., stereo video) or 3D video captured by other types of cameras are readily devised, in keeping with the teachings herein. The method automatically estimates the transformation matrix from IMU system coordinate space to 3D human model coordinate space by aligning the IMU system swing trajectory with a detected IR (infrared) marker, which is attached on a hand or golf club. Variations with other types of markers are readily devised. The method has the following steps which will be explained further in more detail:

- (1) Attach an IR reflective marker onto a golf club or golfer's hand.
- (2) Capture RGBD (red, green, blue, depth) video (i.e., color, 3D video) of the golf swing using one or more depth sensors. Examples of depth sensors, such as RGBD sensors, can range from personal tablets or smart phones such as the Dell Venue 8 7000 series tablet and the Google Project Tango tablet for B2C (business to consumer) use case to more bulky but sophisticated sensors such as Kinect™ for B2B (business-to-business) use case.
- (3) Reconstruct a 3D human model from the RGBD video.
- (4) Detect and track the IR reflective marker from the RGBD video and obtain its 3D trajectory in human model coordinate space.
- (5) Obtain the IMU system swing trajectory at the marker position.
- (6) Estimate a transformation matrix from marker trajectory correspondences.

Although in theory only rigid transformation exists between the two coordinate systems, a perfect alignment cannot always be achieved due to error from the IMU system trajectory as well as marker detection. For a better visual alignment, a non-rigid transformation process can be followed in some embodiments.

Considering the fact that the IMU system trajectory may be inaccurate due to drifting error, a method to correct the IMU system trajectory includes the following steps:

- (1) Set initial trajectory error vector as zero.
- (2) Correct IMU system trajectory according a pre-defined error model.
- (3) Estimate the rigid transformation matrix using marker trajectory and error corrected IMU system trajectory correspondences
- (4) Estimate trajectory error by minimizing the distance between IMU system trajectory and a re-projected marker trajectory onto IMU system coordinate space.
- (5) If the estimated trajectory error does not change or is sufficiently small, or maximum number of iterations is reached, go to (6), otherwise go to (2).
- (6) Output the transformation matrix and a corrected IMU system trajectory.

FIG. 2 is a flow diagram of a method for overlaying an IMU captured golf swing trajectory onto a three-dimensional video. One goal of present embodiments is to align the IMU system golf club swing trajectory with a human model by estimating the transformation from IMU system coordinate system to 3D human model coordinate system. FIG. 2 depicts the overall algorithm framework.

In an action 202, a 3D human model is reconstructed, based on three-dimensional video 214 and camera calibration parameters 216. In an action 204, an IR reflective marker is detected and tracked in the 3D video 214. In an action 206, a 3D marker trajectory is formed, based on the detection and tracking. In an action 208, a transformation from human model space to IMU system space is estimated, based on the IMU system trajectory 220 and a time bias 218 between the IMU system trajectory 220 and the 3D video 214. In an action 210, the IMU system trajectory 220 is overlaid onto a 3-D human model sequence, from the reconstructed 3-D human model in the action 202 and based on the transformation developed in the action 208. The output of these actions is a 3D video sequence with IMU system trajectory overlaid 212. The above actions can be performed by a computing device, more specifically by a processor, and can be performed by various modules which could be implemented in software executing on a processor, hardware, firmware, or combinations thereof.

The input to the system is one or multiple RGBD videos, camera calibration parameters, temporal synchronization information represented as time bias and IMU system trajectory. The system first reconstructs the human model from RGBD videos and detects the IR reflective marker for each video frame. 3D marker trajectory is then calculated by projecting 2D (two-dimensional) marker location into 3D space with known camera parameters. With the known temporal information represented as the time bias between the first video frame and the first IMU system frame, correspondences of the marker location in both coordinate spaces are then built. With such point correspondences, transformation between these two coordinate systems can thus be estimated. An algorithm is described below, in which an infrared (IR) reflective marker is used for detecting the location of the IMU on a golf club, in a video with depth or depth information, i.e., 3D video.

FIG. 3 is a view of a golf club 302 with a ball marker 304. Golf clubs are very thin and are usually dark color or shiny. It is often difficult for a depth sensor to capture a golf club accurately. In one embodiment, a sensor is used to capture RGBD video. It has been observed that a golf club is invisible in the depth images of this RGBD video. In such a case, tracking the golf club from the RGBD video is impossible. Therefore, in an embodiment, attaching an IR reflective marker 304 on the golf club 302 makes the marker visible in NIR (near infrared) images in the RGBD video. An example of the marker 304 is illustrated in FIG. 3, which is a white, soft ball that is attached around the IMU system. Thus the center of the ball can be reasonably accurately considered as the location of the IMU system. Another possible marker would be a golf club glove made of IR reflective material. The embodiments described herein do not limit the use of any specific marker or any specific marker location.

Reconstruction of a human model is described next. To capture RGBD video, a depth sensor is used in various embodiments. Due to the development of depth sensing technology, depth sensors are becoming more accessible and affordable for different ranges of users. Examples of RGBD sensors are given above, and use of further types of depth sensors to capture video with depth or depth information is readily devised.

Given one or multiple RGBD signals, the reconstruction of a 3D object model generally includes the following steps:

- (1) camera calibration that includes optical calibration and RGB to depth calibration,
- (2) extrinsic calibration of multiple RGBD sensors to figure out the geometric relationship between each other,
- (3) target object segmentation,
- (4) surface reconstruction including depth fusion, triangulation and
- (5) texture mapping.
  
  In some embodiments, the Holocam system is used to construct a real time human model from four Kinect™ sensors. However, embodiments are not limited to any specific model reconstruction method. Other literature available methods can be applied and integrated with the embodiments, such as real time Kinect™ fusion.

Detection of an IR reflective marker is described below. FIG. 4 is a marker detection result 404 in a near infrared image 402. To obtain the 3D coordinates of the marker 304 in human model space, the system first detects the marker 304 in the 2D NIR image 402 followed by a step of estimating the 3D location of the marker 304 using camera calibration information. As the marker 304 is an IR reflective marker, it is visible in the NIR image 402. In this embodiment, the system detects the marker by detecting a ball shape in the image 402. A Canny edge detector is applied on the background subtracted image and a circle Hough transform is followed to detect ball marker candidates and pick the ball marker candidate that has the highest matching score with a circle pattern. After detecting marker position for all frames, a refinement process is followed by re-estimating the marker location at those frames with low matching score. The refined position is estimated by interpolating the detected marker position from the neighboring frames with high matching scores. An example of a detection result is depicted in FIG. 4. A detailed description of a suitable detection algorithm is provided following the description of the IMU system trajectory correction. Although a particular marker detection algorithm is disclosed herein, the proposed method does not limit the use of a specific marker, thus, different marker detection algorithms can be applied without invalidating the disclosed pipeline.

Estimating the three-dimensional location of the marker is described below. If only one depth sensor is used, the 3D location of the marker, denoted as P=[X, Y, Z]^Tcan be directly obtained from its 2D location, denoted as q=[x, y]^Twith the known camera intrinsics, i.e.,

$\begin{matrix} X = \frac{x - C_{x}}{f_{x}} Depth (x, y), Y = \frac{y - C_{y}}{f_{y}} Depth (x, y), Z = Depth (x, y) & (Eq . 1) \end{matrix}$

where C_x, C_y, f_x, f_yare intrinsic parameters and depth(x, y) denote the depth reading at location of (x, y)

If multiple depth sensors are used to capture the swing simultaneously, the 3D location in human model coordinate space, denoted as P_t, is found by minimizing the re-projection error for all N sensors:

argmin_P_tΣ_i=1^N∥q_i,t−π(K_i, T_i, P_t)∥² (Eq. 2)

where q_i,tis the detected 2D marker location point at frame t for i^thsensor, K_iand T_ithe intrinsic and extrinsic matrices of i^thsensor and π is the projection operator transforming a 3D point from model coordinate space to i^thsensor image space. Various optimization procedures can be used to solve the above minimization problem to get the optimal 3D marker location P_t. One implementation of the optimization is detailed below with reference to FIG. 5.

Estimating the transformation is described next. Assuming the temporal synchronization is completed, e.g., by aligning an IMU system sampling frame to a video frame and/or optimizing such alignment, marker position correspondences can then be built. Let P_t, t=1, . . . , N denote the 3D coordinates of marker at time t represented in human model coordinate space, and let M_tdenote the corresponding marker positions represented in M-Tracer™ coordinate space. Thus the goal is to estimate a rigid transformation matrix that includes a rotation R, a translation T and a scaling factor s such that P_t=sRM_t+T

The closed-from solution of absolute orientation using unit quaternions is used to find s, R and T.

Although in theory only rigid transformation exists between the two coordinate systems, a perfect alignment is difficult to achieve due to error from M-Tracer™ trajectory as well as marker detection. For a better visual alignment, a non-rigid transformation process can be followed. For instance, Gaussian Mixture Models is a robust method that can handle noise and outliers well. The non-rigid transformation provides the final 3D location of the golf club head and grip in the human model coordinate space.

Correction of the IMU captured golf club trajectory is described next. The M-Tracer™ trajectory is not always accurate. As with any IMU sensor tracking algorithm based on integrating acceleration and rotational velocity, the IMU system trajectory suffers from drifting error as any small error will be accumulated through the integration process. It usually requires another piece of signal information to correct it. Thus, some embodiments of the system correct M-Tracer™ drifting error by using information from the marker trajectory. Let {tilde over (M)}_tdenote the unknown true value of M-Tracer™ trajectory that can be obtained from the observed trajectory M_taccording to a pre-defined error model {tilde over (M)}_t=F(M_t, ε_t), where ε_tdenotes the error vector and F(.) defines the error model. For one embodiment, pseudo code for the method is illustrated as below.

Input: P_t, t=1, . . . , N denote the marker 3D positions at time t represented in human model coordinate space,

M_ti=1, . . . , N denote corresponding marker position at time t represented in M-Tracer™ coordinate space,

Algorithm:

(1) Set ε_t(0)={right arrow over (0)}; n=1;

(2) Estimate transformation matrix Tr=[SR|T] using {tilde over (M)}_t=F(M_t, ε_t(n−1)) and P_tcorrespondences

(3) Re-project P_tto M-Tracer™ space, i.e., Tr⁻¹P_t

(4) Estimate error vector ε_t(n) by minimizing 1/N Σ_i∥F(M_i, ε_i(t))−Tr⁻¹P_i∥²

(5) If 1/n Σ_t∥ε_t(n)∥ is smaller than a threshold, or 1/n Σ_t∥ε_t(n)−ε_t(n−1)∥ is smaller than a threshold, or n is larger than a threshold

output corrected trajectory {tilde over (M)}_t, and transformation matrix Tr

otherwise

n=n+1;

go to (2)

The definition of the error model depends on the sensor properties. In this disclosure, this is not limited to any specific error model. The above method can also be used to correct club head trajectory if the golf club head can be detected and tracked in the RGBD video sequence.

For IMU two-dimensional location detection, an algorithm to detect the white ball marker works as follows:

For each sensor

(1) Make background image BG
(2) Find circles and their confidence for the first frame I₁=I₁−BG, set MarkerLocation=[x, y, c] and set PrevLocation=[x, y, c]
(3) For all frames I_j, j={2, . . . , End}

I
_j
=I
_j
−BG

- Find circles and their confidence
  
  If no circle is found
- add PrevLocation to MarkerLocation

else

- Choose the circle center (x, y) with the highest confidence value c and add [x, y, c] to MarkerLocation
- Set PrevLocation to [x, y, c]
(4) Refine MarkerLocation

The next subsections explain the details of the algorithm to detect the location of the IMU system (e.g., a white IR reflective ball or other marker attached to the golf club) in NIR images of the 3D video.

Background Subtraction is performed. In order to find the location of the marker (e.g., white ball around IMU system) in a frame, first a background model is constructed. The background model is the average of the frames:

$BG = \frac{1}{e - s} \sum_{i = s}^{e} I_{i},$

where s and e represent the first and last frames to average. The best range [s e] is the range covering fast moving frames (e.g., golf club top position to impact). For the frame I_i, the background is subtracted and used for the next step processing.

I
_i
=I
_i
−BG.

Circle detection and confidence determination are performed. First, edges in the given image are detected by finding pixels with high gradient magnitude. Then, the circle Hough transform is applied to find the center and radius of the ball marker candidates (i.e., [x_ballⁱ, y_ballⁱ, r_ballⁱ]). The detection confidence of a circle c_iin image I is computed as:

c
_i
=I*k
_i,

where k_iis the kernel defined for circle i, [x_ballⁱ, y_ballⁱ, r_ballⁱ], i.e.

$k_{i} (x, y, x_{ball}^{i}, y_{ball}^{i}, r_{ball}^{i}) : {\begin{matrix} 0.5 & if \sqrt{{(x - x_{ball}^{i})}^{2} + {(y - y_{ball}^{i})}^{2}} \leq r_{ball}^{i} \\ - 0.5 & if r_{ball}^{i} < \sqrt{{(x - x_{ball}^{i})}^{2} + {(y - y_{ball}^{i})}^{2}} \leq r_{ball}^{i}  1.5 \\ 0 & otherwise \end{matrix} .$

The ball marker [x_ball, y_ball] is thus detected as the circle candidate that has the maximum confidence value.

Detection refinement is performed. To refine the results, the system reviews the confidence values of the detected marker position of all frames and re-estimates those frames with low confidence value (below the given Threshold) by interpolating the results from one or more neighboring frames that has high confidence value, as illustrated in FIG. 5.

FIG. 5 depicts a refinement process for refining marker positions with low confidence values. Marker position(s) with low confidence values 502 are refined using left and right neighboring high confidence frames 504 through an interpolation process. As

IMU system 3D location estimation can be performed by the system as follows. In order to find the 3D location of the IMU system from 2D positions detected in the previous step, minimize re-projection error for all N sensors:

${argmin}_{P_{t}} \sum_{i = 1}^{N} { q_{i, t} - π (K_{i}, T_{i}, P_{t}) }^{2}$

where q_i,tis the detected 2D point at time t for sensor i, K_iand T_iare the intrinsic and extrinsic matrices of sensor i, and π is the projection operator and P_tis the 3D coordinate of the marker in model coordinate space that is estimated from all sensors. The algorithm is explained below:

Step 1: Find initial P_t,

For all q_i,t, i={1, . . . , N}
(1) Set [P_i,t, Valid_i,t]=D(q_i,t, T_i), where D( ) denote the function to find the 3D position of q_i,t, denoted as P_i,t, using depth information and extrinsic parameters of sensor i according Eq. (1). If there is no depth info, then the result is invalid, i.e., Valid_i,t=0, otherwise Valid_i,t=1
(2) If Σ_i=1^NValid_i,t<1
Stop and return: No valid 3D point is detected at time t
(3) Set P_t=argmin_P_i,tΣ_i=1^N∥q_i,t−π(K_i, T_i, P_i,t)∥².Valid_i,t,
(4) Set

$P_{mean} = \frac{1}{\sum_{i = 1}^{N} {Valid}_{i, t}} \sum_{i = 1}^{N} P_{i, t} \cdot {Valid}_{i, t},$

(5) if Σ_i=1^N∥q_i,t−π(K_i, T_i, P_mean)∥²<Σ_i=1^N∥q_i,t−π(K_i, T_i, P_t)∥²
- Set P_t=P_mean

Step 2: Find final P_twhich minimizes re-projection error

Set P_t=argmin_P_i,tΣ_i=1^N∥q_i,t−π(K_i, T_i, P_i,t)∥².Valid_i,t
(6) Calculate averaged re-projection error

$Err = \frac{1}{\sum_{i = 1}^{N} {Valid}_{i, t}} \sum_{i = 1}^{N} { q_{i, t} - π (K_{i}, T_{i}, P_{t}) }^{2} \cdot {Valid}_{i, t}$

(7) if Err>ThresholdError

(a) Remove the outlier q_i,twhich has the largest re-projection error:

argmax_q_i,tΣ_i=1^N∥q_i,t−π(K_i, T_i, P_i,t)∥².Valid_i,t, by setting Valid_i,t=0

(b) Set P_t=argmin_P_i,tΣ_i=1^N∥q_i,t−π(K_i, T_i, P_i,t)∥².Valid_i,t

$Err = \frac{1}{\sum_{i = 1}^{N} {Valid}_{i, t}} \sum_{i = 1}^{N} { q_{i, t} - π (K_{i}, T_{i}, P_{i, t}) }^{2} \cdot {Valid}_{i, t},$

(d) Do the following T times or until Err<ThresholdError
- a. Find q_i,twith maximum error, i.e.,

argmax_q_i,tΣ_i=1^N∥q_i,t−π(K_i, T_i, P_i,t)∥².Valid_i,t, and replace it with a pixel within a window W around q_i,twhich minimizes the re-projection error:

$q_{i, t} = \underset{q \in W (q_{i, t})}{argmax} { q - π (K_{i}, T_{i}, P_{t}) }^{2} \cdot {Valid}_{i, t}$

- b. Set P_t=argmin_P_i,tΣ_i=1^N∥q_i,t−π(K_i, T_i, P_i,t)∥².Valid_i,t
- c. Set

$Err = \frac{1}{\sum_{i = 1}^{N} {Valid}_{i, t}} \sum_{i = 1}^{N} { q_{i, t} - π (K_{i}, T_{i}, P_{t}) }^{2} \cdot {Valid}_{i, t},$

- Otherwise go to (8)

(8) Output P_t

A method performed by the golf coaching system to automatically calibrate the IMU system and the 3D human model system is described below. After calibration, the IMU system capture trajectory can be overlaid on top of the 3D video. The apparatus and method spatially aligns the IMU system captured golf swing trajectory with the 3D human model based on 3D video captured by one or multiple depth sensors. The method can automatically estimate the transformation matrix from IMU system coordinate space to 3D human model coordinate space aligning the detected human skeleton points with the swing trajectory. One embodiment has the following steps:

- (1) Capture RGBD video of the golf swing using a depth sensor. As above, examples of depth sensors can range from personal tablet or smart phone such as Dell Venue 8 7000 series tablet and Project Tango Tablet for B2C use case to more bulky but sophisticated sensors such as Kinect™ for B2B use case.
- (2) Detect and track various human skeleton points, e.g., hand, wrist, elbow, foot, etc.
- (3) Obtain IMU system swing trajectory at grip position.
- (4) Obtain hand trajectory from skeleton tracking.
- (5) Estimate transformation matrix from IMU system and skeleton trajectory correspondences.
- (6) Calculate arm-club angle and track such angle from address to swing end. The method includes:
  - a. Get 3D coordinates of the golf club head in the IMU system coordinate space from the IMU system trajectory.
  - b. Calculate 3D coordinates of the golf club head in human model coordinate space by applying the transformation from (5).
  - c. Get 3D coordinates of hand and elbow points in human model coordinate space from skeleton tracking.
  - d. Calculate the angle between lines defined by hand-club head and hand-elbow in human model coordinate space.
(7) Calculate the arm-floor angle for each frame and track such angle from address to swing end. The method includes:
- a. Get 3D coordinates of left and right foot points at address time in human model coordinate space from skeleton tracking.
- b. Get 3D coordinates of golf club head at address time in IMU system coordinate space from IMU system trajectory.
- c. Calculate 3D coordinates of club head in human model coordinate space by applying the transformation from (5).
- d. Define the floor plane using left, right foot points from (a) and golf club head location from (c) in human model coordinate space.
- e. Calculate the angle between a line defined by hand-elbow and the floor plane estimated from (d).

Although in theory only rigid transformation exists between the two coordinate systems, a perfect alignment cannot always be achieved due to error from the IMU system trajectory as well as skeleton detection. For a better visual alignment, a non-rigid transformation process can be followed.

Considering that the IMU system trajectory may be inaccurate due to drifting error, the system can perform a process to correct the IMU system trajectory that includes the following steps:

- (1) Set initial trajectory error vector as zero.
- (2) Correct IMU system trajectory according a pre-defined error model.
- (3) Estimate the rigid transformation matrix using hand trajectory and error corrected IMU system trajectory correspondences.
- (4) Estimate trajectory error by minimizing the distance between the IMU system trajectory and a re-projected hand trajectory onto IMU system coordinate space.
- (5) If the estimated trajectory error does not change or is sufficiently small, or a maximal number of iterations is reached, go to (6), otherwise go to (2)
- (6) Output the transformation matrix and a corrected IMU system trajectory.

One goal of present embodiments is to align IMU system trajectory with a 3D human model by estimating the transformation from IMU system coordinate system to 3D human model coordinate system. One goal of a further embodiment, described below, is to determine and output an angle between an arm of a golfer and the golf club, and also determine and output an angle between the arm of the golfer and the floor. Knowledge of these angles is useful in coaching the golfer for improvement in golf swing. FIG. 2 depicts the overall algorithm framework.

FIG. 6 is a flow diagram of a method for determining arm-club and arm-floor angles in a golf swing analysis. The method is practiced by embodiments of the golf coaching system, as a variation of the embodiments described above with reference to FIGS. 1-5. In an action 602, the 3D human model is reconstructed, based on the three-dimensional video 618 and the camera calibration parameter 620. In an action 604, skeleton points are extracted and tracked, based on the three-dimensional video 618 and the 3D human model reconstructed in the action 602. In an action 606, the hand trajectory is formed, based on the tracked skeleton points from the action 604. In an action 608, the transformation from human model space to the IMU system is estimated, using the time bias 622 and the IMU system trajectory 624. In an action 610, the IMU system trajectory is overlaid onto a 3D human model sequence, based on the reconstructed 3D human model from the action 602 and the transformation matrix from the action 608. A 3D video sequence with IMU system trajectory overlaid is produced in an action 614. In an action 612, the arm-club angle and the arm-floor angle are calculated, based on the tracked skeleton points from the action 604 and the transformation matrix from the action 608. The arm-club and arm-floor angle trajectory are output, in an action 616.

The inputs to the system are one or multiple RGBD videos, camera calibration parameters, IMU system trajectory and temporal synchronization information represented as time bias between the first video frame and the first frame of IMU system signal. The system reconstructs the human model from RGBD videos and detects the skeleton points for each video frame. Hand trajectory is then extracted by averaging either left and right hand or left and right wrist skeleton points. With the pre-known temporal information, the hand and IMU system grip position trajectory correspondences are then built. With such point correspondences, transformation between these two coordinate systems can thus be estimated.

FIG. 7 depicts a human skeleton model. In one embodiment, the system uses the a skeleton detection and tracking method, which can be obtained directly from the Kinect™ SDK (software development kit) in some embodiments. The human skeleton 702 definition is illustrated in FIG. 7 in accordance with some embodiments. For a specific skeleton point, if multiple skeleton signals are available from different sensors, a merging step is required. Embodiments are not limited to any specific skeleton detection, tracking and merging algorithm. Various methods that generate 3D skeleton trajectory in a world space can be applied. An example of the detected skeleton is illustrated in FIG. 8.

FIG. 8 is an example of a skeleton detection result. Detected and reprojected skeleton lines 802 are seen overlaid onto a video frame of a golfer in mid-swing. The system extracts the hand trajectory, based on these skeleton lines and detected skeleton points. In order to estimate the transformation between model space and IMU system space, the system builds correspondences between feature points in those two spaces. The IMU system can output trajectory of grip position with pre-known distance between hand and IMU system location. This pre-known distance can be measured and input into the system through user input, or determined by the system through measurement in video frames and scaling. Thus, the hand trajectory from human model is extracted to build such correspondences.

FIG. 9 illustrates estimating hand position by crossing elbow-hand lines 902. During the golf swing, the user holds the golf club with two hands together. One technique to get the hand position is to average the left and right hand positions output from the Kinect™ SDK. In addition to using hand position, wrist position can also be used as the estimation of grip position on the golf club, by averaging left and right wrist positions. In a more sophisticated way, these four points can be combined to provide a more robust estimation. An alternative way to estimate the hand position is to extend the left and right elbow-hand/wrist line and use the cross point as the estimation of hand position.

Similarly to previously described embodiments, the system estimates a transformation. Assuming temporal synchronization is completed, e.g., using temporal synchronization information provided as a time bias, hand position correspondences can then be built. Let P_i, i=1, . . . , N denote the 3D coordinates of the hand at time i represented in human model coordinate space obtained from skeleton tracking, and let M_idenote the corresponding grip point positions represented in M-Tracer™ coordinate space. Thus the goal is to estimate a rigid transformation matrix that includes a rotation R, a translation T and a scaling factor s such that

P
_i
=sRM
_i
+T

The closed-form solution of absolute orientation using unit quaternions [4] is used to find s, R and T.

Although in theory only rigid transformation exist between two coordinate systems, a perfect alignment is difficult to achieve due to error from the IMU system trajectory as well as skeleton detection. For a better visual alignment, a non-rigid transformation process can be followed. For instance, Gaussian Mixture Models is a robust method that can handle noise and outliers well. The non-rigid transformation provides the final 3D location of the golf club head and grip in the human model coordinate space.

Arm-club angle and arm-floor angle are two important measurements that can help golfers to improve their skills. With the calculated transformation matrix, the IMU system trajectory is projected to human model coordinate space. Let P_i^eand P_i^hadenote the 3D coordinates of elbow position and hand position in the human model coordinate space obtained from skeleton tracking. Let M_i^hbe 3D coordinate of club head position at time i output by the IMU system, its corresponding coordinates in human model space, denoted as P_i^h, can then be calculated as

P
_i
^h
=sRM
_i
^h
+T

Then, the angle between arm and club, denoted as θ_ac, is defined as the angle between club line [P_i^ha, P_i^h] and arm line [P_i^ha, P_i^e]. This angle can be calculated as

$θ_{ac} = a \cos (\frac{(P_{i}^{ha} - P_{i}^{h}) \cdot (P_{i}^{ha} - P_{i}^{e})}{\langle P_{i}^{ha} - P_{i}^{h} \rangle \langle P_{i}^{ha} - P_{i}^{e} \rangle}),$

where (·) denotes the dot product operator and |·| is the norm operator.

To calculate the arm-floor angle, the floor plane normal is defined, denoted as custom-character . In one embodiment, Holocam technology is used by the system to reconstruct the human model. In Holocam space definition, z axis is defined as the normal to the floor plane pointing upward, i.e., =[0,0,1]^T. If some other model reconstruction method is used while the floor plane is not explicitly defined, an embodiment of the system could estimate the floor plane using 3D positions of the golf club head, left foot and right foot during address time. Let P_ad^lfand P_ad^rfdenote the 3D coordinates of left and right foot at address time in the human model coordinate space obtained from skeleton tracking. Let the P_ad^hbe the 3D coordinates of the golf club head at address time in human model coordinate space. The floor plane normal can thus be estimated as

$\overset{⇀}{n} = \frac{(P_{ad}^{lf} - P_{ad}^{h})  (P_{ad}^{rf} - P_{ad}^{h})}{\langle (P_{ad}^{lf} - P_{ad}^{h})  (P_{ad}^{rf} - P_{ad}^{h} \rangle}$

Then, the angle between arm and floor plane, denoted as θ_af, can be calculated as

$θ_{af} = a \cos (\frac{\overset{⇀}{n} \cdot (P_{i}^{ha} - P_{i}^{e})}{\langle P_{i}^{ha} - P_{i}^{e} \rangle}),$

Various embodiments of the golf coaching system correct the IMU system trajectory. The IMU system trajectory is not always accurate. As with any IMU sensor tracking algorithm based on integrating acceleration and rotational velocity, the IMU system trajectory suffers from drifting error, as any small error is accumulated through the integration process. The trajectory may require another piece of signal information to correct it. Thus, some embodiments of the system correct IMU system drifting error by using the information from the skeleton trajectory. Let {tilde over (M)}_idenote the unknown true value of IMU system trajectory that can be obtained from the observed trajectory M_iaccording to a pre-defined error model {tilde over (M)}_i=F(M_i, ε_i), where ε_idenotes the error vector and F(.) defines the error model. With pseudo code, one embodiment of the method is illustrated as below.

Input: P_i, i=1, . . . , N denote the hand 3D positions at time i represented in human model coordinate space,

M_ii=1, . . . , N denote corresponding grip point positions at time i represented in M-Tracer™ coordinate space,

Algorithm:

- (1) Set ε_i(0)={right arrow over (0)}; t=1;
- (2) Estimate transformation matrix Tr=[sR|T] using {tilde over (M)}_i=F(M_i, ε_i(t−1)) and P_icorrespondences
- (3) Re-project P_ito M-Tracer™ space, i.e., Tr(t)⁻¹P_i
- (4) Estimate error vector ε_i(t) by minimizing 1/N Σ_i∥F(M_i, ε_i(t))−Tr⁻¹P_i∥²
- (5) If 1/n Σ_i∥ε_i(t)∥ is smaller than a threshold, or 1/n Σ_i∥ε_i(t)−ε_i(t−1)∥ is smaller than a threshold, or it is larger than a threshold
- output corrected trajectory {tilde over (M)}_i, and transformation matrix Tr
  
  otherwise

t=t+1;

go to (2)

The definition of the error model depends on the sensor properties. In this disclosure, embodiments are not limited to any specific error model. The above method can also be used to correct club head trajectory if the golf club head can be detected and tracked in the RGBD video sequence.

FIG. 10 is a block diagram of a golf coaching system in accordance with the present disclosure. An inertial measurement unit 1004, such as the IMU system described herein, is attached to a golf club 1002, and is used for capturing inertial measurement of a golf club swing by a golfer. A 3D video camera 1006 captures a three-dimensional video of the golf club swing. A computing device 1008 has a processor 1012, a memory 1014, and a wireless module 1010. The computing device receives captured inertial measurement of the golf club swing from the inertial measurement unit 1004 attached to the golf club 1002, via the wireless module 1010 of the computing device 1008. Also, the computing device receives the captured three-dimensional video from the 3-D video camera 1006, via a wired or a wireless connection, or other transfer mechanism. In further embodiments, wired connections or media transfer could be used.

The computing device 1008 has a 3D human model module 1016, a marker detection and tracking module 1018, a 3D marker trajectory module 1020, a transformation module 1022, an overlay module 1024, a skeleton point extraction and tracking module 1026, a hand trajectory module 1028, and an arm-club, arm-floor angle and trajectory module 1030. Each of these modules could be implemented in software executing on the processor 1012, hardware, firmware, or combination thereof. These modules implement functions described above with reference to FIGS. 1-9. The computing device 1008 forms a 3D human model, with a skeleton model in some embodiments, based on the 3D video of the golf club swing, detects and tracks a marker and forms a 3D trajectory of the marker, and determines a transformation between the IMU system trajectory and the 3D human model, with alignment based on the 3D trajectory of the marker. Alternatively, the computing device 1008 tracks skeleton points from the skeleton model, and determines the transformation between the IMU system trajectory and the 3-D human model, with alignment based on the tracked skeleton points. The computing device 1008 overlays the IMU system trajectory onto the 3D human model, and displays this for viewing at a user-selected angle. Alternatively the computing device 1008 sends a 3-D video to a user device for viewing at a user-selected angle, which can be selected or manipulated through user input. Also, the computing device 1008, in some embodiments, calculates an arm-club angle and/or an arm-floor angle, and/or trajectories for one or both of these, and outputs one or both of these angles and/or trajectories overlaid on the 3D human model, the 3-D human model with skeleton model overlay, the 3D video, or the 3-D video with skeleton model overlay. Variations include various combinations of these features.

FIG. 11 is a flow diagram of a method for spatial alignment of an inertial measurement unit captured golf club swing and a 3D human model of the golf club swing. The method is practiced by components of the golf coaching system, and particularly by one or more processors of a computing device with various modules as described above. In an action 1102, inertial measurement of a golf club swing is captured with (or by) an inertial measurement unit attached to a golf club. In an action 1104, video with depth or depth information (i.e., 3D video) of the golf club swing is captured, e.g., by a 3D video camera such as a stereo camera or a camera with one or more depth sensors. In an action 1106, a three-dimensional human model is determined, based on the video, e.g., by a computing device that receives the captured inertial measurement of the golf club swing and the captured 3D video of the golf club swing. In an action 1108, a marker attached to a hand or a golf club is detected and tracked in the video, by the computing device. In an action 1110, a three-dimensional trajectory of the inertial measurement unit attached to the golf club is determined, by the computing device. In an action 1112, a transformation matrix from human model coordinate space to inertial measurement unit coordinate space is determined, by the computing device. The transformation matrix is based on correspondence between the tracking of the marker and the three-dimensional trajectory of the inertial measurement unit attached to the golf club. In an action 1114, a projected golf club trajectory is overlaid onto the three-dimensional human model, using the transformation matrix. The overlaid video sequence is displayed or sent to another device for displaying.

It should be appreciated that the methods described herein may be performed with a digital processing system, such as a conventional, general-purpose computer system. Special purpose computers, which are designed or programmed to perform only one function may be used in the alternative. FIG. 12 is an illustration showing an exemplary computing device which may implement the embodiments described herein. The computing device of FIG. 12 may be used to perform embodiments of the functionality for a golf coaching system in accordance with some embodiments. The computing device includes a central processing unit (CPU) 1201, which is coupled through a bus 1205 to a memory 1203, and mass storage device 1207. Mass storage device 1207 represents a persistent data storage device such as a floppy disc drive or a fixed disc drive, which may be local or remote in some embodiments. The mass storage device 1207 could implement a backup storage, in some embodiments. Memory 1203 may include read only memory, random access memory, etc. Applications resident on the computing device may be stored on or accessed via a computer readable medium such as memory 1203 or mass storage device 1207 in some embodiments. Applications may also be in the form of modulated electronic signals modulated accessed via a network modem or other network interface of the computing device. It should be appreciated that CPU 1201 may be embodied in a general-purpose processor, a special purpose processor, or a specially programmed logic device in some embodiments.

Display 1211 is in communication with CPU 1201, memory 1203, and mass storage device 1207, through bus 1205. Display 1211 is configured to display any visualization tools or reports associated with the system described herein. Input/output device 1209 is coupled to bus 1205 in order to communicate information in command selections to CPU 1201. It should be appreciated that data to and from external devices may be communicated through the input/output device 1209. CPU 1201 can be defined to execute the functionality described herein to enable the functionality described with reference to FIGS. 1-11. The code embodying this functionality may be stored within memory 1203 or mass storage device 1207 for execution by a processor such as CPU 1201 in some embodiments. The operating system on the computing device may be MS-WINDOWS™, OS/2™, UNIX™, LINUX™, iOS™ or other known operating systems. It should be appreciated that the embodiments described herein may also be integrated with a virtualized computing system implemented with physical computing resources.

Detailed illustrative embodiments are disclosed herein. However, specific functional details disclosed herein are merely representative for purposes of describing embodiments. Embodiments may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.

It should be understood that although the terms first, second, etc. may be used herein to describe various steps or calculations, these steps or calculations should not be limited by these terms. These terms are only used to distinguish one step or calculation from another. For example, a first calculation could be termed a second calculation, and, similarly, a second step could be termed a first step, without departing from the scope of this disclosure. As used herein, the term “and/or” and the “I” symbol includes any and all combinations of one or more of the associated listed items.

As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “includes”, and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Therefore, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.

It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

With the above embodiments in mind, it should be understood that the embodiments might employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing. Any of the operations described herein that form part of the embodiments are useful machine operations. The embodiments also relate to a device or an apparatus for performing these operations. The apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.

A module, an application, a layer, an agent or other method-operable entity could be implemented as hardware, firmware, or a processor executing software, or combinations thereof. It should be appreciated that, where a software-based embodiment is disclosed herein, the software can be embodied in a physical machine such as a controller. For example, a controller could include a first module and a second module. A controller could be configured to perform various actions, e.g., of a method, an application, a layer or an agent.

The embodiments can also be embodied as computer readable code on a tangible non-transitory computer readable medium. The computer readable medium is any data storage device that can store data, which can be thereafter read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion. Embodiments described herein may be practiced with various computer system configurations including hand-held devices, tablets, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers and the like. The embodiments can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a wire-based or wireless network.

Although the method operations were described in a specific order, it should be understood that other operations may be performed in between described operations, described operations may be adjusted so that they occur at slightly different times or the described operations may be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing.

In various embodiments, one or more portions of the methods and mechanisms described herein may form part of a cloud-computing environment. In such embodiments, resources may be provided over the Internet as services according to one or more various models. Such models may include Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). In IaaS, computer infrastructure is delivered as a service. In such a case, the computing equipment is generally owned and operated by the service provider. In the PaaS model, software tools and underlying equipment used by developers to develop software solutions may be provided as a service and hosted by the service provider. SaaS typically includes a service provider licensing software as a service on demand. The service provider may host the software, or may deploy the software to a customer for a given period of time. Numerous combinations of the above models are possible and are contemplated.

Various units, circuits, or other components may be described or claimed as “configured to” perform a task or tasks. In such contexts, the phrase “configured to” is used to connote structure by indicating that the units/circuits/components include structure (e.g., circuitry) that performs the task or tasks during operation. As such, the unit/circuit/component can be said to be configured to perform the task even when the specified unit/circuit/component is not currently operational (e.g., is not on). The units/circuits/components used with the “configured to” language include hardware—for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/circuit/component is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. 112, sixth paragraph, for that unit/circuit/component. Additionally, “configured to” can include generic structure (e.g., generic circuitry) that is manipulated by software and/or firmware (e.g., an FPGA or a general-purpose processor executing software) to operate in manner that is capable of performing the task(s) at issue. “Configured to” may also include adapting a manufacturing process (e.g., a semiconductor fabrication facility) to fabricate devices (e.g., integrated circuits) that are adapted to implement or perform one or more tasks.

The foregoing description, for the purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the embodiments and its practical applications, to thereby enable others skilled in the art to best utilize the embodiments and various modifications as may be suited to the particular use contemplated. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

Claims

1. A method for spatial alignment of golf-club inertial measurement data and a three-dimensional human model for golf club swing analysis, comprising: capturing inertial measurement data of a golf club swing through an inertial measurement unit (IMU); andsending the inertial measurement data of the golf club swing from the inertial measurement unit to a computing device, so that the computing device determines a three-dimensional trajectory of the golf club swing in a coordinate space of the IMU, determines in human model coordinate space a three-dimensional trajectory of an infrared marker in a video of the golf club swing with the video having depth or depth information, determines a transformation matrix from the human model coordinate space to the IMU coordinate space, performs spatial alignment of the three-dimensional trajectory of the golf club swing and a three-dimensional human model based on the video having depth or depth information, using the transformation matrix, and overlays a projected golf club trajectory onto the three-dimensional human model in a sequence representing the golf club swing.
2. The method of claim 1, wherein the computing device displays or sends to a mobile device the sequence representing the golf club swing, showing the projected golf club trajectory overlaid onto the three-dimensional human model.
3. The method of claim 1, wherein the capturing inertial measurement data of the golf club swing includes capturing high-frequency swing trajectory, impact speed and shaft angle on an IMU sensor.
4. The method of claim 1, wherein the computing device corrects a drifting error of the inertial measurement data of the golf club swing and bases the transformation matrix on a corrected three-dimensional trajectory of the golf club swing in the IMU coordinate space.
5. The method of claim 1, wherein: the video having depth or depth information is from a camera having one or more depth sensors; andthe video having depth or depth information is an RGBD (red, green, blue, depth) video.
6. The method of claim 1, wherein the computing device determines the transformation matrix as a non-rigid transformation matrix that compensates for error in the three-dimensional trajectory of the golf club swing in the IMU coordinate space and error in detection of the infrared marker in the video of the golf club swing.
7. A method for spatial alignment of golf-club inertial measurement data and a three-dimensional human model for golf club swing analysis, performed by a computing device, comprising: receiving captured inertial measurement data of a golf club swing from an inertial measurement unit (IMU);receiving or capturing a video with depth or depth information, of the golf club swing;determining a three-dimensional trajectory in human model coordinate space of an infrared marker, based on detecting and tracking the infrared marker in the video with depth or depth information;determining a three-dimensional trajectory in a coordinate space of the IMU attached to the golf club, from the inertial measurement data of the golf club swing;estimating a transformation matrix from the human model coordinate space to the IMU coordinate space; andoverlaying a projected golf club trajectory onto a three-dimensional human model sequence of the golf club swing, based on spatial alignment of the inertial measurement data of the golf club swing and a three-dimensional human model, using the transformation matrix.
8. The method of claim 7, wherein the estimating the transformation matrix comprises: correcting the three-dimensional trajectory of the IMU attached to the golf club, according to an error model;estimating a transformation matrix from the human model coordinate space to the IMU coordinate space, based on the three-dimensional trajectory, in the human model coordinate space, of the infrared marker and based on the error corrected three-dimensional trajectory of the IMU, in the IMU coordinate space;estimating trajectory error by minimizing a distance between the error corrected three-dimensional trajectory of the IMU in the IMU coordinate space and a reprojected infrared marker trajectory in the IMU coordinate space; anddetermining the transformation matrix based on a minimum estimated trajectory error.
9. The method of claim 7, wherein the estimating the transformation matrix is based on a minimal estimated trajectory error and a corrected three-dimensional trajectory, in the IMU coordinate space, of the IMU attached to the golf club.
10. The method of claim 7, wherein the receiving or capturing the video with depth or depth information comprises receiving the video with depth or depth information from and as captured by one of: a device having a video camera with one or more depth sensors, a stereo camera, or a plenoptic camera.
11. The method of claim 7, further comprising: outputting a three-dimensional video having the projected golf club trajectory overlaid onto the three-dimensional human model in the sequence of the golf club swing.
12. The method of claim 7, further comprising: outputting the sequence of the golf club swing, with the golf club trajectory projected onto the three-dimensional human model, as video that is viewable at a plurality of view angles.
13. The method of claim 7, wherein determining the three-dimensional trajectory of the infrared marker, in the human model coordinate space, comprises: projecting two-dimensional location of the infrared marker, relative to video frames, into three-dimensional space, using camera parameters.
14. The method of claim 7, further comprising: determining a time bias between a frame of the video with depth or depth information and the inertial measurement data of the golf club swing; anddetermining correspondences of locations of the infrared marker in the human model coordinate space and locations of the IMU in the IMU coordinate space relative to the time bias, wherein estimating the transformation matrix is based on the correspondences of the locations.
15. A tangible, non-transitory, computer-readable media having instructions thereupon which, when executed by a processor, cause the processor to perform a method comprising: receiving, from an inertial measurement unit (IMU), inertial measurement data of a golf club swing;receiving, from at least a camera, a video of the golf club swing, having depth or depth information;determining, in human model coordinate space, a three-dimensional trajectory of an infrared marker, based on detecting and tracking the infrared marker in the video having depth or depth information;determining, in a coordinate space of the IMU, a three-dimensional trajectory of the IMU, based on the inertial measurement data of the golf club swing;determining a transformation matrix from the human model coordinate space to the IMU coordinate space; andoverlaying a projected golf club trajectory, generated from the inertial measurement data of the golf club swing, onto a three-dimensional human model sequence of the golf club swing, generated from the video with depth or depth information, with the overlaying based on spatial alignment of the inertial measurement data of the golf club swing and a three-dimensional human model, using the transformation matrix.
16. The computer-readable media of claim 15, wherein the method further comprises: constructing a three-dimensional object model in the human model coordinate space, based on the video having the depth or depth information, wherein the three-dimensional human model is based on one or more of target object segmentation, surface reconstruction, depth fusing, triangulation and texture mapping for the three-dimensional object model.
17. The computer-readable media of claim 15, wherein the at least a camera includes a stereo camera or at least one depth sensor.
18. The computer-readable media of claim 15, wherein determining the three-dimensional trajectory of the infrared marker comprises: subtracting a background from images in the video having the depth or depth information;performing edge detection on background subtracted images;matching ball marker candidates, from the edge detected background subtracted images, to a circle pattern; anddetermining three-dimensional coordinates of the matched ball marker candidates, in human model space.
19. The computer-readable media of claim 15, wherein the determining the transformation matrix comprises: determining a rigid transformation matrix;determining a corrected three-dimensional trajectory of the IMU attached to the golf club, in the IMU coordinate space; anddetermining a non-rigid transformation matrix based on minimizing error vectors relative to an error model, the corrected three-dimensional trajectory of the IMU, and the rigid transformation matrix.
20. The computer-readable media of claim 15, wherein the determining the three-dimensional trajectory of the infrared marker, in the human model coordinate space, comprises: refining marker positions with low confidence values, from frames of the video having depth or depth information, through an interpolation process; andminimizing reprojection error in model coordinate space for the marker positions.

Spatial Alignment of Inertial Measurement Unit Captured Golf Swing and 3D Human Model For Golf Swing Analysis Using IR Reflective Marker

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims