As an increasingly popular sport, golf has attracted millions of people around the world. Athletes and amateurs are always looking for ways to improve their skills. Sensor based golf coaching systems are commercially available. One such system provides an IMU (inertial measurement unit) sensor, denoted as M-Tracer™, on the golf club. The sensor tracks the golf club and outputs a high frequency swing trajectory as well as many other metrics such as impact speed, shaft angle etc. Although the sensor based golf coaching systems provide useful information, it is still difficult for a normal user to understand the information and link that information to his or her performance. It is within this context that the embodiments arise.
In some embodiments, a method for spatial alignment of golf-club inertial measurement data and a three-dimensional human model for golf club swing analysis is provided. The method includes capturing inertial measurement data of a golf club swing through an inertial measurement unit (IMU), and sending the inertial measurement data of the golf club swing from the inertial measurement unit to a computing device. The computing device is configured to determine a three-dimensional trajectory of the golf club swing in IMU coordinate space, determine in human model coordinate space a three-dimensional trajectory of an infrared marker in a video of the golf club swing with the video having depth or depth information, determine a transformation matrix from human model coordinate space to IMU coordinate space, perform spatial alignment of the three-dimensional trajectory of the golf club swing and a three-dimensional human model based on the video having depth or depth information, using the transformation matrix, and overlay a projected golf club trajectory onto the three-dimensional human model in a sequence representing the golf club swing.
In some embodiments, a method for spatial alignment of golf-club inertial measurement data and a three-dimensional human model for golf club swing analysis, performed by a computing device is provided. The method includes receiving captured inertial measurement data of a golf club swing from an inertial measurement unit (IMU) and receiving or capturing a video with depth or depth information, of the golf club swing. The method includes determining a three-dimensional trajectory in human model coordinate space of an infrared marker, based on detecting and tracking the infrared marker in the video with depth or depth information and determining a three-dimensional trajectory in IMU coordinate space of the IMU attached to the golf club, from the inertial measurement data of the golf club swing. The method includes estimating a transformation matrix from the human model coordinate space to the IMU coordinate space, and overlaying a projected golf club trajectory onto a three-dimensional human model sequence of the golf club swing, based on spatial alignment of the inertial measurement data of the golf club swing and a three-dimensional human model, using the transformation matrix.
In some embodiments, a tangible, non-transitory, computer-readable media having instructions thereupon which, when executed by a processor, cause the processor to perform a method. The method includes receiving, from an inertial measurement unit (IMU), inertial measurement data of a golf club swing, and receiving, from at least a camera, a video of the golf club swing, having depth or depth information. The method includes determining, in human model coordinate space, a three-dimensional trajectory of an infrared marker, based on detecting and tracking the infrared marker in the video having depth or depth information, and determining, in IMU coordinate space, a three-dimensional trajectory of the IMU, based on the inertial measurement data of the golf club swing. The method includes determining a transformation matrix from the human model coordinate space to the IMU coordinate space, and overlaying a projected golf club trajectory, generated from the inertial measurement data of the golf club swing, onto a three-dimensional human model sequence of the golf club swing, generated from the video with depth or depth information, with the overlaying based on spatial alignment of the inertial measurement data of the golf club swing and a three-dimensional human model, using the transformation matrix.
Other aspects and advantages of the embodiments will become apparent from the following detailed description taken in conjunction with the accompanying drawings which illustrate, by way of example, the principles of the described embodiments.
The described embodiments and the advantages thereof may best be understood by reference to the following description taken in conjunction with the accompanying drawings. These drawings in no way limit any changes in form and detail that may be made to the described embodiments by one skilled in the art without departing from the spirit and scope of the described embodiments.
A golf coaching system for golf swing analysis performs spatial and temporal alignment of an inertial measurement unit (IMU) captured golf swing and a three-dimensional human model, based on a three-dimensional (3D) video of the golf swing, using an apparatus and various methods described herein. In one embodiment, the golf coaching system detects and tracks an infrared reflective marker, and uses this tracking for the spatial alignment. In another embodiment, the golf coaching system detects and tracks skeleton points, and uses this tracking for the spatial alignment. Also, the golf coaching system calculates an arm-golf club angle and an arm-floor angle for the golfer, from a three-dimensional human skeleton model based on the three-dimensional video and the spatial alignment with the IMU captured golf swing. These angles and various videos with overlays can be output by the system, for use in coaching a golfer. The methods can be performed on one or more processors, such as a processor of an IMU, a processor of a computing device and/or a processor of a mobile device (which could also be a computing device).
One device that is suitable for performing portions of various methods and serving as a portion of a suitable apparatus is the M-Tracer™ of the assignee, which is an IMU that can be mounted to a golf club. The M-Tracer™ is equipped with wireless communication, and can send IMU data to another wireless device. Although embodiments are described herein using the M-Tracer™ as an IMU in one embodiment, it should be appreciated that variations and further embodiments are readily devised using other IMU systems, as the embodiments are not limited to the M-Tracer™ product.
The embodiments provide a method to automatically calibrate the IMU system and the 3D human model system is described. After calibration, the IMU system captured trajectory can be overlaid on top of the 3D video. The following is a method to spatially align the IMU system captured golf swing trajectory with a 3D human model based on 3D video captured by one or multiple depth sensors. Variations using other types of 3D video (e.g., stereo video) or 3D video captured by other types of cameras are readily devised, in keeping with the teachings herein. The method automatically estimates the transformation matrix from IMU system coordinate space to 3D human model coordinate space by aligning the IMU system swing trajectory with a detected IR (infrared) marker, which is attached on a hand or golf club. Variations with other types of markers are readily devised. The method has the following steps which will be explained further in more detail:
Although in theory only rigid transformation exists between the two coordinate systems, a perfect alignment cannot always be achieved due to error from the IMU system trajectory as well as marker detection. For a better visual alignment, a non-rigid transformation process can be followed in some embodiments.
Considering the fact that the IMU system trajectory may be inaccurate due to drifting error, a method to correct the IMU system trajectory includes the following steps:
In an action 202, a 3D human model is reconstructed, based on three-dimensional video 214 and camera calibration parameters 216. In an action 204, an IR reflective marker is detected and tracked in the 3D video 214. In an action 206, a 3D marker trajectory is formed, based on the detection and tracking. In an action 208, a transformation from human model space to IMU system space is estimated, based on the IMU system trajectory 220 and a time bias 218 between the IMU system trajectory 220 and the 3D video 214. In an action 210, the IMU system trajectory 220 is overlaid onto a 3-D human model sequence, from the reconstructed 3-D human model in the action 202 and based on the transformation developed in the action 208. The output of these actions is a 3D video sequence with IMU system trajectory overlaid 212. The above actions can be performed by a computing device, more specifically by a processor, and can be performed by various modules which could be implemented in software executing on a processor, hardware, firmware, or combinations thereof.
The input to the system is one or multiple RGBD videos, camera calibration parameters, temporal synchronization information represented as time bias and IMU system trajectory. The system first reconstructs the human model from RGBD videos and detects the IR reflective marker for each video frame. 3D marker trajectory is then calculated by projecting 2D (two-dimensional) marker location into 3D space with known camera parameters. With the known temporal information represented as the time bias between the first video frame and the first IMU system frame, correspondences of the marker location in both coordinate spaces are then built. With such point correspondences, transformation between these two coordinate systems can thus be estimated. An algorithm is described below, in which an infrared (IR) reflective marker is used for detecting the location of the IMU on a golf club, in a video with depth or depth information, i.e., 3D video.
Reconstruction of a human model is described next. To capture RGBD video, a depth sensor is used in various embodiments. Due to the development of depth sensing technology, depth sensors are becoming more accessible and affordable for different ranges of users. Examples of RGBD sensors are given above, and use of further types of depth sensors to capture video with depth or depth information is readily devised.
Given one or multiple RGBD signals, the reconstruction of a 3D object model generally includes the following steps:
Detection of an IR reflective marker is described below.
Estimating the three-dimensional location of the marker is described below. If only one depth sensor is used, the 3D location of the marker, denoted as P=[X, Y, Z]T can be directly obtained from its 2D location, denoted as q=[x, y]T with the known camera intrinsics, i.e.,
where Cx, Cy, fx, fy are intrinsic parameters and depth(x, y) denote the depth reading at location of (x, y)
If multiple depth sensors are used to capture the swing simultaneously, the 3D location in human model coordinate space, denoted as Pt, is found by minimizing the re-projection error for all N sensors:
argminP
where qi,t is the detected 2D marker location point at frame t for ith sensor, Ki and Ti the intrinsic and extrinsic matrices of ith sensor and π is the projection operator transforming a 3D point from model coordinate space to ith sensor image space. Various optimization procedures can be used to solve the above minimization problem to get the optimal 3D marker location Pt. One implementation of the optimization is detailed below with reference to
Estimating the transformation is described next. Assuming the temporal synchronization is completed, e.g., by aligning an IMU system sampling frame to a video frame and/or optimizing such alignment, marker position correspondences can then be built. Let Pt, t=1, . . . , N denote the 3D coordinates of marker at time t represented in human model coordinate space, and let Mt denote the corresponding marker positions represented in M-Tracer™ coordinate space. Thus the goal is to estimate a rigid transformation matrix that includes a rotation R, a translation T and a scaling factor s such that Pt=sRMt+T
The closed-from solution of absolute orientation using unit quaternions is used to find s, R and T.
Although in theory only rigid transformation exists between the two coordinate systems, a perfect alignment is difficult to achieve due to error from M-Tracer™ trajectory as well as marker detection. For a better visual alignment, a non-rigid transformation process can be followed. For instance, Gaussian Mixture Models is a robust method that can handle noise and outliers well. The non-rigid transformation provides the final 3D location of the golf club head and grip in the human model coordinate space.
Correction of the IMU captured golf club trajectory is described next. The M-Tracer™ trajectory is not always accurate. As with any IMU sensor tracking algorithm based on integrating acceleration and rotational velocity, the IMU system trajectory suffers from drifting error as any small error will be accumulated through the integration process. It usually requires another piece of signal information to correct it. Thus, some embodiments of the system correct M-Tracer™ drifting error by using information from the marker trajectory. Let {tilde over (M)}t denote the unknown true value of M-Tracer™ trajectory that can be obtained from the observed trajectory Mt according to a pre-defined error model {tilde over (M)}t=F(Mt, εt), where εt denotes the error vector and F(.) defines the error model. For one embodiment, pseudo code for the method is illustrated as below.
Input: Pt, t=1, . . . , N denote the marker 3D positions at time t represented in human model coordinate space,
Mt i=1, . . . , N denote corresponding marker position at time t represented in M-Tracer™ coordinate space,
output corrected trajectory {tilde over (M)}t, and transformation matrix Tr
otherwise
n=n+1;
go to (2)
The definition of the error model depends on the sensor properties. In this disclosure, this is not limited to any specific error model. The above method can also be used to correct club head trajectory if the golf club head can be detected and tracked in the RGBD video sequence.
For IMU two-dimensional location detection, an algorithm to detect the white ball marker works as follows:
For each sensor
I
j
=I
j
−BG
else
The next subsections explain the details of the algorithm to detect the location of the IMU system (e.g., a white IR reflective ball or other marker attached to the golf club) in NIR images of the 3D video.
Background Subtraction is performed. In order to find the location of the marker (e.g., white ball around IMU system) in a frame, first a background model is constructed. The background model is the average of the frames:
where s and e represent the first and last frames to average. The best range [s e] is the range covering fast moving frames (e.g., golf club top position to impact). For the frame Ii, the background is subtracted and used for the next step processing.
I
i
=I
i
−BG.
Circle detection and confidence determination are performed. First, edges in the given image are detected by finding pixels with high gradient magnitude. Then, the circle Hough transform is applied to find the center and radius of the ball marker candidates (i.e., [xballi, yballi, rballi]). The detection confidence of a circle ci in image I is computed as:
c
i
=I*k
i,
where ki is the kernel defined for circle i, [xballi, yballi, rballi], i.e.
The ball marker [xball, yball] is thus detected as the circle candidate that has the maximum confidence value.
Detection refinement is performed. To refine the results, the system reviews the confidence values of the detected marker position of all frames and re-estimates those frames with low confidence value (below the given Threshold) by interpolating the results from one or more neighboring frames that has high confidence value, as illustrated in
IMU system 3D location estimation can be performed by the system as follows. In order to find the 3D location of the IMU system from 2D positions detected in the previous step, minimize re-projection error for all N sensors:
where qi,t is the detected 2D point at time t for sensor i, Ki and Ti are the intrinsic and extrinsic matrices of sensor i, and π is the projection operator and Pt is the 3D coordinate of the marker in model coordinate space that is estimated from all sensors. The algorithm is explained below:
Step 1: Find initial Pt,
Step 2: Find final Pt which minimizes re-projection error
A method performed by the golf coaching system to automatically calibrate the IMU system and the 3D human model system is described below. After calibration, the IMU system capture trajectory can be overlaid on top of the 3D video. The apparatus and method spatially aligns the IMU system captured golf swing trajectory with the 3D human model based on 3D video captured by one or multiple depth sensors. The method can automatically estimate the transformation matrix from IMU system coordinate space to 3D human model coordinate space aligning the detected human skeleton points with the swing trajectory. One embodiment has the following steps:
Although in theory only rigid transformation exists between the two coordinate systems, a perfect alignment cannot always be achieved due to error from the IMU system trajectory as well as skeleton detection. For a better visual alignment, a non-rigid transformation process can be followed.
Considering that the IMU system trajectory may be inaccurate due to drifting error, the system can perform a process to correct the IMU system trajectory that includes the following steps:
One goal of present embodiments is to align IMU system trajectory with a 3D human model by estimating the transformation from IMU system coordinate system to 3D human model coordinate system. One goal of a further embodiment, described below, is to determine and output an angle between an arm of a golfer and the golf club, and also determine and output an angle between the arm of the golfer and the floor. Knowledge of these angles is useful in coaching the golfer for improvement in golf swing.
The inputs to the system are one or multiple RGBD videos, camera calibration parameters, IMU system trajectory and temporal synchronization information represented as time bias between the first video frame and the first frame of IMU system signal. The system reconstructs the human model from RGBD videos and detects the skeleton points for each video frame. Hand trajectory is then extracted by averaging either left and right hand or left and right wrist skeleton points. With the pre-known temporal information, the hand and IMU system grip position trajectory correspondences are then built. With such point correspondences, transformation between these two coordinate systems can thus be estimated.
Similarly to previously described embodiments, the system estimates a transformation. Assuming temporal synchronization is completed, e.g., using temporal synchronization information provided as a time bias, hand position correspondences can then be built. Let Pi, i=1, . . . , N denote the 3D coordinates of the hand at time i represented in human model coordinate space obtained from skeleton tracking, and let Mi denote the corresponding grip point positions represented in M-Tracer™ coordinate space. Thus the goal is to estimate a rigid transformation matrix that includes a rotation R, a translation T and a scaling factor s such that
P
i
=sRM
i
+T
The closed-form solution of absolute orientation using unit quaternions [4] is used to find s, R and T.
Although in theory only rigid transformation exist between two coordinate systems, a perfect alignment is difficult to achieve due to error from the IMU system trajectory as well as skeleton detection. For a better visual alignment, a non-rigid transformation process can be followed. For instance, Gaussian Mixture Models is a robust method that can handle noise and outliers well. The non-rigid transformation provides the final 3D location of the golf club head and grip in the human model coordinate space.
Arm-club angle and arm-floor angle are two important measurements that can help golfers to improve their skills. With the calculated transformation matrix, the IMU system trajectory is projected to human model coordinate space. Let Pie and Piha denote the 3D coordinates of elbow position and hand position in the human model coordinate space obtained from skeleton tracking. Let Mih be 3D coordinate of club head position at time i output by the IMU system, its corresponding coordinates in human model space, denoted as Pih, can then be calculated as
P
i
h
=sRM
i
h
+T
Then, the angle between arm and club, denoted as θac, is defined as the angle between club line [Piha, Pih] and arm line [Piha, Pie]. This angle can be calculated as
where (·) denotes the dot product operator and |·| is the norm operator.
To calculate the arm-floor angle, the floor plane normal is defined, denoted as . In one embodiment, Holocam technology is used by the system to reconstruct the human model. In Holocam space definition, z axis is defined as the normal to the floor plane pointing upward, i.e., =[0,0,1]T. If some other model reconstruction method is used while the floor plane is not explicitly defined, an embodiment of the system could estimate the floor plane using 3D positions of the golf club head, left foot and right foot during address time. Let Padlf and Padrf denote the 3D coordinates of left and right foot at address time in the human model coordinate space obtained from skeleton tracking. Let the Padh be the 3D coordinates of the golf club head at address time in human model coordinate space. The floor plane normal can thus be estimated as
Then, the angle between arm and floor plane, denoted as θaf, can be calculated as
Various embodiments of the golf coaching system correct the IMU system trajectory. The IMU system trajectory is not always accurate. As with any IMU sensor tracking algorithm based on integrating acceleration and rotational velocity, the IMU system trajectory suffers from drifting error, as any small error is accumulated through the integration process. The trajectory may require another piece of signal information to correct it. Thus, some embodiments of the system correct IMU system drifting error by using the information from the skeleton trajectory. Let {tilde over (M)}i denote the unknown true value of IMU system trajectory that can be obtained from the observed trajectory Mi according to a pre-defined error model {tilde over (M)}i=F(Mi, εi), where εi denotes the error vector and F(.) defines the error model. With pseudo code, one embodiment of the method is illustrated as below.
Input: Pi, i=1, . . . , N denote the hand 3D positions at time i represented in human model coordinate space,
Mi i=1, . . . , N denote corresponding grip point positions at time i represented in M-Tracer™ coordinate space,
t=t+1;
go to (2)
The definition of the error model depends on the sensor properties. In this disclosure, embodiments are not limited to any specific error model. The above method can also be used to correct club head trajectory if the golf club head can be detected and tracked in the RGBD video sequence.
The computing device 1008 has a 3D human model module 1016, a marker detection and tracking module 1018, a 3D marker trajectory module 1020, a transformation module 1022, an overlay module 1024, a skeleton point extraction and tracking module 1026, a hand trajectory module 1028, and an arm-club, arm-floor angle and trajectory module 1030. Each of these modules could be implemented in software executing on the processor 1012, hardware, firmware, or combination thereof. These modules implement functions described above with reference to
It should be appreciated that the methods described herein may be performed with a digital processing system, such as a conventional, general-purpose computer system. Special purpose computers, which are designed or programmed to perform only one function may be used in the alternative.
Display 1211 is in communication with CPU 1201, memory 1203, and mass storage device 1207, through bus 1205. Display 1211 is configured to display any visualization tools or reports associated with the system described herein. Input/output device 1209 is coupled to bus 1205 in order to communicate information in command selections to CPU 1201. It should be appreciated that data to and from external devices may be communicated through the input/output device 1209. CPU 1201 can be defined to execute the functionality described herein to enable the functionality described with reference to
Detailed illustrative embodiments are disclosed herein. However, specific functional details disclosed herein are merely representative for purposes of describing embodiments. Embodiments may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.
It should be understood that although the terms first, second, etc. may be used herein to describe various steps or calculations, these steps or calculations should not be limited by these terms. These terms are only used to distinguish one step or calculation from another. For example, a first calculation could be termed a second calculation, and, similarly, a second step could be termed a first step, without departing from the scope of this disclosure. As used herein, the term “and/or” and the “I” symbol includes any and all combinations of one or more of the associated listed items.
As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “includes”, and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Therefore, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
With the above embodiments in mind, it should be understood that the embodiments might employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing. Any of the operations described herein that form part of the embodiments are useful machine operations. The embodiments also relate to a device or an apparatus for performing these operations. The apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
A module, an application, a layer, an agent or other method-operable entity could be implemented as hardware, firmware, or a processor executing software, or combinations thereof. It should be appreciated that, where a software-based embodiment is disclosed herein, the software can be embodied in a physical machine such as a controller. For example, a controller could include a first module and a second module. A controller could be configured to perform various actions, e.g., of a method, an application, a layer or an agent.
The embodiments can also be embodied as computer readable code on a tangible non-transitory computer readable medium. The computer readable medium is any data storage device that can store data, which can be thereafter read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion. Embodiments described herein may be practiced with various computer system configurations including hand-held devices, tablets, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers and the like. The embodiments can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a wire-based or wireless network.
Although the method operations were described in a specific order, it should be understood that other operations may be performed in between described operations, described operations may be adjusted so that they occur at slightly different times or the described operations may be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing.
In various embodiments, one or more portions of the methods and mechanisms described herein may form part of a cloud-computing environment. In such embodiments, resources may be provided over the Internet as services according to one or more various models. Such models may include Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). In IaaS, computer infrastructure is delivered as a service. In such a case, the computing equipment is generally owned and operated by the service provider. In the PaaS model, software tools and underlying equipment used by developers to develop software solutions may be provided as a service and hosted by the service provider. SaaS typically includes a service provider licensing software as a service on demand. The service provider may host the software, or may deploy the software to a customer for a given period of time. Numerous combinations of the above models are possible and are contemplated.
Various units, circuits, or other components may be described or claimed as “configured to” perform a task or tasks. In such contexts, the phrase “configured to” is used to connote structure by indicating that the units/circuits/components include structure (e.g., circuitry) that performs the task or tasks during operation. As such, the unit/circuit/component can be said to be configured to perform the task even when the specified unit/circuit/component is not currently operational (e.g., is not on). The units/circuits/components used with the “configured to” language include hardware—for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/circuit/component is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. 112, sixth paragraph, for that unit/circuit/component. Additionally, “configured to” can include generic structure (e.g., generic circuitry) that is manipulated by software and/or firmware (e.g., an FPGA or a general-purpose processor executing software) to operate in manner that is capable of performing the task(s) at issue. “Configured to” may also include adapting a manufacturing process (e.g., a semiconductor fabrication facility) to fabricate devices (e.g., integrated circuits) that are adapted to implement or perform one or more tasks.
The foregoing description, for the purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the embodiments and its practical applications, to thereby enable others skilled in the art to best utilize the embodiments and various modifications as may be suited to the particular use contemplated. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.