The device and method disclosed in this document relate to augmented reality and, more particularly, to a sports training system with embodied and detached augmented reality visualization.
Unless otherwise indicated herein, the materials described in this section are not admitted to be the prior art by inclusion in this section.
The realm of table tennis techniques encompasses a wide array of elements, spanning from Technical Skills (grip, stroke, spin, and speed), tactical understanding (game analysis, serve and receive strategies, gameplay tactics, strategic shot placement), and physical and mental aspects (footwork, adaptability, quick decision making, mental toughness, physical conditioning). Stroke training can be considered the most fundamental for all players since it forms the basis for executing various shots effectively. Traditional table tennis stroke training involves coaches who demonstrate specific techniques, and players mimic the strokes and receive feedback and corrections from the coach. However, conventional coaching methods may face challenges in addressing trainees' self-awareness and perception, the ability to see and translate their own performance, skill acquisition through observation, coach availability, providing real-time feedback, addressing individual learning styles and pace, and accommodating diverse schedules, resources, and skill levels among trainees. These limitations underscore the necessity for new training methods that address these challenges and provide more personalized and accessible learning opportunities for players.
In the realm of coach-less training, prior research has delved into video-based approaches because of their convenience. Leveraging pose estimation algorithms to glean 2D or 3D body poses from videos, researchers have efficiently assessed novice athletes' performance compared with experts' and generated corresponding feedback. Although video-based analysis is a valuable supplement to training, it has limitations including the absence of real-time correction and the potential misinterpretation of techniques, thus hindering effective skill development.
Virtual reality (VR) and augmented reality (AR) technologies provide immersive and in-situ visualization that video-based methods cannot offer. Researchers have increasingly turned their focus towards harnessing these innovations for the creation of sports training systems across diverse disciplines. Noteworthy examples span a spectrum of sports and physical practices. Many elements, such as grip, stance, racket movement, contact point, and follow-through form key elements of stroke, posture, and movement. Proper posture provides a stable and balanced foundation for a stroke, and good footwork and movement enable a player to position themselves for each stroke. When it comes to posture or movement training, previous works demonstrate the target content in a similar fashion as the video-based methods by putting the visual cues “detached” from the user in a third-person view. Although it is useful to observe in a third-person view, mimicking skills through observing the detached cues might not be straightforward due to the lack of perspective as well as the pose, orientation, and cognitive transfer burden of relative elements of the entire body, compared to embodied visualization such as virtual cues overlaid on the physical body that can be provided by new AR-VR systems.
These groundbreaking initiatives within the realms of AR and VR underscore the growing recognition of their potential to offer intuitive feedback and elevate training experiences. In the context of table tennis training, extensive research has been conducted in the VR and AR domains. These investigations have delved into the efficacy of enhancing table tennis performance across various domains. Examples include VR simulations of gameplay, the provision of multi-modal cues within VR environments, and the visualization of ball trajectories on AR tables, among others. However, the aforementioned applications in the realm of table tennis training have yet to prioritize stroke training and provide visualization of paddle and body movements.
A method for providing visual guidance for sports training is described herein. The method includes storing, in a memory, previously recorded motions of a first person holding first hand-held sports equipment. The previously recorded motions include motions of a body of the first person and motions of the first hand-held sports equipment. The method further includes capturing, with at least one motion sensor, real-time motions of a second person holding second hand-held sports equipment. The second hand-held sports equipment is of a same type as the first hand-held sports equipment. The real-time motions include motions of a body of the second person and motions of the second hand-held sports equipment. The method further includes determining, with a processor, differences between the real-time motions of the second person and the previously recorded motions of the first person. The method further includes displaying, on a display of an augmented reality device, a visualization of at least one of (i) the previously recorded motions of the first person and (ii) the real-time motions of the second person. The visualization includes visual indications of the determined differences between the real-time motions of the second person and the previously recorded motions of the first person.
The foregoing aspects and other features of the system and method are explained in the following description, taken in connection with the accompanying drawings.
For the purposes of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiments illustrated in the drawings and described in the following written specification. It is understood that no limitation to the scope of the disclosure is thereby intended. It is further understood that the present disclosure includes any alterations and modifications to the illustrated embodiments and includes further applications of the principles of the disclosure as would normally occur to one skilled in the art to which this disclosure pertains.
With reference to
With reference to
With reference to
As the novice athlete 120 practices the selected motion, the AR sports training system 200 advantageously provides both embodied and detached visual guidance for the novice athlete 120 and, in particular, provides feedback that helps the novice athlete 120 to quickly and intuitively understand the differences between their real-time motions and the previously recorded motions of the expert athlete 100. In the context of table tennis, the AR sports training system 200 could be used in common table tennis training sessions such as shadow practice and multi-ball practice.
The AR graphical user interface 130 incorporates an expert avatar 140 holding virtual sports equipment that is animated to represent the previously recorded motions of the expert athlete 100. Similarly, the AR graphical user interface 130 incorporates a novice avatar 150 (also referred to as the user avatar) holding virtual sports equipment that is continuously animated to represent the real-time motions of the novice athlete 120. The expert avatar 140 and the novice avatar 150 are superimposed upon the environment of the novice athlete 120 within the AR graphical user interface 130, such that they appear in front of the novice athlete 120.
The AR sports training system 200 advantageously analyzes the real-time motions of the novice athlete 120 and compares them with the previously recorded motions of the expert athlete 100. Based on this analysis, the AR sports training system 200 advantageously renders the novice avatar 150 to incorporate visual indications of the differences between the real-time motions of the novice athlete 120 and the previously recorded motions of the expert athlete 100. In the illustrated example, specific portions of the novice avatar 150 (e.g., specific joints) are highlighted in a different color (e.g., pink) to indicate that the motions of corresponding portions of the body of the novice athlete 120 are incorrect (i.e., they differ from the motions that were recorded by the expert athlete 100).
With reference to
Table tennis stroke training is a critical aspect of player development. Traditional coaching methods face limitations in trainee self-awareness and perception, skill acquisition by third-person observation, and coach availability. Video-based methods, using pose estimation, have efficiently compared user performance with experts, but face real-time correction challenges and potential misinterpretations. Meanwhile, previous extended reality approaches also fail to provide a comprehensive visualization to help the player learn the movement of the body and paddle.
The AR sports training system 200 bridges these gaps by employing a combination of pose estimation algorithms and IMU sensors to capture and reconstruct the 3D body pose and paddle orientation of novice athletes during practice, allowing real-time comparison with expert strokes. This visual-sensor setup, combined with a motion analysis algorithm, effectively captures the user's body and paddle movements in 3D, enabling the reconstruction of their virtual avatar while providing feedback through detached and embodied visual cues for comprehensive pose and paddle analysis.
The AR sports training system 200 advantageously promotes immersive, embodied learning experiences by providing an AR graphical user interface that includes both detached and embodied visual cues, enabling novice athletes to visualize and correct their strokes effectively. The AR graphical user interface 130 is implemented to enable novice athletes to interact with visual cues in a personalized way, enhancing the overall training experience.
In the illustrated exemplary embodiment, the AR sports training system 200 includes a processing system 210 and the head-mounted AR device 230 (e.g., Microsoft's HoloLens, Oculus Rift, or Oculus Quest). However, it should be appreciated that, in some embodiments, a tablet computer or mobile phone can be used in place of the head-mounted AR device 230. Thus, similar AR graphical user interfaces and features would be provided on the tablet computer or mobile phone. In some embodiments, the processing system 210 may comprise a discrete computer that is configured to communicate with the head-mounted AR device 230 via one or more wired or wireless connections. However, in alternative embodiments, the processing system 210 is integrated with the head-mounted AR device 230. Additionally, the AR sports training system 200 includes external sensors including an external camera 250, an IMU 252 (inertial measurement unit), as well as any other external sensors used by the head-mounted AR device 230 (e.g., Oculus IR-LED Sensors).
In the illustrated exemplary embodiment, the processing system 210 comprises a processor 212 and a memory 214. The memory 214 is configured to store data and program instructions that, when executed by the processor 212, enable the AR sports training system 200 to perform various operations described herein. The memory 214 may be of any type of device capable of storing information accessible by the processor 212, such as a memory card, ROM, RAM, hard drives, discs, flash memory, or any of various other computer-readable medium serving as data storage devices, as will be recognized by those of ordinary skill in the art. Additionally, it will be recognized by those of ordinary skill in the art that a “processor” includes any hardware system, hardware mechanism, or hardware component that processes data, signals, or other information. The processor 212 may include a system with a central processing unit, graphics processing units, multiple processing units, dedicated circuitry for achieving functionality, programmable logic, or other processing systems.
The processing system 210 further comprises one or more transceivers, modems, or other communication devices configured to enable communications with various other devices, at least including the head-mounted AR device 230, the external camera 250, and the IMU 252. Particularly, in the illustrated embodiment, the processing system 210 comprises a Wi-Fi module 216. The Wi-Fi module 216 is configured to enable communication with a Wi-Fi network and/or Wi-Fi router (not shown) and includes at least one transceiver with a corresponding antenna, as well as any processors, memories, oscillators, or other hardware conventionally included in a Wi-Fi module. It will be appreciated, however, that other communication technologies, such as Bluetooth, Z-Wave, Zigbee, or any other radio frequency-based communication technology or wired communication technology can be used to enable data communications between devices in the AR sports training system 200.
The head-mounted AR device 230 is in the form of an AR or virtual reality (VR) headset, generally comprising a display screen 232 and a camera 234 (e.g., ZED Dual 4MP Camera (720p)). The camera 234 may be an integrated or attached camera and is configured to capture a plurality of images of the environment as the head-mounted AR device 230 is moved through the environment by the athlete. The camera 234 is configured to generate image frames of the environment, each of which comprises a two-dimensional array of pixels. Each pixel has corresponding photometric information (intensity, color, and/or brightness). In some embodiments, the camera 234 is configured to generate RGB-D images in which each pixel has corresponding photometric information and geometric information (depth and/or distance). In such embodiments, the camera 234 may, for example, take the form of two RGB cameras configured to capture stereoscopic images, from which depth and/or distance information can be derived, or an RGB camera with an associated IR camera configured to provide depth and/or distance information.
The display screen 232 may comprise any of various known types of displays, such as LCD or OLED screens. In at least one embodiment, the display screen 232 is a transparent screen, through which a user can view the outside world, on which certain graphical elements are superimposed onto the user's view of the outside world. In the case of a non-transparent display screen 232, the graphical elements may be superimposed on real-time images/video captured by the camera 234.
In some embodiments, the head-mounted AR device 230 may further comprise a variety of sensors 236. In some embodiments, the sensors 236 include sensors configured to measure one or more accelerations and/or rotational rates of the head-mounted AR device 230. In one embodiment, the sensors 236 comprise one or more accelerometers configured to measure linear accelerations of the head-mounted AR device 230 along one or more axes (e.g., roll, pitch, and yaw axes) and/or one or more gyroscopes configured to measure rotational rates of the head-mounted AR device 230 along one or more axes (e.g., roll, pitch, and yaw axes). In some embodiments, the sensors 236 may include inside-out motion tracking sensors configured to track the human body motion of the user within the environment, in particular positions and movements of the head and hands of the user.
The head-mounted AR device 230 may also include a battery or other power source (not shown) configured to power the various components within the head-mounted AR device 230, which may include the processing system 210, as mentioned above. In one embodiment, the battery of the head-mounted AR device 230 is a rechargeable battery configured to be charged when the head-mounted AR device 230 is connected to a battery charger configured for use with the head-mounted AR device 230.
The program instructions stored on the memory 214 include an AR sports training program 218. As discussed in further detail below, the processor 212 is configured to execute the AR sports training program 218 to provide detached and embodied AR visualizations and feedback for sports training. In one embodiment, the program instructions stored on the memory 214 further include an AR graphics engine 220 (e.g., Unity3D engine), which is used to render the intuitive visual interface of the AR sports training program 218. Particularly, the processor 212 is configured to execute the AR graphics engine 220 to superimpose on the display screen 232 graphical elements to provide detached and embodied AR visualizations and feedback for sports training. In the case of a non-transparent display screen 232, the graphical elements may be superimposed on real-time images/video captured by the camera 234.
The external camera 250 (e.g., a webcam) is arranged externally to the head-mounted AR device 230 at a location (e.g., in front of the user) suitable for capturing video of the user as he or she engages in sports training using the AR sports training system 200. The external camera 250 is configured to generate image frames of the environment including the user, each of which comprises a two-dimensional array of pixels. Each pixel has corresponding photometric information (intensity, color, and/or brightness). In some embodiments, the external camera 250 is configured to generate RGB-D images in which each pixel has corresponding photometric information and geometric information (depth and/or distance). In such embodiments, the external camera 250 may, for example, take the form of two RGB cameras configured to capture stereoscopic images, from which depth and/or distance information can be derived, or an RGB camera with an associated IR camera configured to provide depth and/or distance information.
The IMU 252 is attached to the hand-held sports equipment 110 that is held by the athlete and is configured to measure inertial data as he or she engages in sports training using the AR sports training system 200. In some embodiments, the IMU 252 is configured to measure inertial data including one or more accelerations of the hand-held sports equipment 110. Particularly, the IMU 252 may be configured to measure linear accelerations of the head-mounted AR device 230 along one or more axes (e.g., roll, pitch, and yaw axes). In some embodiments, the IMU 252 is configured to measure inertial data including one or more rotational rates (or angular velocities) of the hand-held sports equipment 110. Particularly, the IMU 252 may be configured to measure rotational rates or angular velocities of the hand-held sports equipment 110 along one or more axes (e.g., roll, pitch, and yaw axes). In some embodiments, the IMU 252 is further configured to measure inertial data including one or more magnetic field strengths. Particularly, the IMU 252 may be configured to measure magnetic field strengths along one or more axes (e.g., roll, pitch, and yaw axes). Finally, in some embodiments, the IMU 252 is configured to measure inertial data including directly providing orientation data in the form of a time series of quaternions.
A variety of methods, workflows, and processes are described below for enabling the operations and interactions of the AR sports training system 200. In these descriptions, statements that a method, workflow, processor, and/or system is performing some task or function refers to a controller or processor (e.g., the processor 212) executing programmed instructions (e.g., the AR sports training program 218, the AR graphics engine 220) stored in non-transitory computer-readable storage media (e.g., the memory 214) operatively connected to the controller or processor to manipulate data or to operate one or more components in the AR sports training system 200 to perform the task or function. Additionally, the steps of the methods may be performed in any feasible chronological order, regardless of the order shown in the figures or the order in which the steps are described.
Additionally, various AR graphical user interfaces are described for operating the AR sports training system 200. In many cases, the AR graphical user interfaces include graphical elements that are superimposed onto the user's view of the outside world or, in the case of a non-transparent display screen 232, superimposed on real-time images/video captured by the camera 234. To provide these AR graphical user interfaces, the processor 212 executes instructions of the AR graphics engine 220 to render these graphical elements and operates the display 28 to superimpose the graphical elements onto the user's view of the outside world or onto the real-time images/video of the outside world. In many cases, the graphical elements are rendered at a position that depends upon positional or orientation information received from any suitable combination of the sensors 236, the camera 234, the external camera 250, or the IMU 252 to simulate the presence of the graphical elements in a real-world environment. However, it will be appreciated by those of ordinary skill in the art that, in some cases, an equivalent non-AR graphical user interface can also be used to operate the AR sports training program 218, such as a user interface provided on a further computing device such as laptop computer, tablet computer, desktop computer, or a smartphone.
Moreover, various user interactions with the AR graphical user interfaces and with interactive graphical elements thereof are described. To provide these user interactions, the processor 212 may render interactive graphical elements in the AR graphical user interface, receive user inputs from, for example via gestures performed in view of one of the camera 234, the external camera 250, or other sensor, and execute instructions of the AR sports training program 218 to perform some operation in response to the user inputs.
Finally, various forms of motion tracking are described in which spatial positions and motions of the user or of objects in the environment are tracked. To provide this tracking of spatial positions and motions, the processor 212 executes instructions of the AR sports training program 218 to receive and process sensor data from any suitable combination of the sensors 236, the camera 234, the external camera 250, or the IMU 252, and may optionally utilize visual and/or visual-inertial odometry methods such as simultaneous localization and mapping (SLAM) techniques.
The method 300 begins with storing previously recorded motions of an expert person holding sports equipment, the previously recorded motions including motions of a body of the expert person and motions of the sports equipment (block 310). Particularly, the memory 214 stores, as a part of the AR sports training program 218, previously recorded motions of an expert athlete holding hand-held sports equipment. In some embodiments, the AR sports training program 218 includes a library of recorded motions of the expert athlete performing a variety of different activities associated with a sport in which the hand-held sports equipment is utilized. In the case of table tennis, the AR sports training program 218 might include previously recorded motions of the expert athlete for a variety of different stroke motions that are performed using a table tennis paddle while playing table tennis (e.g., a ‘Drive’ stroke, a ‘Loop’ stroke, or a ‘Push’ stroke).
The previously recorded motions of the expert athlete include both motions of the body of the expert athlete and motions of the hand-held sports equipment that was held by the expert athlete. Particularly, in some embodiments, the previously recorded motions of the expert athlete take the form of a sequence of human pose data and a sequence of object pose data. The sequence of human pose data is a time-series sequence of frames that each define positions and/or orientations of a plurality of joints of the expert athlete at a respective time. It should be appreciated that the collective positions of the plurality of joints define the pose of the expert athlete at each respective time and orientation data of each individual joint needn't necessarily be stored. Similarly, the sequence of object pose data is a time-series sequence of frames that each define a position and orientation of the hand-held sports equipment that was held by the expert athlete. In one embodiment, each frame in the sequence of object pose data includes a quaternion that represents the orientation of the hand-held sports equipment at the respective time. In some embodiments, the position of the hand-held sports equipment at the respective time is set to be fixed relative to or to be equal to a position of a particular joint (e.g., a hand-wrist joint) of the plurality of joints of the expert athlete. Thus, in some embodiments, the position of the hand-held sports equipment needn't necessarily be stored in the sequence of object pose data. In at least some embodiments, previously recorded motions of the expert athlete were recorded using a process similar to the process described below for capturing the real-time motions of a novice athlete holding hand-held sports equipment.
Prior to beginning sports training using the AR sports training program 218, a novice athlete wears the head-mounted AR device 230 on his or her head to access an AR graphical user interface provided by the head-mounted AR device 230. Initially, the AR sports training system 200 calibrates according to the novice athlete's height to adjust for the scale of the embodied and detached visualization guidance. Next, the novice athlete interacts with the AR graphical user interface to select a desired sports-related motion (e.g., a table tennis stroke) to practice.
The method 300 continues with capturing real-time motions of a novice person holding sports equipment, the real-time motions including motions of a body of the novice person and motions of the sports equipment (block 320). Particularly, after the user has selected a motion to be practiced, the processor 212 operates the external camera 250 and the IMU 252 to capture real-time motions of the novice athlete holding hand-held sports equipment. In particular, the processor 212 operates the external camera 250 to capture video of the real-time motions of the novice athlete holding hand-held sports equipment. Likewise, the processor 212 operates the IMU 252 to capture inertial data of the real-time motions of the novice athlete holding hand-held sports equipment. As mentioned above, the inertial data may include accelerations, rotational rates, magnetic field strengths, each along three dimensions, as well as quaternion orientation data. The combined data from the external camera 250 and the IMU 252 give the AR sports training system 200 a complete picture of how the novice athlete and hand-held sports equipment move. It should be appreciated that the hand-held sports equipment held by the novice athlete is of the same type as the hand-held sports equipment that was held by the expert athlete 100 during the recording of the previously recorded motions in the library of the AR sports training system 200.
As will be discussed in greater detail below, the real-time motions of the novice athlete holding hand-held sports equipment that are captured by the AR sports training system 200 are used to provide analysis and visual feedback on the performance of the novice athlete as he or she practices the motions that were previously selected in the AR graphical user interface. Concurrently with the capturing of real-time motions of the novice athlete holding hand-held sports equipment, the AR graphical user interface provides both embodied and detached visual guidance for the novice athlete and, in particular, provides feedback that helps the novice athlete to quickly and intuitively understand the differences between their real-time motions and the previously recorded motions of expert athlete 100.
The real-time motions of the novice athlete include both motions of the body of the novice athlete and motions of the hand-held sports equipment that is held by the novice athlete. Particularly, in some embodiments, the real-time motions of the novice athlete take the form of a sequence of human pose data and a sequence of object pose data. In at least some embodiments, the processor 212 determines the sequence of human pose data based on the video captured by the external camera 250 using a vision-based human pose estimation algorithm or pose tracking algorithm. Particularly, based on the video, the processor 212 determines the positions and/or orientations of the plurality of joints of the novice athlete and outputs frames in the sequence of human pose data in real-time. The sequence of human pose data is a time-series sequence of frames that each define positions and/or orientations of a plurality of joints of the novice athlete at a respective time. It should be appreciated that the collective positions of the plurality of joints define the pose of the novice athlete at each respective time and orientation data of each individual joint needn't necessarily be calculated.
In contrast, in at least some embodiments, the processor 212 determines the sequence of object pose data based on the video captured by the external camera 250 and the inertial data captured by the IMU 252. The sequence of object pose data is a time-series sequence of frames that each define a position and orientation of the hand-held sports equipment that is held by the novice athlete. In some embodiments, the processor 212 calculates the sequence of object pose data as a time series of quaternions. Thus, each frame in the sequence of object pose data includes a quaternion that represents the orientation of the hand-held sports equipment at a respective time. As noted previously, in some embodiments, the quaternions may be provided directly by the IMU 252, but may also be calculated from other inertial data or from the video data.
In some embodiments, the processor 212 calculates the orientation of the hand-held sports equipment for each frame in the sequence of object pose data based on the inertial data captured by the IMU 252, but calculates the position of the hand-held sports equipment for each frame in the sequence of object pose data based on the video data. Particularly, in one embodiment, the processor 212 sets the position of the hand-held sports equipment to be fixed relative to or to be equal to a position of a particular joint (e.g., a hand-wrist joint) of the plurality of joints of the novice athlete. In this way, the position of the hand-held sports equipment is properly aligned with the sequence of human pose data. Thus, in some embodiments, the position of the hand-held sports equipment needn't necessarily be provided in the sequence of object pose data at all.
In some embodiments, which lack the IMU 252, the processor 212 determines the sequence of object pose data based only on the video captured by the external camera 250 using a vision-based object pose estimation algorithm or pose tracking algorithm.
With reference again to
Based on the determined differences, the processor 212 determines whether the real-time motions of the novice athlete are performed incorrectly. In particular, the processor 212 compares the determined differences with one or more thresholds. If a determined difference is greater than a respective threshold, then the processor 212 determines that a corresponding portion of the real-time motions are performed incorrectly. In some embodiments, the processor 212 determines, based on the determined differences, that one or more particular joints of the novice athlete were incorrectly positioned during the real-time motions of the novice athlete during a particular time interval. In some embodiments, the processor 212 determines, based on the determined differences, that the hand-held sports equipment held by the novice athlete was incorrectly positioned or oriented during the real-time motions of the novice athlete during a particular time interval.
In some embodiments, for the purpose of comparing the real-time motions of the body of the novice athlete and the previously recorded motions of the body of the expert athlete, the AR sports training system 200 utilizes joint angle quaternions when analyzing body movements and uses dynamic time warping to correctly compare between the body movements of the novice athlete and the body movements of the expert athlete.
Next, the processor 212 calculates a sequence of joint angle quaternions vectors, denoted Q1 ∈RN×J×4, for the novice athlete based on the sequence P1 (line 4 of Algorithm 1). Likewise, in one embodiment, the processor 212 calculates a sequence of joint angle quaternions vectors, denoted Q2 ∈RM×J×4, for the expert athlete based on the sequence P2 (line 3 of Algorithm 1). These quaternions vectors represent the rotations required to transform a vertex v1 into a vertex v2. In some embodiments, the processor filters the sequences Q1 and Q2 using a Kalman filter (line 4 of Algorithm 1).
Next, the processor 212 uses this joint angle data to perform a comparative analysis between the real-time motions of the body of the novice athlete and the previously recorded motions of the body of the expert athlete. Particularly, the processor 212 initializes and calculates a DTW distance matrix D, having dimensions N×M×J. The DTW distance matrix D consists of error values that represent differences for each joint, in each frame of the sequence Q1 compared with each frame of the sequence Q2. A lower error value indicates a better performance by the novice athlete. In some embodiments, since the novice athlete may act slightly faster or slower than the expert athlete, for real-time comparison, the processor 212 uses, for example, N=M=10 to reduce the computation cost, otherwise, N and M refer to the length of the sequence.
To calculate the DTW distance matrix D, the processor 212 determines quaternion dissimilarities between each joint in each frame of the sequence Q1 and each joint in each frame of the sequence Q2 (line 16 of Algorithm 1). The processor 212 calculates each quaternion dissimilarity using the quaternions dissimilarity equation:
where q1, q2 represent two quaternions vectors. Next, the processor 212 determines the error values in the DTW distance matrix D by aligning the sequence Q2 with the sequence Q1 based on the quaternion dissimilarities and using dynamic time warping (line 17 of Algorithm 1).
Based on the DTW distance matrix D, the processor 212 determines whether the real-time motions of the novice athlete are performed incorrectly. In particular, the processor 212 determines, based on the determined differences, that one or more particular joints of the novice athlete were incorrectly positioned during the real-time motions of the novice athlete during a particular time interval. To this end, the processor 212 compares the error values of the DTW distance matrix D with a universal threshold ξjoint. If a determined difference is greater than a respective threshold, then the processor 212 determines that a corresponding joint is positioned incorrectly during the real-time motions. In some embodiments, the novice athlete's movement is considered incorrect if:
D[N][M][k]/N<1−ξjoint
Similarly, for the purpose of comparing the real-time motions of the hand-held sports equipment held by the novice athlete and the previously recorded motions of the hand-held sports equipment held by the expert athlete, the AR sports training system 200 also uses a quaternion-dissimilarity based dynamic time warping.
Next, the processor 212 uses the sequences Q1 and Q2 to perform a comparative analysis between the real-time motions of the hand-held sports equipment held by the novice athlete and the previously recorded motions of the hand-held sports equipment held by the expert athlete. Particularly, the processor 212 initializes and calculates a DTW distance matrix D, having dimensions N×M. The DTW distance matrix D consists of error values that represent differences for each frame of the sequence Q1 compared with each frame of the sequence Q2. A lower error value indicates a better performance by the novice athlete. In some embodiments, since the novice athlete may act slightly faster or slower than the expert athlete, for real-time comparison, the processor uses, for example, N=M=10 to reduce the computation cost, otherwise, N and M refer to the length of the sequence.
To calculate the DTW distance matrix D, the processor 212 determines quaternion dissimilarities between each frame of the sequence Q1 and each frame of the sequence Q2 (line 10 of Algorithm 2). Next, the processor 212 determines the error values in the DTW distance matrix D by aligning the sequence Q2 with the sequence Q1 based on the quaternion dissimilarities and using dynamic time warping (line 11 of Algorithm 2).
In some embodiments, the processor 212 determines, based on the determined differences, that the hand-held sports equipment held by the novice athlete was incorrectly positioned or oriented during the real-time motions of the novice athlete during a particular time interval.
Based on the DTW distance matrix D, the processor 212 determines whether the real-time motions of the novice athlete are performed incorrectly. In particular, the processor 212 determines, based on the determined differences, that the hand-held sports equipment held by the novice athlete was incorrectly positioned or oriented during a particular time interval. To this end, the processor 212 compares the error values of the DTW distance matrix D with a universal threshold ξpaddle. If a determined difference is greater than a respective threshold, then the processor 212 determines that the hand-held sports equipment held by the novice athlete was incorrectly positioned or oriented during the real-time motions during a particular time interval. In some embodiments, the novice athlete's movement is considered incorrect if:
D[N][M]/N<1−ξpaddle
With reference again to
With reference again to
In some embodiments, the AR graphical user interface of the AR sports training system 200 employs two distinct types of visualizations to enhance the learning experience and provide valuable guidance during sports training. First, the AR graphical user interface incorporates detached visualizations that are superimposed on the environment around the novice athlete. Second, the AR graphical user interface incorporates embodied visualizations that are superimposed on the body of the novice athlete. In at least some embodiments, the visualizations include skeleton-like or otherwise human-like virtual avatars. In some embodiments, the virtual avatars are animated based on the real-time motions of the novice athlete or the previously recorded motions of the expert athlete.
As discussed above, during sports training, the AR sports training system 200 compares the real-time motions of the novice athlete with the previously recorded ‘ideal’ or ‘correct’ motions of the expert athlete. Based on determined differences, the skeleton-like virtual avatars incorporate visual indications, such as highlighting, that convey to the novice athlete any deviations (differences) between their real-time motions and the previously recorded ‘correct’ motions of the expert athlete.
By incorporating both detached and embodied visual cues, the AR sports training system 200 provides a comprehensive and immersive training environment. Novice athletes can learn from expert athletes, compare their movements, and receive targeted guidance, all of which contribute to an effective and engaging learning experience. These visual cues are integral to helping novice athletes master complex movements and improve their skills in real-time.
Referring back to
Additionally, as the novice athlete performs motions in real-time, i.e., to practice the sports-related motion, the AR sports training system 200 captures and analyzes the real-time motions of the novice athlete. The processor 212 operates the display screen 232 to also display a visualization of the real-time motions of the novice athlete. In some embodiments, the visualization takes the form of another virtual human avatar, referred to herein as the novice avatar or user avatar, that holds virtual sports equipment and is animated according to the real-time motions of the novice athlete. The novice avatar is superimposed on an environment in front of the novice athlete (e.g., next to the expert avatar).
In at least some embodiments, the novice avatar is rendered to include visual indications of the determined differences between the real-time motions of the novice athlete and the previously recorded motions of the expert athlete. In particular, in one embodiment, based on the DTW distance matrix D representing differences in the sequences of human pose data P1 and P2, the processor 212 identifies particular joints or particular portions of the body that are incorrectly positioned during the real-time motions of the novice athlete during a particular time interval, e.g., by comparing the values in the DTW distance matrix D with the universal threshold ξjoint. In response to determining that particular joints or particular portions of the body are incorrectly positioned during the real-time motions, the processor 212 renders the novice avatar to include a visual indication of particular joints or particular portions of the body that are incorrectly positioned. With reference again to
Similarly, in one embodiment, based on the DTW distance matrix D representing differences in the sequences of object pose data P1 and P2, the processor 212 identifies whether the hand-held sports equipment held by the novice athlete is incorrectly positioned or oriented during the real-time motions of the novice athlete during a particular time interval, e.g., by comparing the values in the DTW distance matrix D with the universal threshold ξpaddle. In response to determining that the hand-held sports equipment held by the novice athlete is incorrectly positioned or oriented during the real-time motions, the processor 212 renders the virtual sports equipment held by the novice avatar to include a visual indication that the hand-held sports equipment held by the novice athlete is incorrectly positioned or oriented. In one example (not shown), the virtual sports equipment (e.g., the virtual table tennis paddle 824) is highlighted in a different color (e.g., pink) than the rest of the novice avatar 820 (or otherwise highlighted in some way) to indicate that the hand-held sports equipment is or was incorrectly positioned or oriented by the novice athlete.
This immediate feedback empowers novice athletes to make real-time adjustments and improvements to their performance. By providing both the expert avatar and the novice avatar, the AR sports training system 200 enables novice athletes to visually compare their movements with those of experts, fostering a deeper understanding of the desired actions and encouraging continuous improvement.
In addition to the expert avatar and the novice avatar, which are superimposed on the environment in a detached manner, the AR sports training system 200 also provides embodied visualizations that are superimposed on the body of the novice athlete. Particularly, the processor 212 operates the display screen 232 to display an embodied visualization of the previously recorded motions of the expert athlete. Embodied visualizations take the form of guidance provided directly within the user's field of view and superimposed upon their body, offering real-time feedback and assistance when needed. These visual cues are particularly valuable for refining specific aspects of movement. During training, the AR sports training system 200 overlays the correct movement trajectories onto the novice athlete's body.
In some embodiments, the embodied visualizations are dynamic and context-sensitive, appearing only when novice athletes make incorrect movements. Particularly, in response to determining that particular joints or particular portions of the body are incorrectly positioned during the real-time motions, the processor 212 operates the display screen 232 to display an embodied visualization of the previously recorded motions of the expert athlete. Likewise, in response to determining that the hand-held sports equipment held by the novice athlete is incorrectly positioned or oriented during the real-time motions, the processor 212 operates the display screen 232 to display an embodied visualization of the previously recorded motions of the expert athlete.
In some embodiments, the visualization takes the form of a virtual avatar, referred to herein as the embodied avatar, that holds virtual sports equipment and is animated according to the previously recorded motions of the expert athlete. Unlike the expert avatar, the embodied avatar is superimposed upon the body of the novice athlete.
In some embodiments, the embodied avatar is rendered to include visual indications of the determined differences between the real-time motions of the novice athlete and the previously recorded motions of the expert athlete, in a similar manner that was discussed with respect to the novice avatar. In response to determining that particular joints or particular portions of the body are incorrectly positioned during the real-time motions, the processor 212 renders the embodied avatar to include a visual indication of particular joints or particular portions of the body that are incorrectly positioned. Likewise, in response to determining that the hand-held sports equipment held by the novice athlete is incorrectly positioned or oriented during the real-time motions, the processor 212 renders the virtual sports equipment held by the embodied avatar to include a visual indication that the hand-held sports equipment held by the novice athlete is incorrectly positioned or oriented.
Additionally, in some embodiments, the embodied avatar is rendered to include visual indications of correct motion trajectories. Particularly, in one embodiment, the processor 212 renders the embodied virtual arm of the embodied avatar to include visual indications of a motion trajectory of an arm of the expert athlete in the previously recorded motions of the expert athlete. Similarly, in one embodiment, the processor 212 renders the virtual sports equipment of the embodied avatar to include visual indications of a motion trajectory of the hand-held sports equipment held by the expert athlete in the previously recorded motions of the expert athlete.
In at least some embodiments, during training, novice athletes have the flexibility to manipulate the position, viewpoint, and scale of both the novice avatar and the expert avatar. Furthermore, novice athletes can pause, resume, and adjust the playback speed of the expert avatar and the embodied avatar, allowing for a detailed slow-motion examination of the stroke. All of these interactions are seamlessly facilitated through our user-friendly interface. For both the expert avatar and the novice avatar, novice athletes have the capability to adjust the avatar's scale through gestures. Additionally, novice athletes can customize the feedback provided by the AR sports training system 200 by enabling or disabling body and paddle feedback.
With reference again to
Embodiments within the scope of the disclosure may also include non-transitory computer-readable storage media or machine-readable medium for carrying or having computer-executable instructions (also referred to as program instructions) or data structures stored thereon. Such non-transitory computer-readable storage media or machine-readable medium may be any available media that can be accessed by a general-purpose or special-purpose computer. By way of example, and not limitation, such non-transitory computer-readable storage media or machine-readable medium can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures. Combinations of the above should also be included within the scope of the non-transitory computer-readable storage media or machine-readable medium.
Computer-executable instructions include, for example, instructions and data that cause a general-purpose computer, special-purpose computer, or special-purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
While the disclosure has been illustrated and described in detail in the drawings and foregoing description, the same should be considered illustrative and not restrictive in character. It is understood that only the preferred embodiments have been presented and that all changes, modifications, and further applications that come within the spirit of the disclosure are desired to be protected.
This application claims the benefit of priority of U.S. provisional application Ser. No. 63/621,800, filed on Jan. 17, 2024 the disclosure of which is herein incorporated by reference in its entirety.
This invention was made with government support under contract number DUE1839971 awarded by the National Science Foundation. The government has certain rights in the invention.
| Number | Date | Country | |
|---|---|---|---|
| 63621800 | Jan 2024 | US |