The present disclosure relates to motion posture analysis and machine learning, in particular to an auxiliary method for motion guidance.
Golf is an outdoor sport that can maintain social distance. During the epidemic, consumers suddenly have a lot of free time and money. As a result, more and more people join the sport of golf, including beginners trying it for the first time.
However, golf is a sport that requires high precision. Whether beginners build basic knowledge or advanced players adjust their swing postures, they all rely on coach guidance. While novice players have plenty of time and disposable income to take lessons and practice, the biggest problem is finding the right and available coach to teach them. Limited by geographical factors, it is difficult for a single coach to travel to different regions to guide local students in a short period of time. In addition, the coach often needs to spend a lot of time correcting the student's body postures during the class, and thus delaying the teaching progress. Although there are some applications (apps) installed on smartphones that allow students to self-detect their swing posture, this method is still not as good as the coach's personal guidance.
In view of the above, the present disclosure proposes an auxiliary method for motion guidance, which makes the coach can handle the guidance requests from a plurality of students in different locations at the same time, and can improve the coach's guidance efficiency.
According to an embodiment of the present disclosure, an auxiliary method for motion guidance includes the following steps: obtaining a video recording a human body performing a motion; analyzing, by a skeleton analysis module, the video to generate a plurality of skeleton coordinate sequences corresponding to the human body; specifying, by a key frame analysis model, a key frame at least according to the video and the plurality of skeleton coordinate sequences; generating, by a pose correction model, an auxiliary image according to the key frame, a key skeleton coordinate corresponding to the key frame, and a skeleton coordinate template; and sending, by the server, the video and a recommended guidance to an electronic device, wherein the recommended guidance comprises a composite result of the key frame and auxiliary image.
According to an embodiment of the present disclosure, an auxiliary system for motion guidance includes a server and an electronic device. The server includes a communication circuit, a computing circuit, and a storage circuit. The communication circuit is configured to receive a video, send the video and send a recommended guidance, where the video records a human body performing a motion. The computing circuit is electrically connected to the communication circuit. The computing circuit is configured to perform a skeleton analysis module to analyze the video to generate a plurality of skeleton coordinate sequences corresponding to the human body; perform a key frame analysis model to specify a key frame at least according to the video and the plurality of skeleton coordinate sequences; and perform a pose correction model to generate an auxiliary image according to the key frame, a key skeleton coordinate corresponding to the key frame, and a skeleton coordinate template. The storage circuit is electrically connected to the computing circuit. The storage circuit is configured to store the skeleton analysis module, the plurality of skeleton coordinate sequences, the key frame analysis model, an index of the key frame, the pose correction model, and the auxiliary image. The electronic device is communicably connected to the server. The electronic device is configured to receive the video and the recommended guidance, and the recommended guidance comprises a composite result of the key frame and the auxiliary image.
The present disclosure will become more fully understood from the detailed description given hereinbelow and the accompanying drawings which are given by way of illustration only and thus are not limitative of the present disclosure and wherein:
In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. According to the description, claims and the drawings disclosed in the specification, one skilled in the art may easily understand the concepts and features of the present invention. The following embodiments further illustrate various aspects of the present invention, but are not meant to limit the scope of the present invention.
The present disclosure proposes an auxiliary system for motion guidance and method thereof. The following paragraphs adopts golf as an example to explain the proposed system and method. However, the sport applicable of the present disclosure is not limited thereto.
The electronic device 1 and electronic device 5 held by the student B and coach C respectively may be a smart phone, a tablet computer or a personal computer with a camera. The present disclosure does not limit the hardware types of electronic devices 1 and 5.
The auxiliary system 100 for motion guidance according to an embodiment of the present disclosure includes a server 3 that may automatically generate the recommended guidance based on video and an electronic device 5 that coach C uses to generate the formal guidance.
The communication circuit 31 is configured to receive the video from the electronic device 1 of the student B, send the video and the recommended guidance to electronic device 5 of coach C, and send the formal guidance to the electronic device 1 of the student B. In an embodiment, the communication circuit 31 is, for example, a network card or a network chip, and the communication circuit 31 may use wired communication or wireless communication. The present disclosure does not limit the hardware type and communication mode of the communication circuit 31.
Please refer to
As shown in
In another embodiment, the module or model further includes a speech meaning analysis model, which generates an auxiliary text according to the key frame, the key skeleton coordinate corresponding to the key frame, and the skeleton coordinate template. In further another embodiment, the module or model further includes a training module, which adjusts hyper-parameters of the key frame analysis model and/or pose correction model according to the formal guidance.
In an embodiment, the computing circuit 32 may be implemented by one or more of the following examples: a central processor unit (CPU), a microcontroller (MCU), an application processor (AP), a field programmable gate array(FPGA), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a system-on-a-chip (SOC), a deep learning accelerator, or any electronic device used to execute the described models and modules.
Please refer to
The electronic device 5 is communicably connected with the server 3 through the communication circuit 52. The communication circuit 52 is configured to receive the video 81 and the recommended guidance, and send the formal guidance. In an embodiment, the recommended guidance includes a composite result of the key frame and the auxiliary image 84. In another embodiment, the recommended guidance further includes the auxiliary text 85 displayed in the key frame.
In step S1, the communication circuit 32 obtains the video 81 recording the human body performing the motion from the electronic device 1. In an embodiment, in addition to receiving the video 81, the communication circuit 32 also receives the stage partition information 82 from the electronic device 1.
In step S2 to step S5, the computing circuit 31 of the server 3 reads from the storage circuit 36 and executes the model or module mentioned in the steps.
In step S2, the skeleton analysis module 71 analyzes the video 81 to generate a plurality of skeleton coordinate sequences corresponding to the human body. In an embodiment, in order to reduce the overhead of the server 3, the skeleton analysis module 71 may be implemented through the functions provided by the skeleton analysis software development kit (SDK) of the mobile device platform, for example, the ARkit on the iOS platform, or the MediaPipe Pose framework released by Google. The video 81 includes a plurality of frames, and each frame has a skeleton coordinate sequence.
In step S3, the key frame analysis model 72 specifies a key frame at least according to the video 81 and the plurality of skeleton coordinate sequences. The key frame analysis model 72 is configured to find out the motion poses in the video 81 that need to be corrected. Based on the amount of collected data, the key frame analysis model 72 adopts the following two implementations. Please refer to
The first implementation: in the early stage of the operation of the auxiliary system 100 for motion guidance, the server 3 has not yet collected a sufficient number of guidance cases (including the video 81, the recommended guidance and the formal guidance). Therefore, the key frame analysis model 72 scores each stage of the motion according to the rule set defined in advance, and finds the stage and time point with the lowest score. The rule set includes a plurality of rules. In an embodiment, each stage of the motion has multiple rules obtained from expert interviews, such as: hands should be straight, hands and shoulders should form an inverted triangle, center of gravity should be on the left of the middle, . . . , etc. The score is determined by the similarity between the student's posture and the rule. The more similar the posture is to the rules, the higher the score. The more the posture deviates from the rules, the lower the score.
As shown in
As shown in
The second implementation: after the auxiliary system 100 for motion guidance has been running for a period of time, the server 3 has collected a sufficient number of guidance cases. Therefore, the server 3 may find out the relationship between the historical skeleton information and the time points according to the historical guidance, and further find out the time point with the highest probability of being selected from the video 81.
As shown in
A piece of the first training data is generated every time the first collection procedure is performed.
In step S351, the computing circuit 32 of the server 3 obtains a reference video and a key frame index 83 corresponding the reference video, where the reference video records a reference human body performing the motion. Specifically, after the auxiliary system 100 for motion guidance has been running for a period of time, the server 3 has collected the videos 81 (reference video) uploaded by different student B (reference human body), as well as the key frame specifically picked out by the coach C in each reference video. The key frame is the frame where the student B made a wrong posture during the motion.
In step S352, the skeleton analysis module 71 analyzes the reference video to generate a plurality of reference skeleton coordinate sequences corresponding to the reference human body. The implementation of step S352 may refer to step S2 in
In step S353, the computing circuit 32 obtains a candidate skeleton coordinate sequence from the plurality of reference skeleton coordinate sequences according to a start frame index and a sliding window length. Specifically, the computing circuit 32 scans the entire video 81 through the sliding window. For example, if the video 81 has N frames and the sliding window length is L frames, the candidate skeleton coordinate sequences extracted for the first time correspond to the 1st frame to the Lth frame, the candidate skeleton coordinate sequences extracted for the second time correspond to the 2nd frame to (L+1)th frame, . . . , and the candidate skeleton coordinate sequences extracted for the last time correspond to the (N-L+1)th frame to Nth frame.
In step S354, the start frame index, the sliding window length, and the candidate skeleton coordinate sequence are served as one of the first training data. In an embodiment, the first training data further includes a coach number. Specifically, different coaches have different guidance styles. Therefore, in step S351, in addition to collecting the reference video and the key frame index 83, the coach number corresponding to coach C who generates the key frame index 83 is also collected. Then, in step S354, the first training data further includes the coach number. In this way, in step S37 of
In an embodiment, the key frame analysis model 72 of the second embodiment not only outputs the key frame index 83, but also outputs the score of the key frame and the coach bias, the coach bias represents the tendency of the coach style to frame selection.
As shown in
Please refer to
In step 41, the server 3 obtains the reference video and an annotated image. The reference video records the first reference human body performing the motion. The annotated image corresponds to the first reference human body in the reference video. Specifically, the annotated image is a part of the formal guidance generated by the coach C.
In step S42, the skeleton analysis module 71 analyzes the reference video to generate a plurality of first reference skeleton coordinate sequences corresponding to the first reference human body. The implementation of step S42 may refer to step S2 in
In step S43, the computing circuit 32 performs a classification procedure to classify the first reference human body into a group according to a plurality of first reference skeleton coordinate sequences and the physiological information of the first reference human body. The group has a plurality of data corresponding to a plurality of second reference human bodies, each of the plurality of data includes a second reference skeleton coordinate sequence and a score. Each second reference human body is different from the first reference human body. The second reference human body corresponds to the reference video that has been classified before step S43 is executed.
In an embodiment, after the auxiliary system 100 for motion guidance has been running for a period of time, the data block 332 of the storage circuit 33 not only stores the plurality of videos 81 of different students B, but also stores the physiological information and group numbers of these students B, the skeleton coordinate sequences and the scores output by the key frame analysis model 72. The physiological information is measurable information of the human body of student B, such as height, weight, length of upper limbs or length of lower limbs, but the present disclosure is not limited to the above examples. The group number is the result of each student B being classified according to its physiological information and skeleton coordinate sequence. In other words, although the body shape of each student B is different, these students B may be classified (for example, using the K-means algorithm) according to the physiological information and skeleton coordinate sequence, so that those students B in the same group has a similar body shape.
In step S44, the annotated image, the plurality of first reference skeleton coordinate sequences, the physiological information, the group number and the reference skeleton coordinate template are served as one of the second training data. The reference skeleton coordinate template is the second reference skeleton coordinate sequence with a maximum score in the plurality of data. In an embodiment, the second training data further includes the coach number, as for more implementation details, please refer to step S354.
After the training of the pose correction model 73 is completed, the auxiliary system 100 for motion guidance may first find the group to which the new student B belongs according to his/her physiological information and skeleton coordinate sequence, and then find the best role model (reference skeleton coordinate template) in this group. Since the student B belonging to the same group have similar body shapes, by comparing the new student B with the old student who was certified by the coach C to have a standard posture, it is obviously helpful to assist the coach C to make the formal guidance.
As shown in
Please refer to
In step S51, the server 3 obtains the reference video and guidance voice. The reference video records the first reference human body performing the motion. The guidance voice corresponds to the first reference human body in the reference video. Specifically, similar to step S41, the guidance voice is a part of the formal guidance generated by the coach C. In step S41, the coach C guides the student B by drawing the auxiliary lines. In step S51, the coach C guides the student B by recording voice.
In step S52, the computing circuit 32 executes a conversion program to convert the guidance voice into a text vector. The conversion program adopts speech to text (STT) technology.
In step S53, the skeleton analysis module 71 analyzes the reference video to generate a plurality of first reference skeleton coordinate sequences corresponding to the first reference human body. In step S54, the computing circuit 32 performs a classification procedure to classify the first reference human body into a group according to a plurality of first reference skeleton coordinate sequences and the physiological information of the first reference human body. A group has a plurality of data corresponding to a plurality of second reference human bodies, each of the plurality of data includes a second reference skeleton coordinate sequence and a score. The implementation of steps S53 and S54 may refer to steps S42 and S43 in
In step S55, the text vectors, the plurality of first reference skeleton coordinate sequences, the physiological information, the group number and the reference skeleton coordinate template are served as one of the third training data. The reference skeleton coordinate template is the second reference skeleton coordinate sequence with a maximum score in the plurality of data. In an embodiment, the third training data further includes the coach number, as for more implementation details, please refer to step S354. Step S55 is similar to step S44 in
As shown in
In step S7, the coach C may generate the formal guidance according to the recommended guidance by the input circuit 55 of the electronic device 5. The electronic device 5 sends the formal guidance to the server 3 by the communication circuit 51. The formal guidance includes at least one of the annotated image and the guidance voice. The annotated image and the guidance voice correspond to the human body in the video 81. The server 3 then sends the formal guidance to the electronic device 1 of student B.
In an embodiment, after the server 3 receives the formal guidance, the computing circuit 32 may execute the training module 75 to adjust the hyper-parameter of at least one of the key frame analysis model 72, the pose correction model 73 and the speech meaning analysis model 74 according to the formal guidance.
In view of the above, the auxiliary system and method for motion guidance proposed by the present disclosure may achieve the following contributions and effects: in the remote teaching of sports (such as golf), the student may obtain the non-real time guidance of the coach through the present disclosure. The guidance includes auxiliary lines and guidance voice in respect to movement posture. Since the coach does not need to respond immediately, the present disclosure allows multiple students to send guidance requests to the same coach, so that an excellent coach may maximize his/her guidance efficiency. For students, when they are practicing on their own, they may obtain coach's remote guidance through the present disclosure, and may clearly see their motion videos with audio-visual guidance suggestions. For the coach, the present disclosure proposes a remote teaching method across geographical restrictions. The proposed method is not limited to one-to-one teaching at a single time, thereby greatly improving the teaching efficiency of the coach.