This application claims priority to Taiwan Application Serial Number 111126165, filed Jul. 12, 2022, which is herein incorporated by reference.
The present disclosure relates to a team sports vision training system and a method thereof. More particularly, the present disclosure relates to a team sports vision training system based on extended reality, voice interaction and action recognition and a method thereof.
Athlete training includes different aspects, such as skills, reactions, tactics, cognitive psychology, etc. Team competitive ball sports (e.g., basketball or football) particularly emphasize tactics and cooperation between teammates.
For example, when the team sport is basketball, the player needs to pay attention not only to the ball, but also to grasp every movement of other nine players on a court. However, most players often only focus on close-range teammate(s) or defensive player(s) on the near side when they hold the ball and have defensive player(s), thus ignoring another teammate who is available on the far side or another defensive player who ambushes on the far side. Accordingly, it causes the originally planned tactics to fail to execute smoothly or even lead to mistakes.
Therefore, a team sports vision training system based on extended reality, voice interaction and action recognition and a method thereof which are capable of effectively assisting the athlete in conducting vision training are commercially desirable.
According to one aspect of the present disclosure, a team sports vision training system based on extended reality, voice interaction and action recognition is configured to train vision and an action of a user and includes a head-mounted display device, an action capture device and a computing server. The head-mounted display device is disposed on the user and includes a task scenario player and a speech sensing module. The task scenario player shows a virtual task scenario image. The speech sensing module senses a speech signal of the user to generate a speech message. The action capture device captures the action of the user to generate an action message. The computing server is signally connected to the head-mounted display device and the action capture device. The computing server stores a scenario setting parameter group and receives the action message and the speech message, and the computing server includes a task scenario generating module, a speech recognition module and an action recognition module. The task scenario generating module generates the virtual task scenario image and a task parameter group according to the scenario setting parameter group, and transmits the virtual task scenario image to the head-mounted display device for the user to watch and then generate the speech signal and the action. The speech recognition module receives the speech message and recognizes the speech message according to a speech recognition procedure to generate a speech recognition result. The speech recognition module judges the task parameter group of the task scenario generating module and the speech recognition result to generate a vision training result. The action recognition module receives the action message and recognizes the action message according to an action recognition procedure to generate an action recognition result. The action recognition module judges the scenario setting parameter group and the action recognition result to generate a sport training result. The vision training result and the sport training result are configured to judge whether the user meets a training requirement.
According to another aspect of the present disclosure, a team sports vision training method based on extended reality, voice interaction and action recognition is configured to train vision and an action of a user and includes performing a virtual task scenario showing step, a speech recognizing step and an action recognizing step. The virtual task scenario showing step includes disposing a head-mounted display device on the user, configuring a task scenario generating module of a computing server to generate a virtual task scenario image and a task parameter group according to a scenario setting parameter group and transmit the virtual task scenario image to the head-mounted display device, and then configuring a task scenario player of the head-mounted display device to show the virtual task scenario image for the user to watch and then generate a speech signal and the action. The speech recognizing step includes configuring a speech sensing module of the head-mounted display device to sense a speech signal of the user to generate a speech message, and then configuring a speech recognition module of the computing server to receive the speech message and recognize the speech message according to a speech recognition procedure to generate a speech recognition result, and judge the task parameter group of the task scenario generating module and the speech recognition result to generate a vision training result. The action recognizing step includes configuring an action capture device to capture the action of the user to generate an action message, and then configuring an action recognition module of the computing server to receive the action message and recognize the action message according to an action recognition procedure to generate an action recognition result, and judge the scenario setting parameter group and the action recognition result to generate a sport training result. The vision training result and the sport training result are configured to judge whether the user meets a training requirement.
The present disclosure can be more fully understood by reading the following detailed description of the embodiment, with reference made to the accompanying drawings as follows:
The embodiment will be described with the drawings. For clarity, some practical details will be described below. However, it should be noted that the present disclosure should not be limited by the practical details, that is, in some embodiment, the practical details is unnecessary. In addition, for simplifying the drawings, some conventional structures and elements will be simply illustrated, and repeated elements may be represented by the same labels.
It will be understood that when an element (or device, module) is referred to as be “connected to” another element, it can be directly connected to the other element, or it can be indirectly connected to the other element, that is, intervening elements may be present. In contrast, when an element is referred to as be “directly connected to” another element, there are no intervening elements present. In addition, the terms first, second, third, etc. are used herein to describe various elements or components, these elements or components should not be limited by these terms. Consequently, a first element or component discussed below could be termed a second element or component.
Reference is made to
The team sports vision training system 100 based on extended reality, voice interaction and action recognition is configured to train vision and an action of a user and includes a head-mounted display device 200, an action capture device 300 and a computing server 400. The head-mounted display device 200 is disposed on the user and includes a task scenario player 210 and a speech sensing module 220. The task scenario player 210 shows a virtual task scenario image. The speech sensing module 220 senses a speech signal of the user to generate a speech message. The action capture device 300 captures the action of the user to generate an action message. In addition, the computing server 400 is signally connected to the head-mounted display device 200 and the action capture device 300. The computing server 400 stores a scenario setting parameter group and receives the action message and the speech message, and the computing server 400 includes a task scenario generating module 410, a speech recognition module 420 and an action recognition module 430. The task scenario generating module 410 generates the virtual task scenario image and a task parameter group according to the scenario setting parameter group, and transmits the virtual task scenario image to the head-mounted display device 200 for the user to watch and then generate the speech signal and the action. The speech recognition module 420 receives the speech message and recognizes the speech message according to a speech recognition procedure to generate a speech recognition result. The speech recognition module 420 judges the task parameter group of the task scenario generating module 410 and the speech recognition result to generate a vision training result. The action recognition module 430 receives the action message and recognizes the action message according to an action recognition procedure to generate an action recognition result. The action recognition module 430 judges the scenario setting parameter group and the action recognition result to generate a sport training result. The vision training result and the sport training result are configured to judge whether the user meets a training requirement. Therefore, the team sports vision training system 100 based on extended reality, voice interaction and action recognition of the present disclosure utilizes an extended reality helmet combined with voice interaction and action recognition technologies to be capable of effectively assisting the user (e.g., athletes or players) in conducting vision training, and making it easier for the user to grasp the movements of teammates on an ever-changing court, and then helping the team score and win. In addition, the present disclosure can realize individual training to avoid the problem of high manpower costs in training due to the need for repeated practice by multiple people on the physical court in a conventional technique. The following is a detailed description of the above-mentioned devices.
Reference is made to
The head-mounted display device 200 is disposed on the user 110 and includes a task scenario player 210, a speech sensing module 220 and a gesture sensing module 230. The task scenario player 210 is configured to show a virtual task scenario image. The speech sensing module 220 senses a speech signal of the user 110 to generate a speech message. The gesture sensing module 230 is configured to sense a gesture of the user 110 to generate a gesture sensing result. In one embodiment, the head-mounted display device 200 can be a mixed reality (MR) helmet or a virtual reality (VR) helmet, and can be worn on the head of the user 110. The head-mounted display device 200 may transmit related information by a wireless type (e.g., wireless network or Bluetooth) or a wire type (the related information includes a virtual task scenario image transmitted from the computing server 400a to the task scenario player 210). The task scenario player 210 can be a screen. The speech sensing module 220 can be a microphone. The gesture sensing module 230 can be a camera. When the user 110 wears the head-mounted display device 200, the eyes corresponding to the screen can view a MR image or a VR image (i.e., the virtual task scenario image), and the microphone corresponding to the mouth can collect sound for subsequent processing, but the present disclosure is not limited thereto.
The action capture device 300a is configured to capture the action of the user 110 to generate an action message. In detail, the action capture device 300a includes an inertial sensor 310 and a vision-based sensor 320. The inertial sensor 310 is disposed on the user 110 and senses the action of the user 110 to generate an inertial action message. The inertial sensor 310 transmits the inertial action message to an action recognition module 430a of the computing server 400a. For example, when the user 110 dribbles, and the inertial sensor 310 is worn on a hand of the user 110, the inertial sensor 310 captures the dribbling action of the hand of the user 110, and the inertial action message generated by the inertial sensor 310 is equivalent to information about the movement of the sphere. In other words, when the sphere touches the hand during dribbling, the trajectory of the hand is the same as the movement trajectory of the sphere. In addition, the vision-based sensor 320 includes a camera facing the user 110. The vision-based sensor 320 captures the action of the user 110 via the camera to generate a vision-based action message, and transmits the vision-based action message to the action recognition module 430a of the computing server 400a. The action message includes the inertial action message and the vision-based action message. The vision-based sensor 320 can be a camera or a mobile phone. It is also worth mentioning that if the team sport is basketball, the inertial sensor 310 is worn on the hand of the user 110. If the team sport is football, the inertial sensor 310 is worn on a foot of the user 110, thus depending on the need of training.
The computing server 400a is signally connected to the head-mounted display device 200 and the action capture device 300a. The computing server 400a stores the scenario setting parameter group 402 and receives the action message and the speech message. The scenario setting parameter group 402 includes a player tactical parameter 4021, a defensive player generating parameter 4022, a task execution parameter 4023, a dribble execution parameter 4024 and a task difficulty adjustment parameter 4025. The player tactical parameter 4021 includes an enable tactical item 4021a and a disable tactical item 4021b. One of the enable tactical item 4021a and the disable tactical item 4021b is selected according to the gesture sensing result. The enable tactical item 4021a represents that a virtual player can move in the virtual task scenario image. The disable tactical item 4021b represents that the virtual player is stationary. In addition, the defensive player generating parameter 4022 includes an enable defense item 4022a and a disable defense item 4022b. One of the enable defense item 4022a and the disable defense item 4022b is selected according to the gesture sensing result. The defensive player is an opponent. The enable defense item 4022a represents that the virtual task scenario image will display virtual defensive players, i.e., the virtual task scenario image will simultaneously display a plurality of virtual teammates and a plurality of virtual defensive players. For example, when the team sport is basketball, and the enable defense item 4022a is selected, the virtual task scenario image will display four virtual teammates and five virtual defensive players. The disable defense item 4022b represents that the virtual task scenario image will only display the virtual teammates without the virtual defensive players.
The task execution parameter 4023 includes a number add item 4023a and a color change item 4023b. One of the number add item 4023a and the color change item 4023b is selected according to the gesture sensing result. The number add item 4023a represents that the numbers are displayed around virtual objects (e.g., the top of the head of the virtual teammates) of the virtual task scenario image, respectively, for the user 110 to watch and then generate the speech signal. The color change item 4023b represents that the numbers are displayed around the virtual objects (e.g., the top of the head of the virtual teammates), respectively, and a cloth of one of the virtual objects is changed from the first color to the second color for the user 110 to watch and then generate the speech signal. In addition, the dribble execution parameter 4024 includes a one-hand dribble item 4024a, a crossover dribble item 4024b, a cross-leg dribble item 4024c and a behind-the-back dribble item 4024d. One of the one-hand dribble item 4024a, the crossover dribble item 4024b, the cross-leg dribble item 4024c and the behind-the-back dribble item 4024d is selected according to the gesture sensing result. The one-hand dribble item 4024a represents that the user 110 should execute a one-hand dribble action. The crossover dribble item 4024b represents that the user 110 should execute a crossover dribble action. The cross-leg dribble item 4024c represents that the user 110 should execute a cross-leg dribble action. The behind-the-back dribble item 4024d represents that the user 110 should execute a behind-the-back dribble action, thus judging subsequent dribble posture and dribble stability.
The task difficulty adjustment parameter 4025 represents that the degree of difficulty of the task is controlled by adjusting parameters. The parameters to be capable of being adjusted include the player tactical parameter 4021, the defensive player generating parameter 4022, the task execution parameter 4023, the dribble execution parameter 4024, a movement speed of the virtual player or a time limit for voice interaction, but the present disclosure is not limited thereto. It can be seen from the above that the scenario setting parameter group 402 can be displayed in the virtual task scenario image, and the combination of the scenario setting parameter group 402, the virtual reality and the selection action allows the user 110 to select desired scenario parameters in the virtual task scenario image. In one embodiment, the virtual task scenario image correspondingly changes the color of the checkbox and the check content therein according to a position of a virtual human hand of the user 110, thereby completing the selection of the scenario parameters. In addition, the degree of difficulty of the task can be set by a coach. For example, the coach utilizes a specific device (e.g., the MR/VR helmet, a mobile device or a tablet computer) to set the degree of difficulty of the task. The specific device and the computing server 400a can transmit the related information corresponding to the degree of difficulty of the task by the wireless type or the wire type.
The computing server 400a includes a task scenario generating module 410, a speech recognition module 420, an action recognition module 430a and a task difficulty adjusting module 440. The task scenario generating module 410 generates the virtual task scenario image and a task parameter group according to the scenario setting parameter group 402, and transmits the virtual task scenario image to the head-mounted display device 200 for the user 110 to watch and then generate the speech signal and the action. The speech recognition module 420 receives the speech message and recognizes the speech message according to a speech recognition procedure to generate a speech recognition result. The speech recognition module 420 judges the task parameter group of the task scenario generating module 410 and the speech recognition result to generate a vision training result. In one embodiment, the speech recognition procedure can be Microsoft speech recognition software (Azure Cognitive Service), but the present disclosure is not limited thereto.
The action recognition module 430a receives the action message and recognizes the action message according to an action recognition procedure to generate an action recognition result. The action recognition module 430a judges the scenario setting parameter group 402 and the action recognition result to generate a sport training result. The vision training result and the sport training result are configured to judge whether the user 110 meets a training requirement. The action recognition procedure is realized by computer vision, signal processing and artificial intelligence technology. In detail, the action recognition module 430a includes an inertial sensor-based action recognition module 432 and a vision-based action recognition module 434. The inertial sensor-based action recognition module 432 recognizes the inertial action message to generate an inertial action recognition result, and judges whether the inertial action recognition result is the same as or similar to the one (i.e., the item selected by the user 110) of the one-hand dribble item 4024a, the crossover dribble item 4024b, the cross-leg dribble item 4024c and the behind-the-back dribble item 4024d of the dribble execution parameter 4024 of the scenario setting parameter group 402 to generate a first sport training result. Moreover, the vision-based action recognition module 434 recognizes the action message to generate the action recognition result, and judges whether the action recognition result is the same as or similar to the one of the one-hand dribble item 4024a, the crossover dribble item 4024b, the cross-leg dribble item 4024c and the behind-the-back dribble item 4024d of the dribble execution parameter 4024 of the scenario setting parameter group 402 to generate a second sport training result. The sport training result includes the first sport training result and the second sport training result. The present disclosure can effectively improve the accuracy of recognition via the dual recognition of inertial sensor-based action and vision-based action.
The task difficulty adjusting module 440 adjusts selection of the enable tactical item 4021a and the disable tactical item 4021b of the player tactical parameter 4021, the enable defense item 4022a and the disable defense item 4022b of the defensive player generating parameter 4022, the number add item 4023a and the color change item 4023b of the task execution parameter 4023, and the one-hand dribble item 4024a, the crossover dribble item 4024b, the cross-leg dribble item 4024c and the behind-the-back dribble item 4024d of the dribble execution parameter 4024 according to the task difficulty adjustment parameter 4025, thereby performing tasks for different degrees of difficulty. For example, a high-difficulty task may correspond to the enable tactical item 4021a, the enable defense item 4022a, the number add item 4023a and/or the behind-the-back dribble item 4024d. A low-difficulty task may correspond to the disable tactical item 4021b, the disable defense item 4022b, the color change item 4023b and/or the one-hand dribble item 4024a.
The computing server 400a includes a memory and a high-performance arithmetic processor for processing images. The memory can store the scenario setting parameter group 402, a plurality of virtual sport scenes, a speech recognition procedure and an action recognition procedure. The high-performance arithmetic processor for processing images is configured to process the MR image or the VR image (i.e., the virtual task scenario image) in real time, such as a central processing unit (CPU) or a graphics processing unit (GPU). The computing server 400a can be a computer, a mobile device or other high-speed electronic computing device, but the present disclosure is not limited thereto. Therefore, the team sports vision training system 100a based on extended reality, voice interaction and action recognition of the present disclosure utilizes an extended reality helmet combined with voice interaction and action recognition technologies to be capable of effectively assisting the user 110 (e.g., athletes or players) in conducting vision training, and making it easier for the user 110 to grasp the movements of teammates on an ever-changing court, and then helping the team score and win. In addition, the present disclosure can realize individual training to avoid the problem of high manpower costs in training due to the need for repeated practice by multiple people on the physical court in a conventional technique.
Reference is made to
Reference is made to
Reference is made to
Reference is made to
Reference is made to
Reference is made to
Reference is made to
According to the aforementioned embodiments and examples, the advantages of the present disclosure are described as follows.
1. The team sports vision training system based on extended reality, voice interaction and action recognition and the method thereof of the present disclosure utilize an extended reality helmet combined with voice interaction and action recognition technologies to be capable of effectively assisting the user in conducting vision training, and making it easier for the user to grasp the movements of teammates on an ever-changing court, and then helping the team score and win, so that the present disclosure can avoid the problem of high manpower costs in training due to the need for repeated practice by multiple people on the physical court in a conventional technique.
2. The team sports vision training system based on extended reality, voice interaction and action recognition and the method thereof of the present disclosure allow the user to wear the extended reality helmet to conduct first-person tactical execution in a simulated situation, and can be combined with a simple action capture system (the inertial sensor or the vision-based sensor) to record the action of the user. When the user watches the virtual content to complete the vision training task, the action capture system recognizes the action of the user in real time and judges whether the user can conduct the specified dribble action synchronously, and then trains the dribble stability of the user.
Although the present disclosure has been described in considerable detail with reference to certain embodiments thereof, other embodiments are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the embodiments contained herein.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present disclosure without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the present disclosure cover modifications and variations of this disclosure provided they fall within the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
111126165 | Jul 2022 | TW | national |