This patent application claims priority to Indian provisional patent application no: 202011017127 filed on Apr. 21, 2020, the complete disclosure of which, in its entirety, is hereby incorporated by reference.
The embodiments herein relate to devices and systems for trajectory analysis, and more specifically to determine a trajectory of a ball.
Analytics of table tennis matches is still nascent field and performance analytics has hitherto been limited to simple metrics such as count of shots played in a rally and simple ball placement heatmaps. Analytics based on such simple metrics have been done primarily through manual methods of annotation. Further, for players that enjoy games or motivated by competition, there are no simple, non-intrusive, cost-effective ways to track their game without using any special equipment and in a normal table tennis environment. The traditional approaches of ball-tracking in vogue rely on deep learning-based object localization. These approaches build on large deep neural network backbones and train them to specifically track a table tennis ball. This approach, however, requires that the ball be clearly visible and more particularly, the object in a frame should not have any distortion. This in turn requires recording the videos in a high resolution as well as high frequency (i.e., frames per second). Also, such approaches are computationally intensive and require Graphics Processing Units (GPU).
The traditional approaches are difficult to implement when there are multiple moving objects in the background and if any noise exists due to people's movement due to other matches happening in the background such as in a local tournament or a coaching center. Another drawback of the traditional approaches is that they are developed for generic fast-moving object tracking. Various parameters such as pixel-to-centimeters and expected shape of the fast-moving object need to be fed into the system. This makes it practically unusable in real-life table tennis match scenarios.
Accordingly, there remains a need to address the aforementioned technical drawbacks using a system and method for determining a trajectory of a ball.
In view of the foregoing, an embodiment herein provides a method of determining a trajectory of a ball from a two-dimensional (2D) media-content using computer vision. The method includes one or more of (a) eliminating background of a 2D media-content of a global area that is captured from an image capturing device to obtain a background subtracted media-content that identifies moving objects in a foreground TT game, (b) locating one or more edges of a TT table associated with the foreground TT game from the background subtracted media-content using computer vision methods that automatically extract and analyze useful information from a single image or a sequence of images associated with the 2D media-content, (c) determining a pixel-to-metric unit ratio between the 2D media-content and the TT table by superimposing the one or more edges with dimensions of the TT table, (d) identifying a contour information of the trajectory of the ball after the ball is serviced using computer vision methods, the contour information includes a contour length, a thickness, end points and a centroid of the trajectory, (e) determining a speed of the ball in the trajectory using the pixel-to-metric unit ratio and the contour information, (f) transforming the speed that is based on the 2D media-content into a three-dimensional speed, and (g) determining the trajectory of the ball in the global area using spline interpolation based on the contour information and the three-dimensional speed of the ball.
In some embodiments, a linear or polynomial regression model may be used for transforming the speed of the ball into the three-dimensional speed.
In some embodiments, the method includes localizing the ball immediately after the ball is serviced by performing a global search in the 2D media-content for a colored streak of an expected dimension and locating a region closer to a player associated with servicing the ball.
In some embodiments, the method includes identifying the contour information is optimized by (a) determining an expected trajectory of the ball using spline extrapolation based on the contour information and the three-dimensional speed of the ball, and (b) obtaining a region of interest (ROI) in the 2D media content based on the expected trajectory to localize the global search, dimensions of the region of interest are based on the three-dimensional speed of the ball.
In some embodiments, the method obtains a look ahead ROI and the look behind ROI that vary based on the three-dimensional speed and a direction of the ball. In some embodiments, the method includes (a) processing the one or more frames obtained from an image search of the global area in a dense layer neural network to obtain a learned image transformation, and (b) classifying a shot-type executed by the player by processing the learned image transformations using a recurrent neural network.
In some embodiments, a heatmap of pitch locations is generated to visually analyze the foreground TT game, the heatmap includes the three-dimensional speed of the ball, a placement of the ball into one or more zones in the TT table, and an indicator that classifies that the placement resulted in a point win or a point loss.
In some embodiments, the method includes generating a top-down view of the pitch locations on a TT table by (a) interpolating the one or more coordinates of the 2D media-content, and (b) generating the top-down view using a perspective transformation.
In some embodiments, the top-down view is combined with the shot-type and the three-dimensional ball speed to enable placement analytics of the player.
In some embodiments, actions of the player in the foreground TT game are classified using a deep learning model that is based on face recognition and human pose models.
In some embodiments, the foreground TT game is localized in the 2D media-content using facial recognition of one or more players associated with the foreground TT game.
In another aspect, there is provided a system for determining a trajectory of a ball from a two-dimensional (2D) media content using computer vision. The system includes an image capturing device that captures the 2D media content of at least one TT table in a global area, and a trajectory tracking system that analyzes one or more frames of the 2D media content associated with said at least one TT table in the global area and determines a trajectory of the ball, the trajectory tracking system is communicatively connected to the image capturing device. The trajectory tracking system includes (i) a memory that stores a database and a set of modules, (ii) a device processor that executes said set of modules, where said set of modules include (a) a background elimination module that eliminates background of the 2D media-content to obtain a background subtracted media-content that identifies moving objects in a foreground TT game, (b) a pixel-to-metric determination module that locates one or more edges of a TT table associated with the foreground TT game from the background subtracted media-content using computer vision methods that automatically extract and analyze useful information from a single image or a sequence of images associated with the 2D media-content and determines a pixel-to-metric unit ratio between the 2D media-content and the TT table by superimposing the one or more edges with dimensions of the TT table, (c) a contour information module that identifies a contour information of the trajectory of the ball after the ball is serviced using computer vision methods, the contour information includes a contour length, a thickness, end points and a centroid of the trajectory, (d) a speed determination module that determines a speed of the ball in the trajectory using the pixel-to-metric unit ratio and the contour information, and transforms the speed that is based on the 2D media-content into a three-dimensional speed, and (e) a trajectory determination module that determines the trajectory of the ball in the global area using spline interpolation based on the contour information and the three-dimensional speed of the ball.
In another aspect, there is provided one or more non-transitory computer-readable storage medium storing the one or more sequence of instructions, which when executed by a processor, to perform a method of determining a trajectory of a ball from a two-dimensional (2D) media-content using computer vision, the method includes (a) eliminating background of a 2D media-content of a global area that is captured from an image capturing device to obtain a background subtracted media-content that identifies moving objects in a foreground TT game, (b) locating one or more edges of a TT table associated with the foreground TT game from the background subtracted media-content using computer vision methods that automatically extract and analyze useful information from a single image or a sequence of images associated with the 2D media-content, (c) determining a pixel-to-metric unit ratio between the 2D media-content and the TT table by superimposing the one or more edges with dimensions of the TT table, (d) identifying a contour information of the trajectory of the ball after the ball is serviced using computer vision methods, the contour information includes a contour length, a thickness, end points and a centroid of the trajectory, (e) determining a speed of the ball in the trajectory using the pixel-to-metric unit ratio and the contour information, (f) transforming the speed that is based on the 2D media-content into a three-dimensional speed, and (g) determining the trajectory of the ball in the global area using spline interpolation based on the contour information and the three-dimensional speed of the ball.
These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
There remains a need to address the aforementioned technical drawbacks using a system and method for determining a trajectory of a ball. Referring now to the drawings, and more particularly to
In an exemplary embodiment, various modules described herein and illustrated in the figures are embodied as hardware-enabled modules and may be configured as a plurality of overlapping or independent electronic circuits, devices, and discrete elements packaged onto a circuit board to provide data and signal processing functionality within a computer. An example might be a comparator, inverter, or flip-flop, which could include a plurality of transistors and other supporting devices and circuit elements. The modules that are configured with electronic circuits process computer logic instructions capable of providing at least one digital signals or analog signals for performing various functions as described herein.
The trajectory tracking server 112 locates a plurality of edges of a TT table (e.g., one of TT tables 104A-N) associated with the foreground TT game from the background-subtracted media-content using computer vision methods. Computer vision methods automatically extract and analyze useful information from a single image or a sequence of images associated with the 2D media-content. Computer vision methods are concerned with the automatic extraction, analysis and understanding of useful information from a single image or a sequence of images. Computer vision methods involve development of a theoretical and algorithmic basis to achieve automatic visual understanding. Computer vision methods may include image classification, object detection, object tracking, semantic segmentation or instance segmentation.
The trajectory tracking server 112 determines a pixel-to-metric unit ratio between the 2D media-content and the TT table 104A-N by superimposing the plurality of edges with dimensions of the TT table 104A-N. The pixel-to-metric unit ratio is a ratio between a pixel of the 2D media-content and a metric unit. The trajectory tracking server 112 identifies a contour information of the trajectory of the ball after the ball is serviced using computer vision methods. The contour information includes a contour length, a thickness, endpoints, and a centroid of the trajectory of the ball. The trajectory tracking server 112 determines a speed of the ball in the trajectory using the pixel-to-metric unit ratio and the contour information. The trajectory tracking server 112 transforms the speed that is based on the 2D media-content into a three-dimensional speed. The three-dimensional speed is a real-world speed of the ball in the global area 102. A computer linear or polynomial regression model may be used for transforming the speed of the ball into the three-dimensional speed. The trajectory tracking server 112 may determine the trajectory of the ball in the global area 102 using spline interpolation based on the contour information and the three-dimensional speed of the ball. The ball in the 2D media-content may be tracked based on the trajectory. the trajectory tracking server 112 may output the trajectory on the electronic display screen 114. The trajectory tracking server 112 may use computer-generated mathematical methods, computer vision methods, and a deep learning model, which may include artificial intelligence and machine learning to track the ball from the 2D media content. The trajectory tracking server 112 is configured to determine the trajectory of the ball based on the 2D media-content captured from the image capturing device 108 using computer vision methods and deep neural networking models and hence does not requires a graphics processing unit (GPU), thereby reducing computer system requirements for the trajectory tracking server 112 as GPU is an expensive computational unit. Optionally, the image capturing device 108 may be a general-purpose mobile camera or a closed-circuit television camera. The trajectory tracking server 112 may be able to begin tracking the ball automatically after the ball is serviced.
Optionally, a heatmap of pitch locations may be generated to visually analyze the foreground TT game. The heatmap may indicate the three-dimensional speed of the ball, a placement of the ball into a plurality of zones in the TT table 104A-N, and an indicator that classifies that the placement resulted in a point win or a point loss.
In some embodiments, the trajectory tracking server 112 may utilize face recognition models and human pose models with the trajectory of the ball to track a game of interest in the global area 102 from the 2D media-content of the global area 102. Human pose models are computer models based on computer vision methods that detect and analyze human posture.
In some embodiments, actions of the player in the foreground TT game are classified using a deep learning model that is based on face recognition and human pose models.
In some embodiments, the foreground TT game is localized in the 2D media-content using facial recognition of one or more players 106A-N associated with the foreground TT game. In some embodiments, the method includes generating a top-down view of the pitch locations on the TT table 104A-N by (a) interpolating the plurality of coordinates of the 2D media-content, and (b) generating the top-down view using a perspective transformation.
In some embodiments, the top-down view is combined with the shot-type and the three-dimensional ball speed to enable placement analytics of the player.
In some embodiments, the trajectory of the ball may be used to enable performance analysis of the one or more players 106A-N. The performance analysis may be used to recommend coaching interventions.
To ensure that the correct ball is being tracked and the noise in the 2D media-content is eliminated, an expected breadth (thickness) of a colored streak caused by the ball may be determined by adopting a pixel-to-metric unit ratio. The pixel-to-metric determination module 208 locates a plurality of edges of a TT table 104A-N associated with the foreground TT game from the background-subtracted media-content using computer vision. The pixel-to-metric determination module 208 determines the pixel-to-metric unit ratio between the 2D media-content and the TT table 104A-N by superimposing the plurality of edges with dimensions of the TT table 104A-N. The pixel-to-metric unit ratio may be computed by using computer vision methods to first locate the plurality of edges of the TT table 104A-N. The pixel-to-metric unit ratio may be computed with knowledge of dimensions of the TT table 104A-N in real-time and an apparent length in the greyscale video frame in pixels. In some embodiments, a set of regions and objects of interest in the greyscale video frame are identified. The set of regions and objects of interest that are identified with a presence of small noise, or due to poor lighting or low contrast, or with a presence of large noise, or due to movement of people or long horizontal contours due to accidental movement zooming of the image capturing device 108, is eliminated by computing contours on the grayscale background frame or the background-subtracted media-content.
After the ball is serviced by a player 106A, the ball is immediately localized by performing a global search in the 2D media-content for a colored streak of an expected dimension and locating a region closer to a player associated with servicing the ball. In an embodiment, the ball may be localized after 0.01 seconds of being serviced by the player 106A. In some embodiments, one or more players 106A-N may be present in the background-subtracted media-content. The global search is an action of performing a search for a colored streak of an expected dimension of the ball in the 2D media-content. The one or more players 106A-N may be localized in the background-subtracted media-content using a facial recognition model. For identifying a start of a service in the foreground TT game in the background-subtracted media-content, the global search may be combined with human pose models to ensure that the ball that is associated with the foreground TT game is localized and not some other ball in the 2D media-content.
The contour information module 210 identifies a contour information of the trajectory of the ball after the ball is serviced using computer vision methods. The contour information includes a contour length, a thickness, endpoints, and a centroid of the trajectory. The speed determination module 212 determines a speed of the ball in the trajectory using the pixel-to-metric unit ratio and the contour information. In an embodiment, the speed determination module 212 determines the speed of the ball as shown below:
Speed of the ball in 2-D plane=[the contour length×the pixel-to-metric unit ratio]/frequency of the 2D media-content.
The pixel-to-metric unit ratio is multiplied by the contour length may be determined based on the trajectory of the ball and a value of a frequency of the 2D media-content. The speed is estimated in a 2-D plane.
The speed determination module 212 transforms the speed that is based on the 2D media-content into a three-dimensional speed. A linear or polynomial regression model may be used for transforming the speed of the ball into the three-dimensional speed. The three-dimensional speed is a real-world speed of the ball in the global area 102. In an embodiment, the linear or polynomial regression model may be based on speed radar measurements. An actual speed may be estimated as shown below:
three-dimensional speed=speed of the ball in 2-D plane*regression coefficient+beta
where the regression coefficient and the intercept beta are outputs of the linear regression model.
The trajectory determination module 214 determines the trajectory of the ball in the global area 102 using spline interpolation based on the contour information and the three-dimensional speed of the ball.
One or more parameters associated with the trajectory of the ball may be normalized or modified in some manner to allow for comparisons, such as comparisons between a plurality of game sessions or between the one or more players 106A-N. Some factors that may be considered in a normalization process may include, but are not limited to, ambient conditions, such as wind speed, temperature, and humidity, the physical characteristic of the one or more players 106A-N such as sex, weight, age, and height and a skill level of the one or more players 106A-N.
An expected trajectory of the ball may be determined using spline extrapolation based on the contour information and the three-dimensional speed of the ball. In a period of time when the ball is lost in play or is occluded due to movement of the one or more players 106A-N or people in the global area 102, identification of the contour information may be optimized by obtaining a region of interest (ROI) in the 2D media content based on the expected trajectory to localize the global search, where dimensions of the region of interest are based on the three-dimensional speed of the ball. The ROI is a region in the 2D media-content that enables the trajectory tracking server 112 to localize the global search. If the ball is lost in play or is occluded due to movement of the one or more players 106A-N or people, the global search is re-initiated. In some embodiments, the ROI may be based on the three-dimensional speed of the ball. In some embodiments, a look ahead ROI and the look behind ROI that vary based on the three-dimensional speed and a direction of the ball may be obtained. The look ahead ROI is a region in the 2D media-content where the ball is estimated to go in the trajectory. The look behind ROI is a region in the 2D media-content where the ball is estimated to be returned. Optionally, both the look ahead ROI and the look behind ROI for the ball may be obtained to determine the trajectory of the ball when the ball is being returned. The ROI may have both a directional and a speed-based variation.
Actions of the one or more players 106A-N in the foreground TT game may be classified using the deep learning model. The deep learning model may be based on face recognition and human pose models.
In some embodiments, a plurality of key-points on a body of the one or more players 106A-N may be determined using human pose estimation models or a convolutional neural network. The human pose estimation models may provide human pose tracking by using a detector (not shown) that employs detection algorithms that identify, map, and localize human joints, to locate a pose region-of-interest (PROI) within the 2D media-content and employing machine learning (ML) to infer 33 key-points and 2D landmarks of the body from a single frame of the 2D media-content. In an embodiment, the human pose estimation models are run only on a first frame of the 2D media-content. For subsequent frames of the 2D media-content, the PROI may be determined based on the plurality of key-points of the first frame.
The plurality of key-points may be provided as an input to an encoder network (not shown) for creating a latent space from which inputs are fed, into a long short term memory (LSTM) network (not shown) to classify actions of the player in the foreground TT game. LSTM is an artificial recurrent neural network having feedback connections. In an embodiment, a deep learning model combining the encoder network and the LSTM network may be trained on video data associated with the 2D media-content to classify the actions into the following classes including, but not limited to (a) Player(s) idle, (b) Player(s) ready to serve, (c) Service, (d) Backhand shot, and (e) Forehand shot.
Optionally, the backhand shot may be sub-classified into further 8 shot-types by the deep learning model and the forehand shot may be sub-classified into further 8 shot-types by the deep learning model that processes the plurality of frames obtained from an image search of the global area 102 in a dense layer neural network to obtain a learned image transformation and classifying a shot-type or the shot-type information executed by the player by processing the learned image transformations using a recurrent neural network. The player action tracking module 302 may produce the shot-type information as output. The shot-type information may include the speed of the ball determined by the speed determination module 212 and the trajectory of the ball that is determined by the trajectory determination module 214.
In an embodiment, a spin of the ball may also be determined using computer vision and included in the shot-type information. The player action tracking module 302 may be designed or configured to provide feedback and store the shot-type information that may include an impact position height of a shot, a shot velocity or spin velocity as it leaves a racquet associated with the player 106A and a consistency of a shot parameter (such as racquet head speed) and shot results (such as in or out).
A surface of the TT table 104A where the ball is placed by the one or more players 106A-N may be divided into one or more zones that may be identified through user input such as professional expertise in TT coaching. An illustration of the surface 600 divided into one or more zones is illustrated in
The view production module 510 generates a top-down view of the pitch locations on a TT table 104A-N by interpolating the one or more coordinates of the 2D media-content and generating the top-down view using a perspective transformation. In an embodiment, the precise location of pitching in a sideways view of the TT table 104A-N that may be obtained from the 2D media-content may be converted to a top-down view using perspective transformation. The view production module 510 may combine the top-down view with the shot-type and the three-dimensional ball speed to enable placement analytics of the player. The top-down view may include ball pitch locations as well as a total count of placements in a zone and be represented in each of the plurality of zones. The ball pitch locations and the total count of placements may further be aggregated at a point, a set and a match level of the foreground TT game. Along with the ball pitch locations and the total count of placements, a representation of the placement associated with a shot executed by a player 106C may also include the speed of the ball in the particular and a marker that indicates whether or not the placement resulted in a point win or loss for the player 106C associated with the shot. The representation enables analysis of placement performance of the player 106C and tailoring specific coaching interventions for the player 106C. An exemplary view of placement analytics of the one or more players 106A-N associated with the foreground TT game generated by the system 500 is described in
The game analytics module 502 enables visually tracking a game of interest in the 2D media content that is captured from the image capturing device 108 using the heatmap of pitch locations. Visually tracking the game of interest enables accurate comprehension of weakness/strength of the one or more players 106A-N associated with the game of interest. For example, a player 106D may be weak in picking up short placements, or weak in responding to long placements as returns due to footwork and not coming into right position.
The embodiments herein may include a computer program product configured to include a pre-configured set of instructions, which when performed, can result in actions as stated in conjunction with the methods described above. In an example, the pre-configured set of instructions can be stored on a tangible non-transitory computer readable medium or a program storage device. In an example, the tangible non-transitory computer readable medium can be configured to include the set of instructions, which when performed by a device, can cause the device to perform acts similar to the ones described here. Embodiments herein may also include tangible and/or non-transitory computer-readable storage media for carrying or having computer executable instructions or data structures stored thereon.
Generally, program modules utilized herein include routines, programs, components, data structures, objects, and the functions inherent in the design of special-purpose processors, etc. that perform particular tasks or implement particular abstract data types. Computer executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
The embodiments herein can include both hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output (I/O) devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
A representative hardware environment for practicing the embodiments herein is depicted in
The system is able to determine the trajectory of the ball based on the 2D media-content captured from the image capturing device 108 without requiring a graphics processing unit (GPU). Optionally, the image capturing device 108 may be a low-end mobile camera as the 2D media-content may be generated from a camera recording from a low-resolution, as the system does not require the image capturing device 108 to be a high-end camera that may capture media content in high definition. Further, a requirement of manual inputs associated with characteristics of the ball are minimized. Accordingly, the system may be able to track the ball automatically.
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202011017127 | Apr 2020 | IN | national |