Multiple cameras are used to capture activity in a scene. Subsequent processing of the captured images enables end users to view the scene and move throughout the scene in over a full 360-degree range of motion. For example, multiple cameras may be used to capture a sports game and end users can move throughout the field of play freely. The end user may also view the game from a virtual camera.
The same numbers are used throughout the disclosure and the figures to reference like components and features. Numbers in the 100 series refer to features originally found in
Sporting events and other competitions are often broadcast for the entertainment of end users. These games may be rendered in a variety of formats. For example, a game can be rendered as a two-dimensional video or a three-dimensional video. The games may be captured using one or more high-resolution cameras positioned around an entire field of play. The plurality of cameras may capture an entire three-dimensional volumetric space, including the field of play. In embodiments, the camera system may include multiple super high-resolution cameras for volumetric capture. The end users can view the action of the game and move through the captured volume freely by being present with a sequence of images representing the three-dimensional volumetric space. Additionally, an end user can view the game from a virtual camera that follows the action within the field by following the ball or a specific player in the three-dimensional volumetric space.
The present techniques enable jersey number recognition in a multiple camera system. In embodiments, providing an immersive media experience for an end user may be based, in part, on identifying the jersey number, team identity, and player location for each player in real time. The stable and highly accurate jersey number recognition system according to the present techniques can extract small jersey numbers (or other indicators/identifiers) on the body of a player even during constant movement by the player. For example, in a video at a 4K resolution, player jersey numbers are a very small portion of each captured image frame. Furthermore, a player's body posture also changes drastically during the video, which causes deformation of the jersey number image or the indicator image. This deformation negatively impacts jersey number recognition accuracy. Second, when the player is oriented in a semi-profile position and is wearing a double-digit jersey number, it is very likely that only one digit of the jersey number is visible. This causes jersey number recognition results that are unreliable and error-prone. Often, conventional techniques recognize the player's jersey number only when the player's jersey number is clearly visible, which is not generally applicable in single camera system. Therefore, the present techniques enable a multiple camera jersey number recognition solution to address all these challenges. In this manner, an immersive media experience is provided to end users in real-time.
As used herein, a game may refer to a form of play according to a set of rules. The game may be played for recreation, entertainment, or achievement. A competitive game may be referred to as a sport, sporting event, or competition. Accordingly, a sport may also be a form of competitive physical activity. The game may have an audience of spectators that observe the game. The spectators may be referred to as end-users when the spectators observe the game via an electronic device, as opposed to viewing the game live and in person. The game may be competitive in nature and organized such that opposing individuals or teams compete to win. A win refers to a first individual or first team being recognized as triumphing over other individuals or teams. A win may also result in an individual or team meeting or securing an achievement. Often, the game is played on a field, court, within an arena, or some other area designated for game play. The area designated for game play typically includes markings, goal posts, nets, and the like to facilitate game play.
A game may be organized as any number of individuals configured in an opposing fashion and competing to win. A team sport is a game where a plurality of individuals is organized into opposing teams. The individuals may be generally referred to as players. The opposing teams may compete to win. Often, the competition includes each player making a strategic movement to successfully overcome one or more players to meet a game objective. An example of a team sport is football.
Generally, football describes a family of games where a ball is kicked at various times to ultimately score a goal. Football may include, for example, association football, gridiron football, rugby football. American football may be a variation of gridiron football. In embodiments, the American football described herein may be as played according to the rules and regulations of the National Football League (NFL). While American football is described, the present techniques may apply to any event where an individual makes strategic movements within a defined space. In embodiments, a strategic movement may be referred to as a trajectory. An end user can be immersed in a rendering of the event based on this trajectory according to the techniques described herein. In particular, the present techniques enable the identification of all the players in the field of play by deriving the corresponding jersey and team information. Again, for ease of description, the present techniques are described using an American football game as an example. However, any game, sport, sporting event, or competition may be used according to the present techniques. For example, the game types may include major sports such as basketball, baseball, hockey, lacrosse, and the like.
At block 102, a camera system 102 is to capture a field of play. In embodiments, the camera system may include one or more physical cameras with a 5120×3072 resolution, configured throughout a stadium to capture the field of play. For example, the number of cameras in the camera system may be thirty-eight. Although particular camera resolutions are described, any camera resolution may be used according to the present techniques. A subset of cameras may be selected, such as eighteen cameras from among the thirty-eight cameras, to cover the entire field of play and ensure that each pixel in the field of play is captured by at least three cameras. The camera system 102 captures a real-time video stream from a plurality of cameras. The plurality of cameras may capture the field of play at 30 frames per second (fps). The subset of cameras selected may be different in different scenarios. For example, depending on the structure surrounding the field of play, each location may be captured by at least three cameras using a smaller or larger subset of cameras. Thus, in embodiments the number of cameras used in the camera system is calculated be determining the number of cameras needed to capture each point within the field of play by at least three cameras.
At block 104, multiple camera player detection is executed to determine isolated bounding boxes surrounding each player in each camera view captured by the camera system 202. Multiple camera player detection module detects and associates a player from multiple cameras and outputs player orientation labels. In embodiments, bounding boxes of a player in each camera view captured by a camera system may be determined. In particular, player detection is performed for each camera view. A person detection algorithm based on a you only look once (YOLO) approach in a multiple camera framework may be executed for each frame captured by a camera. The person detection algorithm is executed to detect all the players in field of play.
The bounding boxes derived for each player from each camera of the camera system may be used as input for single view jersey number recognition. In particular, a single view jersey number recognition uses a pre-designed template to crop a player detection image, followed by a lightweight but powerful feature extraction and classification network. Accordingly, at block 106, single use jersey number recognition is executed. The single use jersey number recognition as described herein includes pre-processing, feature extraction, feature matching, and hard non-maximum suppression. As illustrated at block 110, a single view jersey number recognition process takes as input the detected profile player images as defined by a bounding box. At block 112, features are extracted from the detected non-profile player images. At block 114, a you only look once (YOLO) regression is applied to the extracted features. Finally, at block 116, a hard non-maximum suppression (NMS) algorithm is applied to the features. In particular, a hard NMS algorithm is executed within single cam jersey number recognition to handle double digit number failure cases. The single view jersey number recognition technique at block 106 may take as input detected non-profile player images from block 104 and extracts jersey numbers from each image.
At block 108, a voting policy is implemented to selected a final jersey number. As described herein, the voting policy is implemented to improve the multiple camera jersey number recognition stability, and generates the final jersey number from all single cam jersey number recognition results. As illustrated by
Specifically, orientation detection is incorporated in the jersey recognition, which incorporates the importance of jersey number location. An orientation attribute is defined that can be used as input for a single camera player recognition process. The present techniques also include a light-weight convolutional neural network (CNN) to efficiently leverage both the high level and low-level semantic features extracted from an image of a player. These features include, but are not limited to, words, symbols, phrases, and the like. A hard-NMS may be executed to eliminate the single digit and double digit that can occur according to player orientation. A multiple camera voting policy is used to fuse and infer the final jersey number result with high accuracy. Thus, the present techniques enable real-time, stable, and highly accurate player jersey number recognition. The player jersey recognition may be further used to create engaging live broadcast and analysis of a game in real-time.
As illustrated in the example of
The field of play 200 includes end zones and at each end of the field of play. During play, a first team is designated as the offense, and a second team is designated as the defense. The ball used during play is an oval or prolate spheroid. The offense controls the ball, while the defense is without control of the ball. The offense attempts to advance the ball down the length of the rectangular field by running or passing the ball while the defense simultaneously attempts to prevent the offense from advancing the ball down the length of the field. The defense may also attempt to take control of the ball. Generally, to begin a round of play opposing teams line up in a particular format. A round of play may be referred to as a down. During each down, the offense is given an opportunity to execute a play to advance down the field. To begin a play, the offense and defense line up along a line of scrimmage according to various schemes. For example, an offense will line up in a formation in an attempt to overcome the defense and advance the ball toward the goal line. If the offense can advance the ball past the goal line and into the end zone, the offense will score a touchdown and is awarded points. The offense is also given a try to obtain points after the touchdown.
An American football game is about four hours in duration including all breaks where no gameplay occurs. In some cases, about half of the four hours includes active gameplay, while the other half is some sort of break. As used herein, a break may refer to team timeouts, official timeouts, commercial timeouts, halftime, time during transition after a turnover, and the like. The game may begin with a kickoff, where the kicking team kicks the ball to the receiving team. During the kickoff, the team who will be considered the offense after the kickoff is the receiving team, while the kicking team is typically considered the defense. After the kickoff, the offense must advance the ball at least ten yards downfield in four downs, or otherwise the offense turns the football over to the defense. If the offense succeeds in advancing the ball ten yards or more, a new set of four downs is given to the offense to use in advancing the ball another ten yards. Generally, points are given to the team that advances the ball into the opposing team's end zone or kicks the ball through the goal posts of the opposing team. The team with the most points at the end of a game wins. There are also a number of special plays that may be executed during a down, including but not limited to, punts, field goals, and extra point attempts.
Each team may include a plurality of players. The players that belong to a same team generally where the same colors for uniforms during game play. To distinguish players of the same team, each player may have an identifier that is unique among players of the same team. For example, in American football an identifier is a number worn on the uniform of the player. The number often is found on a jersey worn by the player, and is typically is found on the front and back of the jersey. Accordingly, the identifier may be referred to as a jersey number. In some cases, the identifier is also found on the helmet, shoulders, pants, or shoes worn by the player.
Multiple calibrated cameras may be deployed in the stadium 202 to capture high-resolution images of the field 200. The images may be processed via segmentation and three-dimensional (3D) reconstruction to create a 3D volumetric model. In embodiments, subset of cameras from a set of all available cameras may be selected for image capture, such as eighteen cameras from among the thirty-six cameras as illustrated in
By capturing the game on a field of play with multiple cameras, an immersive viewing experience may be generated for an end user. In embodiments, based on the player trajectory, an immersive media experience may be provided. In some cases, the immersive media experience is provided in real-time. Alternatively, the immersive media experience may be a replay of a previously captured game. In the immersive media experience, an end user can follow the ball and players with a full 360-degree freedom of movement within the field of play. In embodiments, the present techniques enable a virtual camera that follows the player to generate volumetric video.
In embodiments, the present techniques may enable tracking of all players or individuals during a game or an event. The tracking of a player may be based, at least in part, on identifying the player across multiple camera views, wherein each camera of the camera system corresponds to a camera view. The present techniques enable an identification of a player in each camera view based on a number or other identifier worn on the body of the player. Moreover, the present techniques enable an optical solution to track each player, including when players are substituted between downs, according to jersey recognition via a single camera.
The diagram of
In embodiments, if the player's body orientation is nearly parallel to the image plane of the camera view, the jersey number is likely clearly visible. When the identifier or jersey number is clearly visible, the player may be classified as a non-profile player (NP). Otherwise, the player is classified as a profile player (P). In embodiments, the profile player may be oriented such that a substantially side view of the player is captured in a particular camera view. In this side view, the identifier worn by the player is not visible. By contrast, by a non-profile player is not oriented such that a side view of the player is captured. In the capture of the non-profile player, the identifier worn by the player is visible.
In embodiments, an identifier may be considered visible in a camera view when a plane of the identifier is substantially parallel with the image plane of the camera view. The plane of the identifier refers to the plane where most of the identifier is visible when worn on the uniform of the player. As used herein, the plane of the identifier is substantially parallel with the image plane of the camera view when an angle between the plane of the identifier and the image plane is less than approximately sixty-seven degrees. Note that in the example of a football player, the jersey number may be distorted or otherwise not smooth as applied to the jersey worn on the body of the player, even when the plane of the identifier (jersey number) is substantially parallel when the image plane of the camera. This is due to padding and body shape causing some stretching, deforming, or folding of the number as it is worn on the player. However, the present techniques enable the determination of the identifier even when it is stretched, deformed, or otherwise distorted.
An identifier should be substantially visible in a camera view in order to recognize the identifier. As described above, the identifier is substantially visible for non-profile players and the identifier is not substantially visible for profile players. Accordingly, images of a player where the orientation of that player is a non-profile player orientation are used for jersey number recognition. Images where a player is oriented as a profile player in a camera view are not used for jersey number recognition. In embodiments, the players are detected according to player detection techniques and classified as a non-profile player or a profile player for each frame of each camera view based on the orientation of the player. The player's orientation changes from frame to frame in each camera view. The detected player in each frame of each camera view may be used for single camera jersey recognition. As described below, the present techniques can ensure detection of double-digit jersey numbers as two digits as opposed to a single digit. Additionally, the present techniques avoid additional computational cost by not attempting single camera jersey recognition on profile players. Conventional techniques may misrepresent double digit jersey numbers as single digit jersey number due to occlusion. Further, conventional techniques incur additional computational costs when processing all detected players.
In embodiments, for each view, the entire field of play including multiple players is captured by each of the cameras. A person detection algorithm based on a you only look once (YOLO) is executed to detect all the players in field of play. An association of the bounding boxes of an identical player between frames in each camera view is found. Thus, bounding boxes identifying the player with jersey number 55 is found in each camera view as captured by cameras C03, C07, C11, C14, C20, C24, C27, C29 and, C32. For each camera view 404, 406, 408, 410, 412, 414, 416, 418, and 420 each detected player is assigned a unique track ID with respect to each camera. Each bounding box may be described by a location of the bounding box within the image according to xy coordinates. The width (w) and the height (h) of the bounding box is also given.
As illustrated in the example of
In this manner, body orientation is used along with the position and size to describe a person/player. In
The diagram of
After obtaining player detection results for all cameras, jersey number recognition can be executed for all non-profile players. As illustrated in
The diagram of
To determine a precise jersey number location, a convolutional neural network may be used. In particular, the present techniques enable an end to end detection and classification approach to jersey number recognition, where each number is assigned into a unique object category. For instance, in American football game, there are 99 possible jersey numbers, resulting in 99 classification categories ranging from 1˜99, and each category represents one unique number. Note that the jersey number is a player identifier. The present techniques may apply to other player identifiers with a greater or fewer number of possible classification categories.
In embodiments, in preparation for processing by the convolutional neural network, the bounding box for each detected player is padded to correspond to an input size of the CNN. The bounding boxes as obtained from the player detection results will likely vary in size and aspect ratio as player's body posture changes drastically during game play. By padding the bounding boxes, detection results are not resized. Put another way, the cropped image is not resized or resampled, nor is the resolution of the image changed. Instead, as illustrated in
At block 602, each bounding box of a camera view is cropped according to the size of the player detection bounding box. In embodiments, the player image is cropped according to the player detection bounding box and then the max (height, width) of bounding box in this camera view is used as the square template length for this bounding box. Accordingly, the padding described herein uses a maximum height, width, or height and width of the bounding boxes for a current view as the square template length. At block 604, the bounding boxes/cropped images that are smaller than the template size are padded by placing the cropped image into the middle of the template and padding the remainder of the template with nonce values to achieve a same image size for each detected player. At block 608, each padded image is then resized for input into a convolutional neural network 610 for feature extraction. Directly resizing the cropped image will change the aspect ratio of the jersey. By padding the image as described herein, aspect ratio of jersey number remains the same, with no deformation. Accordingly, padding the image avoids deformation and significantly improves the jersey number recognition accuracy.
As illustrated in
For jersey number recognition, there are normally two types of jersey number, i.e., single digit and double-digit numbers. A double-digit number is a combination of two single digits. If the double-digit jersey number has location overlap with single digit number, it is very likely that the single digit number is part of the double-digit number.
Hard-NMS may be implemented according to the present techniques. Firstly, a hard-sort is executed instead of traditional sort which only depends on the scores of bounding boxes. The hard sort depends on both the scores and the bounding box location/size. In the hard sort, the bounding boxes are sorted based on the rectangle size (height*width). If two bounding boxes have the same score, the bigger size bounding box is likely correct when the scores are equivalent. Then, an intersection over union (IOU) is computed for all labels of bounding boxes. This assumes that that one player image only contains one unique jersey number. In addition, the IOU may be modified. In particular, conventional IOU (bi, bj) means an overlap area of bi and bj divided by union area bi and bj. The IOU according to the present techniques yields a hard IOU (bi, bj), with an overlap area of bi and bj divided by area of bj. The IOU according to the present techniques improves sensitivity of the hard NMS to bounding box intersection.
For example, the algorithm below describes hard-non maximum suppression according to the present techniques.
As illustrated in
Jersey numbers of some players may be erroneously recognized as a single digit number due to partial jersey number visibility. After getting all players' bounding boxes and jersey number, correspondences may be found from the same player from different cameras through multiple camera association. Then, a single view jersey number recognition may be applied to the non-profile players in each frame. This results in an initial multiple camera jersey number of the player and count frequency of occurrence for this player jersey number results. In particular, cumulative voting may be used to determine a final jersey number.
Accordingly, at block 1202 for each player detection results are obtained. The detection results include each jersey number associated with the player as well as a frequency that each jersey number was detected. As described above, each player can be located across camera views via a player detection module. At block 1204, each candidate jersey number is sorted according to frequency.
For each candidate jersey number, at block 1206 it is determined if frequency of the candidate jersey number is a less than nine. The number nine is selected here for exemplary purposes only. The number selected at block 1206 can be according to a certain percentage of cameras or any other subset of cameras. If the candidate jersey number has a maximum frequency that is less than nine, process flow continues to block 1208. If the candidate jersey number has a maximum frequency that is greater than nine process flow continues to block 1216.
At block 1208, processing begins for the candidate jersey number results with frequencies that are greater than nine. In particular, in response to the candidate jersey number being a double digit, the candidate jersey number is separated into a single digit part and a double-digit part. At block 1210 it is determined if the double-digit part contains the single digit part of the candidate jersey number. If the double-digit part contains the single digit part of the candidate jersey number, process flow continues to block 1212. If the double-digit part does not contain the single digit part of the candidate jersey number process flow proceeds to block 1216.
At block 1212, the frequency of the single digit part of the candidate jersey number is added to the frequency of the double-digit part of the candidate jersey number. At block 1214, the jersey number results are again sorted according to frequency. At block 1216 the candidate jersey number with the maximum frequency is selected as the final jersey number.
At block 1304, for each detected player player's location is determined. In embodiments, the location of the player may be a point within the captured 3D volume. In order to determine the location of the players within the 3D volume, the location of the player as captured by each camera at a time T is processed to derive the player location.
At block 1306, each player is classified as a non-profile player or a profile player. In embodiments, the player may be classified based on an orientation of the player with respect to the image plane of the camera. Additionally, in embodiments, the player may be classified as a profile or a non-profile player based on visibility of an identifier worn on the player. As described her in, the identifier is a jersey number. At block 1308, single view jersey number recognition is executed. The single view jersey number recognition takes as input a bounding box of the player in a camera image/frame/view, and an orientation of the player. Based on the input, the single view jersey number recognition extracts a plurality of features from the image of the player and determines a candidate jersey number for each camera view. At block 1310, the candidate jersey numbers are subjected to a cumulative voting process to determine a final jersey number. The cumulative voting process may be the process as described with regard to
The diagram of
Referring now to
The computing device 1400 may also include a graphics processing unit (GPU) 1408. As shown, the CPU 1402 may be coupled through the bus 1406 to the GPU 1408. The GPU 1408 may be configured to perform any number of graphics operations within the computing device 1400. For example, the GPU 1408 may be configured to render or manipulate graphics images, graphics frames, videos, or the like, to be displayed to a viewer of the computing device 1400.
The CPU 1402 may also be connected through the bus 1406 to an input/output (I/O) device interface 1410 configured to connect the computing device 1400 to one or more I/O devices 1412. The I/O devices 1412 may include, for example, a keyboard and a pointing device, wherein the pointing device may include a touchpad or a touchscreen, among others. The I/O devices 1412 may be built-in components of the computing device 1400, or may be devices that are externally connected to the computing device 1400. In some examples, the memory 1404 may be communicatively coupled to I/O devices 1412 through direct memory access (DMA).
The CPU 1402 may also be linked through the bus 1406 to a display interface 1414 configured to connect the computing device 1400 to a display device 1416. The display devices 1416 may include a display screen that is a built-in component of the computing device 1400. The display devices 1416 may also include a computer monitor, television, or projector, among others, that is internal to or externally connected to the computing device 1400. The display device 1416 may also include a head mounted display.
The computing device 1400 also includes a storage device 1418. The storage device 1418 is a physical memory such as a hard drive, an optical drive, a thumbdrive, an array of drives, a solid-state drive, or any combinations thereof. The storage device 1418 may also include remote storage drives.
The computing device 1400 may also include a network interface controller (NIC) 1420. The NIC 1420 may be configured to connect the computing device 1400 through the bus 1406 to a network 1422. The network 1422 may be a wide area network (WAN), local area network (LAN), or the Internet, among others. In some examples, the device may communicate with other devices through a wireless technology. For example, the device may communicate with other devices via a wireless local area network connection. In some examples, the device may connect and communicate with other devices via Bluetooth® or similar technology.
The computing device 1400 further includes an immersive viewing manager 1424. The immersive viewing manager 1424 may be configured to enable a 360° view of a sporting event from any angle. In particular images captured by a plurality of cameras may be processed such that an end user can virtually experience any location within the field of play. In particular, the end user may establish a viewpoint in the game, regardless of particular camera locations used to capture images of the sporting event. The immersive viewing manager 1424 includes an SCD module 1426 to determine isolated bounding boxes of each player in each captured camera view. An SCT module 1428 is to obtain the association of the bounding boxes of an identical player between frames in each camera view, assigning identical players a unique track ID between different frames.
An SJR module 1430 is to recognize the jersey number of a player. In embodiments, the jersey number is recognized for each player in real-time. The single use jersey number recognition as described herein includes pre-processing, feature extraction, feature matching, and non-maximum suppression. A single view jersey number recognition process takes as input the detected profile player images as defined by a bounding box. Features are extracted from the detected non-profile player images. A you only look once (YOLO) regression is applied to the extracted features. Finally, a hard NMS algorithm is applied to the features to obtain jersey number results.
An STC module 1432 is to recognize the team tag of a player. An MCA module 1434 uses bounding boxes of a player in one frame from each camera view to derive a 2D/3D location pf the player in the field of play. An MCT module 1436 derives correspondences and connects the temporal and spatial associations to determine a global player identification of each player in the field of play. Finally, a PTO module 1438 takes as input the jersey/team information and locations and generates player trajectories.
The block diagram of
The various software components discussed herein may be stored on one or more computer readable media 1500, as indicated in
An SJR module 1510 is to recognize the jersey number of a player. The single use jersey number recognition as described herein includes pre-processing, feature extraction, feature matching, and non-maximum suppression. A single view jersey number recognition process takes as input the detected profile player images as defined by a bounding box. Features are extracted from the detected non-profile player images. A you only look once (YOLO) regression is applied to the extracted features. Finally, a hard NMS algorithm is applied to the features to obtain jersey number results.
An STC module 1512 is to recognize the team tag of a player. An MCA module 1514 uses bounding boxes of a player in one frame from each camera view to derive a 2D/3D location pf the player in the field of play. An MCT module 1516 derives correspondences and connects the temporal and spatial associations to determine a global player identification of each player in the field of play. Finally, a PTO module 1518 takes as input the jersey/team information and locations and generates player trajectories.
The block diagram of
Example 1 is a method. The method includes detecting a player in a camera view captured by a camera; determining a player location of the player in each camera view, wherein the player location is defined by a bounding box; classifying the player as a profile player or a non-profile player based on a visibility of an identifier; in response to the player being a non-profile player: extracting features from the detected player within the bounding box; classifying a plurality of labels according to the extracted features; and selecting a label from the plurality of labels with a highest number of votes according to a voting policy as a final label.
Example 2 includes the method of example 1, including or excluding optional features. In this example, the method includes applying hard non-maximum suppression to the extracted features to obtain bounding boxes with the plurality of labels to be classified.
Example 3 includes the method of any one of examples 1 to 2, including or excluding optional features. In this example, the identifier is a jersey number worn by the player during game play.
Example 4 includes the method of any one of examples 1 to 3, including or excluding optional features. In this example, the classification of the player as a profile player or a non-profile player indicates the orientation of the player with respect to an image plane of the camera.
Example 5 includes the method of any one of examples 1 to 4, including or excluding optional features. In this example, the identifier of a non-profile player is substantially visible wherein the camera view of the identifier is used to derive the entire identifier.
Example 6 includes the method of any one of examples 1 to 5, including or excluding optional features. In this example, the identifier of each profile player is not substantially visible, wherein the camera view of the identifier cannot be used to derive the entire identifier.
Example 7 includes the method of any one of examples 1 to 6, including or excluding optional features. In this example, in response to the player being classified as a profile player, not using the camera view for jersey number recognition.
Example 8 includes the method of any one of examples 1 to 7, including or excluding optional features. In this example, in preparation for processing the extracted features by a convolutional neural network (CNN), the bounding box for the player is padded to correspond to an input size of the CNN.
Example 9 includes the method of any one of examples 1 to 8, including or excluding optional features. In this example, extracting features from the detected player within the bounding box precisely locates a candidate identifier.
Example 10 includes the method of any one of examples 1 to 9, including or excluding optional features. In this example, extracting features from the detected player within the bounding box extracts high-resolution low-level features and higher-level semantic low-resolution features.
Example 11 is a system. The system includes a processor to: detect a player in a camera view captured by a camera; determine a player location of the player in each camera view, wherein the player location is defined by a bounding box; classify the player as a profile player or a non-profile player based on a visibility of an identifier; and in response to the player being a non-profile player: extract features from the detected player within the bounding box; classify the features according to a label; and select a label with a highest number of votes according to a voting policy as a final label.
Example 12 includes the system of example 11, including or excluding optional features. In this example, the identifier is a jersey number worn by the player during game play.
Example 13 includes the system of any one of examples 11 to 12, including or excluding optional features. In this example, the classification of the player as a profile player or a non-profile player indicates the orientation of the player with respect to an image plane of the camera.
Example 14 includes the system of any one of examples 11 to 13, including or excluding optional features. In this example, the identifier of a non-profile player is substantially visible wherein the camera view of the identifier is used to derive the entire identifier.
Example 15 includes the system of any one of examples 11 to 14, including or excluding optional features. In this example, the identifier of each profile player is not substantially visible, wherein the camera view of the identifier cannot be used to derive the entire identifier.
Example 16 includes the system of any one of examples 11 to 15, including or excluding optional features. In this example, in response to the player being classified as a profile player, not using the camera view for jersey number recognition.
Example 17 includes the system of any one of examples 11 to 16, including or excluding optional features. In this example, in preparation for processing the extracted features by a convolutional neural network (CNN), the bounding box for the player is padded to correspond to an input size of the CNN.
Example 18 includes the system of any one of examples 11 to 17, including or excluding optional features. In this example, extracting features from the detected player within the bounding box precisely locates a candidate identifier.
Example 19 includes the system of any one of examples 11 to 18, including or excluding optional features. In this example, extracting features from the detected player within the bounding box extracts high-resolution low-level features and higher-level semantic low-resolution features.
Example 20 includes the system of any one of examples 11 to 19, including or excluding optional features. In this example, hard non-maximum suppression is applied to the extracted features.
Example 21 is at least one non-transitory computer-readable medium. The computer-readable medium includes instructions that direct the processor to detect a player in a camera view captured by a camera; determine a player location of the player in each camera view, wherein the player location is defined by a bounding box; classify the player as a profile player or a non-profile player based on a visibility of an identifier; in response to the player being a non-profile player: extract features from the detected player within the bounding box; classify a plurality of labels according to the extracted features; and select a label from the plurality of labels with a highest number of votes according to a voting policy as a final label.
Example 22 includes the computer-readable medium of example 21, including or excluding optional features. In this example, the computer-readable medium includes applying hard non-maximum suppression to the extracted features to obtain bounding boxes with the plurality of labels to be classified.
Example 23 includes the computer-readable medium of any one of examples 21 to 22, including or excluding optional features. In this example, the identifier is a jersey number worn by the player during game play.
Example 24 includes the computer-readable medium of any one of examples 21 to 23, including or excluding optional features. In this example, the classification of the player as a profile player or a non-profile player indicates the orientation of the player with respect to an image plane of the camera.
Example 25 includes the computer-readable medium of any one of examples 21 to 24, including or excluding optional features. In this example, the identifier of a non-profile player is substantially visible wherein the camera view of the identifier is used to derive the entire identifier.
Not all components, features, structures, characteristics, etc. described and illustrated herein need be included in a particular aspect or aspects. If the specification states a component, feature, structure, or characteristic “may”, “might”, “can” or “could” be included, for example, that particular component, feature, structure, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, that does not mean there is only one of the element. If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional element.
It is to be noted that, although some aspects have been described in reference to particular implementations, other implementations are possible according to some aspects. Additionally, the arrangement and/or order of circuit elements or other features illustrated in the drawings and/or described herein need not be arranged in the particular way illustrated and described. Many other arrangements are possible according to some aspects.
In each system shown in a figure, the elements in some cases may each have a same reference number or a different reference number to suggest that the elements represented could be different and/or similar. However, an element may be flexible enough to have different implementations and work with some or all of the systems shown or described herein. The various elements shown in the figures may be the same or different. Which one is referred to as a first element and which is called a second element is arbitrary.
It is to be understood that specifics in the aforementioned examples may be used anywhere in one or more aspects. For instance, all optional features of the computing device described above may also be implemented with respect to either of the methods or the computer-readable medium described herein. Furthermore, although flow diagrams and/or state diagrams may have been used herein to describe aspects, the techniques are not limited to those diagrams or to corresponding descriptions herein. For example, flow need not move through each illustrated box or state or in exactly the same order as illustrated and described herein.
The present techniques are not restricted to the particular details listed herein. Indeed, those skilled in the art having the benefit of this disclosure will appreciate that many other variations from the foregoing description and drawings may be made within the scope of the present techniques. Accordingly, it is the following claims including any amendments thereto that define the scope of the present techniques.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2019/098518 | 7/31/2019 | WO | 00 |