The present invention generally relates to a method and associated apparatus for using a trajectory-based technique to detect a moving object in a video sequence, such as the ball in a soccer game. In one embodiment, the method comprises steps of identifying and evaluating sets of connected components in a video frame, filtering the list of connected components by comparing features of the connected components to predetermined criteria, identifying candidate trajectories across multiple frames, evaluating the candidate trajectories to determine a selected trajectory, and processing images in the video sequence based at least in part upon the selected trajectory.
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
As mobile devices have become more capable and mobile digital television standards have developed, it has become increasingly practical to view video programming on such devices. The small screens of these devices, however, present some limitations, particularly for the viewing of sporting events. Small objects, such as the ball in a sports program, can be difficult to see. The use of high video compression ratios can exacerbate the situation by significantly degrading the appearance of small objects like a ball, particularly in a far-view scene.
It can therefore be desirable to apply image processing to enhance the appearance of the ball. However, detecting the ball in sports videos is a challenging problem. For instance, the ball can be occluded or merged with field lines. Even when it is completely visible, its properties, such as shape, area, and color, may vary from frame to frame. Furthermore, if there are many objects with ball-like properties in a frame, it is difficult to make a decisions as to which is the ball based upon only one frame, and thus difficult to perform image enhancement. The invention described herein addresses these and/or other problems.
In order to solve the problems described above, the present invention concerns a method and associated apparatus for using a trajectory-based technique to detect a moving object in a video sequence, such as the ball in a soccer game. In one embodiment, the method comprises steps of identifying and evaluating sets of connected components in a video frame, filtering the list of connected components by comparing features of the connected components to predetermined criteria, identifying candidate trajectories across multiple frames, evaluating the candidate trajectories to determine a selected trajectory, and processing images in the video sequence based at least in part upon the selected trajectory. This and other aspects of the invention will be described in detail with reference to the accompanying drawings.
The above-mentioned and other features and advantages of this invention, and the manner of attaining them, will become more apparent, and the invention will be better understood, by reference to the following description of embodiments of the invention taken in conjunction with the accompanying drawings, wherein:
The exemplifications set out herein illustrate preferred embodiments of the invention, and such exemplifications are not to be construed as limiting the scope of the invention in any manner.
As described herein, the present invention provides a method and associated apparatus for using a trajectory-based technique to detect a moving object in a video sequence, such as the ball in a soccer game. In one embodiment, the method comprises steps of identifying and evaluating sets of connected components in a video frame, filtering the list of connected components by comparing features of the connected components to predetermined criteria, identifying candidate trajectories across multiple frames, evaluating the candidate trajectories to determine a selected trajectory, and processing images in the video sequence based at least in part upon the selected trajectory.
While this invention has been described as having a preferred design, the present invention can be further modified within the spirit and scope of this disclosure. This application is therefore intended to cover any variations, uses, or adaptations of the invention using its general principles. Further, this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains and which fall within the limits of the appended claims.
The present invention may be implemented in signal processing hardware or software within a television production or transmission environment. The method may be performed off-line or in real-time through the use of a look-ahead window.
At step 120, input frames from the video sequence 110 are processed into binary field masks. The mask generation process comprises detecting the grass regions to generate a grass mask GM and then computing the playfield mask, PM, which is the solid area covering these grass regions. In a simple case, the pixels representing the playing field are identified using the knowledge that the field is generally covered in grass or grass-colored material. The result is a binary mask classifying all field pixels with a value of 1 and all non-field pixels, including objects in the field, with a value of 0. Various image processing techniques may then be used to then identify the boundaries of the playing field and create a solid field mask. For instance, all pixels within a simple bounding box encompassing all of the contiguous regions of field pixels above a certain area threshold may be included in the field mask. Other techniques, including the use of filters, may also be used to identify the field and eliminate foreground objects from the field mask. The mask generation process is further described below with respect to
At step 130, an initial set of candidate objects that may be the ball are identified. First, local luminance maxima in the video frame are detected by convolving the luminance component Y of the frame F with a normalized Gaussian kernel Gnk, generating the output image Yconv. A pixel (x,y) is designated as a local maximum if Y(x,y)>Yconv(x,y)+Tlmax, where Tlmax is a preset threshold. This approach generally isolates pixels representing the ball, but also isolates parts of the players, field lines, goalmouths, and other features, since these features also contain bright spots. In a preferred embodiment, Gnk is a 9×9 Gaussian kernel with variance 4 and the threshold Tlmax is 0.1.
The result of the luminance maxima detection process is a binary image Ilm with 1's denoting bright spots. Various clusters of pixels, or connected components, will appear in the image Jlm. The set of connected components in Ilm, Z={Z1, Z2, . . . , Zn}, are termed “candidates,” one of which is likely to represent the ball. Information from the playfield detection of step 120 may be used at step 130, or at step 140 described below, to reduce the number of candidates. In far-view scenes, the assumption can be made that the ball will be inside the playfield, and that objects outside the playfield may be ignored. The candidate generation process is also further described below with respect to
At step 140, those candidates from step 130 that are unlikely to be the ball are eliminated using a sieving and qualification process. To determine which candidates should be discarded, a score is computed for each candidate, providing a quantification of how similar each candidate is to a pre-established model of the ball. In a preferred embodiment, three features of the ball are considered:
Analysis of sample video has shown that both area and whiteness histograms follow a Gaussian distribution. The eccentricity histogram also follows a Gaussian distribution after a symmetrization to account for the minimum value of eccentricity being 1. Candidates can be rejected if their feature values lie outside the range μ±nσ, where μ is the mean and σ is the standard deviation of the corresponding feature distribution. Based on this sieving process, candidates in Z can be accepted as ball-like or rejected. A loose range is used because the features of the ball could vary significantly from frame to frame. Colors other than white, and subsequently the “whiteness” component used in this exemplary embodiment can be substituted with the appropriate color of any device, such as orange for a basketball, brown for a football, or black for a puck.
In a preferred embodiment, A is modeled as a Gaussian distribution with μA=7.416 and σA=2.7443, and the range is controlled by nA=3. E is modeled as a Gaussian distribution with μE=1 and σE=1.2355, and the range is controlled by nE=3. W is modeled as a Gaussian distribution with μw=0.14337 and σw=0.034274, and the range is controlled by nw=3. Candidates must meet all three criteria to be kept. The sieving process may be repeated with tighter values of n to produce smaller numbers of candidates.
Also in step 140, the candidates C that pass the initial sieving process are further qualified based upon factors including:
In a preferred embodiment, the ball is expected to be an isolated object inside the playfield most of the time, in contrast to objects like the socks of players, which are always close to each other. Hence, candidates without a close neighbor, and with a high value of DCC, are more likely to be the ball. Likewise, the ball is also not expected to be near the boundaries of the field. This assumption is especially important if there are other spare balls inside the grass but outside the bounding lines of the playfield.
The object mask OM provides information about which pixels inside the playfield are not grass. This includes players and field lines, which may contain “ball-like” blobs inside them (e.g., socks of players or line fragments). Ideally, ball candidates should not lie inside other larger blobs. As we expect only one candidate C1 inside a connected component of the OM, NCOMi is expected to be 1 in our ideal model.
A score Si for a candidate Ci is computed as:
At this point, candidates having a score equal to 0 are rejected. For the remaining candidates, the score Si is penalized using the other features as follow:
In a preferred embodiment, μA=7.416, σA=2.7443, nA=1.3; μE=1, σE=1.2355, nE=1.3; μw=0.14337, σw=0.034274, nw=1.3; DCCthr=7 pixels, DFthr=10 pixels and NCOMthr=1. The candidate generation process is further described and illustrated below with respect to
At step 150, starting points of trajectories, or “seeds,” are identified. A seed SEEDk is a pair of ball candidates {Ci, Cj} in two consecutive frames Ft, Ft+1, where Ci belongs to Ft and Cj belongs to Ft+1, such that the candidates of the pair {Ci, Cj} are spatially closer to each other than a threshold value SEEDthr, and furthermore meet either the criteria that the score of one candidate is three, or that the score of both candidates is two. In a preferred embodiment, SEEDthr=8 pixels. Criteria may be altered to address other concerns, such as time complexity.
At step 160, candidate trajectories are created from the seeds from step 150. A trajectory Ti{C1i, C2i, . . . , CiN} is defined as a set of candidates in contiguous frames, one per frame, which form a viable hypothesis of a smoothly moving object in a certain time interval or frame range generated using the seed SEEDi.
A linear Kalman filter is used to create the trajectories by growing the seed in both directions. The two samples that compose the seed determine the initial state for the filter. Using this information, the filter predicts the position of the ball candidate in the next frame. If there is a candidate in the next frame inside a search window centered at the predicted position, the candidate nearest to the predicted position is added to the trajectory and its position is used to update the filter. If no candidate is found in the window, the predicted position is added to the trajectory as an unsupported point and is used to update the filter.
In a preferred embodiment, a trajectory building procedure is terminated if a) there are no candidates near the predicted positions for N consecutive frames, and b) there are more than K candidates near the predicted position (e.g., K=1). The filter works in a bidirectional manner, so after growing the trajectory forward in time, the Kalman filter is re-initialized and grown backward in time. The first criterion to terminate a trajectory produces a set of unsupported points at its extremes. These unsupported points are then eliminated from the trajectory. The trajectory generation and selection process is further described an illustrated below with respect to
Some of the candidate trajectories T={T1, T2, . . . , TM} may be parts of the path described by the actual ball, while others are trajectories related to other objects. The goal of the algorithm is to create a trajectory BT by selecting a subset of trajectories likely to represent the path of the actual ball, while rejecting the others. The algorithm comprises the use of a trajectory confidence index, a trajectory overlap index, and a trajectory distance index. A score for each trajectory is generated based on the length of the trajectory, the scores of the candidates that compose the trajectory, and the number of unsupported points in the trajectory.
A confidence index Ω(Ti) is computed for the trajectory Tj as:
Ω(Tj)=Σi=13λipi+Σi=23ωiqi=τr
where:
In a preferred embodiment λ1=0.002, λ2=0.2, λ3=5, ω2=0.8, ω3=2, and τ=10.
For each selected trajectory, there may be others that overlap in time. If the overlap index is high, the corresponding trajectory will be discarded. If the index is low, the overlapping part of the competing trajectory will be trimmed.
The overlap index penalizes the number of overlapping frames while rewarding long trajectories with a high confidence index, and is computed as:
where:
The use of the trajectory distance index increases the spatial-temporal consistency of BT. Using the assumption that the ball moves at a maximum velocity Vmax pixels/frame, two trajectories BT and Ti are incompatible if the spatial distance of the ball candidates between the closest extremes of the trajectories is higher than Vmax times the number of frames between the extremes plus a tolerance D. Otherwise, they are compatible and Ti can be part of BT.
The distance index is given by:
and where:
If DI(BT, Ti)=1, then the trajectory Ti is consistent with BT. Without this criterion, adding Ti to BT can present the problem of temporal inconsistency, where the ball may jump from one spatial location to another in an impossibly small time interval. By adding the distance index criterion in the trajectory selection algorithm, this problem is solved. In a preferred embodiment, Vmax=10 pixels/frame and D=10 pixels.
Given T, the set of candidate trajectories, the algorithm produces as output BT, a subset of candidate trajectories that describe the trajectory of the ball along the video sequence. The algorithm iteratively takes the trajectory from T with the highest confidence index and moves it to BT. Then, all the trajectories in T overlapping with BT are processed, trimming or deleting them depending on the overlapping index χ(BT, Ti) and the distance index DI(BT,Ti). The algorithm stops when there are no more trajectories in T.
The algorithm can be described as follows:
The trim operation trim(BT, Ti) consists of removing from the trajectory Ti all candidates lying in the overlapping frames between BT and T. If this process leads to temporal fragmentation of Ti (i.e., candidates are removed from the middle), the fragments are added as new trajectories to T and Ti is removed from T. In a preferred embodiment, the overlap index threshold Othr=0.5 is used.
With the ball trajectory selected, frames may be processed so as to enhance the appearance of the ball. For instance, a highlight color may be placed over the location or path of the ball to allow the viewer to more easily identify its location. The trajectory may also be used at the encoding stage to control local or global compression ratios to preserve sufficient image quality for the ball to be viewable.
The results of various steps of method 100 are illustrated in
An alternative method to create the final ball trajectory is based on Dijkstra's shortest path algorithm. The candidate trajectories are seen as nodes in a graph. The edge between two nodes (or trajectories) is weighted by a measure of compatibility between the two trajectories. The reciprocal of the compatibility measure can be seen as the distance between the nodes. If the start and end trajectories (Ts, Te) of the entire ball path are known, the trajectories in between can be selected using Dijkstra's algorithm which finds the shortest path in the graph between nodes Ts and Te by minimizing the sum of distances along the path.
As a first step, a compatibility matrix containing the compatibility scores between trajectories is generated. The cell (i, j) of the N×N compatibility matrix contains the compatibility score between the trajectories Ti and Tj, where N is number of candidate trajectories.
If two trajectories Ti and Tj overlap by more than a certain threshold, or Ti ends after Tj, the compatibility index between them will be infinite. By enforcing a rule that Ti ends after Tj, we ensure that the path always goes forward in time. Note that this criterion means that the compatibility matrix is not symmetric, as φ(Ti, Tj) need not be the same as φ(Ti, Tj). If the overlapping index between Ti and Tj is small, the trajectory with lower confidence index will be trimmed for purposes of computing the compatibility index.
The compatibility index between the two trajectories is defined as:
where:
In a preferred embodiment, α=−1/70, β=−0.1 and γ=−0.1.
Once the compatibility matrix is created, Dijkstra's shortest path algorithm can be used to minimize the distance (i.e., the reciprocal of compatibility) to travel from one trajectory node to another.
If the start and end trajectories (Ts, Te) of the entire ball path are known, the intermediate trajectories can be found using the shortest path algorithm. However, Ts and Te are not known a priori. In order to reduce the complexity of checking all combinations of start and end trajectories, only a subset of all combinations is considered, using trajectories with a confidence index higher than a threshold. Each combination of start and end trajectories (nodes) is considered in turn and the shortest path is computed as described earlier. Finally, the overall best path among all these combinations is selected.
The best ball trajectory will have a low cost and be temporally long, minimizing the function:
SC(Q)=w×(CD(Q)/max—c)+(1−w)×((1−length(Q))/max—l)
where:
In a preferred embodiment, w=0.5.
While the present invention has been described in terms of a specific embodiment, it will be appreciated that modifications may be made which will fall within the scope of the invention. For example, various processing steps may be implemented separately or combined, and may be implemented in general purpose or dedicated data processing hardware or in software, and thresholds and other parameters may be adjusted to suit varying types of video input.
This application claims priority to and all benefits accruing from provisional application filed in the United States Patent and Trademark Office on Jul. 21, 2009 and assigned Ser. No. 61/271,396.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2010/002039 | 7/20/2010 | WO | 00 | 1/20/2012 |
Number | Date | Country | |
---|---|---|---|
61271396 | Jul 2009 | US |