Embodiments of the present invention relate to methods and systems for predicting a future event based on spatiotemporal data.
The professional coverage of sporting events relies on extensive state-of-the-art technologies to provide unique experiences and better insights for viewers. Emerging technologies, including advance data capturing sensors and their calibration techniques, event recognition methods, and automatic detection and tracking systems, generate live raw data that are instrumental for processes that augment the broadcast video with instantaneous game-dependent graphics. These readily available raw data enable analyses that improve viewer understanding of live game developments and enrich coverage with contextual information about the players' and the teams' present and historical performances. Especially, knowledge of the teams' playing strategies and tactics is instrumental in capturing and covering their plays; the way a certain team interacts with another may be characterized and used to predict its future actions. Similarly, patterns of interactions among players may be learned and then used to predict a player's next moves and their outcome.
Being able to predict a player's future moves may be applicable to many tasks pertaining to delivering a live coverage of a sporting event. For example, applications for future event prediction may include allowing for informed camera steering or for providing supplementary information to commentators, coaches, or viewers with immediate highlights of the teams' maneuvers throughout the game. For instance, in a team-game that is focused on the whereabouts of the ball (or any other playing object such as the puck in a hockey game) knowing who might be the next player to own or handle the ball may be useful in improving automatic tracking of game participants. Likewise, in a tennis game, predicting the next shot's location may facilitate live predictive analyses. Other application domains that include observations of elements that interact with each other according to some pattern may also benefit from future event prediction. For example, surveillance systems monitoring people's movements, gestures, or communications may benefit from prediction of their future actions.
Probabilistic estimation methods utilize the statistical dependency among a problem domain's random variables to estimate (or classify) a subset of random variables based on another. Specifically, structured classification models use statistical dependency to label state variables based on other states and observed (i.e. input measurements) variables. Such structured classification models may be represented by a graph wherein random variables (i.e. state variables or observation variables) are assigned to the graph's nodes and the graph's edges denote an assumed statistical dependency among the variables assigned to those nodes. Typically, in a multivariate estimation problem the objective is to estimate the value of state vector y based on observation vector x. The optimal approach for solving this involves modeling the Joint Probability Distribution Function (j-PDF) p(y,x). However, constructing a j-PDF over y and x may lead to intractable formulations, especially in cases where vector x is of high dimensionality and includes complex inter-dependencies. One way to reduce such complexity is to assume statistical independence among subsets of model variables. This allows factorization of the j-PDF into products of local functions. As will be shown below, graphical modeling is helpful in depicting an assumed factorization of p(y,x).
A graph may be constructed to represent a sequence of state variables y and their associated observation variables x where the goal is, for example, to label (classify) the state variables based on the observation variables. For instance, Hidden Markov Models (HMM) have often been used to label variables in segmentation tasks. An HMM includes states y={yj}j=1m and associated observations x={xj}=j=m where an observation vector xj includes any observable (measurable) data that may influence any of the problem defined state variables yj. To reduce the complexity of naive HMM joint distribution modeling, it is assumed 1) that each state yj depends only on its immediate predecessor state yj−1 and 2) that each observation xi depends only on the corresponding state yj. These assumptions lead to the following factorization of the j-PDF:
A graphical description of this factorization is shown in
In general, to classify or label y based on the given observations in x, the conditional distribution function p(y|x) (i.e. the posterior probability) is required. Given the HMM modeling of the joint distribution in (1), the conditional distribution p(y|j) may be calculated out of p(y,x) using Baye's rule. Note that the HMM model is considered in the art as a generative model: p(xj|yj) describes how a label yj statistically “generates” a feature vector xj. An alternative approach is a discriminative model wherein the conditional probability p(y|x) is modeled directly. A popular discriminative model is Conditional Random Field (CRF). A CRF model is not complicated by complex dependencies that involve variables in x. Thus, the expression for the conditional probability is simpler than that for the joint probability model HMM. CRF-based models are better suited when a larger and overlapping set of observation variables are required to closely approximate the problem domain.
CRF models differ based on the way the conditional distribution p(y|x) is factored. For example, yj may be influenced by (or statistically dependent on) yj−1, xj−1, xj, and xj+1. Alternatively, in a linear-chain CRF, yj assumed to be influenced merely by yj−1 and xj, as demonstrated by the undirected graph 110 in
where Ψ(y,x; θ)ε is a potential function parameterized by θ:
CRFs were introduced by Lafferty et al. (see Conditional random field: probabilistic models for segmenting and labeling sequence data, ICML-2001). CRFs have since been widely used for various applications such as tracking, image segmentation, and activity/object recognition. As mentioned above, to maintain tractability, HMM assumes inter-independency among observation variables. In contrast, CRF, by virtue of directly modeling the conditional distribution function, allows for direct interactions among the observation variables. CRF is limited by the assumption of Markovian behavior (i.e. a state depends only on its previous state), but this limitation is relaxed by a high-order CRF where a state may depend on several previous states. Nonetheless, in a CRF model, the parameter vector θ is optimized to estimate the most likely sequence y based on the given x, while in a prediction problem what is required is to estimate the most likely future state yj+1 based on {yj, yj−1, . . . yj−m+1} and x. As will be explained below, this problem may be solved by defining the states {yj,yj−1, . . . yj−m+1} as hidden-states and optimizing for only yj+1.
Generally, models that include hidden-state structures provide more flexibility in representing the problem domain relative to fully observable models (e.g. CRF). Hence, a Hidden-state Conditional Random Field (HCRF) model was proposed by Quattoni et al. where intermediate variables are used to model the latent structure of the problem domain (see “Hidden state conditional random fields” in PAMI, 2007).
is the vector of local observations. The hidden states are represented by
Each hj may take a value out of a set of values . The HCRF model is defined as follows:
where the potential function in this model may be:
The model parameter vector θ is computed in a training process wherein a training dataset, including labeled examples
is used to estimate the parameter vector utilizing an objective function such as
where log p(yi|xi; θ) is the log-likelihood of the data and
is the log of Gaussian prior over θ. The optimal parameter vector θ*is derived by maximizing £(θ):
Known-in-the-art optimization methods may be used to search for θ(e.g. gradient ascent based methods). In cases where the objective function is not convex, global searching schemes are typically applied to prevent the search from getting trapped in a local maximum.
Hence, a classification task of labeling the event y generally comprises a learning phase and a testing-phase. The learning phase is typically accomplished offline and, as explained above, is directed at finding the optimal parameter vector θbased on any suitable objective function such as (6). Having the optimal parameter vector, the classifier is operative and ready for labeling in the subsequent testing-phase. In the testing-phase, given an input x (out of a testing dataset) and the optimal parameter vector θ, the label of event y is estimated by yas follows:
The computation of y, referred to as inference in the art, results in the labeling of event y. The accuracy of this labeling depends, in part, on how well the training dataset is representative of the testing dataset.
An HCRF model introduces improvement with respect to a basic CRF model as it optimizes yj+1 directly and allows statistical dependency between yj+1 and previous states (high-order CRF). However, yj+1 is assumed not to be directly influenced by the observations x={xj}j=1m (they are not edge-connected in the HCRF graph 120). Depending on the problem domain, event yj+1 may be influenced by local observations xj captured within the temporal neighborhood of tj as well as by relatively more global observations. Especially in today's advanced and accessible capturing technologies, rich spatiotemporal data may be collected and readily available for processing by efficient computing systems. Future events are likely to be statistically dependent on these spatiotemporal data, and, therefore, these data predictive capability should be leveraged. Systems and methods that directly model the influence that observed spatiotemporal data have on future events are needed.
Known in the art methods have employed HMMs and CRFs for controlling autonomous cars and for Neuro-Linguistic Programming (NLP) pattern recognition, for instance. In these application domains the problem space can be formulated into states that may be reliably labeled by a human to form a training dataset. As these are cooperative environments, they give rise to predictable outcomes. For example, in controlling autonomous cars the behavior of pedestrians is foreseeable (e.g. people tend to stand at the street corner while waiting for the lights to change). Likewise, in NLP, sentences are expected to consist of sentence-parts (e.g. nouns, verbs, etc.). Therefore, in these domains reliable labelling of a model's states in the training phase may be achieved and future behavior may be approximated by a Markovian assumption.
On the other hand, sporting events are non-cooperative environments. Players in a team-game exhibit continuous and adversarial behavior, and, therefore, labeling game states may be a more difficult task. Moreover, predicting future behavior is complex, as interactions among multiple factors require modeling longer term dependencies. As mentioned above, HCRF and high-order CRF models have been introduced to counter this complexity, where a-priori knowledge of the hidden-states is not required and longer-term dependencies can be incorporated, respectively. Accordingly, in the HCRF model prediction is done based on the hidden-states. This allows for capturing contextual information about the future event. To further improve prediction accuracy in a dynamic environment, such as a team-game, methods that directly condition the final prediction on the input observations as well as on the hidden states are required.
Embodiments of the invention are described with reference to the accompanying drawings.
Methods and systems for predicting a future event are provided. Embodiments of the invention disclosed herein describe future event prediction in the context of predicting the future owner of the ball in a soccer game as well as predicting the future location of the next shot in a tennis game. While particular application domains are used to describe aspects of this invention, it should be understood that the invention is not limited thereto. Those skilled in the art with access to the teachings provided herein will recognize additional modifications, applications, and embodiments within the scope thereof and additional fields in which the invention would be of significant utility.
A new model is presented herein, namely Augmented Hidden-states Conditional Random Field (a-HCRF), that may be used for the prediction of a future event. The a-HCRF is a discriminative classifier that leverages on the assumed direct interaction between a future event and observed spatiotemporal data measured at a time segment prior to the predicted event. Current and past states' influence on the future event are also factored into the proposed a-HCRF model.
The a-HCRF model disclosed herein is described in the context of labeling a future event (e.g. labeling ball possession in a soccer game and shot location in a tennis game) based on a temporal series of hidden states and associated observation measurements. A person skilled in the art will appreciate that other applications of the a-HCRF model to other problem domains may be used without departing from the spirit and scope of this invention's embodiments. For example, a-HCRF may include hidden states that are corresponding to points in time that are ahead of the “future event” or hidden states that may correspond to points in spaces other than time.
In an embodiment, the goal may be to classify a future event y; meaning to assign the most likely label to y, out of a set of possible labels y, based on both a series of current and historical events h={hj}j=1m and given corresponding observations x={xj}j=1m. hj may share the same set of labels with y (i.e hj εy) or assumes membership of another set of labels (i.e hj εy) depending on the application domain. An observation xj may include any measurements such as an image or a sequence of video-frames. Typically, an observation is represented by a feature vector
that compactly characterizes the raw observation data. For example, xj may be representing a local observation such as a video-frame that was captured at time tj. In this case, the feature-vector
may include positional data of objects (e.g. players/ball) as well as any descriptors that may be extracted from objects' image in the video frame. These descriptors may measure texture, color, and shape from which further information may be deducted such as the objects' identity. Notice that the feature-vector extracted from x also may include information that is more global in nature. For example, the most recent soccer game phase (e.g. passes, shots, free-kicks, corners, substitutions, etc.).
Similar to HCRF, the posterior of the a-HCRF model may be specified by the expression in (4). The difference is in the formulation of the a-HCRF model's potential function Ψ(y,h,x; θ):
Thus, φ(x,j,ω) is a feature-vector computed based on the observation xj, including measurements that were recorded within a time window ω relative to tj. The a-HCRF model's parameters includes: 1) parameters θk associated with the hidden states hj, 2) parameters θy associated with event y and the hidden states hj, 3) parameters θo associated with event y and a pair of edge-connected states hj and hk, and 4) parameters θp associated with event y given all observations x. Jointly, the model parameter-vector includes
It is apparent that the terms in (9) correspond to a factorization that is consistent with graph 200. Each term measures the joint compatibility of variables that are assigned to nodes connected by edges. The first term φ(x,j,ω)·θh[hj] reflects the compatibility between hidden state hj and observation xj. The second term θy[y·hj] reflects the compatibility between event y and hidden state hj, while the third term θ[y,hj,hk] reflects the compatibility between event y and a pair of connected hidden states hj and hk. The last term (φ(x,ω)·θp[y]/k reflects the compatibility between all the observations and event y, where k denotes the number of possible combinations of h.
Exemplary embodiments of this invention utilize the a-HCRF model to perform prediction of future game-events, such as what player will next own the ball in a team-game such as soccer.
Hence, according to an embodiment and in reference to the a-HCRF graph 200, the hidden state hj is defined as the owner of the ball at time tj. Similarly, the hidden state hj−1 is defined as the owner of the ball at a point in time previous to tj, denoted by tj−1. The predicted event y is defined as the “future ball owner” at time t⊥(j+1) (after tj). The time steps between two successive states, tj−1 and tj may vary, depending on the application, in the magnitude order of seconds. xj to xj−m+1 in graph 200 represent the observations, and, by extension, the feature-vectors φ(x,j,ω) derived from them. Features may be extracted from data captured during a window time ω. For example, φ(x,j,ω) may represent a feature-vector that was extracted from video frames captured in a time window between tj and tj−ω.
As mention above, the potential function comprises of products of factor functions consistent with the model's graph topology 200. Each factor function is indicative of an influence (or statistical dependency) among the participating variables (i.e. state and observation variables) it includes. In the context of predicting ball possession and with reference to (9), for example, the pairwise potential θ[y,hj,hk] may measure the tactics used in a team's passing pattern (e.g. the frequency in which a certain player passes the ball to another certain player). The potential φ(x,j,ω)·θh[hj] may measure the compatibility between a certain player and a set of features. Therefore, in embodiments of this invention, a future event y (i.e. a future owner of the ball) is influenced by previous ownerships of the ball and by observation data captured in past or current times.
Prior to employing the prediction method, the parameters of the a-HCRF predictor need to be estimated in a process known as training.
According to embodiments of this invention, a continuous segment of time wherein events (represented by the hidden states) are unfolded is utilized. When employed for predicting the future owner of the ball in a soccer game, a continuous segment of time wherein a team is in possession of the ball precedes the prediction of that team's upcoming (future) passing of the ball. Assuming that the a-HCRF model includes m states, as depicted in graph 200, and that δtj≡tj−tj−1, the length of this continuous segment may in general be S=δtj+δtj−1+δtj−2+ . . . +δtj−m+1 seconds or S=m·δt seconds when δt=δtj. Hence, training of team-A's model 540 or team-B's model 560 is done based on training data extracted from continuous segments in which the ball is in team-A's possession or in team-B's possession, respectively.
Consequently, in
Following models' construction in 545 and 565, the models' parameters, θA and θB, are estimated in steps 550 and 570 using the training datasets of team-A and team-B, respectively. As mentioned above, a training dataset comprises of examples of a model's variables: {xk}k=jj−m for which the future event y is known. For instance, training sets, with respect to each team, may include N pairs of labeled data: {xi,yi}i=1N.
Embodiments of the current invention may also be employed for predicting the location of the next tennis shot. As illustrated in
Similar to predicting the ball's ownership, predicting the location of the next shot in tennis (i.e. future game-event) may be carried out by employing a training and a testing processes, as shown in
For both soccer and tennis embodiments described above, the a-HCRF models were trained based on data captured from games of which a team (or player) of interest played against various opponent teams (or players). In adversarial sports the behavior of the team of interest throughout the match depends on the team it plays against. In practice, though, training a probabilistic model for each pair of specific teams (or players) is challenging as not enough data is available for training. Thus, embodiments of this invention employ model adaptation, where two models are combined. The first model is the one that was trained using data from all games including the team (or players) of interest, namely Generic Behavior Model (GBM). The second model is the one that was trained using data from all games including the team (or players) of interest playing against a specific opposition, namely Opposition Specific Model (OSM). The GBM and OSM models may be combined to improve the predictive capability of each model when used independently. Fusion, then, may be done at different levels. For example, the feature-vectors or the parameter-vectors of each model may be combined. Alternatively, the output of the GBM's and the OSM's predictors may be combined, for instance, by the linear combination:
Pcomb=w1·PGBM+w2·POSM, (10)
where wi≧0, t=1,2 and w1+wz=1. The wi value may be estimated through optimization process wherein the optimal wt minimizes the prediction error (or maximizes the prediction rate).
Myriad applications may benefit from the future event prediction method provided by embodiments of this invention. For example, knowledge of the next shot's location in a tennis game may be used to assist automatic steering of a measurement device (e.g. a broadcast camera). Similarly, knowing the position or identity of the next player to own the ball in a soccer game may be used to insert graphical highlights into a video stream capturing the game activities. Such highlights may include graphical overlays containing information related to the future owner of the ball (i.e. the predicted future event).
Although embodiments of this invention have been described following certain structures or methodologies, it is to be understood that embodiments of this invention defined in the appended claims are not limited by the certain structures or methodologies. Rather, the certain structures or methodologies are disclosed as exemplary implementation modes of the claimed invention. Modifications may be devised by those skilled in the art without departing from the spirit or scope of the present invention.