Method and apparatus to disambiguate state information for multiple items tracking

Description

TECHNICAL FIELD

This invention relates generally to the tracking of multiple items.

BACKGROUND

The tracking of multiple objects (such as, but not limited to, objects in a video sequence) is known in the art. Considerable interest exists in this regard as successful results find application in various use case settings, including but not limited to target identification, surveillance, video coding, and communications. The tracking of multiple objects becomes particularly challenging when objects that are similar in appearance draw close to one another or present partial or complete occlusions. In such cases, modeling the interaction amongst objects and solving the corresponding data association problem comprises a significant problem.

A widely adopted solution to address this need uses a centralized solution that introduces a joint state space representation that concatenates all of the object's states together to form a large resultant meta state. This approach provides for inferring the joint data association by characterization of all possible associations between objects and observations using any of a variety of known techniques. Though successful for many purposes, unfortunately such approaches are neither a comprehensive solution nor always a desirable approach in and of themselves.

As one example in this regard, these approaches tend to handle an error merge problem at tremendous computational cost due to the complexity inherent to the high dimensionality of the joint state representation. In general, this complexity tends to grow exponentially with respect to the number of objects being tracked. As a result, in many real world applications these approaches are simply impractical for real-time purposes.

BRIEF DESCRIPTION OF THE DRAWINGS

The above needs are at least partially met through provision of the method and apparatus to facilitate disambiguating state information for multiple items described in the following detailed description, particularly when studied in conjunction with the drawings, wherein:

FIG. 1 comprises a flow diagram as configured in accordance with various embodiments of the invention;

FIG. 2 comprises a block diagram as configured in accordance with various embodiments of the invention;

FIG. 3 comprises a model as configured in accordance with various embodiments of the invention;

FIG. 4 comprises a model as configured in accordance with various embodiments of the invention;

FIG. 5 comprises a model as configured in accordance with various embodiments of the invention; and

FIG. 6 comprises a model as configured in accordance with various embodiments of the invention.

Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions and/or relative positioning of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present invention. Also, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments of the present invention. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.

DETAILED DESCRIPTION

Generally speaking, pursuant to these various embodiments, automatic use of a disjoint probabilistic analysis of captured temporally parsed data regarding at least a first and a second item serves to facilitate disambiguating state information as pertains to the first item from information as pertains to the second item. This can also comprise, for example, using a joint probability as pertains to the temporally parsed data for the first item and the temporally parsed data for the second item, by using, for example, a Bayesian-based probabilistic analysis of the temporally parsed data.

The latter can comprise using, if desired, a transitional probability as pertains to temporally parsed data for the first item as was captured at a first time and temporally parsed data for the first item as was captured at a second time that is different than the first time (by using, for example, a transitional probability as pertains to first state information for the first item as pertains to the first time and second state information for the first item as pertains to the second time) as well as using a transitional probability as pertains to temporally parsed data for the second item as was captured at the first time and temporally parsed data for the second item as was captured at the second time (by using, for example, a transitional probability as pertains to first state information for the second item as pertains to the first time and second state information for the second item as pertains to the second time).

This approach can further comprise, if desired, using a conditional probability as pertains to temporally parsed data for the first item and state information for the first item as well as a conditional probability as pertains to temporally parsed data for the second item and state information for the second item.

In effect, these teachings related to providing multiple interactive trackers in a manner that extends beyond a traditional use of Bayesian tracking in a tracking structure. In particular, this approach avoids using a joint state representation that introduces high complexity and that requires corresponding high computational costs. By these teachings, as objects exhibit interaction, such interaction can be modeled in terms of potential functions. By one approach, this can comprise modeling the interactive likelihood densities by a so-called gravitation attraction versus a so-called magnetic repulsion scheme. In addition, if desired, one can approximate 2^ndorder state transition density by an ad hoc 1^storder inertia Markov chain in a unified particle filtering implementation. The proposed models represent the cumulative effect of virtual physical forces that objects undergo while interacting with one another. Those skilled in the art will recognize and appreciate that these approaches implicitly handle the error merge problems of the prior art and further serve to minimize corresponding object labeling problems.

These and other benefits may become clearer upon making a thorough review and study of the following detailed description. Referring now to the drawings, and in particular to FIG. 1, a general overall view of these teachings suggests a process 100 that provides for capturing 101 temporally parsed data regarding at least a first and a second item. These items could comprise any of a wide variety of objects including but not limited to discernable energy waves such as discrete sounds, continuous or discontinuous sound streams from multiple sources, radar images, and so forth. In many application settings, however, these items will comprise physical objects or, perhaps more precisely, images of physical objects.

This step of capturing temporally parsed data can therefore comprise, for example, providing a video stream as provided by a single data capture device of a particular scene (such as a scene of a sidewalk, an airport security line, and so forth) where various of the frames contain data (that is, images of objects) that represent samples captured at different times. Although, as noted, such data can comprise a wide variety of different kinds of objects, for the sake of simplicity and clarity the remainder of this description shall presume that the objects are images of physical objects unless stated otherwise. Those skilled in the art will recognize and understand that this convention is undertaken for the sake of illustration and is not intended as any suggestion of limitation with respect to the scope of these teachings.

This process 100 then provides for automatically using 102, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to a first such item from information (such as, but not limited to, state information) as pertains to a second such item. Those skilled in the art will understand that this process 100 does not require use of a disjoint probabilistic analysis in this regard under all operating circumstances; in many cases such an approach will only be automatically occasioned when such items approach near (and/or impinge upon) one another. In cases where such items are further apart from one another, if desired, alternative approaches can be employed.

Generally speaking, by one approach, this probabilistic analysis can comprise using, at least in part, a Bayesian-based probabilistic analysis of the temporally parsed data. This can comprise, at least in part, using a joint probability as pertains to the temporally parsed data for the first item and the temporally parsed data for the second item. More detailed examples will be provided below in this regard.

This step can further comprise, if desired, using transitional probabilities as pertain to these items. For example, this step will accommodate using a first transitional probability as pertains to temporally parsed data (such as, but not limited to, first state information) for the first item as was captured at a first time and temporally parsed data (such as, but not limited to, second state information) for this same first item as was captured at a second time that is different than the first time. In a similar fashion, this step will accommodate using another transitional probability as pertains to temporally parsed data (such as, but not limited to, first state information) for the second item as was captured at the first time and temporally parsed data (such as, but not limited to, second state information) for this same second item as was captured at that second time.

This step will also further accommodate, if desired, effecting the aforementioned Bayesian-based probabilistic analysis of the temporally parsed data by using conditional probabilities. In particular, for example, this can comprise using a first conditional probability as pertains to temporally parsed data and state information for the first item and a second conditional probability as pertains to temporally parsed data and state information for the second item. Again, more details regarding such approaches are provided below.

Those skilled in the art will appreciate that the above-described processes are readily enabled using any of a wide variety of available and/or readily configured platforms, including partially or wholly programmable platforms as are known in the art or dedicated purpose platforms as may be desired for some applications. Referring now to FIG. 2, an illustrative approach to such a platform 200 will now be provided.

In this illustrative example, a processor 201 operably couples to a memory 202. The memory 202 serves to store the aforementioned captured temporally parsed data regarding at least a first and a second item. By one approach, this memory 202 can be operably coupled to a single image capture device 203 such as, but not limited to, a video camera that provides sequential frames of captured video content of a particular field of view.

The processor 201 is configured and arranged to effect the above-described automatic usage of a disjoint probabilistic analysis of the temporally parsed data to facilitate disambiguation of state information as pertains to the first item from information (such as, but not limited to, state information) as pertains to the second item. This can comprise some or all of the above-mentioned approaches in this regard as well as the more particular examples provided below. By one approach, this processor 201 can comprise a partially or wholly programmable platform as are known in the art. Accordingly, such a configuration can be readily achieved via programming of the processor 201 as will be well understood by those skilled in the art.

Those skilled in the art will recognize and understand that such an apparatus 200 may be comprised of a plurality of physically distinct elements as is suggested by the illustration shown in FIG. 2. It is also possible, however, to view this illustration as comprising a logical view, in which case one or more of these elements can be enabled and realized via a shared platform. It will also be understood that such a shared platform may comprise a wholly or at least partially programmable platform as are known in the art.

A more detailed presentation of a particular approach to effecting such distributed multi-object tracking by use of multiple interactive trackers will now be provided. Again, those skilled in the art will understand and appreciate that this more-detailed description is provided for the purpose of illustration and not by way of limitation with respect to the scope or reach of these teachings.

The described process uses a four dimension parametric ellipse to model visual object's boundaries. The state of an individual object is denoted here by x_tⁱ=(cx_tⁱ, cy_tⁱ, a_tⁱ, p_tⁱ) where I=1, . . . , M is the index of objects, t is the time index, (cx cy) is the center of the ellipse, a is the major axis, and p is the orientation in radians. Those skilled in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described embodiments without departing from the spirit and scope of the invention, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the inventive concept. The ratio of the major and minor axis of the ellipse is kept constantly equal to its value as computed during initialization in this example. This approach also denotes the image observation of x_tⁱby z_tⁱ, the set of all states up to time t by x_0:tⁱwhere x₀ⁱis a prior initialization, and the set of all observations up to time t by z_1:tⁱ. This approach also denotes the interactive observations of z_tⁱat time t by z_t^J^twhere J_t={jl₁, jl₂, . . . }. The elements jl₁, jl₂, . . . ε {1, . . . , M}, jl₁, jl₂, . . . ≠I are the indexes of objects whose observations interact with z_tⁱ. Similarly, z_1:t^J^1:trepresents the collection of the interactive observation sets up to time t.

Since the interactive relationship among observations is likely changing, J may also differ over time. For example, in the graphical model 300 shown in FIG. 3, the interactive observation set for z_t-1²at time t-1 is z_t-1^J^t-1={z_t-1³, z_t-1⁴}. At time t, however z_t^J^t={z_t¹}.

When multiple visual objects move close to one another or other present partial or complete occlusions, it can be generally difficult for the trackers to segment and distinguish these spatially adjacent objects from image observations as the interactive observations are not independent (note that p(z_t¹, . . . , z_t^M)≠Π_i=1^Mp(z_tⁱ)). As a result, one cannot reliably simply factorize the posteriors of different objects. This conditional dependency of objects comprises, in the view of the inventors, a significant reason why multiple independent trackers have difficulty coping with the aforementioned error merge problem as well as the object labeling problem.

By one approach, the present teachings espouse using a separate tracker for each object. In such a case, an error merge problem can occur in at least two cases. First, when two visual objects move closer or begin to present occlusion, the object with the strong observation (in the sense of a large visual image) effectively pulls the tracker of the object with the weaker observation. Second, after occlusion, when two objects move apart, their associated optical trackers often cannot detach and remain bonded while simultaneously tracking the object with the stronger observation.

In these scenarios, it may be helpful to image the influence of an invisible force among the interactive trackers that attracts them to merge together when objects move closer and that prevents them from disjointing when these objects move apart. With this in mind, by analogy, one may then imagine these effects to be associated with an associated tracker's “mass.” When objects are far apart, the corresponding gravitational force between their trackers is relatively weak and can be effectively ignored. Similarly, when such objects are adjacent or occluded, this attractive force becomes relatively strong. This imaginary construct permits an interesting application of Newton's Laws.

By Newton's Third Law, the relative forces between two such trackers will remain equal. At the same time, however, Newton's Second Law would hold that trackers corresponding to different masses will have corresponding different accelerations. As a result, after several frames of captured data, the tracker having a smaller mass (which will correlate to a larger acceleration) will be attracted to merge with the object having the larger mass (i.e., the larger observation which correlates to a small acceleration) and thus error merge will likely occur. To resist the excessive attraction that is viewed as occurring, in this analogical example, a repulsive force can be introduced between these interacting trackers.

In particular, when objects move closer, a repulsive force can be introduced and used to prevent the trackers from falsely merging. As the objects move away, this repulsive force can also help the trackers to detach from one another. As will be demonstrated below, another analogy can be introduced to facilitate the introduction of such a repulsive force; magnetic field theory.

Referring again to FIG. 3, the illustrated dynamic graphical model 300 is shown as depicting two consecutive frames 301 and 302 for multiple objects with interactive observations. Two layers are shown. A so-called hidden layer is noted with circle nodes that represent the states of objects xⁱ. A counter part so-called observable layer represents the observations zⁱthat are associated with the hidden states. A directed link between consecutive states associated with a same object represents the state transition density which comprises a Markov chain. Here, however, the illustrated example release the usual 1^storder Markov chain assumption in regular Bayesian tracking approaches and allows instead higher order Markov chains for generality.

The directed link from object xⁱto its observation zⁱrepresents a generative relationship and can be characterized by the local observation likelihood p(zⁱ|xⁱ). The undirected link between observation nodes represents the interaction itself. The structure of the observation layer at each time depends on the spatial relationships among observations for the objects. That is, when observations for two or more visual objects are sufficiently close or leading to occlusion, an undirected link between them is constructed to represent that dependency event.

Those skilled in the art will note that the graphical model 300 illustrated in FIG. 3 can lead to complicated analysis. Therefore, if desired, this graphical model for M objects can be further decomposed into M submodels using three rules. Rule 1—each submodel focuses on only one object. Rule 2—only the interactive observations that have direct links to the analyzed object's observation are kept with noninteractive observations and all other objects' state nodes being removed. And Rule 3—each undirected link between two interactive observations is decomposed into two different directed links (with the direction corresponding to the other object's observation to the analyzed object's observation.

FIG. 4 illustrates an exemplary part of such decomposition rules as applied to the model shown in FIG. 3 for object 3401 and object 4402. Those skilled in the art will note that such an approach neglects the temporal state correlation of certain interactive observations zⁱwhen considering object i, but such information in fact is taken into account when considering object j. Therefore, when running all of the trackers simultaneously, the decomposed submodels together are able to retain all the information (regarding nodes and links) from the original model. For many purposes this can comprise a powerful and useful simplification.

By one approach these decomposed graphs all comprise directed acyclic independence graphs as are known in the art. By then applying the separation theorem to the associated moral graphs (where again both such notions are well known in the art) one then obtains the corresponding Markov properties (namely, the conditional independence of the decomposed graphs.

To model the density propagation for each object, one may then estimate the posterior based on all of the involved observations p(x_0:tⁱ|z_1:tⁱ, z_1:t^J^1:t). In such a case, the resultant formulation will be seen and understood to be consistent with a typical Bayesian tracker.

The density propagation for each interactive tracker can be formulated as:

$\begin{matrix} \begin{matrix} p (x_{0 : t}^{i} \langle z_{1 : t}^{i}, z_{1 : t}^{J_{1 : t}}) = \frac{p (z_{t}^{i} \langle x_{0 : t}^{i}, z_{1 : t - 1}^{i}, z_{1 : t}^{J_{1 : t}}) p (x_{0 : t}^{i}, z_{1 : t - 1}^{i}, z_{1 : t}^{J_{1 : t}})}{p (z_{1 : t}^{i}, z_{1 : t}^{J_{1 : t}})} \\ = \frac{p (z_{t}^{i} \langle x_{0 : t}^{i}, z_{1 : t - 1}^{i}, z_{1 : t}^{J_{1 : t}}) p (x_{0 : t}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t}^{J_{1 : t}})}{p (z_{t}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t}^{J_{1 : t}})} \\ = \frac{p (z_{t}^{i} \langle x_{t}^{i}, z_{t}^{J_{t}}) p (x_{0 : t}^{i}, z_{1 : t - 1}^{i}, z_{1 : t}^{J_{1 : t}})}{p (z_{t}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t}^{J_{1 : t}})} \end{matrix} & (1) \end{matrix}$

Equation 1 uses the conditional independence property p(z_tⁱ|x_0:tⁱ, z_1:t-1ⁱ, z_1:t^J^1:t)=p(z_tⁱ|x_tⁱ, z_t^J^t). Here, p(z_tⁱ|x_tⁱ, z_t^J^t) represents the interactive likelihood while p(x_0:tⁱ|z_1:t-1ⁱ, z_1:t^J^1:t) represents the interactive prior density. These two densities can be further developed as follows.

The interactive likelihood can be expressed as shown in equation 2:

$\begin{matrix} p (z_{t}^{i} \langle x_{t}^{i}, z_{t}^{J_{t}}) = p (z_{t}^{i} \langle x_{t}^{i}) \frac{p (z_{t}^{i} \langle x_{t}^{i}, z_{t}^{J_{t}})}{p (z_{t}^{J_{t}} \langle x_{t}^{i})} . & (2) \end{matrix}$

The local likelihood p(z_tⁱ|z_tⁱ) characterizes the so-called gravitational force between interactive observations.

The interactive prior density of x_0:tⁱcan be expressed as shown below in equations 3 and 4:

$\begin{matrix} \begin{matrix} p (x_{0 : t}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t}^{J_{1 : t}}) = \frac{p (x_{t}^{i}, z_{t}^{J_{t}} \langle x_{0 : t - 1}^{i}, z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}})}{p (z_{t}^{J_{t}} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}})} \\ p (x_{0 : t - 1}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}}) \\ = \frac{p (x_{t}^{i}, z_{t}^{J_{t}} \langle x_{0 : t - 1}^{i})}{p (z_{t}^{J_{t}} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}})} p (x_{0 : t - 1}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}}) \\ = \frac{p (z_{t}^{J_{t}} \langle x_{t}^{i}, x_{0 : t - 1}^{i})}{p (z_{t}^{J_{t}} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}})} p (x_{t}^{i} \langle x_{0 : t - 1}^{i}) p (x_{0 : t - 1}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}}) \\ = \frac{p (z_{t}^{J_{t}} \langle x_{t}^{i})}{p (z_{t}^{J_{t}} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}})} p (x_{t}^{i} \langle x_{0 : t - 1}^{i}) p (x_{0 : t - 1}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}}) . \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} (3) \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \\ (4) \end{matrix} \end{matrix} \end{matrix}$

In equation 3 the conditional independence property p(x_tⁱ, z_t^J^t|x_0:t-1ⁱ, z_1:t-1^J^1:t-1)=p(x_tⁱ, z_t^J^t|x_0:t-1ⁱ) has been used. Equation 4 uses the property that p(z_t^J^t|x_tⁱ, x_0:t-1ⁱ)=p(z_t^J^t|x_tⁱ).

By substituting equations 2 and 4 back into equation 1 and then rearranging the order, one obtains:

$\begin{matrix} \begin{matrix} p (x_{0 : t}^{i} \langle z_{1 : t}^{i}, z_{1 : t}^{J_{t - 1}}) = p (z_{t}^{i} \langle x_{t}^{i}) p (x_{t}^{i} \langle x_{0 : t - 1}^{i}) p (x_{0 : t - 1}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}}) p (z_{t}^{J_{t}} \langle x_{t}^{i}, z_{t}^{i}) \\ \cdot \frac{1}{p (z_{t}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t}^{J_{t - 1}}) p (z_{t}^{J_{t}} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}})} \\ = k_{t} p (z_{t}^{i} \langle x_{t}^{i}) p (x_{t}^{i} \langle x_{0 : t - 1}^{i}) p (x_{0 : t - 1}^{i} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{t - 1}}) \\ \cdot p (z_{t}^{i} \langle x_{t}^{i}, z_{t}^{i}) . \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} (5) \end{matrix} \end{matrix} \end{matrix} \\ (6) \end{matrix} \end{matrix}$

The densities in the denominator of equation 5 are unrelated with xⁱand thus the fraction in the second line of equation 5 becomes a normalization constant k^t. In equation 6, p(z_tⁱ|x_tⁱ) is the local likelihood, and p(x_tⁱ|x_0:t-1ⁱ) is the state transition density. By the present teachings one introduces a new density p(z_t^J|x_tⁱ, z_tⁱ) referred to here as an interactive function to characterize the interaction among object's observations. When not activating the interaction among object's observations, this formulation will degrade to multiple independent particle filters. This can easily be achieved by switching p(z_t^J|x_tⁱ, z_tⁱ) to a uniform distribution.

To estimate the posterior derived in the preceding, different density estimation methods (such as the Gaussian Mixture model, Kernel density estimation, and so forth) can be applied to the described. By one approach a sequential importance sampling method as is known in the art can provide a useful paradigm. {x_0:t^i,n, w_t^i,n}_n=1^N^scan denote a random measure that characterizes the posterior density p(x_0:tⁱ|z_1:tⁱ, z_1:t^J^1:t), where {x_0:t^i,n, n,=1, . . . , N_s} is a set of support particles with associated weights {w_t^i,n, n,=1, . . . , N_s}. In this example the weights are normalized so that Σ_nw_t^i,n=1. Therefore, the posterior density at t can be approximated as shown in equation 7:

$\begin{matrix} p (x_{0 : t}^{i} \langle z_{1 : t}^{i}, z_{1 : t}^{J_{1 : t}}) \approx \sum_{n = 1}^{N_{s}} w_{t}^{i, n} δ (x_{0 : t}^{i} - x_{0 : t}^{i, n}) & (7) \end{matrix}$

where δ (.) is the Dirac delta function.

This results in a discrete weighted approximation to the true posterior density p(x_0:tⁱ|z_1:tⁱ, z_1:t^J^1:t). The weights can be chosen according to known importance sampling theory. When the particles x_0:t^i,nare drawn from an importance density q(x_0:tⁱ|z_1:tⁱ, z_1:t^J^1:t), then the corresponding weights in equation 7 can be represented as shown in equation 8:

$\begin{matrix} w_{t}^{i, n} \times \frac{p (x_{0 : t}^{i, n} \langle z_{1 : t}^{i}, z_{1 : t}^{J_{1 : t}})}{q (x_{0 : t}^{i, n} \langle z_{1 : t}^{i}, z_{1 : t}^{J_{1 : t}})} & (8) \end{matrix}$

In the sequential case, one could have particles constituting an approximation to p(x_0:t-1^i,n|z_1:t-1ⁱ, z_1:t-1^J^1:t-1) and then need to approximate p(x_0:t^i,n|z_1:tⁱ, z_1:t^J^1:t), with a new set of particles at each iteration. When the importance density is chosen to factorize as shown in equation 9:

$\begin{matrix} q (x_{0 : t}^{i, n} \langle z_{1 : t}^{i}, z_{1 : t}^{J_{1 : t}}) = q (x_{t}^{i, n} \langle x_{0 : t - 1}^{i, n}, z_{1 : t}^{i}, z_{1 : t}^{J_{1 : t}}) q (x_{0 : t - 1}^{i, n} \langle z_{1 : t - 1}^{i}, z_{1 : t - 1}^{J_{1 : t - 1}}) . & (9) \end{matrix}$

One can then obtain particles x_0:t^i,n˜q(x_0:t^i,n|z_1:tⁱ, z_1:t^J^1:t) by augmenting each of the exiting particles x_0:t-1^i,n˜q(x_0:t-1^i,n|z_1:t-1ⁱ, z_1:t-1^J^1:t-1) with the new state x_t^i,n˜q(x_t^i,n|x_0:t-1^i,n, z_1:tⁱ, z_1:t^J^1:t). By substituting equation 6 and 9 into equation 8, the weight updating rule can be shown to be as illustrated in equation 10:

$\begin{matrix} w_{t}^{i, n} \propto w_{t - 1}^{i, n} \frac{p (z_{t}^{i} \langle x_{t}^{i, n}) p (x_{t}^{i, n} \rangle x_{0 : t - 1}^{i, n}) p (z_{t}^{J_{t}} \langle x_{t}^{i, n}, z_{t}^{j})}{q (x_{t}^{i, n} \langle x_{0 : t - 1}^{i, n}, z_{1 : t}^{i}, z_{1 : t}^{J_{1 : t}})} . & (10) \end{matrix}$

For most application purposes, only x_tⁿ, x_t-1ⁿ, and x_t-2ⁿneed to be stored and one can effectively disregard the path x_0:t-3ⁿand the history of observations z_1:t-1. By this approach the modified weight becomes as shown in equation 11:

$\begin{matrix} w_{t}^{i, n} \propto w_{t - 1}^{i, n} \frac{p (z_{t}^{i} \langle x_{t}^{i, n}) p (x_{t}^{i, n} \langle x_{t - 1}^{i, n}, x_{0 : t - 2}^{i, n}) p (z_{t}^{J_{t}} \langle x_{t}^{i, n}, z_{t}^{j})}{q (x_{t}^{i, n} \langle x_{t - 1}^{i, n}, z_{t}^{i}, z_{1 t}^{J_{1 : t}})} . & (11) \end{matrix}$

As mentioned above, it becomes useful to introduce a so-called repulsion force to resist excessive attraction among the interactive observations and magnetic field theory provides an analogy to facilitate the description of this force. Consider, for the purposes of example and explanation, a simple case where z_t^J^t={z_t^j} were the two objects i and j are two magnetic monopoles having the same polarity. Since each object generates an observation while the corresponding magnet produces a magnetic field, the observations bear the analogy with the magnetic fields. Such assumptions are in face consistent with the earlier assumptions made with respect to the graphical model. That is, that different object's states (here, the magnets) at certain time are independent while they interact with each other only through their observations (here, the magnetic field).

In this analogy the local likelihood p(z_tⁱ|x_tⁱ) only characterizes the intensity of the corresponding local magnetic field while the interactive function p(z_t^J^t|x_tⁱ, z_tⁱ) represents the mutual repulsion between two magnetic fields. This constitutes a useful analogy to the concept of potential difference in magnetic theory that is related to the distance between two points in repulsive magnetic fields. In particular, when the distance is small the repulsion is strong and vice versa. Therefore, as a specific example, for each particle x_t^i,none can calculate a magnetic repulsion weight defined as shown in equation 12:

$\begin{matrix} ϕ_{t}^{i, n} (z_{t}^{J_{t}}, z_{t}^{i} \langle x_{t}^{i, n}) = 1 - \frac{1}{α_{1}} \exp {- \frac{d_{i, n, t}^{2}}{σ_{1}^{2}}} & (12) \end{matrix}$

where α₁is a normalization constant, σ₁is a prior constant that characterizes the allowable maximal interaction distance, d_i,n,tis the distance between the current particle's observation and the interactive observation z_t^j, for example, can be the Euclidean distance d_i,n,t=∥z_t^j−z_tⁱ|x_t^i,n∥. For some practical purposes it can be acceptable to use the reciprocal of the area of an object overlapping region to represent this distance for simplicity and also to set α₁=1 and σ₁=10/A_o˜50/A_owhere A_ois the average area of objects (ellipses) in the initial frame. In such a case the interactive function can be approximately estimated as shown in equation 13:

$\begin{matrix} p (z_{t}^{J_{t}} \langle x_{t}^{i}, z_{t}^{i}) = ϕ_{t}^{i} (\cdot) \approx \sum_{n = 1}^{N_{x}} \frac{ϕ_{t}^{i, n}}{\sum_{n^{'} = 1}^{N_{x}} ϕ_{t}^{i, n^{'}}} δ (x_{t}^{i} - x_{t}^{i, n}) & (13) \end{matrix}$

By one approach it can be useful to recursively locate the interactive observations and iterate the repulsion process to reach a relatively stable state. FIG. 5 illustrates one half on one repulsion iteration cycle 500. In this example the subscript k-1, . . . , K represents the iteration time. In the illustration the dashed ellipses represent the particles while the solid ellipses represent the temporary estimates of the object's observations. At the beginning of iterating at time t, one can first roughly estimate the observation's regions {circumflex over (z)}_t,0ⁱand {circumflex over (z)}_t,0^J^tusing two independent trackers. When they have an overlapping area, one can determine that they are interacting and then trigger this recursive estimation. Subsequently, each particle's observation of object i, z_t,kⁱ|x_t,k^i,nis repelled by the temporary estimate {circumflex over (z)}_t,k^jby calculating the here-styled magnetic repulsion weight. The weighted mean of all the particles can serve to specify the new temporary estimate of object i's observation {circumflex over (z)}_t,kⁱ. Then, one can similarly calculate the here-style magnetic repulsion weight for object j's particles and thus estimate {circumflex over (z)}_t,k^j,nto complete one iteration cycle.

When z_tⁱhas two interactive observations z_t^J^t={z_t^J¹, z_t^J²}, it should be repelled by the other two simultaneously. This, in turn, can lead to revising equation 12 to be:

$\begin{matrix} ϕ_{t}^{i, n} (\cdot) = (1 - \frac{1}{α_{11}} \exp {- \frac{d_{i, j_{1}, n, t}^{2}}{σ_{11}^{2}}}) (1 - \frac{1}{α_{12}} \exp {- \frac{d_{i, j_{2}, n, t}^{2}}{σ_{12}^{2}}}) & (14) \end{matrix}$

where α₁₁and α₁₂are normalization constants, σ₁₁and σ₁₂are again prior constants, d_i,j1,n,tand d_i,j2,n,tare the distances between the current particle's observation z_tⁱ|x_t^i,nand other interactive observations z_t,k^j1and z_t,k^j2, respectively. For some application purposes it can be acceptable to set α₁₁=α₁₂=1 and choose σ₁₁and σ₁₂=10/A_o˜50/A_owhere A_ois the average area of objects (ellipses) in the initial frame.

By leveraging this magnetic potential model, the interactive function p(z_t^J^t|x_tⁱ, z_tⁱ) reduces the probability that object estimates will occupy the same position in the feature space. In a sense, it may be helpful to regard this use of gravitational attraction versus magnetic repulsion as a competitive exclusion principle. By using the above-described magnetic potential model to estimate the interactive function, a given tracker can successfully separate the image observation in occlusion and thus solve the error merge problem. It is possible, however, for the mutual repulsion techniques described to lead to false object labeling (particularly following sever occlusion). If desired, then, these teachings may further accommodate use of a magnetic potential model to address this issue.

By one approach, an ad hoc 1^storder inertia Markov chain can serve to estimate the 2^ndorder state transition density p(x_tⁱ|x_t-1ⁱ, x_t-2ⁱ) and solve the aforementioned object labeling problem with considerably reduced computational cost. This approach is exemplified in equation 15 as follows:

$\begin{matrix} \begin{matrix} p (x_{t}^{i} \langle x_{t - 1}^{i}, x_{t - 2}^{i}) = p (x_{t}^{i} \langle x_{t - 1}^{i}) \frac{p (x_{t - 2}^{i} \langle x_{t - 1}^{i})}{p (x_{t - 2}^{i} \langle x_{t - 1}^{i})} \\ = p (x_{t}^{i} \langle x_{t - 1}^{i}) φ_{t}^{i} (x_{t}^{i}, x_{t - 1}^{i}, x_{t - 2}^{i}) \end{matrix} & (15) \end{matrix}$

where the state transition density p(x_tⁱ|x_t-1ⁱ) can be modeled by a 1^storder Markov chain as usual in a typical Bayesian tracking method. This can be estimated by either a constant acceleration model or by a Gaussian random walk model. φ_tⁱ(.) comprises an inertia function and relates with two posteriors.

FIG. 6 illustrates a corresponding analysis 600 of object i's motion in three consecutive frames where shadow ellipses represent the states and dashed line ellipses represent the particles. The illustrated motion vector comprises a reference motion vector from x_t-2ⁱto x_t-1ⁱ. By shifting the motion vector along its direction, one can establish the inertia state {circumflex over (x)}_tⁱand its inertia motion vector for the current frame. Even if there are external forces present, so long as the frame rate is sufficiently high one can assume that x_tⁱis not too distant from {circumflex over (x)}_tⁱ. Note also that x_t^i,n1, x_t^i,n2are particles of state x_tⁱ.

The inertia weights are defined as shown below in equation 16

$\begin{matrix} φ_{t}^{i, n} (x_{t}^{i, n}, x_{t - 1}^{i}, x_{t - 2}^{i}) \propto \frac{1}{α_{2}} \exp {- \frac{{(θ_{t, n}^{t})}^{2}}{σ_{21}^{2}}} \cdot \exp {- \frac{{( {\vec{v}}_{t}^{i, n}  -  {\vec{\hat{v}}}_{t}^{i} )}^{2}}{σ_{22}^{2}}} & (16) \end{matrix}$

where α₂is a normalization term and σ₂₁and σ₂₂are prior constants that characterize the allowable variances of a motion vector's direction and speed respectively. In equation 16,

${\vec{v}}_{t}^{i, n} = x_{t}^{i, n} - x_{t - 1}^{i}, {\overset{\vec{^}}{v}}_{t}^{i} = x_{t - 1}^{i} - x_{t - 2}^{i}; θ_{t, n}^{i} = ∠ ({\vec{v}}_{t}^{i, n}, {\overset{\vec{^}}{v}}_{t}^{i})$

is the angle between

${\vec{v}}_{t}^{i, n} and {\overset{\vec{^}}{v}}_{t}^{i} .$

The norms

$ {\vec{v}}_{t}^{i, n}  and  {\overset{\vec{^}}{v}}_{t}^{i} $

are the Euclidean metrics. Accordingly, the inertia function can be approximated as shown in equation 17 below:

$\begin{matrix} φ_{t}^{i} (x_{t}^{i}, x_{t - 1}^{i}, x_{t - 2}^{i}) \approx \sum_{n = 1}^{N_{i}} \frac{φ_{t}^{i, n}}{\sum_{n^{'} = 1}^{N_{i}} φ_{t}^{i, n^{'}}} δ (x_{t}^{i} - x_{t}^{i, n}) & (17) \end{matrix}$

The prior art has leveraged other image cues such as gradient, color, and motion in order to estimate a local observation likelihood. Here, if desired, one can combine existing color histogram models and a principle component analysis (PCA)-based model to efficiently estimate the local likelihood exemplified by equation 18:

p(z_tⁱ|x_tⁱ)=p_c·p_p. (18)

where p_cand p_pare the likelihood densities estimated by the color histogram and PCA models respectively.

For a color cue, one can use a Bhattacharyya distance to measure the similarity between a reference histogram h_oⁱthat is obtained prior to tracking and the histogram h_t^i,nthat is determined by particle x_t^i,nfor object i. Equation 19 exemplifies such an approach:

$\begin{matrix} d_{c} = \sqrt{1 - \prod_{b = 1}^{B} \sqrt{h_{0}^{i} (b) h_{t}^{i, n} (b)}} . & (19) \end{matrix}$

where b is the index of bins. The color factor can then be specified by a Gaussian distribution with variance σ_cas illustrated in equation 20:

$\begin{matrix} p_{c} (z_{t}^{i} \langle x_{t}^{i, n}) = \frac{1}{\sqrt{2 π} σ_{c}} \exp {- \frac{d_{c}^{2}}{2 σ_{c}^{2}}} . & (20) \end{matrix}$

In this example, the color space employed is simply the normalized YCbCr space with 8 bins for CbCr and only 4 bins coarsely provided for luminance.

To apply principle component analysis here, one may first collect a set of training examples of tracking objects. One may then use singular value decomposition to obtain the Karhune-Loeve basis vectors. To measure a likelihood of an image region determined by x_t^i,n, one can calculate the Mahalanobis distance d_pbetween the image region and the mean of the training examples. The PCA factor can be defined as a Gaussian distribution with variance σ_pas illustrated in equation 21:

$\begin{matrix} p_{p} (z_{t}^{i} \langle x_{t}^{i, n}) = \frac{1}{\sqrt{2 π} σ_{p}} \exp {- \frac{d_{p}^{2}}{2 σ_{p}^{2}}} . & (21) \end{matrix}$

So configured, those skilled in the art will recognize and understand that these teachings comprise a distributed multiple objects tracking architecture that uses multiple interactive trackers and that extends traditional Bayesian tracking structures in a unique way. In particular, this approach eschews the joint state representation approach that tends, in turn, to require high complexity and considerable computational capabilities. Instead, a conditional density propagation mathematical structure is derived for each tracked object by modeling the interaction among the object's observations in a distributed scheme. By estimating the interactive function and the state transition density using a magnetic-inertia potential model in the particle filtering implementation, these teachings implicitly handle the error merge problems and further lead to resolution of object labeling problems as well. These teachings are sufficiently respectful of computational requirements to readily permit use in a real-time application setting.

Those skilled in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described embodiments without departing from the spirit and scope of the invention, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the inventive concept.

Claims

1. A method comprising: capturing temporally parsed data regarding at least a first and a second item;automatically using, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to the first item from information as pertains to the second item.
2. The method of claim 1 wherein automatically using, at least in part, probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to the first item from information as pertains to the second item comprises using a joint probability as pertains to the temporally parsed data for the first item and the temporally parsed data for the second item.
3. The method of claim 2 wherein automatically using, at least in part, probabilistic analysis of the temporally parsed data comprises using, at least in part, a Bayesian-based probabilistic analysis of the temporally parsed data.
4. The method of claim 3 wherein using, at least in part, a Bayesian-based probabilistic analysis of the temporally parsed data comprises using: a transitional probability as pertains to temporally parsed data for the first item as was captured at a first time and temporally parsed data for the first item as was captured at a second time that is different than the first time;a transitional probability as pertains to temporally parsed data for the second item as was captured at the first time and temporally parsed data for the second item as was captured at the second time.
5. The method of claim 4 wherein: using a transitional probability as pertains to temporally parsed data for the first item as was captured at a first time and temporally parsed data for the first item as was captured at a second time further comprises using a transitional probability as pertains to first state information for the first item as pertains to the first time and second state information for the first item as pertains to the second time;using a transitional probability as pertains to temporally parsed data for the second item as was captured at the first time and temporally parsed data for the second item as was captured at the second time further comprises using a transitional probability as pertains to first state information for the second item as pertains to the first time and second state information for the second item as pertains to the second time.
6. The method of claim 5 wherein using, at least in part, a Bayesian-based probabilistic analysis of the temporally parsed data further comprises using: a conditional probability as pertains to temporally parsed data for the first item and state information for the first item;a conditional probability as pertains to temporally parsed data for the second item and state information for the second item.
7. The method of claim 1 wherein the first and second item each comprise an object.
8. The method of claim 1 wherein the first and second item each comprise a discernable energy wave.
9. The method of claim 1 wherein automatically using, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to the first item from information as pertains to the second item comprises automatically using, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to the first item from state information as pertains to the second item.
10. The method of claim 1 wherein capturing temporally parsed data regarding at least a first and a second item comprises capturing temporally parsed data regarding at least a first and a second item using only a single data capture device.
11. An apparatus comprising: a memory having captured temporally parsed data regarding at least a first and a second item stored therein;a processor operably coupled to the memory and being configured and arranged to automatically use, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to the first item from information as pertains to the second item.
12. The apparatus of claim 11 wherein the processor is further configured and arranged to automatically use a joint probability as pertains to the temporally parsed data for the first item and the temporally parsed data for the second item.
13. The apparatus of claim 12 wherein the processor is further configured and arranged to automatically use, at least in part, a Bayesian-based probabilistic analysis of the temporally parsed data.
14. The apparatus of claim 13 wherein the Bayesian-based probabilistic analysis of the temporally parsed data comprises using: a transitional probability as pertains to temporally parsed data for the first item as was captured at a first time and temporally parsed data for the first item as was captured at a second time that is different than the first time;a transitional probability as pertains to temporally parsed data for the second item as was captured at the first time and temporally parsed data for the second item as was captured at the second time.
15. The apparatus of claim 14 wherein the processor is further configured and arranged to: use a transitional probability as pertains to first state information for the first item as pertains to the first time and second state information for the first item as pertains to the second time;use a transitional probability as pertains to first state information for the second item as pertains to the first time and second state information for the second item as pertains to the second time.
16. The apparatus of claim 15 wherein the processor is further configured and arranged, at least in part, to use the Bayesian-based probabilistic analysis of the temporally parsed data by using: a conditional probability as pertains to temporally parsed data for the first item and state information for the first item;a conditional probability as pertains to temporally parsed data for the second item and state information for the second item.
17. The apparatus of claim 11 wherein the first and second item each comprise an object.
18. The apparatus of claim 11 wherein the first and second item each comprise a discernable energy wave.
19. The apparatus of claim 11 wherein the processor is configured and arranged to automatically use, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to the first item from information as pertains to the second item by automatically using, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to the first item from state information as pertains to the second item.
20. The apparatus of claim 11 further comprising: a single image capture device operably coupled to the memory such that the captured temporally parsed data is captured via the single image capture device.

Method and apparatus to disambiguate state information for multiple items tracking

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims