A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The pages that follow describe experimental work, presentations and progress reports that disclose currently preferred embodiments consistent with the above-entitled invention. All of these documents form a part of this disclosure and are fully incorporated by reference. This description incorporates many details and specifications that are not intended to limit the scope of protection of any utility patent application which might be filed in the future based upon this provisional application. Rather, it is intended to describe an illustrative example with specific requirements associated with that example. Therefore, the description that follows should only be considered as exemplary of the many possible embodiments and broad scope of the present invention. Those skilled in the art will appreciate the many advantages and variations possible on consideration of the following description.
When constructing a system for tracking atomic objects within an environment, it is critical that the descriptive definition for an object is clearly defined. In a video sequence, a person can appear in the scene carrying a bag. It is not immediately apparent whether the correct behavior is to treat the bag as a separate object from the person. For our purposes, we have chosen a functional definition for objects, considering any group of pixels which tends to move as a group to be a single object. In our example case, if the motion of the bag were sufficiently distinguished from that of the person, it would be treated as a separate entity. This effectively groups together pixels which maintain a strong spatial dependence over time, and tracks them as a whole.
Regarding
Regarding
The primary purpose of the shape model is to capture this spatial dependency between pixels corresponding to the same object. This not only allows the creation of data association, finding the component pixels of an object to update the models, but it also provides a strong predictive power for the set of assignments within a specific region of the image, when the object's location is known. Therefore, computing the probability of a set of assignments, A, when provided with an object's shape model, C, and its current position, μ: p(A|S,μ) is easily accomplished.
A novel method of modeling of representing these spatial dependencies has been developed, using a dynamic type of stochastic occupancy grid. A template grid, corresponding to individual pixels, is maintained for each object, centered on an arbitrary point of reference. Each grid cell contains a predictive probability that a pixel will be observed at that given position. An autoregressive model is used to update this probability estimate, based on the observed behavior. If, in an exemplary embodiment, an object is designated as a person-shaped object, the stochastic nature of this model allows more mobile sections of the object, such as a person's limbs, to be modeled as an area of more diffuse probability, while the more stable areas, such as a person's head and torso, to maintain a more certain and clearly delineated model. Also, persistent changes in the shape of an object, for example, when a car turns in its orientation, are easily accommodated for, as the auto-regression allows more recent information to outweigh older, perhaps outdated, evidence. One of the strengths of this approach to object shape estimation is the invariance to object-sensor distance and the flexibility to describe multiple types of objects (people, vehicles, people on horses, or any object of interest).
This novel method of stochastic shape modeling provides a seamless and effective method which can handle occlusions and color ambiguity. Occlusions occur when: objects of interest overlap (dynamic occlusions), objects of interest pass behind a background object (static occlusion), or objects deform to overlap (self occlusions). Color ambiguity may occur when objects and background pixels are similar in color intensities, resulting high background likelihood values for these pixels. To address these issues, a detailed set of object assignments are used, where each label consists of background or a set of objects. Thus a single pixel can be labeled with multiple object IDs, as we undergo a dynamic occlusion. This method has proven effective in dealing with complex scenes and can seamlessly handle additional evidence and models in the future.
In another exemplary embodiment, cameras may be used as remote sensors for gathering video and audio data sets for use in tracking. Regarding nonlinear object ID and tracking methods, the objects within a scene are characterized via a feature-based representation of each object. Kalman filtering and particles filters have been implemented to track object position and velocity through a video sequence. A point of reference for each object (e.g. center of mass) is tracked through video sequence. Given an adequate frame rate, greater than 3 frames per second, we can assume that this motion is approximately linear. Kalman filters provide a closed form solution to track the position and velocity of an object, given Gaussian noise, and produce a full probability distribution for the given objects in the scene.
An objective in this exemplary embodiment is to track level-set-derived target silhouettes through occlusions, caused by moving objects going through one another in the video. A particle filter is used to estimate the conditional probability distribution of the contour of the objects at time τ, conditioned on observations up to time τ. The video/data evolution time τ should be contrasted with the time-evolution t of the level-sets, the later yielding the target silhouette (
The algorithm used for tracking objects during occlusions consists of a particle filtering framework that uses level-sets results for each update step.
This technique will allow the inventive system to track moving people during occlusions. In occlusion scenarios, using just the level sets algorithm would fail to detect the boundaries of the moving objects. Using particle filtering, we get an estimate of the state for the next moment in time p(Xτ|Y1:τ−1), update the state
and then use level sets for only a few iterations, to update the image contour γ(τ+1). With this algorithm, objects are tracked through occlusions and the system is capable of approximating the silhouette of the occluded objects.
Regarding
While certain illustrative embodiments have been described, it is evident that many alternatives, modifications, permutations and variations will become apparent to those skilled in the art in light of the description.
This application is a Continuation-in-part of co-pending application Ser. No. 11/727,668 which was filed Mar. 28, 2007, and which is incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 11727668 | Mar 2007 | US |
Child | 11808941 | US |