This invention relates generally to the security surveillance field field, and more specifically to a new and useful method for tracking an object through an environment across multiple cameras in the surveillance field.
The evolving requirements for surveillance are particularly stressing, as the effective cost of system failure has increased dramatically. A single mistake or error can result in a terrorist or illegal activity resulting in theft of property or information, destruction of property, an attack, and even worse loss of human life. Attacks can happen in a variety of locations from airplanes, trains, corporate head quarters, government building, nuclear power plants, military facilities, and any number of potential targets. Monitoring secure zones requires a tremendous amount of infrastructure: cameras, monitors, computers, networks, etc. This system then requires personnel to operate and monitor the security system. Even after all this investment and continuing operation cost, tracking a person or vehicle through an environment across multiple cameras is full of possibilities for error. Thus, there is a need in the visual surveillance field to create a new and useful method for tracking an object. This invention provides such a new and useful method.
The following description of the preferred embodiments of the invention is not intended to limit the invention to these preferred embodiments, but rather to enable any person skilled in the art to make and use this invention.
As shown in
Step S110, which includes collecting visual data representing a physical environment from a plurality of cameras, functions to monitor an environment from cameras with differing vantage points in the environment as shown in
Step S120, which includes constructing a model of the environment, functions to create a virtual description of object position and layout of a physical environment. The model is preferably a 3D computer representation created in any suitable 3D modeling program as shown in
The modeled camera components preferably include a representation of all the cameras in the vision system (the plurality of cameras). The location and orientation of each camera is preferably specified in the camera models. Obtaining relatively precise agreement between the location and orientation of the actual camera in the environment and the camera component in the model is significant for accurate tracking of an object. The mounting bracket of a camera may additionally be modeled, which preferably includes positioning of the bracket, angles of bracket joints, periodic motion of the bracket (e.g., rotating bracket), and/or any suitable parameters of the brackets. Additionally, the focal length, sensor width, aspect ratio, and other imaging parameters of the cameras are additionally modeled. The camera components may be used in relating visual data from different cameras to determine a position of an object. Additionally, positioning information of cameras is particularly important for tracking an object as they transition between regions of the environment that are inspected by different cameras.
The modeled object components are preferably static or dynamic components. Static components of the environment are preferably permanent, non-moving objects in an environment such as structures of a building (e.g., walls, beams, windows, ceilings), terrain elevations, furniture, or any features or objects that remain substantially constant in the environment. The model additionally includes dynamic components that are objects or features of the environment that change such as escalators, doors, trees moving in the wind, changing traffic lights, or any suitable object that may have slight changes. The object components may factor into the updating of the image processing. Modeling object components preferably prevents unintentionally tracking an object that is in reality a part of the environment. For example, when trying to track an object through an environment, one algorithm may look for portions of the image that are different from the unpopulated static environment. However, if a tree were in the background waving in the wind, this image difference should not be tracked as an object. Modeling the tree as an object component is preferably used to prevent this error. Additionally, static components in the environment can be used to understand when occlusions occur. For example, by modeling a counter, a person walking in behind the counter may be properly tracked because of the modeled object can provide an understanding that a portion of the person may not be visible because of the counter.
The modeled subjects of the environment are preferably the moving objects that populate an environment. The subjects are preferably people, vehicles, animals, and/or objects that convey an object. The subjects are preferably the objects that will be tracked through an environment. However, some subjects may be left untracked. Some subjects may be selectively tracked (as instructed by a security system operator). Subjects may alternatively be automatically tracked based on subject-tracking rules. The subject-tracking rules may include a subject being in a specified zone, moving in a particular way (too fast, wrong direction, etc.), having a particular size, image recognition trigger, or based on any suitable rule. Additionally, a time limit may be implemented before a subject is tracked to prevent automatic tracking caused by the motion of random objects. The model preferably represents the subjects by an avatar, which is a dynamic representation of the subject. The avatars preferably are positioned in the model as determined from the video data of the physical environment. Body or detailed movements of a subject are preferably not modeled, but course behavior descriptions such as standing, walking, sitting, or running may be represented. A subject component may include descriptors such as weight, inertia, friction, orientation, position, steering, braking, motion capabilities (e.g., maximum speed, minimum speed, turning radius), environment permissions (areas allowed or actions allowed in areas of the environment), and/or any suitable descriptor. The descriptors are preferably parameters determining possible interactions and representation in an environment.
The sub-step of modeling conceptual components S122 functions to facilitate the computation of tracking objects through 3D geometry. A conceptual component is preferably virtually constructed and associated with the imaging and modeling of the environment, but may not physically be an element in the environment. The conceptual components preferably include screens, shadows, and sprites as shown in
Additionally, Step S120 preferably includes predicting motion of a subject S124, which functions to model the motion of a subject and calculate future position of a subject from previous information. The motion is preferably calculated from descriptors of the sprite representing a subject. The previous direction of the subject, motion patterns, velocity, and acceleration and/or any other motion descriptors are preferably used to calculate a trajectory and/or position at a given time of a subject. The model preferably predicts the location of the subject without current input from the vision system. Furthermore, motion through unmonitored areas may be predicted. For example if a subject leaves the inspection zone of a camera on one end of a hallway, the velocity of the subject may be used to predict when the subject should appear in an inspection zone on the other end of the hallway. The motion prediction may additionally be used to assign a probability of where a subject may be found. This may be useful in situations where a tracked subject is lost from visual inspection, and a range of locations may be inspected based on the probability of the location of the subject. The model may additionally use the motion predictions to construct a blob prediction. A blob prediction is a preferred pattern detection process for the images of the cameras and is described more below. The model preferably constructs the predictions such that the current prediction is compared to current visual data. If the model predictions and the visual data are not in agreement to a satisfactory level, the differences are preferably resolved by either adjusting the dynamics of the tracked subject to match the processed visual data or ignoring the vision visual data as incompatible with the dynamics of a tracked subject of a particular type and behavior.
Additionally, Step S120 preferably includes setting processing parameters based on the model S126, which functions to use the model to determine the processing algorithms and/or settings for processing visual data. Using the model to predict appropriate processing algorithms and settings allows for optimization of limited processing resources. As described above, static and dynamic object components, shadow components, subject motion predictions, blob predictions, and/or any suitable modeled component may be used to determine processing parameters. The shadows preferably determine processing parameters of the camera associated with the screen of the shadow. The processing parameters are preferably determined based on discrepancies between the model and the visual data of the environment. The processing operations are preferably set in order to maintain a high degree of confidence in the accuracy of the model of the tracked subjects.
Step S130, which includes processing images from the cameras, functions to analyze the image data of the vision system for tracking objects. The processed image data preferably provides the model with information regarding patterns in the video imagery. The processing algorithms may be frame by frame or frame-difference bases. The algorithms used for processing of the image data may include connected component analysis, background subtraction, mathematical morphology, image correlation, and/or any suitable image tracking process. The processing algorithms include a set of parameters that determine the particular behavior on the processed image. The processing parameters are preferably partially or fully set by the model. The visual data from the plurality of cameras is preferably acquired and processed at the same time. The visual data from the cameras is preferably individually processed. The processed results are preferably chain codes of image coordinates for binary patterns that arise after processing image data. The binary pattern preferably has coordinates to locate specific features in each pattern.
The patterns detected in the processed visual data are preferably in the form of binary connected regions, also referred to as blobs. Blob detection preferably provides an outline and a designating coordinate to denote the location of the distinguishing features of the blob. The outline of detected blobs preferably corresponds to the outline of a subject. As shown in
Step S140, which includes cooperatively tracking the object by comparison of the processed video images and the model, functions to compare the model and processed video images to determine the location of a tracked subject. The model preferably moves each sprite to a predicted position and constructs shadows of each sprite on each screen. The shadows are preferably flat polygons in the model as are the blobs that have been inputted from the vision system and drawn on the screens. As shown in
Additionally the method may include the step of calibrating alignment of the model and the visual data S150, which functions to modify the static model to compensate for discrepancies between the model and the visual data. Imperfect alignment of cameras in an environment may account for error during the tracking process and this step preferably accounts for camera model components as well to lessen the source of error. Specific, well-measured features in the 3D model that are highly visible in the camera are preferably selected to be calibration features. The calibration process preferably includes simulating the camera image in the model and aligning the simulated image to the camera image at all the specified calibration features. The camera-bracket-lens geometry of the camera model is preferably adjusted until the simulation and video image align at the specified features. Additionally, a mesh distortion may be applied within the model to account for optical properties or aberrations of camera lenses that cause distortion of visual data. The 3D model's camera-bracket-lens geometry can be adjusted manually or automatically. Automatic adjustment requires the application of an appropriate optimization algorithm, such as gradient hill climbing. For camera calibration to be accurate, the model's representation of the specified calibration features must be accurately located in 3D. Additionally, the position of the camera being calibrated in the model must be known with high precision. If camera and feature locations are accurately known in three dimensions, then a camera can preferably be calibrated using only two specified features in the image of each camera. If there is uncertainty of the camera's height, then the camera can preferably be calibrated using three specified features. Camera and feature locations are best determined by direct measurement. Modern surveying techniques preferably yield satisfactory accuracies for camera calibration in situations requiring a high degree of tracking accuracy.
As a person skilled in the art will recognize from the previous detailed description and from the figures and claims, modifications and changes can be made to the preferred embodiments of the invention without departing from the scope of this invention defined in the following claims.
This application claims the benefit of U.S. Provisional Application No. 61/261,300 filed 13 Nov. 2009, titled “METHOD FOR TRACKING AN OBJECT THROUGH AN ENVIRONMENT ACROSS MULTIPLE CAMERAS” which is incorporated in its entirety by this reference.
| Number | Date | Country | |
|---|---|---|---|
| 61261300 | Nov 2009 | US |