The present invention relates generally to the field of video image processing and context based scene understanding and behavior analysis. More specifically, the present invention pertains to systems and methods for performing 3D dense range calculations using data fusion techniques.
Video surveillance systems are used in a variety of applications to detect and monitor objects within an environment. In security applications, for example, such systems are sometimes employed to detect and track individuals or vehicles entering or leaving a building facility or security gate, or to monitor individuals within a store, office building, hospital, or other such setting where the health and/or safety of the occupants may be of concern. In the aviation industry, for example, such systems have been used to detect the presence of individuals at key locations within an airport such as at a security gate or parking garage.
Automation of digital image processing sufficient to perform scene understanding (SU) and/or behavioral analysis of video images is typically accomplished by comparing images acquired from one or more video cameras and then comparing those images with a previously stored reference model that represents a particular region of interest. In certain applications, for example, scene images from multiple video cameras are obtained and then compared against a previously stored CAD site model or map containing the pixel coordinates for the region of interest. Using the previously stored site model or map, events such as motion detection, motion tracking, and/or object classification/scene understanding can be performed on any new objects that may have moved in any particular region and/or across multiple regions using background subtraction or other known techniques. In some techniques, a stereo triangulation technique employing multiple image sensors can be used to compute the location of an object within the region of interest.
One problem endemic in many video image-processing systems is that of correlating the pixels in each image frame with that of real world coordinates. Errors in pixel correspondence can often result from one or more of the video cameras becoming uncalibrated due to undesired movement, which often complicates the automation process used to perform functions such as motion detection, motion tracking, and object classification. Such errors in pixel correlation can also affect further reasoning about the dynamics of the scene such as the object's behavior and its interrelatedness with other objects. The movement of stationary objects within the scene as well as changes in the lighting across multiple image frames can also affect system performance in certain cases.
In some applications, pixel correlation of the 2D image data with the real world 3D coordinates is accomplished using an algorithm or routine that estimates the internal camera geometric and optical parameters, and then computes the external 3D position and orientation of the camera. In one conventional method, for example, camera calibration is accomplished using a projection equation matrix containing various intrinsic and extrinsic camera parameters such as focal length, lens distortion, origin position of the 2D image coordinates, scaling factors, origin position of the 3D coordinates, and camera orientation. From these parameters, a least squares method can then be used to solve for values of the matrix in order to ascertain the 3D coordinates within the field of view of the camera.
In many instances, the computational power required to perform such matrix calculations is significant, particularly in those applications where multiple video cameras are tasked to acquire image data and/or where numerous reference points are to be determined. In those applications where multiple video cameras are used to determine the 3D coordinates, for example, the computational power required to perform 3D dense range calculations from the often large amount of image data acquired may burden available system resources. In some instances, the resolution of images acquired may need to be adjusted in order to reduce processor demand, affecting the ability to detect subtle changes in scene information often necessary to perform scene understanding and/or behavior analysis.
The present invention pertains to systems and methods of establishing 3D coordinates from 2D image domain data acquired from an image sensor. An illustrative method in accordance with an exemplary embodiment may include the steps of acquiring at least one image frame from an image sensor, selecting via manual and/or algorithm-assisted segmentation the key physical background regions of the image, determining the geo-location of three or more reference points within each selected region of interest, and transforming 2D image domain data from each selected region of interest into a 3D dense range map containing physical features of one or more objects within the image frame. A manual segmentation process can be performed to define a number of polygonal zones within the image frame, each polygonal zone representing a corresponding region of interest. The polygonal zones may be defined, for example, by selecting a number of reference points on the image frame using a graphical user interface. A software tool can be utilized to assist the user to hand-segment and label (e.g. “road”, “parking lot”, “building”, etc.) the selected physical regions of the image frame.
The graphical user interface can be configured to prompt the user to establish a 3D coordinate system to determine the geo-location of pixels within the image frame. In certain embodiments, for example, the graphical user interface may prompt the user to enter values representing the distances between the image sensor to a first and second reference point used in defining a polygonal zone, and then measure the distance between those reference points. Alternatively, and in other embodiments, the graphical user interface can be configured to prompt the user to enter values representing the distance to first and second reference points of a planar triangle defined by the polygonal zone, and then measure the included angle between the lines forming the two distances.
Once the values for the reference points used in defining the polygonal zone have been entered, an algorithm or routine can be configured to calculate the 3D coordinates for the reference points originally represented by coordinate pairs in 2D. In some embodiments, for example, the algorithm or routine can be configured to calculate the 3D coordinates for the reference points by a data fusion technique, wherein 2D image data is converted into real world 3D coordinates based at least in part on the measured distances inputted into the graphical user interface.
Once the 3D coordinates have been determined, the 2D image domain data inputted within the polygonal zone is transformed into a 3D dense range map using an interpolation technique, which converts 2D image domain data (i.e. pixels) into a 3D look-up table so that each pixel within the image frame corresponds to real-world coordinates defined by the 3D coordinate system. After that, the same procedure can be applied to another polygonal zone defined by the user, if desired. Using the pixel features obtained from the image frame as well as parameters stored within the 3D look-up table, the physical features of one or more objects located within a region of interest may then be calculated and outputted to a user and/or other algorithms. In some embodiments, the physical features may be expressed as a physical feature vector containing those features associated with each object as well as features relating to other objects and/or static background within the image frame. If desired, the algorithm or routine can be configured to dynamically update the 3D look-up table with new or modified information for each successive image frame acquired and/or for each new region of interest defined by the user.
An illustrative video surveillance system in accordance with an exemplary embodiment may include an image sensor, a graphical user interface adapted to display images acquired by the image sensor within an image frame and including a means for manually segmenting a polygon within the image frame defining a region of interest, and a processing means for determining 3D reference coordinates for one or more points on the polygon. The processing means may include a microprocessor/CPU or other suitable processor adapted to run an algorithm or routine for geometrically fusing 2D image data measured and inputted into the graphical user interface into 3D coordinates corresponding to the real world coordinates of the region of interest.
The following description should be read with reference to the drawings, in which like elements in different drawings are numbered in like fashion. The drawings, which are not necessarily to scale, depict selected embodiments and are not intended to limit the scope of the invention. Although examples of various programming and operational steps are illustrated in the various views, those skilled in the art will recognize that many of the examples provided have suitable alternatives that can be utilized.
The computer 18 can include software and/or hardware adapted to process real-time images received from one or more of the image sensors 12,14,16 to detect the occurrence of a particular event. In certain embodiments, and as further described below with respect to
Once one or more image frames have been acquired by an image sensor 12,14,16, the user may next input various parameters relating to at least one region of interest to be monitored by the surveillance system 10, as indicated generally by block 30. The selection of one or more regions of interest, where the 3D range information is desired, can be accomplished using a manual segmentation process on the image frame, wherein the computer 18 prompts the user to manually select a number of points using the graphical user interface 24 to define a closed polygon structure that outlines the particular region of interest. In certain techniques, for example, the computer 18 may prompt the user to select at least three separate reference points on the graphical user interface 24 to define a particular region of interest such as a road, parking lot, building, security gate, tree line, sky, or other desired geo-location. The context information for each region of interest selected can then be represented on the graphical user interface 24 as a closed polygonal line, a closed curved line, or a combination of the two. The polygonal lines and/or curves may be used to demarcate the outer boundaries of a planar or non-planar region of interest, forming a polygonal zone wherein all of the pixels within the zone represent a single context class (e.g. “road”, “building”, “parking lot”, “tree line”, “sky”, etc.). Typically, at least three reference points are required to define a polygonal zone, although a greater number of points may be used for selecting more complex regions on the graphical user interface 24, if desired.
Once the user has performed manual segmentation and defined a polygonal zone graphically representing the region of interest, the algorithm or routine 26 may next prompt the user to set-up a 3D camera coordinate system that can be utilized to determine the distance of the image sensor from each reference point selected on the graphical user interface 24, as indicated generally by block 32. An illustrative step 32 showing the establishment of a 3D camera coordinate system may be understood by reference to FIG. 3, which shows a 3D camera coordinate system 34 for a planar polygonal zone 36 defined by four reference points R1, R2, R3, and R4. As shown in
To measure the distance D1, D2, D3, and D4 from the image sensor 40 to each of the four reference points R1, R2, R3, and R4, the user may first measure the distance from one of the reference points to the image sensor 40 using a laser range finder or other suitable instrument, measure the distance from that reference point to another reference point, and then measure the distance from that reference point back to the image sensor 40. The process may then be repeated for every pair of reference points.
In one illustrative embodiment, such process may include the steps of measuring the distance D2 between the image sensor 40 and reference point R2, measuring the distance D2-4 between reference point R2 and another reference point such as R4, and then measuring the distance D4 between that reference point R4 back to the origin 38 of the image sensor 40. Using the measured distances D2, D4, and D2-4, a triangle 42 can then be displayed on the graphical user interface 24 along with the pixel coordinates of each reference point R2, R4 forming that triangle 42. A similar process can then be performed to determine the pixel coordinates of the other reference points R1 and R3, R1 and R2, R4 and R3, producing three additional triangles that, in conjunction with triangle 42, form a polyhedron having a vertex located at the origin 38 and a base representing the planar polygonal zone 36.
In an alternative technique, the distance to two points and their included angle from the camera can be measured. The angle can be determined using a protractor or other suitable instrument for measuring the angle θ between the two reference points R2 and R4 from the image sensor 40 instead of determining the distance D2-4 between those two points. This situation may arise, for example, when one of the reference points is not easily accessible. A laser range finder or other suitable instrument can be utilized to measure the distances D2 and D4 between each of the reference points R2 and R4 and the origin 38. A similar process can then be performed to determine the pixel coordinates of the other reference points R1 and R3, R1 and R2, and R4 and R3.
In some cases where the camera is installed very high or is otherwise inaccessible, where one of the reference points (e.g. R2) on the ground is inaccessible, and/or where the other reference point (e.g. R4) is accessible, a protractor or other suitable instrument located at R4 can then be used to measure the angle θ between the reference point R2 and the origin 38 at R4. A laser range finder or other suitable instrument can then be utilized to measure the distances D2-4 and D4.
Referring now to
Given a plane ABC passing through point “A” parallel with the image plane, the plane ABC can be seen to intersect with {overscore (OB0)} at point “B” and with {overscore (OC0)} at point “C”. The plane ABC is thus perpendicular with the optical axis Z in
As can be further seen in
The lengths of {overscore (A′B′)}, {overscore (P′A′)}, and {overscore (P′B′)} can all be obtained by 2D image coordinates acquired from the image sensor 40, allowing the above expressions to be rewritten generally as:
m=k1t, n=k2t. (2)
Since PA⊥OP and PB⊥OP, the Pythagorean Theorem can be applied as follows:
b2−z2=m2, a2−z2=n2. (3)
Substituting “m” and “n” in (3) above with “t” in (2) above thus yields:
a2−z2=k12t2, b2−z2=k22t2. (4)
Subtraction of the two expressions above in (4) yields:
a2−b2=k3t2, k3=k12−k22. (5)
For ΔOAB, according to the Law of Cosines:
a2+b2−2ab cos(∠AOB)=t2. (6)
Substituting “t” in (6) above by “a” and “b” in (5) thus yields the following equation that can be used to solve for “b”:
(1+k4)b2−2k5ab+(1−k4)a2=0; (7)
where:
k4=1/k3; and
k5=cos(∠AOB).
With respect to the 3D coordinates (X,Y,Z) at point “A” in
t=√{square root over (a2+b2−2ab cos(∠AOB))}. (8)
Using (2) above, the value of “n” can then be obtained from the following expression:
n=k2t. (9)
From this, the value of “Z” can then be calculated based on (3) above, as follows:
Z=√{square root over ((a2−n2))}. (10)
Once the value of “Z” has been obtained from the above steps, the X and Y coordinates for point “A” can then be computed.
As further shown in
x=LAP cos(α), y=LAP sin(α); (11)
where:
LAP is the length of {overscore (AP)} and is a known value; and
α is the angle between {overscore (AP)} and the X axis.
Since ΔOPA is similar to ΔOP′A′, {overscore (AP)} is thus parallel with {overscore (P′A′)}, and the orientation of {overscore (AP)} is the same as the orientation of {overscore (P′A′)}. Accordingly, the slope of {overscore (P′A′)} can be obtained from the 2D image data received from the image sensor 40 based on the following equation:
α=arctan(ypixel/xpixel); (12)
where:
ypixel is the y pixel coordinate for point A′; and
xpixel is the x pixel coordinate for point A′.
In similar fashion, the values of points B0 and C0 can be processed to obtain their 3D coordinates (X,Y,Z) in 3D space. If desired, more points can be processed by the same method to determine the 3D coordinates for other regions detected by the image sensor 40. After computing the 3D coordinates for each of the reference points, an interpolation method can then be utilized to calculate the 3D coordinates for all of the pixels within the region.
Referring back to
Once the geo-location of each object within the polygonal zone 36 has been determined at step 44 in
An illustrative step 50 of transforming 2D image domain data into a 3D look-up table 52 may be understood by reference to
The 3D look-up table 52 may include parameter blocks 60 from multiple ROI's located within an image frame 56. In certain embodiments, for example, the 3D look-up table 52 may include a first number of parameter blocks 60a representing a first ROI in the image frame 56 (e.g. a parking lot), and a second number of parameter blocks 60b representing a second ROI in the image frame 56 (e.g. a building entranceway). In certain embodiments, the 3D look-up table 52 can include parameter blocks 60 for multiple image frames 56 acquired either from a single image sensor, or from multiple image sensors. If, for example, the surveillance system comprises a multi-sensor surveillance system similar that described above with respect to
Using the pixel features obtained from the image frame 56 as well as the parameter blocks 60 stored within the 3D look-up table 52, the physical features of one or more objects located within an ROI may then be calculated and outputted to the user and/or other algorithms, as indicated generally by blocks 62 and 64 in
The physical features may be expressed as a feature vector containing those features associated with the tracked object as well as features relating to other objects and/or static background within the image frame 56. In certain embodiments, for example, the feature vector may include information regarding the object's velocity, trajectory, starting position, ending position, path length, path distance, aspect ratio, orientation, height, and/or width. Other information such as the classification of the object (e.g. “individual”, “vehicle”, “animal”, “inanimate”, “animate”, etc.) may also be provided. The physical features can be outputted as raw data in the 3D look-up table 52, as graphical representations of the object via the graphical user interface 24, or as a combination of both, as desired.
In certain embodiments, and as further indicated by line 66 in
Turning now to
The CAMERA POSITION section 76 of the graphical user interface 68 can be configured to display a frame 78 showing the 3D camera coordinate system to be applied to the image sensor as well as a status box 80 indicating the current position of the image sensor within the coordinate system. In the illustrative view of
Once the positioning of the image sensor has been selected via the CAMERA POSITIONING section 76, the user may select a “Done” button 90, causing the surveillance system to accept the selected position. Once button 90 has been selected, the graphical user interface 68 can be configured to prompt the user to enter various parameter values into a VALUE INPUT section 92 of the display screen 70, as shown in a second view in
To define an ROI on the image frame 74, the user may select a “Point” button 100 on the VALUE INPUT section 92, and then select at least four reference points on the image frame 74 to define the outer boundaries of the ROI. In the illustrative view of
Once a polygonal zone 102 is defined on the image frame 74, the user may then assign a name and region type to the zone 102 using the REGION NAME and REGION TYPE text boxes 96,98. After entering the text of the region name and type within these text boxes 96,98, the user may then select an “Add” button 104, causing the graphical user interface 68 to display a still image 106 of the scene in the CAMERA POSITION section 76 along with a polyhedron 108 formed by drawing lines between the camera origin “V” and at least four selected reference points of the polygonal zone 102, as shown in a third view in
A FACET INPUT section 114 of the graphical user interface 68 can be configured to receive values for the various sides of the polyhedron 108, which can later be used to form a 3D look-up table that correlates pixel coordinates in the image frame 74 with physical features in the image sensor's field of view. The FACET INPUT section 114 can be configured to display the various sides and/or angles forming the polyhedron 108 in tabular form, and can include an icon tab 116 indicating the name (i.e. “First”) of the current ROI that is selected. With the INPUT MODE selection box 92 set to “Side only” mode, as shown in
Alternatively, and in other embodiments, the user may select the “Angle & Side” button on the INPUT MODE frame 92 to calculate the coordinates of each reference point using both angle and side measurements. In certain embodiments, and also as described above with respect to
Once the values for each region of interest is entered via the FACET INPUT section 94, the user may then select a “3D_CAL” button 134, causing the surveillance system to create a 3D dense range map containing the feature vectors for that region of interest. In certain embodiments, for example, selection of the “3D_CAL” button 134 may cause the surveillance system to create a 3D look-up table similar to that described above with respect to
Once the 2D image domain data has been transformed into a 3D look-up table, the graphical user interface 68 can then output the table to a file for subsequent use by the surveillance system. The graphical user interface 68 can be configured to prompt the user whether to save a file containing the 3D look-up table data, as indicated by reference to window 136 in
Having thus described the several embodiments of the present invention, those of skill in the art will readily appreciate that other embodiments may be made and used which fall within the scope of the claims attached hereto. Numerous advantages of the invention covered by this document have been set forth in the foregoing description. It will be understood that this disclosure is, in many respects, only illustrative. Changes can be made with respect to various elements described herein without exceeding the scope of the invention.
The present invention is a continuation-in-part of U.S. patent application Ser. No. 10/907,877, entitled “Systems and Methods for Transforming 2D Image Domain Data Into A 3D Dense Range Map”, filed on Apr. 19, 2005.
Number | Date | Country | |
---|---|---|---|
Parent | 10907877 | Apr 2005 | US |
Child | 11213527 | Aug 2005 | US |