1. Field of Invention
The present invention relates in general to the video processing field. More particularly, the present invention relates to a method, system and program product for a plurality of cameras to track an object using motion vector data.
2. Background Art
Video cameras are increasingly used to monitor sites for various purposes including security, surveillance, tracking, and reconnaissance. Because of the increased interest in security since the terrorist attacks of Sep. 11, 2001; extended business hours in stores, offices and factories; increased numbers of unattended facilities; and increased use of traffic monitoring, the demand for monitoring by video cameras is growing rapidly. Typically, one or more video cameras are located in areas to be monitored. The images taken by the video cameras are typically viewed and/or recorded at one or more monitoring stations, which may be remote from the areas to be monitored. The video cameras may be fixed and/or mobile. Similarly, the monitoring stations may be fixed and/or mobile. For example, video cameras may be fixedly mounted at several locations of an airport, e.g., along walkways, perimeter fences, runways, and gates. The images taken by the video cameras at the airport locations may be monitored at one or more monitoring stations. A fixedly-mounted video camera may have the ability to pan, tilt, and zoom its current field of view within an overall field of view. Alternatively, video cameras may be mounted for mobility on one or more reconnaissance aircraft or other vehicle, with each such aircraft or other vehicle traveling to cover a reconnaissance area. The images taken by the video cameras within the reconnaissance areas may be monitored at one or more monitoring stations. In addition to the mobility provided by the vehicle, a vehicle-mounted video camera may have the ability to pan, tilt, and zoom its current field of view within an overall field of view.
Typically, the ability to pan, tilt, and zoom a video camera is controlled by an operator in a monitoring station. The operator may notice or be alerted that an event of interest, e.g., unauthorized activity, has occurred in an area being monitored. The alert may be generated by a motion detecting apparatus using the output of the video camera or some other detector or sensor. For example, it is conventional to use a motion detecting apparatus that detects motion using motion vectors generated from the output of a video camera. Once aware that an event of interest has occurred, the operator may then cause the video camera covering the area in which the event has occurred to pan, tilt, and zoom to follow or track an object associated with the event. Typically, the operator controls the video camera to track the object using an input device, such as a mouse or joystick, which causes transmission of pan, tilt, and zoom adjustment commands to a pan, tilt, and zoom adjustment mechanism associated with the video camera. Because this is an open-loop system, tracking the object is difficult and requires a skilled operator. The difficulty increases as the object moves out of the area being monitored by the video camera and into another area being monitored by a second video camera.
According to the preferred embodiments, a method, system and program product use motion vector data to track an object moving between areas being monitored by a plurality of video cameras. Motion vector data are used to predict whether an object in a first field of view covered by a first camera system will enter a second field of view covered by a second camera system. For example, motion vector data may be provided to a motion tracking processor of the first camera system at a macroblock level by an MPEG compression processor of the first camera system. Alternatively, motion vector data may be provided to a motion tracking processor of the first camera system at a pixel level by a pre-processor of the first camera system. If the prediction is that the object will enter the second field of view, tracking data are provided to the second camera system. The tracking data provided to the second camera system may include pan, tilt and/or zoom adjustment data, which may be provided to a PTZ adjustment mechanism of the second camera system, for example. Alternatively, or in addition, the tracking data provided to the second camera system may include pan/tilt motion vector data, zoom factor data and/or shrinkage/expansion data, which are provided to a motion tracking processor of the second camera system. Pan, tilt and/or zoom adjustment data may also be provided to a PTZ adjustment mechanism of the first camera system irrespective of the prediction of whether the object will enter the second field of view. Because the preferred embodiments use a closed loop system, tracking the object is made easier and does not require a skilled operator even as the object moves between areas being monitored by different cameras.
The foregoing and other features and advantages of the invention will be apparent from the following more particular description of the preferred embodiments of the invention, as illustrated in the accompanying drawings.
The preferred exemplary embodiments of the present invention will hereinafter be described in conjunction with the appended drawings, where like designations denote like elements.
1. Overview
A method, system and program product in accordance with the preferred embodiments use motion vector data to track an object moving between areas being monitored by a plurality of video cameras. Motion vector data are used to predict whether an object in a first field of view covered by a first camera system will enter a second field of view covered by a second camera system. For example, motion vector data may be provided to a motion tracking processor of the first camera system at a macroblock level by an MPEG compression processor of the first camera system. Alternatively, motion vector data may be provided to a motion tracking processor of the first camera system at a pixel level by a pre-processor of the first camera system. If the prediction is that the object will enter the second field of view, tracking data are provided to the second camera system. The tracking data provided to the second camera system may include pan, tilt and/or zoom adjustment data, which may be provided to a PTZ adjustment mechanism of the second camera system, for example. Alternatively, or in addition, the tracking data provided to the second camera system may include pan/tilt motion vector data, zoom factor data and/or shrinkage/expansion data, which are provided to a motion tracking processor of the second camera system. Pan, tilt and/or zoom adjustment data may also be provided to a PTZ adjustment mechanism of the first camera system irrespective of the prediction of whether the object will enter the second field of view.
Because the preferred embodiments used a closed loop system, tracking the object is made easier and does not require a skilled operator even as the object moves between areas being monitored by different cameras. The tracking data, including handoff information between cameras, are generated by the system responding to the behavior of motion vector fields created by the object being tracked. This is in stark contrast with prior art open loop systems that require the operator to manually generate PTZ adjustment data by manipulating an input device separately for each camera.
2. Single Camera Embodiment
The advantages of the preferred embodiments are best understood by initially understanding a method, system and program product for a single camera to track an object using motion vector data. Referring to
A video data processor 20 receives the output 14 of video camera 12 and provides digital video data including motion vector data 22 for an object in the field of view based on the sequence of video fields provided by video camera 12. The output 14 of video camera 12 may be provided to video data processor 20 via any type of connection, including wireless. Video data processor 20 may be separate from video camera 12 as shown in
The MPEG standard is a well known video compression standard. Within the MPEG standard, video compression is defined both within a given frame (also referred to herein as a “field”) and between frames. Video compression within a frame, i.e., spatial compression, is accomplished via a process of discrete cosine transformation, quantization, and run length encoding. Video compression between frames, i.e., temporal compression, is accomplished via a process referred to as motion estimation, in which a motion vector is used to describe the translation of a set of picture elements (pels) from one frame to another. These motion vectors track the movement of like pixels from frame to frame typically at a macroblock level. A macroblock is composed of 16×16 pixels or 8×8 pixels. The movement is broken down into mathematical vectors which identify the direction and distance traveled between video frames. These motion vectors are themselves typically encoded. A pre-processor is typically necessary to provide motion vector data at a sub-macroblock level, e.g., at a pixel level.
A motion tracking processor 24 receives motion vector data 22 from video data processor 20. In addition, motion tracking processor 24 may receive other digital video data from video data processor 20. The motion vector data 22 (and any other digital video data) from video data processor 20 may be provided to motion tracking processor 24 via any type of connection, including wireless. The motion tracking processor 24 may be separate from video data processor 20 as shown in
The motion tracking processor 24 will now be described with reference to
Note that system processor 101 shown in
Main memory 102 in accordance with the preferred embodiments contains data 116, an operating system 118, and a tracking mechanism 120 that will be described in detail below. While the tracking mechanism 120 is shown separate and discrete from operating system 118 in
Computer system 100 utilizes well known virtual addressing mechanisms that allow the programs of computer system 100 to behave as if they have access to a large, single storage entity instead of access to multiple, smaller storage entities such as main memory 102 and DASD device 112. Therefore, while data 116, operating system 118, and tracking mechanism 120 are shown to reside in main memory 102, those skilled in the art will recognize that these items are not necessarily all completely contained in main memory 102 at the same time. It should also be noted that the term “memory” is used herein to generically refer to the entire virtual memory of the computer system 100, including a memory on digital video surveillance card 109.
Data 116 represents any data that serves as input to or output from any program in computer system 100. Operating system 118 is a multitasking operating system known in the industry as OS/400; however, those skilled in the art will appreciate that the spirit and scope of the present invention is not limited to any one operating system.
Processor 101 may be constructed from one or more microprocessors and/or integrated circuits. Processor 101 executes program instructions stored in main memory 102. In addition, if motion tracking processor 24 (shown in
Although computer system 100 is shown to contain only a single processor and a single system bus, those skilled in the art will appreciate that the present invention may be practiced using a computer system that has multiple processors and/or multiple buses. In addition, the interfaces that are used in the preferred embodiments each include separate, fully programmed microprocessors that are used to off-load compute-intensive processing from processor 101. However, those skilled in the art will appreciate that the present invention applies equally to computer systems that simply use I/O adapters to perform similar functions.
Display interface 106 is used to directly connect one or more displays 122 to computer system 100. These displays 122, which may be non-intelligent (i.e., dumb) terminals or fully programmable workstations, are used to allow system administrators and users (also referred to herein as “operators”) to communicate with computer system 100. Note, however, that while display interface 106 is provided to support communication with one or more displays 122, computer system 100 does not necessarily require a display 122, because all needed interaction with users and processes may occur via network interface 108.
Network interface 108 is used to connect other computer systems and/or workstations (e.g., 124 in
Alternatively, an I/O adapter may be used to connect PTZ adjustment mechanism 16 and/or video data processor 20 to computer system 100. In addition, if video data processor 20 is resident on digital video surveillance card 109 or elsewhere in computer system 100, an I/O adapter may also be used to connect video camera 12 to computer system 100. For example, video camera 12, PTZ adjustment mechanism 16 and/or video data processor 20 may be connected to computer system 100 using an I/O adapter on digital video surveillance card 109. In a variation of this alternative, a system I/O adapter may be used to connect video camera 12, PTZ adjustment mechanism 16 and/or video data processor 20 to computer system 100 through system bus 110.
At this point, it is important to note that while the present invention has been and will be described in the context of a fully functional computer system, those skilled in the art will appreciate that the present invention is capable of being distributed as a program product in a variety of forms, and that the present invention applies equally regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of suitable signal bearing media include: recordable type media such as floppy disks and CD ROM (e.g., 114 of
Referring to
Method 300 continues by computing camera pan and tilt adjustment data (step 325) and computing camera zoom adjustment data (step 330). Steps 325 and 330 are more described in detail below with reference to
Having calculated the pan, tilt and/or zoom adjustment data, method 300 proceeds (step 335) to send the pan, tilt and/or adjustment data to PTZ adjustment mechanism 16 (shown in
Steps 325 and 330 are best understood by initially describing the camera field of view and the camera field of view progression, as well as defining variables and formulas used in the calculation of the pan, tilt and/or adjustment data.
Variables and formulas used in the calculation of the pan, tilt and/or zoom adjustment data in accordance with the preferred embodiments are described below. Initially, variables and formulas used in the calculation of pan and tilt adjustment data are described. Thereafter, variables and formulas used in the calculation of zoom adjustment data are described.
The following variables are used in the calculation of pan and tilt adjustment data.
Formulas used in the calculation of pan and tilt adjustment data according to the preferred embodiments are now described below. In accordance with the preferred embodiments, the formulas used in the calculation of pan and tilt adjustment data vary depending on the number of field(s) of data available. When three fields of data are available (i.e., current field (n), reference #1 field (n−1), and reference #2 field (n−2)), the object trajectory is fully characterized. In this case, the camera pan and tilt calculations assume the object travels at constant velocity or acceleration.
mvcamera(n+1)=lococ(n)+mvoc-rel[(n−1)(n)]+mvoc-accel[(n−2)(n)];
mvoc-rel[(n−1)(n)]=lococ(n)−lococ(n−1)+mvcamera(n);
mvoc-accel[(n−2)(n)]=mvoc-rel[(n−1)(n)]−mvoc-rel[(n−2)(n−1)];
=[lococ(n)−lococ(n−1)+mvcamera(n)]−[lococ(n−1)−lococ(n−2)+mvcamera(n−1)];
mvcamera(n+1)=lococ(n)+[lococ(n)−lococ(n−1)+mvcamera(n)]+{[lococ(n)−lococ(n−1)+mvcamera(n)]−[lococ(n−1)−lococ(n−2)+mvcamera(n−1)]};
=3×[lococ(n)−lococ(n−1)]lococ(n−2)+[2×mvcamera(n)]−mvcamera(n−1);
mvoc(n)=lococ(n)−lococ(n−1);
mvcamera(n+1)=[3×mvoc(n)]lococ(n−2)+[2×mv×camera(n)]−mvcamera(n−1).
When two fields of data are available (i.e., current field (n), and reference #1 field (n−1)), the object trajectory is partially characterized. In this case, the camera pan and tilt calculations assume the object travels at constant velocity and follows a linear path.
mvcamera(n+1)=lococ(n)+mvoc-rel[(n−1)(n)];
mvoc-rel[(n−1)(n)]=lococ(n)−lococ(n−1)+mvcamera(n);
mvcamera(n+1)=lococ(n)+[lococ(n)−lococ(n−1)+mvcamera(n)];
mvoc(n)=lococ(n)−lococ(n−1);
mvcamera(n+1)=[2×mvoc(n)]+lococ(n−1)+mvcamera(n).
When one field of data is available (i.e., current field (n)), the object trajectory is unknown. In this case, the camera pan and tilt calculations reposition the video camera along a linear trajectory defined by the current object center location and current field origin.
mvcamera(n+1)=ISF×lococ(n);
Now referring to
As mentioned above,
Referring now to
f1ideal(2)=f1ideal(1)−Δf1ideal(2);
tan[Θ(2)]=[y(1)+mvy-net(2)]/f1ideal(1)=mvy-net(2)/Δf1ideal(2);
Δf1ideal(2)=f1ideal(1)×{mvy-net(2)/[y(1)+mvy-net(2)];
ZFideal(2)=y(1)/[y(1)+mvy-net(2)].
Referring now to
f1ideal(2)=f1ideal(1)+Δf1ideal(2);
tan[Θ(2)]=[y(1)+mvy-net(2)]/f1ideal(1)=−mvy-net(2)/Δf1ideal(2);
Δf1ideal(2)=f1ideal(1)×{−mvy-net(2)/[y(1)+mvy-net(2)];
ZFideal(2)=y(1)/[y(1)+mvy-net(2)].
Additional formulas used in the calculation of zoom according to the preferred embodiments are described below.
In accordance with the preferred embodiments, formulas follow for calculating the weighted average object size oswa, net change in object height from field 1 to field (n) mvy-net(n), net change in object width from field 1 to field (n) mvx-net(n), field (n) object height in pixels y(n), and field (n) object width in pixels x(n).
oswa=[y(1)2+x(1)2]/[y(1)+x(1)]
mvy-net(n)=mvy-net[(n−1)n]+mvy-net(n−1);
mvx-net(n)=mvx-net[(n−1)n]+mvx-net(n−1);
y(n)=y(1)+mvy-net(n);
x(n)=x(1)+mvx-net(n).
In accordance with the preferred embodiments, the formula used to determine whether the object ratio is within tolerance follows.
[(1+ΔAR)×{y(1)/x(1)}]≧[y(n)/x(n)]≧[(1−ΔAR)×{y(1)/x(1)}]
In accordance with the preferred embodiments, formulas follow for calculating the weighted net object expansion or contraction mvwa(n) in field (n); ideal zoom factor ZFideal(n) for field (n) based on actual object expansion or contraction from field (n−1) to field (n); field (n) ideal focal length f1ideal(n) for optimal viewing of moving object; and change in ideal focal length Δf1ideal(n) from field (n−1) to field (n). The ideal zoom factor ZFideal(n) is calculated based on the estimated zoom factor ZFest(n) and a net object expansion or contraction factor. The net object expansion or contraction factor is essentially a correction factor based on oswa, y(1), mvy-net(n), x(1), and mvx-net(n).
mvwa(n)=[y(1)×mvy-net(n)]+[x(1)×mvx-net(n)]/[y(1)+x(1)];
ZFideal(n)=[ZFest(n)×oswa]/[oswa+mvwa(n)];
f1ideal(n)=f1act(n−1)×ZFideal(n);
Δf1ideal(n)=f1ideal(n)−f1ideal(n−1).
In accordance with the preferred embodiments, the formulas used in the calculation of Δf1est vary depending on the number of field(s) of data available, similar to the calculation of pan and tilt adjustment data described above. When two fields of data are available (i.e., field 1 and field 2), estimated field 3 focal length change Δf1est(3) is estimated based on object velocity as follows.
Δf1est(3)=Δf1ideal-vel[12];
=Δf1ideal(2).
When three fields of data are available (i.e., field (n−2), field (n−1) and field (n)), estimated field (n+1) focal length change Δf1est(n+1) is estimated based on object velocity and object acceleration as follows.
In accordance with the preferred embodiments, formulas follow for calculating the field (n) actual focal length f1act(n), field (n+1) estimated focal length f1est(n+1) and field (n+1) estimated zoom factor ZFest(n+1).
f1act(n)=f1act(n−1)×ZFest(n);
f1est(n+1)=f1ideal(n)+Δf1est(n+1);
ZFest(n+1)=f1est(n+1)/f1act(n).
Referring now to
Referring now to
If the object aspect ratio is within tolerance (step 920=YES), method 900 calculates (at step 930) variable mvwa(n)={[y(1)×mvy-net(n)]+[x(1)×mvx-net(n)]}/[y(1)+x(1)]. Next, method 900 calculates (at step 935) variable ZFideal(n)=[ZFest(n)×oswa]/[oswa+mvwa(n)]. The method 900 then calculates (at step 940) variable f1ideal(n)=f1act(n−1)×ZFideal(n). The variable f1ideal(n) is stored for use in future calculations. Next, method 900 calculates (at step 945) variable Δf1ideal(n)=f1ideal(n)−f1ideal(n−1). The variable Δf1ideal(n) is stored for use in future calculations. The method 900 proceeds to determine whether the current field (n) is field 2 (step 950). If the current field (n) is field 2 (step 950=YES), method 900 calculates (at step 955) the variable Δf1est(3)=Δf1ideal(2). This estimate represents the constant velocity case assuming no additional object history data is available. In the event such additional object history data is available (for example, in the case where one camera hands off tracking data to a second camera), it is recognized that a focal length change estimate based on both object velocity and acceleration is possible as described in step 960. On the other hand, if the current field (n) is not field 2 (step 950=NO), method 900 calculates (at step 960) the variable Δf1est(n+1)=[2×Δf1ideal(n)]−Δf1ideal(n−1). After either step 955 or step 960, the method 900 continues by calculating (at step 965) the variables f1act(n)=f1act(n−1)×ZFest(n); f1est(n+1)=f1ideal(n)+Δf1est(n+1); and ZFest(n+1)=f1est(n+1)/f1act(n), which is the estimated zoom adjustment for field (n+1). The variables f1act(n) and ZFest(n+1) are stored for use in future calculations and method 900 returns to method 300 (step 335 shown in
As mentioned above, the variables mvy-net[(n−1)(n)] and mvx-net[(n−1)(n)] are calculated internally by motion tracking processor 24 (shown in
{mvx-net[(n−1)(n),mvy-net[(n−1)]}={[ΣΔmvi(x)/i,[ΣΔmvj(y)/j}.
The symbol Σ denotes summation across all relevant expansion and compression reference vectors along the object boundary. It will be appreciated by one of skill in the art that the number of reference vectors in the x and y dimensions need not be the same. The symbol Δmvi(x) denotes the difference in motion vector length between a pair of relevant reference points along the object boundary in the x dimension. On the other hand, the symbol Δmvj(y) denotes the difference in motion vector length between a pair of relevant reference points along the object boundary in the y dimension. The symbols i and j denote the number of relevant reference point pairs along the object boundary in the x and y dimensions, respectively.
Referring now to
The ability to continue tracking an object depends on the object maintaining it's aspect ratio within a certain user-specified tolerance. This tolerance (ΔAR) may be specified in a lookup table based on the type of object being tracked.
To account for the relative amount of expansion or contraction in both the x and y dimensions, oswa and mvwa(n) are defined. The variable oswa represents the weighted average size of the object when viewed at the ideal focal length in field 1, where the object's width x(1) and height y(1) are weighted relative to the object's aspect ratio via multiplication by {x(1)/[x(1)+y(1)]} and {y(1)/[x(1)+y(1)]}, respectively. The variable mvwa(n) represents the weighted average amount of object expansion or contraction in field (n), where the object net change in width mvx-net(n) and height mvy-net(n) are weighted relative to the object's original aspect ratio via multiplication by {x(1)/[x(1)+y(1)]} and {y(1)/[x(1)+y(1)]}, respectively.
3. Multiple Camera Embodiment
A method, system and program product in accordance with the preferred embodiments use motion vector data to track an object moving between areas being monitored by a plurality of video cameras. According to the preferred embodiments, motion vector data are used to predict whether an object in a first field of view covered by a first video camera will enter a second field of view covered by a second video camera. The video cameras may be fixed and/or mobile. For example, video cameras may be fixedly mounted at several locations of an airport, e.g., along walkways, perimeter fences, runways, and gates. The images taken by the video cameras at the airport locations may be monitored at one or more monitoring stations, which may be fixed and/or mobile. A fixedly-mounted video camera may have the ability to pan, tilt, and/or zoom its current field of view within an overall field of view. Alternatively, video cameras may be mounted for mobility on one or more reconnaissance aircraft or other vehicle, with each such aircraft or other vehicle traveling to cover a reconnaissance area. The images taken by the video cameras within the reconnaissance areas may be monitored at one or more monitoring stations, which may be fixed and/or mobile. In addition to the mobility provided by the vehicle, a vehicle-mounted video camera may have the ability to pan, tilt, and zoom its current field of view within an overall field of view.
Referring now to
One or more of video cameras 1105, 1110 may correspond to video camera 12 (shown in
Referring to
The motion tracking processors 24′ are identical to motion tracking processor 24 (shown in
Motion tracking processors 24′ and/or system processor 1215 predict whether an object will move from one video camera's field of view to the other camera's field of view based on motion vector data. If the object is predicted to enter the other camera's field of view, system processor 1215 provides tracking data to the other camera system's PTZ adjustment mechanism 16′ and/or motion tracking processor 24′. For example, system processor 1215 may calculate and provide pan, tilt and/or zoom adjustment data 1225 to PTZ adjustment mechanism 16′ of video camera system 1210 for camera 12 of video camera system 1210 to track the object as it moves between the fields of view based on tracking data 1230 provided to system processor 1215 by the motion tracking processor 24′ of video camera system 1205. Alternatively or in addition, system processor 1215 may provide tracking data 1235 to motion tracking processor 24′ of video camera system 1210 based on tracking data 1230 provided to system processor 1215 by the motion tracking processor 24′ of video camera system 1205. The tracking data 1230, 1235 may include at least one of object motion vector data, object shrinkage/expansion data, and other digital video data.
The PTZ adjustment data 1220, 1225 provided to PTZ adjustment mechanisms 16′ by system processor 1215 are calculated in the same manner as described above with respect to the single camera embodiment. The PTZ adjustment data 1220, 1225 provided to PTZ adjustment mechanisms 16′ by system processor 1215 may be calculated in system processor 1215. For example, PTZ adjustment data 1225 provided to PTZ adjustment mechanism 16′ of camera system 1210 by system processor 1215 may be calculated by system processor 1215 based on tracking data 1230 provided to system processor 1215 by motion tracking processor 24′ of camera system 1205 and tracking data 1235 provided to system processor 1215 by motion tracking processor 24′ of camera system 1210. In this example, the tracking data 1230 provided to system processor 1215 by motion tracking processor 24′ of camera system 1205 may include object motion vector data, object shrinkage/expansion data relative to an object in the field of view of camera system 1205. Also in this example, the tracking data 1235 provided to system processor 1215 by motion tracking processor 24′ of camera system 1210 may include input variables relating to camera system 1210.
Alternatively, the PTZ adjustment data 1220, 1225 provided to PTZ adjustment mechanism 16′ by system processor 1215 may be at least partially calculated in one camera system's motion tracking processor 24′ before being received as tracking data 1230, 1235 by system processor 1215 which in turn provides the PTZ adjustment data 1220, 1225 to the other camera system's PTZ adjustment mechanism 16′. In this alternative case, tracking data 1230, 1235 provided to one camera system's motion tracking processors 24′ by system processor 1215 may include input variables relating to the other camera system. For example, tracking data 1230 provided to motion tracking processor 24′ of camera system 1205 by system processor 1215 may include input variables relating to camera system 1210.
Similarly, tracking data 1230, 1235 provided to motion tracking processors 24′ by system processor 1215 may include many of the same variables described above with respect to the single camera embodiment. That is, the variables and equations described above with respect to the single camera embodiment are also used in the calculation of tracking data 1230, 1235 provided to motion tracking processors 24′ by system processor 1215. The tracking data 1230, 1235 provided to motion tracking processors 24′ by system processor 1215 may be calculated in system processor 1215. For example, tracking data 1235 provided to motion tracking processor 24′ of camera system 1210 by system processor 1215 may be calculated by system processor 1215 based on tracking data 1230 provided to system processor 1215 by motion tracking processor 24′ of camera system 1205. In this example, the tracking data 1235 provided to system processor 1215 by motion tracking processor 24′ of camera system 1210 may include field of view boundary data of camera system 1210 so that system processor 1215 may predict whether an object in the field of view of camera system 1205 will enter the field of view of camera system 1210.
Alternatively, the tracking data 1230, 1235 provided by system processor 1215 may be at least partially calculated in one camera system's motion tracking processor 24′ before being received by system processor 1215 which in turn provides the tracking data 1230, 1235 to the other camera system's motion tracking processor. In this alternative case, tracking data 1230, 1235 provided to one camera system's motion tracking processors 24′ by system processor 1215 may include digital video data relating to the other camera system, such as the other camera system's field of view boundary data. For example, tracking data 1230 provided to motion tracking processor 24′ of camera system 1205 by system processor 1215 may include field of view boundary data of camera system 1210 so that motion tracking processor 24′ of camera system 1205 may predict whether an object in the field of view of camera system 1205 will enter the field of view of camera system 1210.
The motion tracking processors 24′ and system processor 1215 will now be described with reference to
As shown in
Digital video surveillance cards 1309 each include a motion tracking processor 24′. One digital surveillance card 1309 includes a first motion tracking processor 24′ associated with a first video camera system 1205 (shown in
Motion tracking processors 24′ resident on digital video surveillance cards 1309 are connected to the various system components via system bus 1310 and/or one or more other buses. In addition to motion tracking processors 24′, digital video surveillance cards 1309 may each include a video data processor 20 (shown in
Main memory 1302 in accordance with the preferred embodiments contains data 1316, an operating system 1318, and a tracking mechanism 1319, a prediction mechanism 1320 and a handoff mechanism 1321. While these mechanisms are shown separate and discrete from operating system 1318 in
Computer system 1300 utilizes well known virtual addressing mechanisms that allow the programs of computer system 1300 to behave as if they have access to a large, single storage entity instead of access to multiple, smaller storage entities such as main memory 1302 and DASD device 1312. Therefore, while data 1316, operating system 1318, tracking mechanism 1319, prediction mechanism 1320 and handoff mechanism 1321 are shown to reside in main memory 1302, those skilled in the art will recognize that these items are not necessarily all completely contained in main memory 1302 at the same time. It should also be noted that the term “memory” is used herein to generically refer to the entire virtual memory of the computer system 1300, including one or more memories on digital video surveillance cards 1309.
Data 1316 represents any data that serves as input to or output from any program in computer system 1300. Operating system 1318 is a multitasking operating system known in the industry as OS/400; however, those skilled in the art will appreciate that the spirit and scope of the present invention is not limited to any one operating system.
System processor 1215 may be constructed from one or more microprocessors and/or integrated circuits. System processor 1215 executes program instructions stored in main memory 1302. In addition, motion tracking processors 24′ may execute program instructions stored in main memory 1302 by virtue of being resident on digital video surveillance cards 1309. Main memory 1302 stores programs and data that system processor 1215 and motion tracking processors 24′ may access. When computer system 1300 starts up, system processor 1215 initially executes the program instructions that make up operating system 1318. Operating system 1318 is a sophisticated program that manages the resources of computer system 1300. Some of these resources are system processor 1215, main memory 1302, mass storage interface 1304, display interface 1306, network interface 1308, digital surveillance cards 1309, and system bus 1310.
Although computer system 1300 is shown to contain only a single system processor and a single system bus, those skilled in the art will appreciate that the present invention may be practiced using a computer system that has multiple system processors and/or multiple buses. In addition, the interfaces that are used in the preferred embodiments each include separate, fully programmed microprocessors that are used to off-load compute-intensive processing from system processor 1215. However, those skilled in the art will appreciate that the present invention applies equally to computer systems that simply use I/O adapters to perform similar functions.
Display interface 1306 is used to directly connect one or more displays 1322 to computer system 1300. These displays 1322, which may be non-intelligent (i.e., dumb) terminals or fully programmable workstations, are used to allow system administrators and users (also referred to herein as “operators”) to communicate with computer system 1300. Note, however, that while display interface 1306 is provided to support communication with one or more displays 1322, computer system 1300 does not necessarily require a display 1322, because all needed interaction with users and processes may occur via network interface 1308.
Network interface 1308 is used to connect other computer systems and/or workstations (e.g., 1324 in
Alternatively, I/O adapters may be used to connect PTZ adjustment mechanisms 16′, video data processors 20, and/or motion tracking processors 24′ (in lieu of connection via digital surveillance cards 1309) to computer system 1300. In addition, if video data processors 20 are resident on digital surveillance cards 1309 or elsewhere in computer system 1300, I/O adapters may be used to connect video cameras 12 (shown in
At this point, it is important to note that while the present invention has been and will be described in the context of a fully functional computer system, those skilled in the art will appreciate that the present invention is capable of being distributed as a program product in a variety of forms, and that the present invention applies equally regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of suitable signal bearing media include: recordable type media such as floppy disks and CD ROM (e.g., 1314 of
Method 1500 continues by computing camera pan and tilt adjustment data (step 1525) and computing camera zoom adjustment data (step 1530). Steps 1525 and 1530 are described in detail above with respect to the single camera embodiment. In the preferred embodiments, all adjustments assume constant velocity or acceleration of the object to determine the next camera location. However, those skilled in the art will appreciate that the present invention applies equally to adjustments made without these assumptions. The pan and tilt adjustment data are calculated relative to the object center point and the center point of the camera field of view. The zoom adjustment data are based on net contraction or expansion of the object boundary. Preferably, steps 1525 and 1530 are executed concurrently (in a multitasking fashion) as shown in
Having calculated the pan, tilt and/or zoom adjustment data, method 1500 proceeds (step 1535) to send the pan, tilt and/or adjustment data to PTZ adjustment mechanisms 16′ (shown in
Method 1500 then determines whether the object is moving toward another camera system's field of view (step 1540). The determination of 1540 is based on motion vector data. In one embodiment, the pan, tilt and/or zoom adjustment data calculated in steps 1525 and 1530 may be used along with knowledge of other camera system's field of view boundary data. In this embodiment, step 1540 may determine whether the predicted camera pan and tilt adjustments mvcamera(n+1) and/or estimated zoom factor ZFest(n+1) point toward the other camera system's field of view boundary. In another embodiment, motion vector data from video data processors 20 (shown in
If the object is determined not to be moving toward another camera system's field of view (step 1540=NO), method continues by determining whether tracking is to continue (step 1545). If tracking is to continue (step 1545=YES), method 1500 loops back to step 1520. On the other hand, if tracking is not to continue (step 1540=NO), method 1500 loops back to step 1505. As mentioned above, step 1540 may be executed by video data processors 20, motion tracking processors 24′, and or system processor 1215. If this step is executed by video data processors 20′, video data generated by video data processors 20 in the execution of the step are provided to motion tracking processors 24′. On the other hand, if this step is executed by motion tracking processors 24′, video data processors 20 provides any video data to motion tracking processors 24′ necessary for execution of the step.
If the object is determined to be moving toward another camera system's field of view (step 1540=YES), method 1500 continues by predicting whether the object will enter the other camera system's field of view (step 1550). The prediction of 1550 is based on motion vector data. In one embodiment, the pan, tilt and/or zoom adjustment data calculated in steps 1525 and 1530 may be used along with knowledge of the camera center location and the other camera system's field of view boundary data. Step 1550 may, for example, determine whether the predicted camera pan and tilt adjustments mvcamera(n+1) extend to the other camera system's field of view boundary. Alternatively, or in addition, step 1550 may determine whether the estimated focal length f1est(n+1) extend to the other camera system's field of view boundary. Preferably, step 1550 is executed by motion tracking processors 24′ (shown in
If the object is predicted not to enter the other camera systems field of view (step 1550=NO), method 1500 continues to step 1545 and determines whether tracking is to continue.
If the object is predicted to enter the other camera systems field of view (step 1550=YES), method 1500 proceeds (at step 1555) to send tracking data to the system processor. Step 1555 is executed by the motion tracking processors 24′ (shown in
The embodiments and examples set forth herein were presented in order to best explain the present invention and its practical application and to thereby enable those skilled in the art to make and use the invention. However, those skilled in the art will recognize that the foregoing description and examples have been presented for the purpose of illustration and example only. The description as set forth is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible without departing from the spirit and scope of the forthcoming claims. For example, the preferred embodiments expressly extend to mobile video camera systems as well as fixed video camera systems. In another modification, non-constant acceleration may be accounted for by expanding object motion history data across more than two past reference fields. This modification increases tracking accuracy, but with a tradeoff of a requirement for more history data storage. In yet another modification, the tracking time interval may be extended beyond every field, e.g., 1/30th second or ½ second rather than 1/60th second. This modification reduces the history data storage requirement, but with a tradeoff of decreased tracking accuracy.
This patent application is related to a pending U.S. patent application Ser. No. __/______ (docket no. ROC920040315US1), filed ______, entitled “METHOD, SYSTEM AND PROGRAM PRODUCT FOR A CAMERA TO TRACK AN OBJECT USING MOTION VECTOR DATA”, which is assigned to the assignee of the instant application.