The present invention pertains to tracking and particularly tracking with cameras. More particularly, the invention pertains to tracking with static cameras.
The invention is a system for object tracking with a pan-tilt-zoom camera in conjunction with an object range sensor.
a and 3b show image and screen coordinates, respectively;
a, 11b and 11c show measurements of pixel information;
d, 11e and 11f show plots of control inputs corresponding to the measurements of
a, 12b and 12c show object motion plots;
d, 12e and 12f show plots of camera motion inputs corresponding to the plots of
a, 13b and 13c show the measurements of
d, 13e and 13f show the control inputs of
a, 14b and 14c show the object motion plots of
d, 14e and 14f show the camera motion input plots of
The present system, the invention, may involve autonomous tracking with static cameras. One of the challenges is maintaining an image at the center of a camera screen at a commanded pixel width of the image. Related art methods of tracking objects with cameras appear to need enormous tweaking or tuning by experts. The present system is a model based control approach that makes the tweaking possible by technicians as it reduces the control tuning to the tuning of three independent parameters. Thus, the present approach or system may make the installation of surveillance networks of pan-tilt-zoom (PTZ) cameras easy and economical.
The present system may provide controls for object tracking by a static PTZ camera in conjunction with an object range sensor. Measurements of the image centroid position and image width may be obtained from image processing, and object depth from the range sensor, may be used to drive the pan, tilt and zoom rates. The system may include an exponential observer for the object world coordinates based on the constant acceleration point mass model, and an exponentially stabilizing nonlinear control law for the pan, tilt and zoom rates. “Control law” may be regarded as a term of art relating to a specific algorithm or pattern of control generating commands from a controller or a control system.
The overall system may have stable performance in a wide variety of conditions of interest. The results for static cameras may be extended to those on moving platforms. With the present approach, depth may be estimated when the object is within the view of two cameras.
Much tracking of objects by cameras may use local models of image formation in conjunction with both model-based, such as a linear quadratic regulator (LQR) approach, and non-model-based control, such as a proportional-integral-derivative (PID) approach. A challenge in these approaches is that the controller for each camera should be specially tuned to the unique environmental conditions at its location. This may make the establishment of large networks of cameras rather complex and expensive. The present approach or system should not require special tuning for a change of location or place of the camera.
The present system may begin as an attempt to integrate image processing and control to create a scalable and inexpensive network of tracking cameras. It may include an additional measurement of depth in conjunction with a detailed model of the image formation process. This component may be part of the control system. The depth measurement may be regarded as an important component of the present approach or system.
The dynamics of image processing between camera control inputs and image measurements tend to be highly nonlinear. Also, image processing may result in very noisy measurements. Besides, there may be several latencies in the system, which include those of the image processing, the network, and the actuators.
The parameters to note or track may include the coordinates of the center of mass (or equivalent) of the pixel pattern and a relevant measure of the pattern size such as image width, or the number of pixels in the pattern, or the distance between specific pixels inside the pattern, or any related measure whose variance is small. However, the present control laws may regulate the image coordinates rather than pixel coordinates. This approach may permit a decoupling of the pan and tilt controls from the zoom control. The model object motion may be modeled with point mass constant acceleration models for each of its three-dimensional (3D) coordinates for the purpose of tracking (i.e., an application of an internal model principle in control theory).
The present control system may overcome the challenges associated with nonlinear dynamics, noise and multiple latencies and provides exponential tracking. Moreover, this control design may involve only the selection of three independent parameters, implementable even by a technician, or, better still, the selection may be automated.
Several steps of the processing of system 10 shown in
Motion and camera models may be significant in the present system 10. Two different models may be dealt with—a motion model of the object and a processing model of the camera adequate for the purpose of tracking control.
The camera model may be described. Both for the purpose of control design and building a simulation test bed for the control system, one may model all of the necessary steps of the image formation process, and the processing of the image. Since one may control pan, tilt and zoom (focal length), and measure camera outputs of image center position and image width (or an equivalent size parameter with minimum variance), one needs the mapping between the position and size of the object to its position and size in the camera image plane.
One may treat the camera as mounted on a ceiling and with an inertial coordinate system fixed to it. An image coordinate system 40 with relevant axis orientations is shown in
An initial step of processing may include a transformation from inertial coordinates to camera coordinates. Since the camera is capable of two rotations, pan and tilt, the coordinate transformation may be obtained by rotating the inertial coordinates through the tilt and pan angles—
where xi is the position of the object in camera coordinates, xo is the position of the object in the inertial world coordinate system, Oc is the origin of the camera coordinate system in the inertial coordinate system, Φ is the pan angle, and ω is the tilt angle.
b shows a screen coordinate system 50.
where f item 58, is the focal length, Sx and Sy are pixel scaling factors, (xp,yp) are the pixel coordinates of the point, and (xp0,yp0) show the origin of the pixel coordinate system (e.g., it may be at (160, 119) pixels in the present camera).
Tangential and radial distortion in the optical system may be ignored as the present camera should have little distortion. If the distortions are monotonic functions, their inverses may be used (for compensation) within the control laws derived to provide essentially the same results as a camera with no distortion.
Magnification may be noted. For an object of constant width w that is orthogonal to the optical axis of the camera, the width of the image on the screen may be obtained (i.e., this is usually an approximation, but generally a good one) from the equation for magnification by a thin lens,
where ws, item 56, is the width of the object's image 55 (
Image processing and actuation may be noted. One may model the image processing that yields the position and size of the object on the image plane as a time delay τp since its time of calculation is fairly predictable. Even if this latency cannot be calculated a priori for an image processing algorithm, one may simply calculate it at every measurement through use of time stamps, for use in the estimation and prediction. The control inputs may include the pan, tilt and zoom rates,
{dot over (Φ)}=u1(t−τΦ), 2.6
{dot over (ω)}=u2(t−τω), and 2.7
{dot over (f)}=u3(t−τf), 2.8
where τΦ, τω and τf are the latencies of the motors controlling pan, tilt and zoom rates. In the case where the camera platform is rotating, its yaw δ1(t) and pitch rates δ2(t) enter as disturbances into equations 2.6 and 2.7,
{dot over (Φ)}=u1(t−τΦ)+δ1(t) and 2.9
{dot over (ω)}=u2(t−τω)+δ2(t). 2.10
Motion modeling may be done in world coordinates. Object motion may be modeled with constant acceleration models for each of its 3D coordinates. Denoting the state of each of the coordinates by sj=(pjvjaj), where pj=x0, or yo, or zo, each of the motion models may then be of the following form,
Using the measurements of pixel coordinates and depth, observers and predictors may be designed for object motion using the model herein.
Estimation and prediction is a significant aspect. The world coordinates of the object may be calculated from pixel coordinates and depth by inverting the operations of projection and coordinate transformation at time (t−τp), where τp is the image processing delay,
T−1(Φ, ω)=TT(Φ, ω) because T is an orthogonal rotation matrix.
Some filtering of measurements may be necessary before the algebraic operations mentioned herein. Where needed, this filtering may be tailored to the specific noise characteristics of the measurements. For the most part, the filtering may be done by the observers for the world coordinates. One purpose may be to maintain consistency in the system modeling assumptions. Observers for the motion models of indicated herein may be of the standard Luenberger form,
where Lj is the observer gain that can be set using a variety of design procedures (such as from a Ricatti equation in a Kalman filter) τk=τΦ, τω, τf depending upon the control law which uses the prediction. The reason for using predictions at different points in the future may be that each of the actuators has a different latency. This way, one may be able to accurately the handle the multiple latencies in the system to produce an exponential observer. The current framework may also permit adding the latencies of the observer and control law calculations. Finally, the approach herein may permit more complicated linear-time-invariant dynamic models for the object world coordinates. For example, one may be able to use models of gait, and typical time constants of human walking or running. Predictions of image coordinates and their derivatives may be obtained with equation 3.5 to attain state predictions, at the appropriate time, of the world coordinates. Equation 2.1 may yield the image coordinates, and differentiating may yield a equation for higher derivatives of image coordinates. For example, image coordinate velocities are given by
{dot over (x)}i={dot over (T)}(Φ, ω)x0+T(Φ, ω){dot over (x)}0+{dot over (O)}c, 3.6
where {dot over (T)}(Φ, ω) refers to an element by element differentiation of the matrix T(Φ, ω), and {dot over (O)}c is the translational velocity of the camera.
The control system may include two parts. The first part is the tracking of the image of the object on the screen through pan and tilt inputs, and the second is the regulation of image size on the screen by control of focal length (zoom control). In developing the equations for tracking on the screen, one may assume that the image of the object being tracked remains within the screen. The zoom control may ensure this over most of the camera's field of view (FOV). However, this control may naturally degrade when the tracked object is very far from the camera or very close, and the zoom limits are reached. This situation may be ameliorated in the following ways. For instance, when the object is closer to the camera, the detector may focus on a smaller portion of the pattern, and when the object is far away, the detector may focus on a larger portion of the pattern. Moreover, for the near field problem—where the object approaches the camera—one may increase the time of prediction and move the camera into position to view the object once it is sufficiently far away. In addition, one may note that the control inputs are computed for a future time, t+τk, taking into account the actuator latencies.
One may do position tracking of an object on a screen. The controller may implement detection in conjunction with a particle filter, and with predictions from delayed measurements to regulate a pattern position of the tracked object at the center of the screen.
Screen position tracking may be done. An objective of the tracking is to maintain the center of the image at the center of the image plane. One may use the measurements of the image center from the particle filter and control the pan and tilt rates to control the center point (or any other reference point) of the image plane. Since the actuation may control the pan and tilt angular rates, i.e., velocities, one can use an integrator backstepping type control approach. In the control with the present system, one may ignore actuator dynamics because they appear insignificant (less than 30 ms) compared to the latencies of the actuators themselves (100 ms), the latency of image processing (200 ms), the network (100 ms), and the implementation of the control law (50-100 ms). Because of the speed of the responses of the camera actuators, one may also ignore the rigid body dynamics of the camera itself. Note however, that first order actuator lags may be accommodated within the current estimation plus control framework—although the resulting control laws may be more complex and use acceleration estimates.
A key aspect of the control approach is that regulation of the image coordinates xi and yi to zero may automatically result in the image being centered at (xp0,yp0) in the pixel coordinates and permit decoupling of the pan and tilt controls from the zoom control. The pan and tilt control laws, respectively, may be as in the following,
where αΦ>0 and αω>0 set the convergence rates of xi and yi. The control patterns may be based on feedback linearization, and are exponentially stable in conditions where,
under a full state feedback. The result may be immediate when the expressions for {dot over (x)}i and {dot over (y)}i are derived from expansion of equation 3.6, and the control inputs are substituted for the pan and tilt rates.
Singularity in the control law may be reviewed. The pan control law generally never goes singular in practice because the object is well out of view of the camera before zi=0—the object passing through the image plane of the camera. Thus, for cases where tracking is possible, zi>0, i.e., the object may be imaged by the camera. Secondly, zi+sin(Φ)x0=0 needs the pan angle and the xo to have opposite signs for zi≠0, and this may mean that the object is on one side and the camera axis is looking the other way. This may also mean that the object is not within the field of view, unless it is very close to a camera with a wide view (e.g., a few centimeters), a situation which surveillance cameras may be positioned to avoid. For a camera that is used in the present system, the maximum lateral distance at which an object may be picked up by the imager is
and thus the singularity will not occur since
sin Φ=−zi/xo
will not be satisfied.
Although the control law is exponentially stable under full state feedback, output feedback using the observers and predictors as noted herein may blow up under specific conditions, such as high speed motion of the object (this means angular motion with respect to the camera—no human being can move fast enough to evade the camera), and large initial estimation errors of object velocity and acceleration. This appears inescapable due to the latencies in the system. Besides, there is the possibility of the object escaping the finite screen of the camera before tracking is achieved.
There may be image width regulation through zoom control. To derive this control law, one may assume that the width of the object w is a constant. This may be equivalent to assuming that either the object does not rotate and change its width fast, or that the detector keeps track of some measure of an object dimension that does not change very fast. Using the formula for magnification in equation 2.5, and approximating it as ws=f/ziw and rearranging, one may have
and differentiating it yields
which may permit a control approach for {dot over (f)}=n3(t−τf) to exponentially stabilize the screen image width ws relative to a reference width wref,
The present approach may record the 3D trajectory of an object moved in front of the PTZ camera along with the trajectory of its image centroid and a time history of its image width, and then test the performance of the control laws in a high fidelity simulation.
The present system may use PTZ devices for surveillance. Measurements may be taken and the resultant signals can be converted to drive or control actuators. There may be control inputs with pan, tilt and zoom rates to the respective actuating device or motor.
There may be several cameras, or there may be minimally one camera and a range or depth finder, e.g., ladar. Depth may be along the camera's axis. The depth is one significant characteristic of the present system. The controller 70 may provide an implementation of the control laws which can be incorporated by equations 4.1, 4.2 and 4.5 herein. Equation 4.1 may exploit the camera operation. There may be a state estimator or predictor 60 for solving a non-linear state estimation law.
There may be object tracking with static cameras for surveillance. There may be a large or small network of cameras. There may be at least two sensing-like devices or cameras at various surveillance posts or stations. One device may be used to track an object (e.g., a person) and another device to track the object's three-dimensional (3-D) coordinate location. At another surveillance post or station there may be another set of devices that can handle a field of view, though not necessarily, overlapping the field of view of the previous devices or cameras which may hand off the tracking of the object to the next set of devices (or cameras). The object, such as a person, being tracked may be marked. If the person is standing still, e.g., in a queue, then the present tracking system may obtain some data of the person for facial recognition, or for a close match, to reduce the number of searches needed for identification of the tracked object. There may be several identifying markers on the object or person.
The present system 10 may eliminate some guards in secure areas with its tracking capabilities. The cameras may be placed in strategic places or where the needs are critical. It is difficult for guards to track numerous objects or persons simultaneously. The present system may be very helpful under such situations. With related art surveillance camera systems, e.g., having PID control, the latter need to be tuned or replaced with different control schemes adjusted for particular places. The present system may have a global law that is applicable at various places of camera placement. Control tweaking may also be diminished or eliminated with the present control law (i.e., equation 4.1).
Significant hardware components of the present system 10 may include the PTZ camera, range finder and a processor. The camera system may utilize wireless networks such as for communication among the cameras so as, for example, to provide a handoff on tracking of a particular subject of object to another set of cameras.
The processing and networking of the system 10, particularly the system for tracking objects with static cameras, may incorporate algorithms for various operations such as image processing which may be done using various techniques. The algorithms may include the control laws. There may be invariant space and detection relative to rotation of the target, multi-resolution histograms, and the significant characteristic of depth information.
Camera parameters and data generation may be considered. The actuator saturations of the camera (from its manual) may be noted as
These limits may be used in the simulation of camera control. The rate limits may be
−2π≦{dot over (Φ)}≦2π rad/sec
−2π≦{dot over (ω)}≦2π rad/sec
−15≦f≦15 mm/sec. 5.2
The scaling factor from physical units to pixels may be determined as S=88300,Sx=1:1S; Sy=S from a calibration.
Approximate latencies of the actuation may be determined from a difference between the time of issue of a command and the time of the first sign of motion in the image. Thus, τΦ and τω appear to be in the range of 50 to 75 ms, while τf appears in the range of 125 to 150 ms. The accuracy of this measurement may be limited by the 1/60 sec (17 ms) frame acquisition time.
In an illustrative illustration, a planar black target with a width of about 0.141 m may be moved in front of the camera at an almost constant depth, and its images may be acquired at a frequency of approximately 10 Hz. The position and orientation of the coordinate system of the camera may be calculated with respect to an inertial coordinate system in a laboratory in a test. The measured positions of the black target may be transformed to a coordinate system fixed to the camera and corresponding to the camera axis at zero pan and tilt. A time history of points may be generated for about a 100 seconds with a known pan, tilt and zoom for the purpose of testing the tracking control system.
Simulated tracking of an experimental trajectory may be performed. For the simulated tracking, the observers in equation 3.5 may be designed by pole placement to yield L=(26.25 131.25 125) for all of the three observers. The poles can be placed at (−20 −5 −1.25) with the maximum speed of convergence for the position and slower convergence for velocity and acceleration. The control laws may be designed as the following,
αΦ=0.001; αω=0.05; αf=0.1.
Signal 111 may be multiplexed with signal 94 to result in a signal 112 that goes to an f(u) module 113. From module 113 may proceed a Zi2 signal 114 as an output of module 60. Signal 107 from module 103 may be demultiplexed into signals 115, 116 and 117 to be inputs u0, u1 and u2, respectively, of a translation and rotation (Fcn) module 118. Signal 94 from selector module 93 may be demultiplexed into signals 119 and 121 to be inputs u3 and u4, respectively, of module 118. y0, y1 and Y2 outputs of module 118 may be multiplexed into a signal 122 to combiner module 123. A signal 124 of a constant [x0;y0;z0] module 253 may go as another input to module 123. A resultant signal 125 from module 123 may be an xi signal at an output of state estimator module 60. A signal 126 may provide the ws signal through module 60 to an output of it.
A controller or control law module 70 of
A signal 135 may be output from the summer module 254. Signals 107, 108, 125 and 135 may be multiplexed in to a signal 141. Signal 141 may go to an f(u) (Fcn) module 142 for tilt control. An output signal 143 from module 142, and signals 107, 108, 125 and 135 may be multiplexed into a signal 136 which may go to an f(u) (Fcn) module 137 for pan control. An output signal 138 from module 137 may go to a saturation module 139. Signal 143 from module 142 may go to a saturation module 144.
The signal 135 may go to a selector module 147. Module 147 may have an output signal 148. The signal 114 may go to a state-space [x′=Ax+Bu; y=Cx+Du] module 145. Module 145 may have an output (zi) signal 146. A w reference module 149 may provide a wref output signal 151. The input signal 126 to module 70 may be regarded as an estimate of w. Signals 126, 146, 148 and 151 may be multiplexed into a signal 152. The signal 152 may go to an f(u) (Fcn) module 153 for zoom control. An output 154 from module 153 may go to a saturation module 155. Output signals 156, 157 and 158, from saturation modules 139, 144 and 155, respectively, may be multiplexed into the rates signal 134.
Signal 208 and signal 187 may be input to a product (x) module 209 for a product output signal 211. Signal 211 may go to a matrix gain (K*u) amplifier module 212. An output signal 213 and an output signal 214 from a (xp0, yp0) module 215 may go to a summing module 216. An output signal 217 may proceed from module 216 to an input of a transport delay module 218. An output signal 219 from module 218 and an output signal 222 from a random number source or generator module 221 may go to a summer module 223. An output signal 224 from module 223 may be input to a saturation module 225. A signal 226 from module 225 may be an Xp, Yp output from camera module 90.
The signals 126, 187 and 202 may be multiplexed into a signal 227. Signal 227 may be an input to a magnification (u(4)/(1*0+u(3)/u(5))) module 228. The signal 208 may go to an off-axis correction (f(u)) module 231. A signal 229 from module 228 and a signal 232 from module 231 may go to a product (x) module 233. An output 234 from module 233 may go to an amplifier module 235 with a gain K. A transport delay module 236 may receive a signal 237 from module 235. A signal 238 from module 236 and a signal 241 from a uniform random generator module 242 may be input to a summer module 239. An output signal 243 from module 239 may be a ws output for the camera module 90.
Signal 201 may go to a transport delay module 244. An output signal 245 from module 244 and a signal 246 from a uniform random number generator 247 may go to a summer 248. An output signal 249 of summer 248 may be the z output of the camera module 90.
a, 11b and 11c show the measurements of pixel positions and widths from experimental data (dotted lines), i.e., no control input and the corresponding positions and width using the control laws (herein) in conjunction with the estimation and prediction (solid lines). The set points for each of the measurements are also shown in these Figs. as solid lines, xp0=160; yp0=119; wref=180.
For the purpose of illustrating the immunity of the control system to noise, results corresponding to those in
The actuator chatter produced by noise in measurements may be greatly ameliorated by the quantization of actuator position, or the discrete number of actuator positions available (as the actuators are stepper motors).
The measurements in
Exponential tracking of object motion may be demonstrated with PTZ cameras. While both the control law and the observer are exponentially stable, their combination will not necessarily be exponentially stable under all initial conditions. However, this stability appears achievable for most human motion under the cameras, given the camera's field of view, actuator saturation and rate limits, and latencies.
While there may be an objection to the need for depth measurements, the latter might not be that expensive to implement. Simply ensuring that each point is in the field of view of two cameras may give a depth measurement of adequate accuracy. Other mechanisms for providing depth measurements may include laser range-finders, ladars, and radars. Automobile deer detection radars may be adequate as their cost appears to be dropping significantly.
One may demonstrate coordinated tracking of an object with multiple cameras, include motion compensation in the control law to track objects from moving platforms, such as uninhabited aerial vehicles (UAVS) and unmanned ground vehicles (UGVs), improve target identification and acquisition, and exploit synergy between image processing and control to render the image static for longer periods of time, permitting faster and more reliable image processing.
In the present specification, some of the matter may be of a hypothetical or prophetic nature although stated in another manner or tense.
Although the invention has been described with respect to at least one illustrative example, many variations and modifications will become apparent to those skilled in the art upon reading the present specification. It is therefore the intention that the appended claims be interpreted as broadly as possible in view of the prior art to include all such variations and modifications.
Number | Name | Date | Kind |
---|---|---|---|
5164827 | Paff | Nov 1992 | A |
5644386 | Jenkins et al. | Jul 1997 | A |
5850352 | Moezzi et al. | Dec 1998 | A |
5862517 | Honey et al. | Jan 1999 | A |
5912700 | Honey et al. | Jun 1999 | A |
6042050 | Sims et al. | Mar 2000 | A |
6151424 | Hsu | Nov 2000 | A |
6195122 | Vincent | Feb 2001 | B1 |
6522396 | Halmos | Feb 2003 | B1 |
6707487 | Aman et al. | Mar 2004 | B1 |
6867799 | Broemmelsiek | Mar 2005 | B2 |
20030095186 | Aman et al. | May 2003 | A1 |
20030210329 | Aagaard et al. | Nov 2003 | A1 |
20030219146 | Jepson et al. | Nov 2003 | A1 |
20040100563 | Sablak et al. | May 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20070286456 A1 | Dec 2007 | US |