This invention relates generally to surveillance systems, and more particularly to directing cameras based on time-series surveillance data acquired from an environment.
Surveillance and sensor systems are used to make an environment safer and more efficient. Typically, surveillance systems detect events in signals acquired from the environment. The events can be due to people, animals, vehicles, or changes in the environment itself. The signals can be complex, for example, visual and acoustic, or the signals can sense temperature, motion, and humidity in the environment.
The detecting can be done in real-time as the events occur, or off-line after the events have occurred. Some real-time and the off-line processing requires means for storing, searching, and retrieving recorded events. It is desired to automate the processing of surveillance data to detect significant events.
Surveillance and monitoring of indoor and outdoor environments has been gaining importance in recent years. Currently, surveillance systems are used in a wide variety of settings, e.g., at homes, offices, airports, and industrial facilities. Most conventional surveillance systems rely on a single modality, e.g., a video, occasionally augmented with an audio. Such video-based systems generate massive amounts of video data. It is a challenge to store, retrieve, and detect events in a video. Computer vision procedures configured to detect events, or persons are either not fast enough for use in a real-time system, or do not have sufficient accuracy for reliable detection. In addition, video invades privacy of the occupants of the environment. For example, it may be illegal to acquire videos from designated spaces.
For some applications, it is desired to detect patterns in the surveillance data, e.g., human movement patterns, and provide an interface for identifying and selecting those patterns of interest.
One embodiment of the invention disclose a method for directing a camera based on time-series data, wherein the time-series data represent atomic activities sensed by sensors in an environment, and wherein each atomic activity includes a time and a location at which the each atomic activity is sensed, comprising: providing a spatio-temporal pattern of the specified atomic activity, wherein the spatio-temporal pattern is based only on the time and the location of the atomic activities, such that a spatio-temporal sequence of the atomic activities forms the specified primitive activity; detecting, in the time-series data, a sensed primitive activity corresponding to the spatio-temporal pattern to produce a result, wherein the detecting is performed by a processor; and directing the camera based on the result.
Another embodiment of the invention disclose a system for directing a camera based on time-series data, wherein the time-series data represent atomic activities sensed by sensors in an environment, and wherein each atomic activity includes a time and a location at which the each atomic activity is sensed, comprising: means for providing a spatio-temporal pattern of the specified atomic activity, wherein the spatio-temporal pattern is based only on the time and the location of the atomic activities, such that a spatio-temporal sequence of the atomic activities forms the specified primitive activity; control module configured to detect, in the time-series data, a sensed primitive activity corresponding to the spatio-temporal pattern to produce a result; and means for directing the camera based on the result.
In some embodiments, the system includes a surveillance database 130. The processor 111 is conventional and includes memory, buses, and I/O interfaces. The environment 105 includes sensors 129 for acquiring surveillance data 131. As described below, the sensors include, but are not limited to, video sensors, e.g., cameras, and motion sensors. The sensors are arranged in the environment according to a plan 220, e.g., a floor plan for an indoor space, such that locations of the sensors are identified.
The control module receives the time-series surveillance data 131 from the sensors. The time-series data represent atomic activities sensed by the sensors in the environment. Each atomic activity is sensed by any one of the sensors and includes a time and a location at which the atomic activity is sensed. Examples of the atomic activity are a motion sensed by a motion sensor, or as can be observed in an image acquired by a camera. The location of the atomic activity is typically determined based on a location of the sensor on the plan 220. In one embodiment, the locations of the sensors and the atomic activities are stored in the surveillance database.
As shown in
The control module detects the pattern 122 in the time-series data 131 producing a result 190. In one embodiment, the pattern is acquired via the interface 121. Typically, the pattern is specified by a user via an input device 140.
Based on the result, a command is executed. The type of the command can be specified by the user. Non-limiting examples of the commands are displaying a relevant video on the interface, controlling, e.g., directing, a camera, signaling an alarm, and/or transmitting a message.
In one embodiment, the control module detects the pattern in real-time directly from the time-series data 131. In another embodiment, the time-series data are stored in the surveillance database, and the control module queries the database. In one embodiment, the control module detects the pattern upon receiving the atomic activity. In yet another embodiment the control module detects the pattern periodically.
Sensors
The time-series data 131 are acquired by a network of sensors 129. The sensors can be heterogeneous or homogeneous. The sensors 129 can include video cameras and motion detectors. Other types of sensors as known in the art can also be included, e.g., temperature sensors and smoke detectors.
Because of the relative high cost of the cameras and the low cost of the sensors, the number of sensors can be substantially larger than the number of cameras; i.e., the cameras are sparse and the detectors are dense in the environment. For example, one area viewed by one camera can include dozens of detectors. In a large building, there could be hundreds of cameras, but thousands and thousands of detectors. Even though the number of sensors can be relatively large, compared with the number of cameras, the amount of data acquired by the sensors is small compared with the video data.
In one embodiment, the cameras do not respond to activities sensed in a fixed field of view, but simply record images of the environment. It should be noted, that the videos can be analyzed using conventional computer vision techniques. This can be done in real-time, or off-line after the videos are acquired. The computer vision techniques include object detection, object tracking, object recognition, face detection, and face recognition. For example, the system can determine whether a person entered a particular area in the environment, and record this as a time-stamped event in the database.
However, in another embodiment, the cameras include pan-tilt-zoom (PTZ) cameras configured to orient and zoom the camera in response to the atomic activities detected by sensors.
Atomic Activity
A timeline 230 shows the atomic activities in a “player piano roll” format, with time running from left to right. A current time is marked by a vertical line 221. The atomic activities for the various detectors are arranged along the vertical axis. The rectangles 122 represent the atomic activities (vertical position) being active for a time (horizontal position and extent). On each horizontal arrangement for a particular sensor is a track outlined by a rectangular block 125.
The visualization of the video has a common highlighting scheme. The locations of the atomic activities 233 can be highlighted with color on the floor plan 220. Sensors that correspond to the atomic activities are indicated on the timeline by horizontal bars 123 rendered in the same color. A video can be played that corresponds to events, at a particular time, and a particular area of the environment.
Primitive Activity
According to one embodiment, the atomic activities are related in space and time to form the primitive activity. For example, a person walking down a hallway causes a subset of the motion sensors mounted in the ceiling to signal atomic activities serially at predictable time intervals, depending on a velocity of the person.
Live Alarms
One requirement of an on-line surveillance system is the ability to set and signal “live alarms” immediately. Live alarms allow a user to acquire visual evidence of activities of interest as the activities happen in the environment. The alarms can correspond to abnormal activities such as someone entering an unauthorized space, or as an intermediate step toward performing some other task such as counting the number of people who access a printer-room during the day.
One embodiment uses the motion sensors to detect the primitive activities, and to direct the PTZ camera at the activities of interest. Typically, the primitive activities correspond to a sequence of sensor activations. The sequence of activations can be specified by the user by tracing the path 160 of interest on the plan forming an ordered sequence of a subset of the sensors. The alarm “goes off” whenever the specified arrangement of activations occurs along the path.
In one embodiment, the primitive activity is modeled as a finite state machine (FSM), where each sensor acts as an input, and the specified arrangement is parsed by the FSM. For incoming sensor data, all specified FSMs are updated. When one of the FSM detects the specified arrangement, an alarm is signaled, and a command is sent to the control module to direct the cameras toward the location of the sensor that caused the alarm. After the camera(s) are directed to the appropriate location, visual evidence of the activity at the scene is acquired for further image analysis.
The control module, upon receiving the atomic activity, detects the primitive activity and outputs a command 535 to the camera. The command may include navigation parameters of the camera optimal to acquire the activity of interest. In one embodiment, the control module uses a policy module 540 to determine the command. In another embodiment, the control module queries the surveillance database 130 to determine the command. An example of the command is a tracking a movement of a user 550 sensed by the sensors 520.
As described in more details below, in one embodiment, the control module detects the events to issue the command.
Policy Module
In general, there is more than one camera 930-931 that can observe the corresponding location of the activity. For each ordered pair of sensor activation and camera, we define a cost of allocation. If a camera is not in the visibility set Ai(t), the cost of allocation is infinity. For cameras in the visibility set vis(Ai(t)), the allocation cost is a change in PTZ parameters required to acquire the activity sensed by the sensor, i.e., a required change in the state of the camera to acquire the sensed primitive activity.
If Sk(t) is a current state of the camera Ckεvis(Ai(t)), ŝk is the state required to observe the sensor. In one embodiment, Ŝk is determined from a calibration database. Then, the cost of allocation
cost(Ai(t),Ck)=d(Sk(t),Ŝk),
where d(.) is a distance metric on a state-space of the cameras.
In another embodiment, the state of a camera is defined as the current PTZ values, i.e., Sk(t)=(p, t, z). In one variation of this embodiment, instead of a zoom parameter, image-analysis is used to enhance a resolution of images of faces. Thus, the distance metric d(,) is defined as a Euclidean norm between the current and required pan-and-tilt values. Accordingly, the required parameters to observe the ith event Ai(t) is Ŝk=({circumflex over (p)},{circumflex over (t)}), and the cost is
cost(Ai(t),Ck)=√{square root over ((p−{circumflex over (p)})2+(t−{circumflex over (t)})2)}.
Events
In some embodiments, it is desired to specify more complex patterns. As defined herein, the event is a pattern of activities involving multiple primitive activities and constraints on the primitive activities, wherein the constraints on the primitive activities are spatio-temporal, sequential and/or concurrent. In some embodiments, the event is mapped to a Petri net (PN) as described below.
General activities in indoor and outdoor environments often involve a number of people and objects and usually imply some form of temporal sequencing and coordination. For example, the activity of two people meeting in the lounge of a large office space and exchanging an object, e.g., a briefcase, includes several primitives:
two people enter the lounge independently;
the people stop near each other;
the object is transferred from one person to the other; and
the people leave.
The activity starts with two independent movements, which occur concurrently. The movements come to the temporal synchronization point, at which time the suitcase is exchanged, and then diverge again into two independent motions as the people leave the room. Such situations where observations form independent streams coming into synchrony at discrete points in time are modeled by embodiments of the invention using a formalism of Petri nets.
Petri Nets
Petri nets (PN) is a tool for describing relations between conditions and events. Some embodiment use the PN to model and analyze behaviors such as concurrency, synchronization and resource sharing.
Formally, the Petri net is defined as
PN={P,T,→},
where P and T are finite disjoint sets of places and transitions respectively i.e. P∩T=Ø, and operator → is a relation between places and transitions, i.e., →⊂(P×T)∪(T×P).
Also, in the PN there exists at least one end place and at least one start place. A preset of a node xεP∪T is a set. x={y|y→x}. A postset of the node xεP∪T is a set x′={y|x→y}.
A transition 850 is enabled if and only if all the input places have a token. When a transition is enabled, the transition can fire. Fire is a term of art used when describing Petri nets. In a simplest case, all the enabled transitions can fire. The embodiments also associate other constraints to be satisfied before an enabled transition can fire. When a transition fires, all enabling tokens are removed and the token is placed in each of the output places of the transition (the postset).
When each of the enabling places 731 and 732 of a transition t3 740 has the token, the transition t3 is ready to fire when the associated constraint occurs, i.e., when the two persons A and B come near each other.
Then, the transition t3 fires and both tokens are removed and a token is placed in the output place p5 750. Now a transition t4 760 is enabled and ready to fire. The transition t4 fires when the briefcase is exchanged between the two people, and the token is removed from the place p5 and placed in the end place p6. When the token reaches the end place, the PN 700 is completed.
The Petri net is used by some embodiments to represent and recognize events in the time-series data. Those embodiments define the events based on primitive actions and constraints for those actions. In the embodiment, the primitive actions are human movement patterns, which are detected using the sensors. In some embodiments, the constraints are described using conjunction operators, e.g., “AND,” “OR,” “AFTER,” “BEFORE.” The events and constraints are mapped to the Petri nets.
It is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
This application is related to U.S. patent application Ser. No. (MERL-2104) 12/______ filed Dec. 28, 2009, entitled “Method and System for Detecting Events in Environments” filed by Yuri Ivanov, co-filed herewith and incorporated herein by reference.