This invention relates to surveillance systems. Specifically, the invention relates to video-based surveillance systems.
In its original form, a tripwire was an arrangement in which a wire, string, or the like was stretched across a path, and if someone or something happened to trip over the wire or otherwise pull it, some response was triggered. For example, such a response could be detonating a landmine, sounding an alarm, or recording an event (e.g., triggering a counter, camera, etc.). Today, tripwires are often, for example, implemented as beams of light (e.g., laser, infrared, or visible); when someone or something breaks the beam, a response is triggered.
An example of a conventional tripwire using a light beam is shown schematically in
Conventional tripwires are advantageous in that they are at least conceptually simple to use. They also require a minimum of human intervention, once they have been installed.
Conventional tripwires, however, have a number of disadvantages. For example, they can not discriminate between triggering objects of interest and those not of interest. As an example, one may be interested in how many people, but not dogs, walk down a path; however, either a person or a dog would trigger the tripwire. It is also problematic if a group of people walk together, resulting in a single triggering of the tripwire, rather than one for each person.
Furthermore, conventional tripwire arrangements generally involve the installation of dedicated equipment. For example, considering the example of a laser tripwire, a laser source and a laser detector must be installed across a path of interest. Additionally, such dedicated equipment may be difficult to install in such a manner that it is not easily detectable.
Additionally, a conventional tripwire does not afford a high degree of flexibility. Conventional tripwires typically detect if someone or something passes across it, only, without regard to direction of crossing. Furthermore, because they extend in straight lines, only, conventional tripwires are limited as to the regions across which they may be set up.
Conventional video surveillance systems are also in common use today. They are, for example, prevalent in stores, banks, and many other establishments. Video surveillance systems generally involve the use of one or more video cameras, and the video output from the camera or cameras is either recorded for later review or is monitored by a human observer, or both. Such a system is depicted in
In contrast with conventional tripwires, video surveillance systems can differentiate between people and animals (i.e., between objects of interest and objects not of interest) and can differentiate the individuals within a group of people walking together. They further provide flexibility over tripwires, in terms of the shape of the regions they can monitor. Also, because video surveillance systems are so widely used, there is no need to install further equipment. However, video surveillance systems also suffer some drawbacks.
Perhaps the most significant drawback of conventional video surveillance systems is that they require a high degree of human intervention in order to extract information from the video generated. That is, either someone has to be watching the video as it is generated, or someone has to review stored video.
An example of a prior-art video-based surveillance system can be found in U.S. Pat. Nos. 6,097,429 and 6,091,771 to Seeley et al. (collectively referred to below as “Seeley et al.”). Seeley et al. is directed to a video security system that includes taking snapshots when an intrusion is detected. Seeley et al. addresses some of the problems relating to false alarms and the need to detect some intrusions/intruders but not others. Image differencing techniques and object recognition techniques are used in this capacity. However, there are many differences between Seeley et al. and the present invention, as described below. Among the most severe shortcomings of Seeley et al. is a lack of disclosure as to how detection and recognition are performed. What is disclosed in these areas is in contrast to what is presented in regard to the present invention.
Another example of a video- and other-sensor-based surveillance system is discussed in U.S. Pat. Nos. 5,696,503 and 5,801,943 to Nasburg (collectively referred to below as “Nasburg”). Nasburg deals with the tracking of vehicles using multiple sensors, including video sensors. “Fingerprints” are developed for vehicles to be tracked and are used to subsequently detect the individual vehicles. While Nasburg does mention the concept of a video tripwire, there is no disclosure as to how such a video tripwire is implemented. Nasburg further differs from the present invention in that it is focused exclusively on detecting and tracking vehicles. In contrast, the present invention, as disclosed and claimed below, is aimed toward detecting arbitrary moving objects, both rigid (like a vehicle) and non-rigid (like a human).
In view of the above, it would be advantageous to have a surveillance system that combines the advantages of tripwires with those of video surveillance systems, and this is a goal of the present invention.
The present invention implements a video tripwire system, in which a virtual tripwire, of arbitrary shape, is placed in digital video using computer-based video processing techniques. The virtual tripwire is then monitored, again using computer-based video processing techniques. As a result of the monitoring, statistics maybe compiled, intrusions detected, events recorded, responses triggered, etc. For example, in one embodiment of the invention, the event of a person crossing a virtual tripwire in one direction may trigger the capture of a snapshot of that person, for future identification.
The inventive system may be implemented using existing video equipment in conjunction with computer equipment. It thus has the advantage of not requiring extensive installation of monitoring equipment. The inventive system may be embodied, in part, in the form of a computer-readable medium containing software implementing various steps of a corresponding method, or as a computer system, which may include a computer network, executing such software.
The inventive system may also be used in conjunction with imaging devices other than conventional video, including heat imaging systems or infrared cameras.
One embodiment of the invention comprises a method for implementing a video tripwire system, comprising steps of: installing a sensing device (which may be a video camera or other such device), if one does not already exist; calibrating the sensing device; establishing a boundary as a virtual tripwire; and gathering data.
Further objectives and advantages will become apparent from a consideration of the description, drawings, and examples.
In describing the invention, the following definitions are applicable throughout (including above).
A “computer” refers to any apparatus that is capable of accepting a structured input, processing the structured input according to prescribed rules, and producing results of the processing as output. Examples of a computer include a computer; a general-purpose computer; a supercomputer; a mainframe; a super mini-computer; a mini-computer; a workstation; a microcomputer; a server; an interactive television; a hybrid combination of a computer and an interactive television; and application-specific hardware to emulate a computer and/or software. A computer can have a single processor or multiple processors, which can operate in parallel and/or not in parallel. A computer also refers to two or more computers connected together via a network for transmitting or receiving information between the computers. An example of such a computer includes a distributed computer system for processing information via computers linked by a network.
A “computer-readable medium” refers to any storage device used for storing data accessible by a computer. Examples of a computer-readable medium include a magnetic hard. disk; a floppy disk; an optical disk, like a CD-ROM or a DVD; a magnetic tape; a memory chip; and a carrier wave used to carry computer-readable electronic data, such as those used in transmitting and receiving e-mail or in accessing a network.
“Software” refers to prescribed rules to operate a computer. Examples of software include software; code segments; instructions; computer programs; and programmed logic.
A “computer system” refers to a system having a computer, where the computer comprises a computer-readable medium embodying software to operate the computer.
A “network” refers to a number of computers and associated devices that are connected by communication facilities. A network involves permanent connections such as cables or temporary connections such as those made through telephone or other communication links. Examples of a network include an internet, such as the Internet; an intranet; a local area network (LAN); a wide area network (WAN); and a combination of networks, such as an internet and an intranet.
“Video” refers to motion pictures represented in analog and/or digital form. Examples of video include television, movies, image sequences from a camera or other observer, and computer-generated image sequences. These can be obtained from, for example, a live feed, a storage device, an IEEE 1394-based interface, a video digitizer, a computer graphics engine, or a network connection.
“Video processing” refers to any manipulation of video, including, for example, compression and editing.
A “frame” refers to a particular image or other discrete unit within a video.
The invention is better understood by reading the following detailed description with reference to the accompanying figures, in which like reference numerals refer to like elements throughout, and in which:
In describing preferred embodiments of the present invention illustrated in the drawings, specific terminology is employed for the sake of clarity. However, the invention is not intended to be limited to the specific terminology so selected. It is to be understood that each specific element includes all technical equivalents that operate in a similar manner to accomplish a similar purpose. Each reference cited here is incorporated by reference as if each were individually incorporated by reference.
Furthermore, the embodiments discussed below are generally discussed in terms of detection of people. However, the invention is not to be understood as being limited to the detection of people. On the contrary, the video tripwire system in the embodiments discussed below can be used to detect objects of all sorts, animate or inanimate. Examples include vehicles, animals, plant growth (e.g., a system that detects when it is time to trim hedges), falling objects (e.g., a system that detects when a recyclable can is dropped into a garbage chute), and microscopic entities (e.g., a system that detects when a microbe has permeated a cell wall).
Analysis system 5 performs analysis tasks, including necessary processing to implement the video tripwire. An embodiment of analysis system 5 is shown in more detail in
Computer system 52 is provided with memory 53, which may be external to, as shown, or incorporated into computer system 52, or a combination of both. Memory 53 includes all memory resources required by analysis system 52 and may also include one or more recording devices for storing signals received from communication medium 2.
In a further embodiment of the invention, sensing device 1 maybe implemented in the form of more than one sensing device monitoring the same location. In this case, the data output by each sensing device may be integrated prior to transmitting data over communication medium 2, or the outputs of all sensing devices may be transmitted to analysis system 5 and dealt with there.
In yet a further embodiment of the invention, sensing device 1 may comprise a number of sensing devices monitoring different locations and sending their data to a single analysis system 5. In this way, a single system can be used for surveillance of multiple sites.
The processes performed by the components shown in
Once sensing device 1 has been installed, it is necessary to calibrate it with analysis system 5. System calibration may be performed, generally speaking, by either explicit calibration, in which the system is told (or automatically determines) the necessary calibration parameters of sensing device 1, or by implicit calibration, in which the system is told (or automatically determines) the size of an object of interest at various locations in the field-of-view of sensing device 1. The purpose of calibration is to provide scale information, i.e., so that the system knows what the size of a person or other object of interest should be in different image areas. This information is especially important for the data analysis step 74. Calibration may be performed in one of, or in a combination of two or more of, three ways: manual numeric calibration, aided segmentation calibration, and fully automatic calibration. Flowcharts of embodiments of these methods are shown in
An embodiment of the aided segmentation calibration method, which uses implicit calibration and may also involve at least some degree of explicit calibration (see below), is shown in
An embodiment of the fully automatic calibration method, which involves implicit calibration, is shown in
The step of determining the average size of a person in an image region 724B is carried out only if a sufficient number of objects to result in a meaningful determination are logged in a given region. The number of determinations needed for a meaningful histogram may be determined empirically and may depend, for example, on the amount and type of activity to which the tripwire will be exposed. For such regions, peaks are detected in the histograms. The highest peak in each image region, i.e., the most frequent occurrence, is assumed to be a single persons of this information is determined, then calibration is successfully carried out 725B, and the system is able to signal its readiness for actual operation.
The process of
Each of the automated calibration methods (aided and fully automatic) requires the segmentation of images into foreground objects and background (see steps 722A and 722B in
The objective of pixel-level background modeling 7221 is to maintain an accurate representation of the image background and to differentiate background (BG) pixels from foreground (FG) pixels. In an exemplary embodiment, this step implements the process disclosed in commonly-assigned U.S. patent application Ser. No. 09/815,385, entitled, “Video Segmentation Using Statistical Pixel Modeling,” filed Mar. 23, 2001, and incorporated herein by reference in its entirety. The general idea of the exemplary method is that a history of all pixels is maintained over several frames, including pixel values and their statistics. A stable, unchanging pixel is treated as BG. If the statistics of a pixel change significantly, it will be considered to be FG. If the pixel stabilizes again, it will revert to being considered a BG pixel. This method serves to alleviate sensor noise and to automatically address changes to the background (for example, in a store, when a person removes an item from a shelf, the shelf will instantaneously be treated as FG but will revert to BG after the scene re-stabilizes).
The objective of foreground detection and tracking 7222 is to combine the FG pixels into FG objects and to track them over a number of frames, to guarantee spatio-temporal consistency. This obtains sets of pixels determined to be FG pixels, as well as their statistical properties, from the pixel-level background modeling 7221. In an exemplary embodiment, the FG pixels are spatially merged into larger FG objects using simple morphology and connected component detection, techniques that are well-known in the art. These objects are tracked using correlation methods over several frames to obtain reliable size information. Exemplary tracking techniques are discussed in, for example, commonly-assigned co-pending U.S. patent application Ser. No. 09/694,712, entitled, “Interactive Video Manipulation,” filed Oct. 24, 2000, and incorporated herein by reference in its entirety. See, also, e.g., Wren, C. R. et al., “Pfinder: Real-Time Tracking of the Human Body,” IEEE Trans. on Pattern Matching and Machine Intelligence, Vol. 19, pp. 780–784, 1997; Grimson, W. E. L. et al., “Using Adaptive Tracking to Classify and Monitor Activities in a Site,” CVPR, pp. 22–29, June 1998; and Olson, T. J. and Brill, F. Z., “Moving Object Detection and Event Recognition Algorithm for Smart Cameras, IUW, pp. 159–175, May 1997. Each of these references is to be considered as being incorporated by reference herein in its entirety.
The third step, object analysis 7223, has a number of functions. Object analysis 7223 may serve to separate and count objects; to discriminate between objects of interest (e.g., people) and “confusers” (e.g., shopping carts); to determine an object's direction of motion; and to account for occlusions of objects. In an illustrative embodiment, determinations regarding an object are made based on one or more of: its size; its internal motion; the number of head-like protrusions (e.g., if people are the objects of interest); and face detection (for example, again, in the case in which people are the objects of interest). Techniques for performing such functions are known in the art, and examples of such techniques are discussed in, for example, Allmen, M., and Dyer, C., “Long-range Spatiotemporal Motion Understanding Using Spatiotemporal Flow Curves,” Proc. IEEE CVPR, Lahaina, Maui, Hi., pp. 303–309, 1991; Gavrila, D. M., “The Visual Analysis of Human Movement: A Survey,” CVIU, Vol. 73, No. 1, pp. 82–98, January 1999; Collins, Lipton, et al., “A System for Video Surveillance and Monitoring: VSAM Final Report,” Robotics Institute, Carnegie-Mellon University, Tech. Rept. No. CMU-RI-TR-00-12, May 2000; Lipton, A. J., et al., “Moving Target Classification and Tracking from Real-Time Video,” 1998 DARPA IUW, Nov. 20–23, 1998; and Haering, N., et al., “Visual Event Detection,” Video Computing Series, M. Shah, Ed., 2001. Each of these references is to be considered as being incorporated by reference herein in its entirety.
Returning now to
Other parameters that may be initialized include a time interval of active detection; a direction of crossing each line as a criterion for event detection (for example, to determine when a person enters an area, as opposed to when it is desired to determine when a person either enters or exits the area); and sensitivity of the detection.
Embodiments of this invention may include various different types of tripwires. For example, a video tripwire need not be straight; one or more curved tripwires may be drawn that follow the contour of one or more regions in a scene. In a similar vein, a video tripwire need not be a single linear segment; a video tripwire may comprise a multi-segment tripwire that is made up of more than one linear segment. Furthermore, a video tripwire need not merely comprise a single tripwire; on the contrary, a video tripwire may comprise “multiple” parallel tripwires that may, for example, require an object to cross all of the tripwires in a particular order or within a particular period of time. Other variations may be possible, as well, and the invention is not limited to these examples.
Embodiments of this invention may include a graphical user interface (GUI). In such embodiments, the user may initialize the system by literally drawing a tripwire on a video image, or an image that is a snapshot from a video stream (e.g., such a “snapshot” may be a frame of a video stream or may be separately acquired). This may be done using a “point and click” interface, wherein a user may select a point on an image using a pointing device, such as a mouse, and then drag a tripwire along the image, thus designating the tripwire. Other components of a tripwire rule, such as directionality (left-to-right, right-to-left, either), object type (human, vehicle, animal, etc.), object speed, etc., may also be selected using a “point-and-click” interface. For example, directionality may be selected as options on a graphical menu selected using, for example, a pointing device, such as a mouse; object type may be selected from a list or pull-down menu using, for example, a pointing device, such as a mouse; and so on.
Another function of initialization 73 is for the user to select various logging options. These options determine what data is collected and may include, but are not limited to:
These various options, in combination, may be considered a video event rule. A video event rule may comprise a prescribed action (such as a “human” crossing a “virtual tripwire” in a prescribed direction) and a prescribed response (such as logging the alert with text and video to a database and sending an e-mail to a particular email address). Video event rules may encompass more complex activities involving other virtual video features, such as areas of interest, along with other classes of activities, such as loitering, leaving a bag behind, or stealing an item, and other types of response, such as activating a Digital Video Recorder (DVR) or sounding an audible alarm.
After initialization 73, the system operates to collect and analyze data 74. If the user has entered a time window, the system starts processing when it is within this time window. When it detects a tripwire event (of a particular type, if specified by the user), it is logged along with accompanying information; types of accompanying information will become apparent below in the discussion of data reporting. In the context of some applications, a tripwire event may trigger an alarm or other response 76 (e.g., taking a snapshot).
An embodiment of an exemplary technique for performing analysis and detecting tripwire events is shown in
Several methods for implementing the determination of direction of a crossing 744 are possible. As a first example, it may be implemented through the use of optical flow methods to objects detected as crossing the tripwire; the use of optical flow methods could also serve to obviate the need for object segmentation. As a second example, trajectory information may be used from object tracking (in step 7222 of
Calibration 72 is of particular importance in the execution of step 74, particularly if only a particular type of object is of interest. For example, if people are the objects of interest, calibration 72 permits step 74 to discriminate between, for example, people and objects that are either smaller (e.g., cats and mice) or larger (e.g., groups of people and cars) than people.
When data has been gathered, it can then be reported to a user 75. In an exemplary embodiment of the invention, a user can query the system for results using a graphical user interface (GUI). In this embodiment, summary information and/or detailed data on one or more individual detections may be displayed. Summary information may include one or more of the following: number of detections, number of people (or other objects of interest) detected, number of multi-person (multi-object) detections (i.e., when multiple persons (or other objects of interest) cross simultaneously), number of people (objects) crossing in each direction, any or all of the preceding within a user-selected time window, and one or more time histograms of any or all of the preceding. Details on a single detection may include one or more of the following: time, direction, number of people (objects) crossing, size of object(s) crossing, and one or more snapshots or videos taken around the time of the detection.
An example of an application of the inventive video tripwire is the detection of “tailgating.” Tailgating describes an event in which a certain number of people (often one person) is permitted to enter an area (or the like) and one or more others try to follow closely to also gain entry.
The embodiments illustrated and discussed in this specification are intended only to teach those skilled in the art the best way known to the inventors to make and use the invention. Nothing in this specification should be considered as limiting the scope of the present invention. The above-described embodiments of the invention may be modified or varied, and elements added or omitted, without departing from the invention, as appreciated by those skilled in the art in light of the above teachings. It is therefore to be understood that, within the scope of the claims and their equivalents, the invention may be practiced otherwise than as specifically described.
This application is a continuation-in-part of U.S. patent application Ser. No. 09/972,039, filed on Oct. 9, 2001, now issued as U.S. Pat. No. 6,696,945, entitled, “Video Tripwire,” commonly-assigned, and incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
3812287 | Lemelson | May 1974 | A |
4249207 | Harman et al. | Feb 1981 | A |
4257063 | Loughry et al. | Mar 1981 | A |
5491511 | Odle | Feb 1996 | A |
5623249 | Camire | Apr 1997 | A |
5696503 | Nasburg | Dec 1997 | A |
5801943 | Nasburg | Sep 1998 | A |
5926210 | Hackett et al. | Jul 1999 | A |
5956081 | Katz et al. | Sep 1999 | A |
6069653 | Hudson | May 2000 | A |
6075560 | Katz | Jun 2000 | A |
6091771 | Seeley et al. | Jul 2000 | A |
6097429 | Seeley et al. | Aug 2000 | A |
6177886 | Billington et al. | Jan 2001 | B1 |
6201473 | Schaffer | Mar 2001 | B1 |
6226388 | Qian et al. | May 2001 | B1 |
6297844 | Schatz et al. | Oct 2001 | B1 |
6696945 | Venetianer et al. | Feb 2004 | B1 |
20020008758 | Broemmelsiek et al. | Jan 2002 | A1 |
20020082769 | Church et al. | Jun 2002 | A1 |
20020171546 | Evans et al. | Nov 2002 | A1 |
20040137915 | Diener et al. | Jul 2004 | A1 |
20040151374 | Lipton et al. | Aug 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20040105570 A1 | Jun 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09972039 | Oct 2001 | US |
Child | 10704645 | US |