The invention relates to the field of intelligent video surveillance and, more specifically, to a surveillance system that analyzes the behavior of objects such as people and vehicles moving in a video scene.
Intelligent video surveillance connotes the use of processor-driven, that is, computerized video surveillance involving automated screening of security cameras, as in security CCTV (Closed Circuit Television) systems.
The invention makes use of Boolean logic. Boolean logic is the invention of George Boole (1815-1864) and is a form of algebra in which all values are reduced to either True or False. Boolean logic symbolically represents relationships between entities. There are three Boolean operators AND, OR and NOT, which may be regarded and implemented as “gates.” Thus, it provides a process of analysis that defines a rigorous means of determining a binary output from various gates for any combination of inputs. For example, an AND gate will have a True output only if all inputs are true while an OR gate will have a True output if any input is True. So also, a NOT gate will have a True output if the input is not True. A NOR gate can also be defined as a combination of an OR gate and a NOT gate. So also, a NAND gate is defined as a combination of a NOT gate and an AND gate. Further gates that can be considered are XOR and XNOR gates, known respectively as “exclusive OR” and “exclusive NOR” gates, which can be realized by assembly of the foregoing gates.
Boolean logic is compatible with binary logic. Thus, Boolean logic underlies generally all modern digital computer designs including computers designed with complex arrangements of gates allowing mathematical operations and logical operations.
Logic Inference Module
A configurable logic inference engine is a software implementation in the present system to allow a user to set up a Boolean logic equation based on high-level descriptions of inputs, and to solve the equation without requiring the user to understand the notation, or even the rules of the underlying logic.
Such a logic inference engine is highly useful in the system of a copending patent application owned by the present applicant's assignee/intended assignee, namely application Ser. No. 09/773,475, filed Feb. 1, 2001, published as Pub. No.: US 2001/0033330 A1, Pub. Date: Oct. 25, 2001, entitled System for Automated Screening of Security Cameras, and corresponding International Patent Application PCT/US01/03639, of the same title, filed Feb. 5, 2001, both also called a security system, and hereinafter referred to the PERCEPTRAK disclosure or system, and herein incorporated by reference. That system may be identified by the trademark PERCEPTRAK herein. PERCEPTRAK is a registered trademark (Regis. No. 2,863,225) of Cernium, Inc., applicant's assignee/intended assignee, to identify video surveillance security systems, comprised of computers; video processing equipment, namely a series of video cameras, a computer, and computer operating software; computer monitors and a centralized command center, comprised of a monitor, computer and a control panel. Events in the PERCEPTRAK system described in said application Ser. No. 09/773,475 are defined as:
Software-driven processing of the PERCEPTRAK system performs a unique function within the operation of such system to provide intelligent camera selection for operators, resulting in a marked decrease of operator fatigue in a CCTV system. Real-time video analysis of video data is performed wherein a single pass or at least one pass of a video frame produces a terrain map which contains elements termed primitives which are low level features of the video. Based on the primitives of the terrain map, the system is able to make decisions about which camera an operator should view based on the presence and activity of vehicles and pedestrians and furthermore, discriminates vehicle traffic from pedestrian traffic. The PERCEPTRAK system provides a processor-controlled selection and control system (“PCS system”), serving as a key part of the overall security system, for controlling selection of the CCTV cameras. The PERCEPTRAK PCS system is implemented to enable automatic decisions to be made about which camera view should be displayed on a display monitor of the CCTV system, and thus watched by supervisory personnel, and which video camera views are ignored, all based on processor-implemented interpretation of the content of the video available from each of at least a group of video cameras within the CCTV system.
Thus, the PERCEPTRAK system uses video analysis techniques which allow the system to make decisions automatically about which camera an operator or security guard should view based on the presence and activity of vehicles and pedestrians, as examples of subjects of interest. Events, e.g., activities or attributes, are associated with subjects of interest, including both vehicles and pedestrians, as primary examples. They include, but are not limited to, single pedestrian, multiple pedestrians, fast pedestrian, fallen pedestrian, lurking pedestrian, erratic pedestrian, converging pedestrians, single vehicle, multiple vehicles, fast vehicles, and sudden stop vehicle. More is said about them in the following description.
The present invention is an improvement of said PERCEPTRAK system and disclosure.
Intelligent Video Events
In a current state-of-the-art intelligent video systems, such as the PERCEPTRAK system, individual targets (subjects of interest) are tracked in the video scene and their behavior is analyzed based on motion history and other symbolic data characteristics, including events, that are available from the video as disclosed in the PERCEPTRAK system disclosure.
Intelligent video systems such as the PERCEPTRAK system have had heretofore at most one mask to determine if a detected event should be reported (a so-called active mask).
A surveillance system disclosed in Venetianer et al. U.S. Pat. No. 6,696,945 employs what is termed a video “tripwire” where the event is generated by an object “crossing” a virtually-defined tripwire but without regard to the object's prior location history. Such a system merely recognizes the tripwire crossing movement, rather than tracking a target so crossing, and without taking into any consideration tracking history of targets or activity of subjects of interest within a sector, region or area of the image. Another basic difference between line crossing and the multiple mask concept of the present invention is the distinction between lines (with a single crossing point) and areas where the areas may not be contiguous. It is possible for a subject of interest to have been in a public mask and then take multiple paths to the secure mask.
In view of the foregoing, it can be understood that it would be advantageous for an intelligent video surveillance system to provide not only current event detection as well as active area masking but also to provide means and capability to analyze and report on behavior based on the location of a target (subject of interest) at the time of behavior for multiple events and to so analyze and report based on the target location history.
Among the several objects, features and advantages of the invention may be noted the provision of a system and methodology which provides a capability for the use of multiple masks to divide the scene into logical areas along with the means to detect behavior events and adds a flexible logic inference engine in line with the event detection to configure and determine complex combinations of events and locations.
Briefly, an intelligent video system as configured in accordance with the invention captures video of scenes and provides software-implemented segmentation of targets in said scenes based on processor-implemented interpretation of the content of the captured video. The system is an improvement therein comprising software implementation for:
providing a configurable logic inference engine;
establishing masks for a video scene, the masks defining areas of the scene in which a logic-defined events may occur;
establishing at least one Boolean equation for analysis of activities in the scenes relative to the masks by the logic inference engine mask according to rules established by the Boolean equation; and
a user input interface providing preselection of the rules by a user of the system according to possible activity in the areas defined by the masks;
the logic inference engine using such Boolean equation to report to a user of the system the logic-defined events, thereby indicative of what, when and where a target has activities in one or more of the areas.
Thus, the logic inference engine or module reports within the system the results of the analysis, so as to allow reporting to a user of the system, such as a security guard, the logic-defined events as indicative of what, when and where a target has activities in one or more of the areas. The logic-defined event is a behavioral event connoting behavior, activities, characteristics, attributes, locations and/or patterns of a target subject of interest, and further comprises a user interface for allowing user selection of such behavior events for logic definition by the Boolean equation in accordance with a perceived advantage, need or purpose arising from context of system use.
Considered in another way, the invention provides a method of implementing complex behavior recognition in an intelligent video system, such as the PERCEPTRAK system, including detection of multiple events which are defined activities of subjects of interest in different areas of the scene, where the events are of interest for behavior recognition and reporting purposes in the system. The method comprises:
creating one or more of multiple possible masks defining areas of a scene to determine where a subject of interest is located;
setting configurable time parameters to determine when such activity occurs; and
using a configurable logic inference engine to perform Boolean logic analysis based on a combination of such events and masks.
According to a system aspect, the invention is used in a system for capturing video of scenes, including a processor-controlled segmentation system for providing software-implemented segmentation of subjects of interest in said scenes based on processor-implemented interpretation of the content of the captured video, and is an improvement comprising software implementation for:
providing a configurable logic inference engine;
establishing at least one mask for a video scene, the mask defining at least one of possible types of areas of the scene where a logic-defined event may occur;
creating a Boolean equation for analysis of activities relative to the at least one mask by the logic inference engine mask according to rules established by the Boolean equation;
providing preselection of the rules by a user of the system according what, when and where a subject of interest might have an activity relative to the at least one of possible types of areas;
analysis by the logic inference engine in accordance with the Boolean equation of what, when and where subjects of interest have activities in the at least one of possible types of areas; and
reporting within the system the results of the analysis so to inform thereby a user of the system what, when and where a target, i.e., a subject of interest, has or did have an activity or event in any of such areas.
The invention thus allows an open-ended means of detecting complex events as a combination of individual behavior events and locations. For example, such a complex event is described in this descriptive way:
Events detected by the intelligent video system can vary widely by system but for the purposes of this invention the following list from the previously referenced the PERCEPTRAK system include the following events or activities or attributes or behaviors of subjects of interest (targets), and for convenience may be referred to as “behavioral events”:
SINGLE_PERSON
MULTIPLE_PEOPLE
CONVERGING_PEOPLE
FAST_PERSON
FALLEN_PERSON
ERRATIC_PERSON
LURKING_PERSON
SINGLE_CAR
MULTIPLE_CARS
FAST_CAR
SUDDEN_STOP_CAR
SLOW_CAR
STATIONARY_OBJECT
ANY_MOTION
CROWD_FORMING
CROWD_DISPERSING
COLOR_OF_INTEREST—1
COLOR_OF_INTEREST—2
COLOR_OF_INTEREST—3
WALKING_GAIT
RUNNING_GAIT
ASSAULT_GAIT
These behavioral events of subjects of interest are combined with locations defined by mask configuration to add the dimension of “where” to a “what” dimension of the event. Note that an example, described herein, of assigning symbols advantageously includes examples of a target that “was in” a given mask and so adds an additional dimension of “when” to the equation. A representative sample of named masks is shown below but is not intended to limit the invention to only these mask examples:
It will be appreciated that many other characteristics, attributes, locations, patterns and mask elements or events in addition to the above may be selected, as by use of the GUI ((Graphical User Interface) herein described, for logic definition by the Boolean equation in accordance with a perceived advantage, need or purpose arising from context of system use.
Definitions Used Herein
Boolean Notation
A technique of expressing Boolean equations with symbols and operators. The basic operators are OR, AND, and NOT using the symbols shown below.
+=OR operator, where (A+B) is read as A or B
•=AND operator, where (A•B) is read as A and B
{overscore (A)}=NOT operator, where ({overscore (A)}+B) is read as (Not A) or (B)
CCTV
Closed Circuit Television; a television system consisting of one or more cameras and one or more means to view or record the video, intended as a “closed” system, rather than broadcast, to be viewed by only a limited number of viewers.
Intelligent Video System
A coordinated intelligent video system, as provided by the present invention, comprises one or more computers, at least one of which has at least one video input that is analyzed at least to the degree of tracking moving objects (targets), i.e., subjects of interest, in the video scene and recognizing objects seen in prior frames as being the same object in subsequent frames. Such an intelligent video system, for example, the PERCEPTRAK system, has within the system at least one interface to present the results of the analysis to a person (such as a user or security guard) or to an external system.
Mask
As used in this document a mask is an array of contiguous or separated cells each in a rows and column aligned with and evenly spaced over an image where each cell is either “On” or “Off” and with the understanding that the cells must cover the entire scene so that every area of the scene is either On or Off. The cells, and thus the mask, are user defined according to GUI selection by a user of the system. The image below illustrates a mask of 32 columns by 24 rows. The cells where the underlying image is visible are “On” and the cells with a fill concealing the image are “Off. The areas defined by “Off” cells do not have to be contiguous. The areas defined by “On” cells do not have to be contiguous. The array defining or corresponding to an area image may be one of multiple arrays, and such arrays need not be contigous.
As used in this document a mask is an array of contiguous or separated cells each in a rows and column aligned with and evenly spaced over an image where each cell is either “On” or “Off”. The cells, and thus the mask, are user defined according to GUI selection by a user of the system. The image below illustrates a mask of 32 columns by 24 rows. The cells where the underlying image is visible are “On” and the cells with a fill concealing the image are “Off. The array defining or corresponding to an area image may be one of multiple arrays, and such arrays need not be contiguous.
Scene
The area/areas/portions of areas within view of one or more CCTV cameras (Virtual View). Where a scene spans more than one camera, it is not required that the views of the cameras be contiguous to be considered as portions of the same scene. Thus area/areas/portions of areas need not be contiguous.
Target
An object or subject of interest that is given a unique Target Number and tracked while moving within a scene while recognized as the same object. A target may be real, such as a person, animal, or vehicle, or may be a visual artifact, such as a reflection, shadow or glare.
Video
A series of images (frames) of a scene in order of time, such as 30 frames per second for broadcast television using the NTSC protocol, for example. The definition of video for this document is independent of the transport means, or coding technique. For example, video may be broadcast over the air, connected as baseband as over copper wires or fiber, or digitally encoded and communicated over a computer network. Intelligent video as here employed involves analyzing the differences between frames of video frames independently of the communication means.
Virtual View
The field of view of one or more CCTV cameras that are all assigned to the same scene for event detection. Objects are recognized in the different camera views of the Virtual View in the same manner as in a single camera view. Target ID Numbers assigned when a target is first recognized are used for the recognized target when it is in another camera view. Masks of the same name defined for each camera view are recognized as the same mask in the Boolean logic analysis of the events.
Software
The general term “software” is herein simply intended for convenience to mean programs, programming, program instructions, code or pseudo code, process or instruction sets, source code and/or object code processing hardware, firmware, drivers and/or utilities, and/or other digital processing devices and means, as well as software per se.
This patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The above-identified PERCEPTRAK system brings about the attainment of a CCTV security system capable of automatically carrying out decisions about which video camera should be watched, and which to ignore, based on video content of each such camera, as by use of video motion detectors, in combination with other features of the presently inventive electronic subsystem, thus achieving a processor-controlled selection and control system (“PCS system”), which serves as a key part of the overall security system, for controlling selection of the CCTV cameras. The PCS system is implemented in order to enable automatic decisions to be made about which camera view should be displayed on a display monitor of the CCTV system, and thus watched by supervisory personnel, such as a security guard, and which video camera views are ignored, all based on processor-implemented interpretation of the content of the video available from each of at least a group of video cameras within the CCTV system.
Included as a part of the PCS system are novel image analysis techniques which allow the system to make decisions about which camera an operator should view based on the presence and activity of vehicles and pedestrians. Events are associated with both vehicles and pedestrians and include, but are not limited to, single pedestrian, multiple pedestrians, fast pedestrian, fallen pedestrian, lurking pedestrian, erratic pedestrian, converging pedestrians, single vehicle, multiple vehicles, fast vehicles, and sudden stop vehicle.
The image analysis techniques are also able to discriminate vehicular traffic from pedestrian traffic by tracking background images and segmenting moving targets. Vehicles are distinguished from pedestrians based on multiple factors, including the characteristic movement of pedestrians compared with vehicles, i.e. pedestrians move their arms and legs when moving and vehicles maintain the same shape when moving. Other factors include the aspect ratio and smoothness, for example, pedestrians are taller than vehicles and vehicles are smoother than pedestrians.
The primary image analysis techniques of the PERCEPTRAK system are based on an analysis of a Terrain Map. Generally, the function herein called Terrain Map is generated from at least a single pass of a video frame, resulting in characteristic information regarding the content of the video. Terrain Map creates a file with the characteristic information based on each of the 2×2 kernels of pixels in an input buffer, which contains six bytes of data describing the relationship of each of sixteen pixels in a 4×4 kernel surrounding the 2×2 kernel.
The informational content of the video generated by Terrain Map is the basis for all image analysis techniques of the present invention and results in the generation of several parameters for further image analysis. The parameters include: (1) Average Altitude; (2) Degree of Slope; (3) Direction of Slope; (4) Horizontal Smoothness; (5) Vertical Smoothness; (6) Jaggyness; (7) Color Degree; and (8) Color Direction.
The PCS system as contemplated by the PERCEPTRAK disclosure comprises seven primary software components:
Analysis Worker(s)
Video Supervisor(s)
Video Worker(s)
Node Manager(s)
Administrator (Set Rules) GUI (Graphical User Interface)
Arbitrator
Console
The PCS system as contemplated by the PERCEPTRAK disclosure comprises six primary software components:
Analysis Worker(s)
Video Supervisor(s)
Video Worker(s)
Node Manager(s)
Set Rules GUI (Graphical User Interface); and
Arbitrator
Such a system is improved by employing, in accordance with the present disclosure, a logic inference engine capable of handling a Boolean equation of indefinite length. A simplified example in Equation 1 below is based on two pairs of lists. Each pair has a list of values that are all connected by the And operator and a list of values that are connected by the OR operator. Each pair of lists is connected by a configurable AND/OR operator and the intermediate results of each pair are connected by a configurable AND/OR operator. The equation below is the generalized form where the tilde (˜) represents an indefinite number of values, (+/•) represents a configurable selection of either the AND operator or the OR operator. The NOT operators ({overscore (A)}) are randomly applied in the example to indicate that any value in the equation can be either in its “normal” state or its inverted state as according to a NOT operator.
While the connector operators in Equation 1 are shown as configurable as either the AND or OR operators, the concept includes other derived Boolean operators including the XOR, NAND, and NOR gates.
For ease of Boolean notation mask status of targets and the results of target event analysis are assigned to single character or target symbols according to descriptions and event derivations such as the following.
The Logic Inference Engine (LIF) or module (LIM) of the PERCEPTRAK system evaluates the states of the associated inputs based on the rules defined in the PtrakEvent structure. If all of the rules are met the LIF returns the output True.
The system need not be limited to a single LIF, but a practical system can employ with advantage a single LIF. All events are constrained by the same rules so that a single LIF can evaluate all current and future events monitored and considered by the system. Evaluation, as according to the rules established by the Boolean equation of evaluating an event, yields a logic-defined event (“Logic Defined Event”), which is to say an activity of a subject of interest (target) which the system can report in accordance with the rules preselected by a user of the system.
In this example, events are limited for convenience to four lists of inputs organized as two pairs of input lists. Each pair has a list of inputs that are connected by AND operators and one list of inputs that are connected by OR operators. There is no arbitrary limit to the length of the lists, but the GUI design will, as a practical matter, dictate some limit.
The GUI should not present the second pair of lists until the first pair has been configured. The underlying code will assume that if the second pair is in use then the first pair must also be in use.
Individual inputs in all four lists can be evaluated in either their native state or inverted to yield the NOT condition. For example, TenMinTimeTick and NOT SinglePerson with a one hour valid status will detect that an hour has passed without seeing a roving security guard.
Inputs do not have to be currently True to be evaluated as True by the LIF. The parameter ValidTimeSpan can be used to control the time that inputs may be considered as True. For example if ValidTimeSpan is set to 20, a time in seconds, any input that has been True in the last 20 seconds is still considered to be True.
Each pair of lists can be logically connected by an AND operator, an OR operator, or an XOR operator, to yield two results. The two results may be connected by either an AND operator, and OR operator or an XOR operator to yield the final result of the event evaluation.
Prior to evaluation each input is checked for ValidTimeSpan. Each input is considered True if it has been True within ValidTimeSpan.
If the List2Last element of PtrakEvent is True the oldest input from the second pair of lists must be newer (or equal using the Or Equal operator) than the newer input of the first pair of lists. This conditions allows specifying events where inputs are required to “fire” (occur) in a particular order rather than just within a given time in any order.
After normalization for valid time span, each input is normalized for the NOT operator. The NOT operator can be applied to any input in any list allowing events such as EnteredStairway AND NOT ExitedStairway. The inversion can be performed by XORing with the Inverted (NOT) operator for that input. If one of the inputs and Inverted is True but not both True then the input is evaluated in the following generic Boolean equation as True.
ThisEvent.EventState=(AndIn1 AND AndIn2 AND AndIn3 . . . ) AND/OR (OrIn1 OR OrIn2 OR OrIn3 . . . )
AND/OR
(AndIn4 AND AndIn5 AND AndIn6 . . . ) AND/OR (OrIn4 OR OrIn5 OR OrIn6 . . . ) (Equation 2)
If EventState is evaluated as True then the Logic Defined Event is considered to have “fired”.
PtrakEventInputs Array
An array identified as PtrakEventInputs contains one element for each possible input in the system such as identified above with the symbols A to K. Each letter symbol is mapped to a Flat Number for the array element. For example A=1, B=2, etc.
The elements are of type PtrakEventInputsType as defined below.
After the Boolean equation is parsed, a structure is filled out to map the elements of the equation to common data elements for all events. This step allows a common LIF to evaluate any combination of events. The following is the declaration of the event type structure.
Public Type PtrakEventType
A graphical user interface (GUI) is employed. It includes forms to enter events, and mask names and configurable times to define a Boolean Equation from which an LIF will evaluate any combination of events.
Configuration Variables
In order to allow configuration of different cameras to respond to behavior differently, individual cameras used as part of the PERCEPTRAK system can have configuration variables assigned to program variables from a database at process start up time. Following are some representative configuration variables and so-called constants, with comments on their use in the system.
Constants for Mask Timing
Mask assignment is carried out in accordance with a predetermined need for establishing security criteria within a scene. As an example,
To generate a PERCEPTRAK event determinative of unauthorized entry for this scene, the following Boolean equation is to be evaluated by the PERCEPTRAK system.
(IsInSecureMask And IsInActiveMask And WasInPublicMask) (Equation 3)
In operation, solving of the Boolean equation (3) operating on the data masks by the Perceptrak system provides a video solution indicating impermissible presence of a subject in the private area. Further Boolean analysis by parsing by the above-identified constants for erratic behavior or movement, or other attributes of constants, indicates greater information about the subject, such as that the person is running. Tracking shows the movement of the person, who remains subject to intelligent video analysis.
Many other types of intelligent video analysis can be appreciated.
In view of the foregoing, one can appreciate that the several objects of the invention are achieved and other advantages are attained.
Although the foregoing includes a description of the best mode contemplated for carrying out the invention, various modifications are contemplated.
As various modifications could be made in the constructions and methods herein described and illustrated without departing from the scope of the invention, it is intended that all matter contained in the foregoing description or shown in the accompanying drawings shall be interpreted as illustrative rather than limiting.
This application claims the priority of U.S. provisional patent application Ser. No. 60/666,429, filed Mar. 30, 2005, entitled INTELLIGENT VIDEO BEHAVIOR RECOGNITION WITH MULTIPLE MASKS AND CONFIGURABLE LOGIC INFERENCE MODULE.
Number | Date | Country | |
---|---|---|---|
60666429 | Mar 2005 | US |