The present disclosure relates generally to surveillance systems and more particularly to multi-camera, multi-sensor surveillance systems. The disclosure develops a system and method that exploits data mining to make it significantly easier for the surveillance operator to understand a situation taking place within a scene.
Surveillance systems and sensor networks used in sophisticated surveillance work these days typically employ many cameras and sensors which collectively generate huge amounts of data, including video data streams from multiple cameras and other forms of sensor data harvested from the surveillance site. It can become quite complicated to understand a current situation given this huge amount of data.
In a conventional surveillance monitoring station, the surveillance operator is seated in front of a collection of video screens, such as illustrated in
The present system and method seek to overcome these surveillance problems by employing sophisticated visualization techniques which allow the operator to see the big picture while being able to quickly explore potential abnormalities using powerful data mining techniques and multimedia visualization aids. The operator can perform explorative analysis without predetermined hypotheses to discovery abnormal surveillance situations. Data mining techniques explore the metadata associated with video data screens and sensor data. These data mining techniques assist the operator by finding potential threats and by discovering “hidden” information from surveillance databases.
In a presently preferred embodiment, the visualization can represent multi-dimensional data easily to provide an immersive visual surveillance environment where the operator can readily comprehend a situation and respond to it quickly and efficiently.
While the visualization system has important uses for private and governmental security applications, the system can be deployed in an application where users of a community may access the system to take advantage of the security and surveillance features the system offers. The system implements different levels of dynamically assigned privacy. Thus users can register with and use the system without encroaching on the privacy of others—unless alert conditions warrant.
Further areas of applicability will become apparent from the description provided herein. It should be understood that the description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.
a and 2b are display diagrams showing panoramic views generated by the surveillance visualization system of the invention,
a, 6b and 6c are illustrations of the power lens performing different visualization functions;
The description of the invention is merely exemplary in nature and, thus, variations that do not depart from the gist of the invention are intended to be within the scope of the invention. Such variations are not to be regarded as a departure from the spirit and scope of the invention.
Before a detailed description of the visualization system is presented, an overview will be given.
In the conventional system, the operator must continually scan the bank of monitors, looking for any movement or activity that might be deemed unusual. When such movement or activity is detected, the operator may use a PTZ control to zoom in on the activity of interest and may also adjust the angle of other monitors in an effort to get additional views of the suspicious activity. The surveillance operator's job is a difficult one. During quiet times, the operator may see nothing of interest on any of the monitors for hours at a time. There is a risk that the operator may become mesmerized with boredom during these times and thus may fail to notice a potentially important event. Conversely, during busy times, it may be virtually impossible for the operator to mentally screen out a flood of normal activity in order to notice a single instance of abnormal activity. Because the images displayed on the plural monitors are not correlated to each other, the operator must mentally piece together what several monitors may be showing about a common event.
a and 2b give an example of how the situation is dramatically improved by our surveillance visualization system and methods. Instead of requiring the operator to view multiple, disparate video monitors, the preferred embodiment may be implemented using a single monitor (or a group of side-by-side monitors showing one panoramic view) such as illustrated at 10. As will be more fully explained, video streams and other data are collected and used to generate a composite image comprised of several different layers, which are then mapped onto a computer-generated three-dimensional image which can then be rotated and zoomed into and out of by the operator at will. Permanent stationery objects are modeled in the background layer, while moving objects are modeled in the foreground layer, and where normal trajectories extracted from historical movement data are modeled in one or more intermediate layers. Thus, in
Because modeling techniques are used, the surveillance operator can readily rotate the image in virtual three-dimensional space to get a better view of a situation. In
Because modeling techniques and layered presentation are used, the operator can choose whether to see computer simulated models of a scene, or the actual video images, or a combination of the two. In this regard, the operator might wish to have the building modeled using computer-generated images and yet see the person shown by the video data stream itself. Alternatively, the moving person might be displayed as a computer-generated avatar so that the privacy of the person's identity may be protected. Thus, the layered presentation techniques employed by our surveillance visualization system allow for multimedia presentation, mixing different types of media in the same scene if desired.
The visualization system goes further, however. In addition to displaying visual images representing the selected scene of interest, the visualization system can also display other metadata associated with selected elements within the scene. In a presently preferred embodiment, a power lens 20 may be manipulated on screen by the surveillance operator. The power lens has a viewing port or reticle (e.g., cross-hairs) which the operator places over an area of interest. In this case, the viewing port of the power lens 20 has been placed over the fourth floor office 16. What the operator chooses to see using this power lens is entirely up to the operator. Essentially, the power lens acts as a user-controllable data mining filter. The operator selects parameters upon which to filter, and the uses these parameters as query parameters to display the data mining results to the operator either as a visual overlay within the portal or within a call-out box 22 associated with the power lens.
For example, assume that the camera systems include data mining facilities to generate metadata extracted from the visually observed objects. By way of example, perhaps the system will be configured to provide data indicative of the dominant color of an object being viewed. Thus, a white delivery truck would produce metadata that the object is “white” and the jacket of the pizza delivery person will generate metadata indicating the dominant color of the person is “red” (the color of the person's jacket). If the person wishes to examine objects based upon the dominant color, the power lens is configured to extract that metadata and display it for the object identified within the portal of the power lens.
In a more sophisticated system, face recognition technology might be used. At great distances, the face recognition technology may not be capable of discerning a person's face, but as the person moves closer to a surveillance camera, the data may be sufficient to generate a face recognition result. Once that result is attained, the person's identity may be associated as metadata with the detected person. If the surveillance operator wishes to know the identity of the person, he or she would simply include the face recognition identification information as one of the factors to be filtered by the power lens.
Although color and face recognition have been described here, it will of course be understood that the metadata capable of being exploited by the visualization system can be anything capable of being ascertained by cameras or other sensors, or by lookup from other databases using data from these cameras or sensors. Thus, for example, once the person's identity has been ascertained, the person's license plate number may be looked up using motor vehicle bureau data. Comparing the looked up license plate number with the license plate number of the vehicle from which the user exited (in
Referring now to
The video data feeds from cameras 30 are input to a background subtraction processing module 40 which analyzes the collective video feeds to identify portions of the collective images that do not move over time. These non-moving regions are relegated to the background 42. Moving portions within the images are relegated to a collection of foreground objects 44. Separation of the video data feeds into background and foreground portions represents one generalized embodiment of the surveillance visualization system. If desired, the background and foreground components may be further subdivided based on movement history over time. Thus, for example, a building that remains forever stationery may be assigned to a static background category, whereas furniture within a room (e.g., chairs) may be assigned to a different background category corresponding to normally stationery objects which can be moved from time to time.
The background subtraction process not only separates background from foreground, but it also separately identifies individual foreground objects as separate entities within the foreground object grouping. Thus, the image of a red car arriving in the parking lot at 8:25 a.m. is treated as a separate foreground object from the green car that arrived in the parking at 6:10 a.m. Likewise, the persons exiting from these respective vehicles would each be separately identified.
As shown in
The three-dimensional modeling process develops vector graphic wire frame models based on the underlying video data. One advantage of using such models is that the wire frame model takes considerably less data than the video images. Thus, the background images represented as wire frame models can be manipulated with far less processor loading. In addition, the models can be readily manipulated in three-dimensional space. As was illustrated in
Foreground objects receive different processing, depicted at processing module 48. Foreground objects are presented on the panoramic background according to the spatial and temporal information associated with each object. In this way, foreground objects are placed at the location and time that synchronizes with the video data feeds. If desired, the foreground objects may be represented using bit-mapped data extracted from the video images, or using computer-generated images such as avatars to represent the real objects.
In applications where individual privacy must be respected, persons appearing within a scene may be represented at computer-generated avators so that the person's position and movement may be accurately rendered without revealing the person's face or identity. In a surveillance system, where detection of an intruder is an important function, the ability to maintain personal privacy might be counterintuitive. However, there are many security applications where the normal building occupants do not wish to be continually watched by the security guards. The surveillance visualization system described here will accommodate this requirement. Of course, if a thief is detected within the building, the underlying video data captured from one or more cameras 30 may be still be readily accessed to determine the thief's identity.
So far, the system description illustrated in
In addition to the metadata available from the cameras themselves, the surveillance and sensor network may be linked to other networked data stores and image processing engines. For example, a face recognition processing engine might be deployed on the network and configured to provide services to the cameras or camera systems, whereby facial images are compared to data banks of stored images and used to associate a person's identity with his or her facial image. Once the person's identity is known, other databases can be consulted to acquire additional information about the person.
Similarly, character recognition processing engines may be deployed, for example, to read license plate numbers and then use that information to look up information about the registered owner of the vehicle.
All of this information comprises metadata, which may be associated with the backgrounds and foreground objects displayed within the panoramic scene generated by the surveillance visualization system. As will be discussed more fully below, this additional metadata can be mined to provide the surveillance operator with a great deal of useful information at the click of a button.
In addition to displaying scene information and metadata information in a flexible way, the surveillance visualization system is also capable of reacting to events automatically. As illustrated in
One of the very useful aspects of the surveillance visualization system is the device which we call the power lens. The power lens is a tool that can provide capability to observe and predict behavior and events within a 3D global space. The power lens allows users to define the observation scope of the lens as applied to one or multiple regions-of-interest. The lens can apply one or multiple criteria filters, selected from a set of analysis, scoring and query filters for observation and prediction. The power lens provides a dynamic, interactive analysis, observation and control interface. It allows users to construct, place and observe behavior detection scenarios automatically. The power lens can dynamically configure the activation and linkage between analysis nodes using a predictive model.
In a presently preferred form, the power lens comprises a graphical viewing tool that may be take the form and appearance of a modified magnifying glass as illustrated at 20 in
Associated with the power lens is a query generation system that allows metadata associated with objects within the image to be filtered and the output used for data mining. In the preferred embodiment, the power lens 20 can support multiple different scoring and filter criteria functions, and these may be combined by using Boolean operators such as AND/OR and NOT. The system operator can construct his or her own queries by selecting parameters from a parameter list in an interactive dynamic query building process performed by manipulating the power lens.
In
The power lens allows the user to select a query template from existing power lens query and visualization template models. These models may contain (1) applied query application domains, (2) sets of criteria parameter fields, (3) real-time mining score model and suggested threshold values, and (4) visualization models. These models can then be extended and customized to meet the needs of an application by utilizing a power lens description language preferable in XML format. In use, the user can click or drag and drop a power lens into the panoramic video display and then use the power lens as an interface for defining queries to be applied to a region of interest and for subsequent visual display of the query results.
The power lens can be applied and used between video analyzers and monitor stations. Thus, the power lens can continuously query a video analyzer's output or the output from a real-time event manager and then filter and search this input data based on predefined mining scoring or semantic relationships.
As depicted at 73, the data mining results are sent to a visual display engine so that the results can be displayed graphically, if desired. In one case, it may be most suitable to displayed retrieved results in textual or tabular form. This is often most useful where the specific result is meaningful, such as the name of a recognized person. However, the visualization engine depicted at 74 is capable of producing other types of visual displays, including a variety of different graphical displays. Examples of such graphical displays include tree maps, 2D/3D scatter plots, parallel coordinates plots, landscape maps, density maps, waterfall diagrams, time wheel diagrams, map-based displays, 3D multi-comb displays, city tomography maps, information tubes and the like. In this regard, it should be appreciated that the form of display is essentially limitless. Whatever best suits the type of query being performed may be selected. Moreover, in addition to these more sophisticated graphical outputs, the visualization engine can also be used to simply provide a color or other attribute to a computer-generated avator or other icon used to represent an object within the panoramic view. Thus, in an office building surveillance system, all building occupants possessing RF ID badges might be portrayed in one color and all other persons portrayed in a different color.
b illustrates a different view, namely, a 3D trajectory map.
For wide area surveillance monitoring or investigations, information from several regions may need to be monitored and assimilated. The surveillance visualization system permits multiple power lenses to be defined and then the results of those power lenses may be merged or fused to provide aggregate visualization information. In a presently preferred embodiment, grid nodes are employed to map relationships among different data sources, and from different power lenses.
Referring to
A user's query is decomposed into multiple layers of a query or mining process. In
As
The processing engines 124 include a query engine 134 that supports query statement generation and user interaction. When the user wishes to define a new query, for example, the user would communicate through the query creation user interface 126, which would in turn invoke the query engine 134.
The processing engines of the power lens also include a visualization engine 136. The visualization engine is responsible for handling visualization rendering and is also interactive. The interactive visualization user interface 128 communicates with the visualization engine to allow the user to interact with the visualized image.
The processing engines 124 also include a geometric location processing engine 138. This engine is responsible for ascertaining and manipulating the time and space attributes associated with data to be displayed in the panoramic video display and in other types of information displays. The geometric location processing engine acquires and scores location information for each object to be placed within the scene, and it also obtains and stores information to map pre-defined locations to pre-defined zones within a display. A zone might be defined to comprise a pre-determined region within the display in which certain data mining operations are relevant. For example, if the user wishes to monitor a particular entry way, the entry way might be defined as a zone and then a set of queries would be associated with that zone.
Some of the data mining components of the flexible surveillance visualization system can involve assigning scores to certain events. A set of rules is then used to assess whether, based on the assigned scores, a certain action should be taken. In the preferred embodiment illustrated in
Finally, the information processing engines 124 also preferably include a configuration extender module 142 that can be used to create and/or update configuration data and criteria parameter sets. Referring back to
In the preferred embodiment illustrated in
Where the results of one grid are to be used by another grid, a query fusion operation is invoked. The distributed grid node manager 120 thus supports the instantiation of one or more query fusion grids 146 to define links between nodes and to store the aggregation results. Thus, the query fusion grid 146 defines the connecting lines between mining query grids 100 of
The distributed grid node manager 120 is also responsible for controlling the mining visualization grids 102 and 104 of
In the preferred embodiment depicted in
The system depicted in
In a presently preferred embodiment, the 3D global data space includes shared data of:
Preferably, the 3D global data space may be configured to preserve privacy while allowing multiple users to share one global space of metadata and location data. Multiple users can use data from the global space to display a field of view and to display objects under surveillance within the field of view, but privacy attributes are employed to preserve privacy. Thus user A will be able to explore a given field of view, but may not be able to see certain private details within the field of view.
The presently preferred embodiment employs a privacy preservation manager to implement the privacy preservation functions. The display of objects under surveillance are mediated by a privacy preservation score, associated as part of the metadata with each object. If the privacy preservation function (PPF) score is lower than full access, the video clips of surveillance objects will either be encrypted or will include only metadata, where identity of the object cannot be ascertained.
The privacy preservation function may be calculated based on the following input parameters:
Preferably, the privacy preservation level is context sensitive. The privacy preservation manager can promote or demote the privacy preserving level based on status of context.
For example, users within a community may share the same global space that contains time, location, and event metadata of foreground surveillance objects such as people and car. A security guard with full privileges can select any physical geometric field of view covered by this global space and can view all historical, current, and prediction information. A non-security guard user, such as a home owner within the community, can view people who walk into his driveway with full video view (e.g. with face of person), and he can view only a partial video view in the community park, but he cannot view areas in other people's houses based on privilege and privacy preservation function. If the context is under an alarm event, such as a person breaks into a user's house and triggers an alarm, the user can get full viewing privileges in privacy preservation function for tracking this person's activities, including the ability to continue to view the person should that person run next door and then to public park and public road. The user can have full rendering display on 3D GUI and video under this alarm context.
In order to support access by a community of users, the system uses a registration system. A user wishing to utilize the surveillance visualization features of the system goes through a registration phase that confirms the user's identity and sets up the appropriate privacy attributes, so that the user will not encroach on the privacy of others. The following is a description of the user registration phase which might be utilized when implementing a community safety service whereby members of a community can use the surveillance visualization system to perform personal security functions. For example, a parent might use the system to ensure that his or her child made it home from school safely.
The architecture defined above supports collaborative use of the visualization system in at least two respects. First, users may collaborate by supplying metadata to the data store of metadata associated with objects in the scene. For example, a private citizen, looking through a wire fence, may notice that the padlock on a warehouse door has been left unlocked. That person may use the power lens to zoom in on the warehouse door and provide an annotation that the lock is not secure. A security officer having access to the same data store would then be able to see the annotation and take appropriate action.
Second, users may collaborate by specifying data mining query parameters (e.g., search criteria and threshold parameters) that can be saved in the data store and then used by other users, either as a stand-alone query or as part of a data mining grid (
For example, using the power lens or other specification tool, a first user may configure a query that will detect how long a vehicle has been parked based on its heat signature. This might be accomplished using thermal sensors and mapping the measured temperatures across a color spectrum for easy viewing. The query would receive thermal readings as input and would provide a colorized output so that each vehicle's color indicates how long the vehicle has been sitting (how long its engine has had time to cool).
A second person could use this heat signature query in a power lens to assess parking lot usage throughout the day. This might be easily accomplished by using the vehicle color spectrum values (heat signature measures) as inputs for a search query that differently marks vehicles (e.g., applies different colors) to distinguish cars that park for five to ten minutes from those that are parked all day. The query output might be a statistical report or histogram, showing aggregate parking lot usage figures. Such information might be useful in managing a shopping center parking lot, where customers are permitted to park for brief times, but employees and commuters should not be permitted to take up prime parking spaces for the entire day.
From the foregoing, it should be also appreciated that the surveillance visualization system offers powerful visualization and data mining features that may be invoked by private and government security officers, as well as by individual members of a community. In the private and government security applications, the system of cameras and sensors may be deployed on a private network, preventing members of the public from gaining access. In the community service application, the network is open and members of the community are permitted to have access, subject to logon rules and applicable privacy constraints. To demonstrate the power that the surveillance visualization system offers, an example use of the system will now be described. The example features a community safety service, where the users are members of a participating community.
This example assumes a common scenario. Parents worry if their children have gotten home from school safely. Perhaps the child must walk from a school bus to their home a block away. Along the way there may be many stopping off points that may tempt the child to linger. The parent wants to know that their child went straight home and were not diverted along the way.
As the system learns the child's behavior, a trajectory path representing the “normal” return-home route is learned. This normal trajectory is then available for use to detect when the child does not follow the normal route. The system learns not only the path taken, but also the time pattern. The time pattern can include both absolute time (time of day) and relative time (minutes from when the bus was detected as arriving at the stop). These time patterns are used to model the normal behavior and to detect abnormal behavior.
In the event abnormal behavior is detected, the system may be configured to start capturing and analyzing data surrounding the abnormal detection event. Thus, if a child gets into a car (abnormal behavior) on the way home from school, the system can be configured to capture the image and license plate number of the car and to send an alert to the parent. The system can then also track the motion of the car and detect if it is speeding. Note that it is not necessary to wait until the child gets into a car before triggering an alarm event. If desired, the system can monitor and alert each time a car approaches the child. That way, if the child does enter the car, the system is already set to actively monitor and process the situation.
With the foregoing examples of collaborative use in mind, refer now to
Based on a set of predefined data mining and scoring processes, the data within the data store is analyzed at 202. The analysis can include preprocessing (e.g., to remove spurious outlying data and noise, supply missing values, correct inconsistent data), data integration and transformation (e.g., removing redundancies, applying weights, data smoothing, aggregating, normalizing and attribute construction), data reduction (e.g., dimensionality reduction, data cube aggregation, data compression) and the like.
The analyzed data is then available for data mining as depicted at 204. The data mining may be performed by any authorized collaborative user, who manipulates the power lens to perform dynamic, on-demand filtering and/or correlation linking.
The results of the user's data mining are returned at 206, where they are displayed as an on-demand, multimodal visualization (shown in the portal of the power lens) with the associated semantics which defined the context of the data mining operation (shown in an associated call-out box associated with the power lens). The visual display is preferably superimposed on the panoramic 3D view through which the user can move in virtual 3D space (fly in, fly through, pan, zoom, rotate). The view gives the user heightened situational awareness of past, current (real-time) and forecast (predictive) scenarios. Because the system is collaborative, many users can share information and data mining parameters; yet individual privacy is preserved because individual displayed objects are subject to privacy attributes and associated privacy rules.
While the collaborative environment can be architected in many ways, one presently preferred architecture is shown in
As illustrated, the central station terminal communicates with a computer system 216 that defines the collaborative automated surveillance operation center. This is a software system, which may run on a computer system, or network of distributed computer systems. The system further includes a server or server system 218 that provides collaborative automated surveillance operation center services. The server communicates with and coordinates data received from the devices 214. The server 218 thus functions to harvest information received from the devices 214 and to supply that information to the mobile stations and the central station(s).