This invention relates to surveillance systems. Specifically, the invention relates to video-based human verification systems and methods.
Physical security is of critical concern in many areas of life, and video has become an important component of security over the last several decades. One problem with video as a security tool is that video is very manually intensive to monitor. Recently, there have been solutions to the problems of automated video monitoring in the form of intelligent video surveillance systems. Two examples of intelligent video surveillance systems are described in U.S. Pat. No. 6,696,945, titled “Video Tripwire” and U.S. patent application Ser. No. 09/987,707, titled “Surveillance System Employing Video Primitives,” both of which are commonly owned by the assignee of the present application and incorporated herein by reference in their entirety. These systems are usually deployed on large-scale personal computer (PC) platforms with large footprints and a broad spectrum of functionality. There are applications for this technology that are not addressed by such systems, such as, for example, the monitoring of residential and light commercial properties. Such monitoring may include, for example, detecting intruders or loiterers on a particular property.
Typical security monitoring systems for residential and light commercial properties may consist of a series of low-cost sensors that detect specific things such as motion, smoke/fire, glass breaking, door/window opening, and so forth. Alarms from these sensors may be situated at a central control panel, usually located on the premises. The control panel may communicate with a central monitoring location via a phone line or other communication channel. Conventional sensors, however, have a number of disadvantages. For example, many sensors cannot discriminate between triggering objects of interest, such as a human, and those not of interest, such as a dog. Thus, false alarms can be one problem with prior art systems. The cost of such false alarms can be quite high. Typically, alarms might be handled by local law enforcement personnel or a private security service. In either case, dispatching human responders when there is no actual security breach can be a waste of time and money.
Conventional video surveillance systems are also in common use today and are, for example, prevalent in stores, banks, and many other establishments. Video surveillance systems generally involve the use of one or more video cameras trained on a specific area to be observed. The video output from the video camera or video cameras is either recorded for later review or is monitored by a human observer, or both. In operation, the video camera generates video signals, which are transmitted over a communications medium to one or both of a visual display device and a recording device.
In contrast with conventional sensors, video surveillance systems allow differentiation between objects of interest and objects not of interest (e.g., differentiating between people and animals). However, a high degree of human intervention is generally required in order to extract such information from the video. That is, someone must either be watching the video as the video is generated or later reviewing stored video. This intensive human interaction can delay an alarm and/or any response by human responders.
In view of the above, it would be advantageous to have a video-based human verification system that can verify the presence of a human in a given scene. The system may, in addition, be able to provide alerts based on other situations such as the presence of a non-human object (e.g., a vehicle, a house pet, or a moving inanimate object (e.g., curtains blowing in the wind) or the presence of any motion at all. In an exemplary embodiment, the video-based human verification system may include a video sensor adapted to capture video and produce video output. The video sensor may include a video camera. The video-based human verification system may further include a processor adapted to process video to verify the presence of a human. An alarm processing device may be coupled to the video sensor by a communication channel and may be adapted to receive at least video output through the communication channel.
In an exemplary embodiment, the processor may be included on the video sensor. The video sensor may be adapted to transmit alert information and/or video output in the form of, for example, a data packet or a dry contact closure, to the alarm processing device if the presence of a human, a non-human, or any motion at all is verified. The alarm processing device or a central monitoring center interface device may be adapted to transmit at least a verified human alarm to a central monitoring center and may also be adapted to transmit at least the video output to the central monitoring center. The alarm, optionally along with associated video and/or imagery, may also be sent directly to the property owner via a remote access web-page or via a wireless alarm receiving device.
In an exemplary embodiment, the processor may be included on the alarm processing device. The alarm processing device or interface device may be adapted to receive video output from the video sensor. The alarm processing device or the central monitoring center interface device may be further adapted to transmit alert information and/or video output to the central monitoring center if the presence of a human, a non-human, or any motion at all is verified. The alarm processing device or the central monitoring center interface device may also transmit the alarm, and optionally associated video and/or imagery, directly to the property owner via a remote access web-page or via a wireless alarm receiving device
In an exemplary embodiment, the processor may be included at the central monitoring center. The alarm processing device or the central monitoring center interface device may be adapted to receive video output from the video sensor and may further be adapted to retransmit the video output to the central monitoring center where the presence of a human, a non-human, or any motion at all may be verified.
Further objectives and advantages will become apparent from a consideration of the description, drawings, and examples.
In describing the invention, the following definitions are applicable throughout (including above).
A “computer” may refer to one or more apparatus and/or one or more systems that are capable of accepting a structured input, processing the structured input according to prescribed rules, and producing results of the processing as output. Examples of a computer may include: a computer; a stationary and/or portable computer; a computer having a single processor or multiple processors, which may operate in parallel and/or not in parallel; a general purpose computer; a supercomputer; a mainframe; a super mini-computer; a mini-computer; a workstation; a micro-computer; a server; a client; an interactive television; a web appliance; a telecommunications device with internet access; a hybrid combination of a computer and an interactive television; a portable computer; a personal digital assistant (PDA); a portable telephone; application-specific hardware to emulate a computer and/or software, such as, for example, a digital signal processor (DSP) or a field-programmable gate array (FPGA); a distributed computer system for processing information via computer systems linked by a network; two or more computer systems connected together via a network for transmitting or receiving information between the computer systems; and one or more apparatus and/or one or more systems that may accept data, may process data in accordance with one or more stored software programs, may generate results, and typically may include input, output, storage, arithmetic, logic, and control units.
“Software” may refer to prescribed rules to operate a computer. Examples of software may include software; code segments; instructions; computer programs; and programmed logic.
A “computer system” may refer to a system having a computer, where the computer may include a computer-readable medium embodying software to operate the computer.
A “network” may refer to a number of computers and associated devices that may be connected by communication facilities. A network may involve permanent connections such as cables or temporary connections such as those made through telephone or other communication links. Examples of a network may include: an internet, such as the Internet; an intranet; a local area network (LAN); a wide area network (WAN); and a combination of networks, such as an internet and an intranet.
“Video” may refer to motion pictures represented in analog and/or digital form. Examples of video may include television, movies, image sequences from a camera or other observer, and computer-generated image sequences. Video may be obtained from, for example, a live feed, a storage device, an IEEE 1394-based interface, a video digitizer, a computer graphics engine, or a network connection.
A “video camera” may refer to an apparatus for visual recording. Examples of a video camera may include one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device. A video camera may be positioned to perform surveillance of an area of interest.
“Video processing” may refer to any manipulation of video, including, for example, compression and editing.
A “frame” may refer to a particular image or other discrete unit within a video.
The foregoing and other features and advantages of the invention will be apparent from the following, more particular description of exemplary embodiments of the invention, as illustrated in the accompanying drawings wherein like reference numerals generally indicate identical, functionally similar, and/or structurally similar elements. The left-most digits in the corresponding reference numerals indicate the drawing in which an element first appears.
Exemplary embodiments of the invention are discussed in detail below. While specific exemplary embodiments are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations can be used without parting from the spirit and scope of the invention.
The video sensor 101 may include an infrared (IR) video camera 102, an associated IR illumination source 103, and a processor 104. The IR illumination source 103 may illuminate an area so that the IR video camera 102 may obtain video of the area. The processor 104 may be capable of receiving and/or digitizing video provided by the IR video camera 102, analyzing the video for the presence of humans, non-humans, or any-motion at all, and controlling communications with the alarm processing device 111. The video sensor 101 may also include a programming interface (not shown) and communication hardware (not shown) capable of communicating with the alarm processing device 111 via communication channel 105. The processor 104 may be, for example: a digital signal processor (DSP), a general purpose processor, an application-specific integrated circuit (ASIC), field programmable gate array (FPGA), or a programmable device.
The human (or other object) verification technology employed by the processor 104 that may be used to verify the presence of a human, a non-human, and/or any motion at all in a scene may be the computer-based object detection, tracking, and classification technology described in, for example, the following, all of which are incorporated by reference herein in their entirety: U.S. Pat. No. 6,696,945, titled “Video Tripwire”; U.S. patent application Ser. No. 09/987,707, titled “Surveillance System Employing Video Primitives”; and U.S. patent application Ser. No. 11/139,986, titled “Human Detection and Tracking for Security Applications.” Alternatively, the human verification technology that is used to verify the presence of a human in a scene may be any other human detection and recognition technology that is available in the literature or is known to one sufficiently skilled in the art of computer-based human verification technology.
The communication channel 105 may be, for example: a computer serial interface such as recommended standard 232 (RS232); a twisted-pair modem line; a universal serial bus connection (USB); an Internet protocol (IP) network managed over category 5 unshielded twisted pair network cable (CAT5), fibre, wireless fidelity network (WiFi), or power line network (PLN); a global system for mobile communications (GSM), a general packet radio service (GPRS) or other wireless data standard; or any other communication channel capable of transmitting a data packet containing at least one video image.
The alarm processing device 111 may be, for example, an alarm panel or other associated hardware device (e.g., a set-top box, a digital video recorder (DVR), a personal computer (PC), a residential router, a custom device, a computer, or other processing device (e.g., a Slingbox by Sling Media, Inc. of San Mateo, Calif.)) for use in the system. The alarm processing device 111 may be capable of receiving alert information from the video sensor 101 in the form of, for example, a dry contact closure or a data packet including, for example: alert time, location, video sensor information, and at least one image or video frame depicting the human in the scene. The alarm processing device 111 may further be capable of retransmitting the data packet to the CMC 113 via connection 112. Examples of the connection 112 may include: a plain old telephone system (POTS), a digital service line (DSL), a broadband connection or a wireless connection.
The CMC 113 may be capable of receiving alert information in the form of a data packet that may be retransmitted from the alarm processing device 111 via the connection 112. The CMC 113 may further allow the at least one image or video frame depicting the human in the scene to be viewed and may dispatch human responders.
The video-based human verification system 100 may also include other sensors, such as dry contact sensors and/or manual triggers, coupled to the alarm processing device 111 via a dry contact connection 106. Examples of dry contact sensors and/or manual triggers may include: a door/window contact sensor 107, a glass-break sensor 108, a passive infrared (PIR) sensor 109, an alarm keypad 110, or any other motion or detection sensor capable of activating the video sensor 101. A strobe and/or a siren (not shown) may also be coupled to the alarm processing device 111 or to the video sensor 101 via the dry contact connection 106 as an output for indicating a human presence once such presence is verified. The dry contact connection 106 may be, for example: a standard 12 volt direct current (DC) connection, a 5 volt DC solenoid, a transistor-transistor logic (TTL) dry contact switch, or a known dry contact switch.
In an exemplary embodiment, the dry contact sensors, such as, for example, the PIR sensor 109 or other motion or detection sensor, may be connected to the alarm processing device 111 via the dry contact connection 106 and may be capable of detecting the presence of a moving object in the scene. The video sensor 101 may only be employed to verify that the moving object is actually human. That is, the video sensor 101 may not be operating (to save processing power) until it is activated by the PIR sensor 109 through the alarm processing device 111 and communication channel 105. As an option, at least one dry contact sensor or manual trigger may also trigger the video sensor 101 via a dry contact connection 106 directly connected (not shown) to the video sensor 101. The IR illumination source 103 may also be activated by the PIR sensor 109 or other dry contact sensor. In another exemplary embodiment, the video sensor 101 may be continually active.
The video capturer 315 of the video sensor 101 may capture video from the IR video camera 102. The video capturer 315 of the video sensor 201 may capture video from the low-light video camera 202. In either case, the video may then be encoded with the video encoder 316 and may also be processed by the processor 104. The processor 104 may include a content analyzer 317 to analyze the video content and may further include a thin activity inference engine 318 to verify the presence of a human, a non-human, and/or any motion at al. in the video (see, e.g., U.S. patent application Ser. No. 09/987,707, titled “Surveillance System Employing Video Primitives”).
In an exemplary embodiment, the content analyzer 317 models the environment, filters out background noise, detects, tracks, and classifies the moving objects, and the thin activity inference engine 318 determines that one of the objects in the scene is, in fact, a human, a non-human, and/or any motion at all, and that this object is in an area where a human, a non-human, or motion should not be.
The programming interface 320 may control functions such as, for example, parameter configuration, human verification rule configuration, a stand-alone mode, and/or video camera calibration and/or setup to configure the camera for a particular scene. The programming interface 320 may support parameter configuration to allow parameters for a particular scene to be employed. Parameters for a particular scene may include, for example: no parameters; parameters describing a scene (indoor, outdoor, trees, water, pavement); parameters describing a video camera (black and white, color, omni-directional, infrared); and parameters to describe a human verification algorithm (for example, various detection thresholds, tracking parameters, etc.). The programming interface 320 may also support a human verification rule configuration. Human verification rule configuration information may include, for example: no rule configuration; an area of interest for human detection and/or verification; a tripwire over which a human must walk before he/she is detected; one or more filters that depict minimum and maximum sizes of human objects in the view of the video camera; one or more filters that depict human shapes in the view of the video camera. Similarly, The programming interface 320 may also support a non-human and/or a motion verification rule configuration. Non-human and/or motion verification rule configuration information may include, for example: no rule configuration; an area of interest for non-human and/or motion detection and/or verification; a tripwire over which a non-human must cross before detection; a tripwire over which motion must be detected; one or more filters that depict minimum and maximum sizes of non-human objects in the view of the video camera. The programming interface 320 may further support a stand-alone mode. In the stand-alone mode, the system may detect and verify the presence of a human without any explicit calibration, parameter configuration, or rule set-up. The programming interface 320 may additionally support video camera calibration and/or setup to configure the camera for a particular scene. Examples of camera calibration include: no calibration; self-calibration (for example,
The video sensor data packet interface 319 may receive encoded video output from the video encoder 316 as well as data packet output from the processor 104. The video sensor data packet interface 319 may be connected to and may transmit data packet output to the alarm processing device 111 via communication channel 105.
The software architecture of the alarm processing device 111 may include a data packet interface 321, a dry contact interface 322, an alarm generator 323, and a communication interface 324 and may further be capable of communicating with the CMC 113 via the connection 112. The dry contact interface 322 may be adapted to receive output from one or more dry contact sensors (e.g., the PIR sensor 109) and/or one or more manual triggers (e.g., the alarm keypad 110), for example, in order to activate the video sensor 101 and/or video sensor 201 via the communication channel 105. The alarm processing device data packet interface 321 may receive the data packet from the video sensor data packet interface 319 via communication channel 105. The alarm generator 323 may generate an alarm in the event that the data packet output transmitted to the alarm processing device data packet interface 321 includes a verification that a human is present. The communication interface 324 may transmit at least the video output to the CMC 113 via the connection 112. The communication interface 324 may further transmit an alarm signal generated by the alarm generator 323 to the CMC 113.
The video capturer 315 of the “dumb” video sensor 401 may capture video from the IR video camera 102. The video capturer 315 of the “dumb” video sensor 501 may capture video from the low-light video camera 202. In either case, the video may then be encoded with the video encoder 316 and output from a video steaming interface 625 to the alarm processing device 411 via communication channel 405.
The software architecture of the alarm processing device 411 may include the dry contact interface 322, a control logic 626, a video decoder/capturer 627, the processor 104, the programming interface 320, the alarm generator 323, and the communication interface 324. The dry contact interface 322 may be adapted to receive output from one or more dry contact sensors (e.g., the PIR sensor 109) and/or one or more manual triggers (e.g., the alarm keypad 110), for example, in order to activate the video sensor 401 and/or video sensor 501 via the communication channel 405. In a system having multiple video sensors 401, the dry contact output may pass to control logic 626. The control logic 626 determines which video source and which time range to retrieve video. For example, for a system with twenty non-video sensors and five partially overlapping video sensors 401 and/or 501, the control logic 626 determines which video sensors 401 and/or 501 are looking at the same area as which non-video sensors. The alarm processing device video decorder/capturer 627 may capture and decode the video output received from the video sensor video streaming interface 319 via communication channel 405. The alarm processing device video decoder/capturer 627 may also receive output from the control logic 626. The video decoder/capturer 627 may then output the video to the processor 104 for processing.
The software architecture for the video-based human verification system with centralized processing as shown in
The video sensor data may also be shared, for example, wirelessly with the residential or commercial customer by using the home computer 932 as a server to transmit the video sensor data from the video-based human verification system 900 to one or more wireless receiving devices 934 via one or more wireless connections 933. The wireless receiving device 934 may be, for example: a computer wirelessly connected to the Internet, a laptop wirelessly connected to the Internet, a wireless PDA, a cell phone, a Blackberry, a pager, a text messaging receiving device, or any other computing device wirelessly connected to the Internet via a virtual private network (VPN) or other secure wireless connection.
In another embodiment, data may be shared by the customer through the CMC 113. The CMC 113 may host a web-service through which subscribers may view alerts through web-pages. Alternatively, or in addition, the CMC 113 may broadcast alerts to customers via wireless alarm receiving devices. Examples of such wireless alarm receiving devices include: a cell phone, a portable laptop, a PDA, a text message receiving device, a pager, a device able to receive an email, or other wireless data receiving device.
In summary, an alarm, along with optional video and/or imagery, may be provided to the customer in a number of ways. For example, first, a home PC may host a web page for posting an alarm, along with optional video and/or imagery. Second, a home PC may provide an alarm, along with optional video and/or imagery, to a wireless receiving device. Third, a CMC may host a web page for posting an alarm, along with optional video and/or imagery. Fourth, a CMC may provide an alarm, along with optional video and/or imagery, to a wireless receiving device.
There may be three modes of operation for the obfuscation module. In a first obfuscation mode, the obfuscation technology may be on all the time. In this mode, the appearance of any human and/or their faces may be obfuscated in all imagery generated by the system. In a second obfuscation mode, the appearance of non-violators and/or their faces may be obfuscated in imagery generated by the system. In this mode, any detected violators (i.e., unknown humans) may not be obscured. In a third obfuscation mode, all humans in the view of the video camera may be obfuscated until a user specifies which humans to reveal. In this mode, once the user specifies which humans to reveal, the system may turn off obfuscation for those individuals.
In addition to obfuscating face images, it might be desirable to extract a “best face” image from the video. To achieve this, human head detection and “best face” detection may be added to the system. One technique for human head detection (as well as face detection) is discussed in, for example, U.S. patent application Ser. No. 11/139,986, titled “Human Detection and Tracking for Security Applications,” which is incorporated by reference in its entirety.
One technique for “best face” detection is as follows. Once a face has been successfully detected in the frame with the human head detection, a best shot analysis is performed on each frame with the detected face. The best shot analysis determines, for example, computes a weighted best shot score based on the following exemplary metrics: face size and skin tone ratio. With the face size metric, a large face region implies more pixels on the face, and a frame with a larger face region receives a higher score. With the skin tone ratio metric, the quality of the face shot is directly proportional to the percentage of skin-tone pixels in the face region, and a frame with a higher percentage of skin-tone pixels in the face region receives a higher score. The appropriate weighting of the metrics may be determined by testing on a generic test data set or an available test data set for the scene under consideration. The frame with the best shot score is determined to contain the best face.
As an alternative to the various exemplary embodiments of the invention, the system may include one or more video sensors.
As an alternative to the various exemplary embodiments of the invention, the video sensors 101, 201, 401, or 501 may communicate with an interface device instead of or in addition to communicating with the alarm processing device 111 or 411. This alternative may be useful in fitting the invention to an existing alarm system. The video sensor 101, 201, 401, or 501 may transmit video output and/or alert information to the interface device. The interface device may communicate with the CMC 113. The interface device may transmit video output and/or alert information to the CMC 113. As an option, if the video sensor 101 or 201 does not include the processor 104, the interface device or the CMC 113 may include the processor 104.
As an alternative to the various exemplary embodiments, the video sensors 101, 201, 401, or 501 may communicate with an alarm processing device 111 or 411 via a connection with a dry contact switch.
The various exemplary embodiments of the invention have been described as including an IR video camera 102 or a low-light video camera 202. Other types and combinations of video cameras may be used with the invention as will become apparent to those skilled in the art.
The exemplary embodiments and examples discussed herein are non-limiting examples.
The embodiments illustrated and discussed in this specification are intended only to teach those skilled in the art the best way known to the inventors to make and use the invention. Nothing in this specification should be considered as limiting the scope of the present invention. The above-described embodiments of the invention may be modified or varied, and elements added or omitted, without departing from the invention, as appreciated by those skilled in the art in light of the above teachings. It is therefore to be understood that, within the scope of the claims and their equivalents, the invention may be practiced otherwise than as specifically described.
This application claims priority to U.S. patent application Ser. No. 11/139,972, filed on May 31, 2005, titled “Video-Based Human Verification System and Method, and U.S. Provisional Patent Application No. 60/672,525, filed on Apr. 19, 2005, titled “Human Verification Sensor for Residential and Light Commercial Applications,” both commonly-assigned, and both of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60672525 | Apr 2005 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11139972 | May 2005 | US |
Child | 11486057 | Jul 2006 | US |