The present invention relates generally to a system and method for surveilling a scene, and, in particular embodiments, to a system and method for surveilling a scene comprising an allowed region and a restricted region.
Surveillance systems comprising a visual image sensor and a thermal image sensor are known.
In accordance with an embodiment of the present invention, a surveillance system comprises a visual sensor configured to capture a visual image of a scene, a thermal sensor configured to capture a thermal image of the scene and a distance measuring sensor configured to capture a distance image of the scene, the scene comprising an allowed region and a restricted region. The system further comprises a processor configured to generate a combined image based on the visual image, the thermal image and the distance image, wherein an object in the scene is displayed as a representation in a visual image when in the allowed region and displayed as a representation in a thermal image when in the restricted region.
In accordance with another embodiment of the present invention, a method for surveilling a scene having an allowed region and a restricted region comprises capturing a visual image of a scene, capturing a thermal image of the scene, and capturing a distance image of the scene, the scene comprising an allowed region and a restricted region. The method further comprises generating a combined image based on the visual image, the thermal image and the distance image, wherein an object in the scene is displayed as a representation in a visual image when in the allowed region and displayed as a representation in a thermal image when in the restricted region.
In accordance with yet another embodiment of the present invention, a camera comprises a processor and a computer readable storage medium storing programming for execution by the processor. The programming includes instructions to capture a visual image of a scene, capture a thermal image of the scene and capture a distance image of the scene, the scene comprising an allowed region and a restricted region. The programming further includes instructions to generate a combined image based on the visual image, the thermal image and the distance image, wherein an object in the scene is displayed as a representation in a visual image when in the allowed region and displayed as a representation in a thermal image when in the restricted region.
For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
Video surveillance systems that monitor private and public properties may be in tension between security needs and general personal rights. This is especially true for surveillance systems that are located on a private property but capture not only activities on the private property but also activities on a neighboring property such as public land. For example, cameras that surveille the perimeter of the private property may inevitably surveil the border area and the neighboring property. The video surveillance system can restrict the capturing or the displaying of scenes outside the private property by masking off activities outside of the private property. For example, the viewing angle of the cameras can be restricted by mechanical apertures or lens covers. Alternatively, areas of the displayed image can be darkened or blackened.
Embodiments of the invention provide a surveillance system comprising a visual sensor, a thermal sensor and a distance measuring sensor. The images of a scene captured by the visual sensor and the thermal sensor may be assembled to form a combined image with input from the distance measuring sensor. The combined image may be masked to reflect an allowed region and a restricted region of a scene. The distance measuring sensor may be a three dimensional measurement sensor (3D sensor). The distance measuring sensor is able to determine whether an object moving through the scene is moving within the allowed region, moving within the restricted region or moving between the allowed and restricted regions. The object here may include subjects such as people or animals and movable objects such as vehicles or other movable devices. The distance measurement sensor is able to detect and determine a three-dimensional coordinate set of an object in order to determine whether the object or subject is within or outside the perimeter.
In various embodiments the surveillance camera(s) 150 are located at the building 110 or near the building 110 and surveille the border 140 of the property where the building 110 is located. The surveillance camera(s) 150 face the inside 120 and outside areas 130. In an embodiment the surveillance camera(s) 150 faces the inside 120 and outside areas 130 in a substantially orthogonal angle in a horizontal plan parallel to the ground. The surveillance camera(s) may face the inside and outside areas 120, 130 in a different angle in other embodiments. The camera(s) 150 may face the inside and outside areas (allowed and restricted regions) 120, 130 in their respective field of view (see discussion later). A top view of the surveillance location is shown in
The three different sensors (visual image sensor, thermal image sensor and distance measuring sensor) may be located within one single housing (a single camera) or may be located in two or more different housings (several cameras). For example, a visual image sensor and a thermal image sensor may be located in a single housing and the distance measuring sensor is located in a separate single housing.
The visual camera comprises a visual image sensor that is a “normal” image sensor. The visual image sensor produces an image that is similar to what is seen by a human eye. The visual image sensor may be configured to receive and process signals in the visible spectrum of light such as between 390 nm to 700 nm. The visual image sensor may be a CCD sensor or CMOS sensor. The visual camera may be a video camera. The visual image sensor could be a color image sensor, color-independent intensity image sensor or grayscale sensor.
The thermal camera comprises a thermal image sensor. Thermal image sensor is sensitive to radiation in the infrared spectrum and produces a thermal image or a thermogram, showing heat radiated by different objects (such as a microbolometer). The thermal image sensor may be configured to receive signals in the infrared spectrum or infrared radiation in the spectral range between about 3 μm and 15 μm (mid infrared) or between about 15 μm and 1 mm (far infrared). Images captured by the thermal camera may not infringe on the privacy of third parties. The captured images of the thermal cameras allow detection and classification of objects in broad categories such as humans, animals, vehicles, etc. However, these sensors do not provide the identification of individuals. In other words, the thermal sensor allows to capture that something is happening and what is happening but does not allow to identify the object (person) doing what is happening. Moreover, the thermal camera can “see” in total darkness without any lighting.
The distance measuring device may comprise a distance measurement sensor. The distance measuring sensor may be a 3D sensor or a sensor that is configured to capture depth data (3D data or a depth image for a depth camera). The distance measuring device is configured to determine whether or not an object is within a perimeter or is outside that perimeter. For example, the distance measuring device such as a depth camera (especially a time-of-flight camera) can incorporate additional imaging sensors to generate a thermal or visual image of the scene in addition to the depth image.
The three dimensions at each pixel in a depth image of a scene correspond to the x and y coordinates in the image plane, and a z coordinate that represents the depth (or distance) of that physical point from the distance measuring sensors. Examples of depth sensors/cameras include stereoscopic sensors/cameras, structured light sensors/cameras, and time-of-flight (TOF) sensors/cameras. A stereoscopic sensor/camera performs stereo imaging in which 2D images from two (or more) passive image sensors (e.g. visual image sensors) are used to determine a depth image from disparity measurements between the two 2D images. A structured light sensor/camera projects a known pattern of light onto a scene and analyzes the deformation of the pattern from striking the surfaces of objects in the scene to determine the depth. A TOF sensor/camera emits light or laser pulses into the scene and measures the time between an emitted light pulse and the corresponding incoming light pulse to determine scene depth. Other 3D imaging technologies may also be used to gather depth data of a scene. For example, LiDAR (Light Detection And Ranging) sensor/camera emits light to scan the scene and calculate distances by measuring the time for a signal to return from an object hit by the emitted light. By taking into account the angle of the emitted light, relative (x, y, z) coordinates of the object with respect to the LiDAR sensor can be calculated and provided representing the 3D data of the object. Is the specific location of the LiDAR sensor (on the property) known, absolute (x, y, z) coordinates can be calculated.
A camera (the housing) may not only include the image/thermal or measurement sensors but may also include any other sensing component (such as an alarm sensor), optical components or equipment (such as lenses) and further electronic products to produce images or transmit (image) data or signals. For example, to minimize deviation, the sensors in a single camera could gather electromagnetic radiation from a common optical path that is split with a mirror, prism or lens before entering the sensors.
In order to produce images of the same view of a scene, the different sensors or cameras may be placed in close proximity to each other (distance up to 50 cm or up to 3 meters). However, in other embodiments the different cameras or sensors could be placed in different locations as long as they cover the same scene.
In some embodiments the field of view (mainly in the vertical direction) of the 3D sensor may be a limiting factor. In alternative embodiments the field of view of the thermal sensor may be the limiting factor.
In a first step 410 the sensors are mechanically installed to cover a scene or a region of interest. This means that the visual and thermal sensors and the distance measuring sensor (3D sensor) are coordinated and adjusted with respect to each other. If the units are separate they must be aligned when installed so that they provide the best possible and most suitable match on the scene. As mentioned above, the unit with the smallest field of view (angle) is the limiting factor. This is often the distance measuring device (e.g., 3D sensor). According to an embodiment,
In a second step 420, the sensors are calibrated. The sensors are calibrated for reliable functioning of the surveillance system. According to embodiments, the sensors are calibrated (and a 3 dimensional image is constructed) by assigning measurement points of the image measuring device (e.g., 3D sensor) to visual image pixels and thermal image pixels. In other words, the pixels of the captured 3D image (e.g., measurement points, special positions, or (x, y, z) space coordinates) are assigned to the pixels of the captured image(s) of the image sensor and the pixels of the captured image(s) of the thermal sensor. The pixels of the various captured images must be known in order to correctly assign or map them to each other. In various embodiments, the pixels of the 3D image (e.g., (x, y, z) space coordinates) are clearly or definitely mapped to the pixels of the thermal image and the visual image. In various embodiments, each identified spatial position is mapped to a pixel(s) in the thermal image and pixel(s) in the color image: (x, y-z)→pixel thermal image (xt, yt) and (x, y, z)→pixel color (xv, yv).
The calibration of the sensors may be done for a plurality of sampling points in the scene. For example, during the calibration phase, a special test object may be moved to different sampling positions in the scene. The sensors (visual, thermal and 3D sensor) can identify and record the special test object (specimen). For example, the test object(specimen) may be a colored, highly reflective specimen with a temperature different from the ambient temperature. The size of the test specimen may be selected such that the specimen has a size of several pixels at a maximum distance from the sensors (but still within the image region of interest) and that it can be detected by the distance measuring device (e.g., 3D sensor).
The test object may be moved to several positions in the scene. For example, the test object may be positioned at several locations at edges and diagonals of the region of interest. Alternatively, a random coverage of the region of interest is possible too. At all these positions, each sensor detects the test object, and for each position a color image, a thermal image and a spatial position is captured. As discussed supra, based on these measurements each identified spatial position is mapped to pixels in the thermal image and pixels in the color image: (x, y-z)→pixel thermal image (xt, yt) and (x, y, z)→pixel color (xv, yv). Values between the selected positions (e.g., edges or certain positions on the diagonals) of the test object can be calculated by interpolation.
The different sensors may have different resolutions. In various embodiments, the measurement point (pixel of the measurement image) of the distance measurement sensor may be assigned to a plurality of pixels of visual image of the visual sensor. However, the measurement point of the distance sensor may not be assignable to a pixel of the thermal image of the thermal camera, or alternatively, several measurement points of the distance sensor may be assigned to a single pixel of the thermal image. This effect needs to be considered when the combined image is constructed. For example, the “intermediate pixels” may be calculated for an improved thermal image so that a thermal pixel (if necessary an “intermediate pixel”) can be assigned to each measurement point (pixel of the measurement image).
In an alternative embodiment, the visual and thermal sensors can be directly calibrated with respect to each other. Calibration can be carried out by overlapping the captured images of the visible and the thermal sensors. This may include superimposing the two images of the two sensors and displaying the superimposed (or mixed) image as a single image. For example, the image of the visual image sensor (or color sensor) may be used as background and the image of the thermal image sensor is superimposed with 50% opacity (or an opacity between 30% and 70%, etc.). Moving the thermal image with respect to the color image (up, down, left, right). Moreover, the image of the thermal sensor may be scaled (increasing, scaling down) in order to compensate for different angles of the view of the lens. The overlapping can be manually performed by using operating control elements.
The superposition of the thermal image on the visual image is calibrated for a specific distance, e.g., several meters. For probe objects that are substantially closer to or further away from the sensors an offset appears between the thermal image and the color image. In a particular example, (
In various embodiments, the sensors need to be recalibrated in certain time instances because environmental effects (temperature, wind, etc.) can impact the accuracy of the surveillance system. Such a recalibration may be performed once a month, one a year or once every two to three years. In other embodiments the recalibration is a permanent or continuous recalibration. In various embodiments, moving objects in the scene can be identified (measured, captured) by all the sensors and can be used for recalibration of these sensors.
In the next step, at 430, a masking map (masking card) of the scene to be the monitored is defined and generated. The masking map reflects the allowed region and the restricted region of the scene. The masking map may be a 3-dimensional masking map. The map may be generated by separating the 3 dimensional image of the scene constructed in the previous step 420 in two or more different portions, regions or areas. For example, the masking map may define an allowed region (fully surveilled) and a restricted region (restrictively surveilled). The two areas can be separated by a defining a separating region between the inside area and the outside area. The separating region may be a 2 dimensional plane, surface plane or hyperplane. Alternatively, the separating surface may be 3 dimensional volume. The two regions may be separated by other methods too.
In an embodiment, shown in
In an alternative embodiment, shown in
In a yet further embodiment, shown in
In the next step, at 440, a combined image is generated. Based on the calibration, the system or the distance measuring sensor (e.g., 3D sensor) knows for each measurement point the corresponding pixels of the image of the visual (color) sensor and the image of the thermal sensors. For an object, detected by the distance measuring sensor (e.g., 3D-sensor) within the region of interest (scene), the 3D sensor provides the distance and spatial coordinates. By comparing the spatial coordinates of the object with the three-dimensional masking map, the processor can decide whether the object is located in the allowed region or in the restricted region and therefore, whether the object is to be represented the pixels of the thermal sensor or the pixels of the visual sensor. Based on this mapping the combined image of the thermal sensor and the visual sensor is determined and displayed. The combined image can be displayed at a monitoring station or at the camera. If the object is identified between two calibrated test points (see above at step 420, e.g., edges or certain positions on the diagonals) the object is represented by pixels of the visual image or pixels of the thermal image and these pixels are calculated by interpolation. The calculation can be based on an interpolation of the measurement point (e.g., pixel of the depth image) and/or on an interpolation of the pixels of the thermal sensor or the calculation can be based on an interpolation of the measurement point (e.g., pixel of the depth image) and/or on an interpolation of the pixels of the visual sensor. If the object is detected at one of the calibrated test points the pixels of the thermal image or the visual image are defined and no interpolation may be necessary.
In various embodiments, the method above 400 may be modified such that the combined image only displays pixels in a certain temperature range in the outside area. For example, if an object moves in the restricted area surveilled by the sensors and the object is not a living thing the object may be shown as moving in a visual representation because no privacy aspect may be violated. Only if the moving object is a human being and if the object moves in the restricted area, the combined image should display this movement by pixels of the thermal sensor. This can be achieved by setting the thermal sensor to capture only specific temperature ranges, such as a temperature range of 30 degrees Celsius to 40 degrees Celsius. Alternatively, other temperature ranges can be also selected. An advantage of this is that the displayed image provides a more complete and comprehensive picture of the scene.
The camera 610 provides color image data and thermal image data to the analytics server 650 via the network 630. The distance measuring device (3D sensor) 620 provides depth image data or 3D data to the analytics server 650 via the network 630. The analytics server 650 generates a combined thermal/color image using the color image data and the thermal data from the camera 610. The combined thermal/color image is generated based on the 3D data and masking as described in previous embodiments. The combined images can be continuously stored, stored on an alarm or based on time at the storage device 640. The combined images can also be displayed continuously, on request, or on alarm at the monitoring station 660.
The image analysis unit 730 is configured to process data acquired by the different sensors to detect moving objects present in the scene as well as the test object, even if it is not moving. Any suitable type of object detection algorithm could be used and different algorithms could be selected for different sensor types. When an object is found, the object position and pixel information as well as information whether the object is a special test object is provided. Additionally information about the observed scene may be provided by the image analysis unit (e.g., detected structures, boundaries of detected objects, walls, etc.).
The mapping unit 732 is configured to perform calibration of and the mapping between spatial measurement points captured by the distance measuring sensor and the pixels of the images captured by the thermal and visual sensors. The mapping unit may implement different algorithms to interpolate values in between the sampling points acquired for calibration.
The masking unit 735 is configured to define the three-dimensional masking map and to determine whether a position is located in the allowed region or the restricted region. The mask unit 735 may receive or access a predefined masking map definition. The masking map may also be defined by a graphical user interface operated by a user, e.g. by drawing in a 2D or 3D representation of the observed scene or by the user entering coordinates. Additional information provided by the image analysis unit 730 may be used when defining the masking map.
The image combiner is configured to generate a combined image. The image combiner receives positional data from the distance measuring sensor and image data from the visual image sensor and the thermal sensor. On the determination of the masking unit 735, the image combiner uses the appropriate pixel from the respective sensor to generate the combined image.
The video encoder 750 is configured to compress the generated image(s) in accordance with an image compression standard, such as JPEG, or in accordance with a video compression standard, such as H.264 or Motion-JPEG and delivers the compressed data on the network interface.
The network interface 780 is configured to transmit the data over a specific network. Any suitable network protocol(s) may be used. The network interface allows the camera to communicate with a monitoring station or an administrative station adapted to configure the camera.
The storage device 770 is adapted to store depth, image or video data acquired by the sensors as well as compressed image or video data.
While the sensors, optics and electronics are described in one and the same housing, however as mentioned above, this is not mandatory; they could be provided in different housings. Additionally, calculations that place a substantial burden on the resource of a processor may be offloaded to a separate and dedicated computing device such as a computer. For example, the definition and drawing of the masking map may be done on a separate computer connected to the camera via network. The separate computer (e.g., PC) receives depth data and image data of the scene acquired by the distance measuring sensor and the other image sensors. That may allow the user to use computational complex virtual reality methods to configure the masking map or to run computational demanding image analysis algorithms to detect structures in the scene to support the user in configuring the masking map.
While this invention has been described with reference to illustrative embodiments, this description is not intended to be construed in a limiting sense. Various modifications and combinations of the illustrative embodiments, as well as other embodiments of the invention, will be apparent to persons skilled in the art upon reference to the description. It is therefore intended that the appended claims encompass any such modifications or embodiments.