1. Field of the Invention
The present invention generally relates to the detection of fire and smoke, and in particular to use of image and video analysis techniques to detect the presence of indicators of fire and smoke.
2. Background Description
Conventional point smoke and fire detectors typically detect the presence of certain particles generated by smoke and fire by ionization or photometry. Point detectors cannot be operated in open spaces and it may take a long time for smoke particles to reach a detector in large rooms, atriums, etc. This, in turn, slows the response time of the point detectors which is very critical especially at the early stages of fire. The strength of using video in fire detection is the ability to serve large and open spaces. Current fire detection algorithms and methods are based on the use of color in video to detect the flames, as described, for example, in the article “Flame recognition in video” by W. Phillips III, M. Shah, and N. V. Lobo in Pattern Recognition Letters, c. 23 (1-3), s. 319-327, Ocak 2002; the article “A system for real-time fire detection” by G. Healey, D. Slater, T. Lin, B. Drda, and A, D. Goedeke in IEEE Computer Vision and Pattern Recognition Conference (CVPR) Proceedings '93, s. 605-606, 15-17 Haziran 1993; U.S. Pat. No. 6,844,818 to Grech-Cini et al (“Grech-Cini”).
U.S. Pat. No. 6,011,464 to Thuillard (“Thuillard”) describes a wavelet transform based method analyzing one dimensional (1-D) signals coming from a sensor belonging to a hazard detector system. The original sensor output signal is fed to multi-stage cascaded pairs of high-pass/low-pass filters. Association functions are assigned for high-pass filter outputs which are then analyzed using a set of fuzzy logic rules. An alarm is issued according to fuzzy logic rules. Thuillard fails to extend his method to two-dimensional (2-D) image sequences forming the video.
Japanese patent JP11144167 to Takatoshi et al
(“Takatoshi”) describes a fire detecting device based on flame detection only with the aim of eliminating false alarms due to artificial light sources, “especially rotating lamps”. Takatoshi fails to take advantage of smoke detection to eliminate false alarms.
An attempt has been made to use flicker on the flame boundaries and within flame regions as an indicator for the existence of flames within the viewing range of the visible or IR spectrum camera. PCT publication number WO02/069292 describes use of Fast Fourier Transforms (FFT) to compute temporal object boundary pixels to detect peaks especially around 10 Hz in Fourier domain. An important weakness of this method is that flame flicker is not purely sinusoidal but random. This makes it hard to detect peaks in FFT plots because they may not have a clear peak at 10 Hz due to the random nature of flames.
It is therefore an object of the present invention to provide a technique that improves on the prior art by using smoke detection to eliminate false alarms and to provide an early indication of fire.
Another object of the invention is to improve on the prior art by employing a technique that reduces the computational requirements of fire and smoke detection.
It is also an object of the invention to provide a robust alternative to Fast Fourier Transforms for detection of flame flicker.
The invention provides a novel method and a system to detect smoke, fire and/or flame by processing the data generated by a group of sensors including ordinary cameras monitoring a scene in visible and infra-red spectrum. Video generated by the cameras are processed by a two-dimensional (2-D) nonlinear filter based on median operation. Flame and smoke flicker behavior is detected using Hidden Markov Models employing the output of the 2-D nonlinear filter to reach a decision.
One aspect of the invention is a method, a system and a device for accurately determining the location and presence of smoke due to fire and flames using video data captured by a camera. The method and the system detects smoke by a) transforming plurality of images forming the video into Nonlinear Median filter Transform (NMT) domain, b) implementing an “LL” energy based energy measure indicating the existence of smoke from the NMT domain data, c) detecting slowly decaying NMT coefficients, d) performing color analysis in low-resolution NMT sub-images, e) using a Markov model based decision engine to model the turbulent behavior of smoke, and f) fusing the above information to reach a final decision.
In a further aspect, the system and method computes the Nonlinear Median (NM) filter transforms of video image frames without performing any multiplication operations. Another aspect of the invention provides for searching all sub-images of NM transformed video data for slowly disappearing high amplitude NMT coefficients compared to the reference background NMT image, thereby indicating smoke activity.
It is also an aspect of the invention to provide a method and system that searches all NMT sub-images of transformed video data for newly appeared regions having energy less than the reference background NMT sub-images, thereby indicating existence of smoke. In a further aspect, the method and system of the invention calculates “L1”-norm based NMT energy function which does not require any multiplication operations. Another aspect of the invention carries out color content analysis on the low resolution sub-images of the NMT transformed video data to detect gray colored regions. In yet a further aspect, the invention is implemented by carrying out flicker and turbulent behavior analysis of smoke regions in video by using Markov models trained with NMT coefficients.
The method and system of the invention additionally g) performs an adaptive decision fusion mechanism based on the LMS (Least Mean Square) algorithm, h) creates a weighted mechanism for processed data fusion, i) combines processed data from a variety of camera outputs, and j) has memory and is able to recall on previously recorded decisions.
The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:
The method of the invention constructs a 2-D nonlinearly filtered background image from a plurality of image frames and monitors the changes in some parts of the image by comparing the current nonlinearly filtered image to the constructed background image. This 2-D image and image frame analysis method of subband energy definition is distinct from the approach taken by Thuillard. Thuillard uses a Euclidean norm requiring squared sums, and cannot locate the exact location of the fire because his method makes use of a 1-D sensor output signal. The present invention does not use any multiplications. It uses median filtering and l1-norm requiring only absolute values, which is computationally much faster than Euclidean norm based energy calculations. Furthermore the approach of the present invention uses hidden Markov model (HMM) technology as the decision engine to detect fire within the viewing range of the camera. Also, the 2-D nonlinear image analysis of image frames makes it possible to estimate the location of smoke regions within image sequences.
As indicated above, Takatoshi fails to take advantage of smoke detection to eliminate false alarms. However, in many fires, smoke rises into the view of sensors well before flames become visible. Takatoshi uses 2-D Continuous Wavelet Transform (CWT) for image analysis. By contrast, the present invention uses a discrete-time nonlinear filtering structure (
As indicated above, the prior art uses FFT to detect flicker on the flame boundaries and within flame regions, but it is difficult to use FFT as an indicator for the existence of flames within the viewing range of the visible or IR spectrum camera because flicker is random rather than sinusoidal. The improvement of the present invention models flame flicker processes with Markov models. Also, the prior art Grech-Cini reference describes how edges are determined using the image space domain Sobel edge filter, which requires 8 multiplications to produce an output sample. The improvement provided by the present invention is the use of a nonlinear filter that does not use multiplication, which is computationaly faster than linear Sobel edge-detection filter. Furthermore sub-images used in the analysis are smaller in size than the output of the Sobel filter. The present invention does not require any multiplications, which leads to a low-cost field-programmable gate array (FPGA) implementation, although the invention may be implemented in other physical configurations. Another improvement of the present invention over the wavelet and Sobel operator based methods is that these other methods detect only edges of an image. On the other hand, the median filter does not smooth out the textured parts of an image, as is well-known by those skilled in the art. This is an advantage over the prior art because it can be used as an important clue for smoke detection. Blurred textured regions in the video may be due to smoke.
The invention not only detects smoke colored moving regions in video but also analyzes the motion of such regions for flicker estimation. Proposed method for smoke detection is based on comparing the nonlinearly filtered current image with an nonlinearly estimated background image. Smoke gradually smoothens sharp transitions in an image when it is not that thick to cover the scene. This feature of smoke is a good indicator of its presence in the field of view of the camera. Sharp transition and textured regions in an image frame produces high amplitude regions in a nonlinearly filtered image. Here is an overview of the nonlinear image analysis method.
The nonlinear filtering of a signal or an image or a video frame consists of processing discrete coefficients (pixels). In discrete nonlinear filtering structure shown in
x
h(n) xo(n)−median[xe(n), xe(n−1),xe(n+1)] (1)
The median operation simply determines the middle value of xe(n), xe(n−1), xe(n+1) and does not require any multiplications. If the signal is smooth the median value will be close to xo(n) and xh(n) will be very close to zero. However, if there is a transition in the processed row of the image (e.g. xen) and xe(n+1) are significantly different from xe(n−1)) then the median value will be either xe(n) or xe(n+1) and xh(n) will be significantly different from zero. Therefore a high valued xh(n) indicates that there is a change in the value of the original signal x around the index 2n. In Eq. 1 the median filter is implemented using three samples but it can be implemented using four or more samples as well.
The output of the nonlinear filtering structure shown in
In this invention it is assumed that each image of the video is represented in median filter domain as described above. Other video formats have to be converted to raw data format first, and then converted to the nonlinear median transform representation.
Each image of a color video consists of three matrices corresponding to three color components: red, green, and blue, or widely used luminance (Y) and two color difference or chrominance (U and V fields) components. The method and the system can handle other color representation formats as well. A nonlinear median transform (NMT) can be computed separately for each color component, as shown in
NMT coefficients contain spatial information about the original image. For example, the (n,m)-th coefficient of the sub-image H11 (or other sub-images H21, H31, L11) of the current image I is related with a two pixel by two pixel region in the original image pixel I(k,l), k=2n,2n−1, l=2m,2m−l because of the sub-sampling operation during the nonlinear median transform computation. In general, a change in the p-th level transform coefficient corresponds to a 2p by 2p region in the original image frame. If there is a significantly large value in the (n,m)-th coefficient of the HL1 (LH1) sub-image then this means that there is a significant vertical (horizontal) change around the (k,l)-th pixel of the original image. In other words, this means that there is an object boundary going through the (k,m)-th pixel of the original image or there is a textured object around the (k,m)-th pixel of the image.
In the present invention, a median filter based method known in the art is lased for background image estimation (see e.g. the public domain document: I. Haritaoglu, D. Harwood, L. S. Davis, “W4S: Real-time surveillance of people and their activities,” IEEE Trans. Pattern Anal. Mach. Intell., 2000). Other background estimation methods described in “Algorithms for cooperative multisensor surveillance” by R T Collins, A J Lipton, H Fujiyoshi, T Kanade, published in Proceedings of the IEEE, 2001 can also be used to estimate a background image.
The main assumption of the above methods is that the camera capturing the image frames should be stationary. Once moving regions are estimated by this known method, a nonlinear median transform based image analysis method is implemented to discriminate between smoke and other regular moving regions. When there is smoke in some parts of the image then the smoke obstructs the texture and edges in the background. Since the edges and texture contribute to high amplitude values in H11, H21 and H33 sub-images, energies of these sub-images drop due to smoke in an image sequence. It is also possible to determine the location of smoke using the sub-images, because they also contain spatial information as described above. In the Grech-Cini reference, edges are determined using the image space domain Sobel edge filter. The NMT domain analysis of the present invention is computationaly faster than Grech-Cini's image space domain because nonlinear median transformed images are smaller in size than the actual image and they can be computed without performing any multiplications.
Let
w
n(x,y)=|Hn(x,y)|+|H2n(x,y)|+|H3n(x,y) (2)
represent a composite image containing median difference sub-images corresponding to n-th level nonlinear median transform. In Eq. 1 we construct an “l1-norm” based energy function which also does not require any multiplications. This image is divided into small blocks of size (K1, K2) and the energy of each block e(I1,I2) is computed as follows
e(l1,l2)=Σ(x,y)wn(x+l1K1,y+l2K2) (3)
This is shown in
The above local energy values computed for the NMT of the current image are compared to the corresponding NMT of the background image which contains information about the past state of the scene under observation. If there is a decrease in value of a certain e(l1, l2) then this means that the texture or edges of the scene monitored by the camera no longer appear as sharp as they used to be in the current image of the video. Therefore, there may be smoke in the image region corresponding to the (l1, l2)-th block.
One can set up thresholds for comparison. If a certain e(l1, l2) value drops below the pre-set threshold this may be an indicator of existence of smoke in the region. Let D1 be a decision variable which becomes 1 when the e(l1, l2) value drops below the pre-set threshold in some part of the image frame of the video. Otherwise D1 is zero. One can also define different sensitivity levels to different parts of the image by defining different threshold values for different (l1, l2) indices.
Edges in the current image frame of the video produce high amplitude values in NMT difference sub-images because of the subtraction operation in Eq. 1. If smoke covers one of the edges of the current image then the edge initially becomes less visible and after some time it may disappear from the scene as the smoke gets thick.
Let the NMT coefficient H1n(x,y) be one of the transform coefficients corresponding to the edge covered by the smoke. Initially, its value decreases due to reduced visibility, and in subsequent image frames it becomes either zero or close to zero whenever there is very little visibility due to thick smoke. Therefore locations of the edges of the original image are determined from the high amplitude coefficients of the NM transform of the background image in the system of the invention. Slow fading of a NMT coefficient is an important clue for smoke detection. If the values of a group of NMT coefficients along a curve corresponding to an edge decrease in value in consecutive frames then this means that there is less visibility in the scene. In turn, this may be due to the existence of smoke.
An instantaneous disappearance of a high valued NMT coefficient in the current frame cannot be due to smoke. Such a change corresponds to a moving object and such changes are ignored. One can set up thresholds for comparison. If the value of a high-valued KMT coefficient drops below a preset threshold or drops a pre-determined percentage of its original value this is an indicator of the smoke. Let D2 be a decision variable which becomes 1 when the value of a certain NMT coefficient drops below the preset threshold in some part of the image frame of the video. Otherwise D2 is zero. We can assign fractional values to the decision variable according to the rate of decrease as well (e.g. a 10% decrease may make D2=0.1, a 20% decrease may make D2=0.2 etc). One can also define different sensitivity levels to different parts of the image by defining different threshold or percentage values for different image regions.
Smoke colored regions are detected in low resolution L1 sub-images. This is possible because the L1 family of sub-images contains essentially actual image pixel values. Although there are various types of fires, smoke does not have any color. Therefore, the color difference U and V components of a smoke pixel should be ideally equal to zero. Small threshold values can be put around U and V values to check if a moving region in video has no color. If U and V pixel values are close to zero this is also an indicator of the existence of smoke in the scene. If the color space of the video is Red (R), Green (G), Blue (B) it can be transformed into <Y,U,V> or <Y,Cb,Cr> color spaces (chrominance Cb and Cr values must be ideally equal to 128 for a colorless object).
NMT domain color analysis is computationaly faster than image space domain color analysis because the L1 family of sub-images is smaller in size than the actual image. If a moving region is gray colored then the decision variable D3 may become 1. Otherwise D3 will be equal to 0. Fractional values can be assigned to the decision variable D3, too.
Flicker on the flame boundaries and within flame regions can be used as an indicator for the existence of flames and smoke within the viewing range of the camera. It is known in the art to compute Fast Fourier Transforms (FFT) of temporal object boundary-pixels to detect peaks especially around 10 Hz in Fourier domain (PCT publication number WO02/Q69292). An important weakness of Fourier domain methods is that flame flicker is not purely sinusoidal but random. Consequently, peaks cannot be detected with precision in FFT plots. In order to overcome this deficiency the present invention uses a different approach, which is to model the flame flicker process using Markov models. Smoke does not flicker as much as flames but it has a turbulent behavior related with flame flicker. Therefore, a Markov model based stochastic approach is ideal to represent the smoke motion in video.
In the prior art shapes of fire regions have been represented in Fourier domain. Fourier Transform does not carry any time (space) information. In order to make FFTs also carry time information, they have to be computed in windows of data. Hence, temporal window size is very important for detection. If the window size is too long, then one may not observe the incidence of peaks in the FFT data. If it is too short, then one may completely miss cycles and therefore no peaks can be observed in the Fourier domain.
A smoke behavior process is modeled with three state hidden Markov models as shown in
In the system according to the invention, candidate smoke regions are detected by color (brightness) analysis in the L1 sub-band images captured by a visible range camera. Twenty-frames-long state sequences of each of the pixels in these candidate regions are determined by the Markov model analysis described above. The model yielding higher probability is determined as the result of the analysis for each of the candidate pixels. Probability of a Markov model can also be computed without performing any multiplication (see the book Fundamentals of Speech Recognition by L R Rabiner, B H Juang, 1993, Prentice-Hall). If probability of model A is higher than the probability of model B for a given pixel then the decision variable D4 is set to 1. Otherwise the decision variable is D4=0.
Decision Fusion
Decision variables, D1, D2, D3 and D4 obtained via the NMT based analysis of a video signal are fused to reach a final decision. Multi-sensor data fusion methods include decision fusion based on voting, Bayesian inference, and Dampster-Shafer methods. We can use these multi-sensor decision fusion methods to combine the decision results. In this section, we describe two methods, a voting based decision fusion strategy and an LMS (least mean square) based decision fusion strategy. However, other data fusion methods can be also used to combine the decision of individual sensors.
Voting schemes include unanimity voting, majority voting, m-out-of-n voting in which an output choice is accepted if at least m votes agree out of the decisions of n sensors. A variant of m-out-of-n voting is the so-called t-out-of-V voting in which the output is accepted if
H=Σw
i
D
i
>T (4)
where wi's are the user-defined weights, Di's are the decisions of the sensors, and T is a user-defined threshold. Decision parameters of the sensors Di can take binary values, 0 and 1 corresponding to normal case and the existence of fire, respectively. Each Di can also take any real value between 0 and 1, if there is an associated model for the i-th decision variable.
With the use of binary decision variables it is possible to have a smoke detection scheme without requiring any multiplications because the NMT transform, the Markov model probability computation and the decision fusion step do not require any multiplications. This is an important advantage in FPGA implementation because multiplication units occupy a huge area in the FPGA preventing a low-cost solution.
In the LMS method, let the final decision is composed of N-many decision functions: D1, . . . , DN corresponding to different sensors. Upon receiving a sample input x, at time step n, each sensor yields a decision Di(x,n) which takes real values from the range [0,1]. As the value gets closer to 1, the decision is fire and as it gets closer to 0 it corresponds normal case. The type of sample input x may vary depending on the algorithm. In our case, each incoming image frame is considered as a sample input.
In the adaptive decision fusion scheme of the invention, weights are updated according to the LMS algorithm which is the most widely used adaptive filtering method. Another innovation that we introduced is that individual decision algorithms do not produce binary values 1 (correct) or 0 (false). They produce a real number between 1 and 0, i.e., Di(x,n) takes real values in the range [0,1].
Let D(x,n)=[D1(x,n) . . . DN(x,n)]T, be the vector of decisions of the sensors for the input image frame x at time step n. The weight adaptation equation is as follows:
where w (n) =[w1(n) . . . wN(n)], is the current weight vector. The adaptive algorithm converges, if Di(x,n) are wide-sense stationary random processes and when the update parameter μ lies between 0 and 2. The computational cost can be reduced by omitting the normalization norm ∥D(x,n)∥2 and by selecting a μ close to zero.
The weights are unconditionally updated using LMS adaptation in Eq (5). The error e(x,n) is estimated as follows:
where y(x,n)ε{−1,1} is user's classification result.
The laser anticipate actively in the learning process by disclosing her/his classification result, y(x,n), on the input image frame x at time step n.
The decision fusion method as well as the other methods such as wavelet transform computation, wavelet domain energy calculations, hidden Markov model computations etc., described herein, are preferably implemented using program instructions (software, firmware, etc.) that can be executed by a computer system and are stored on a computer readable medium, such as memory, hard drive, optical disk. (CD-ROM, DVD-ROM, etc.), magnetic disk, etc.
Alternatively, these methods can be implemented in hardware (logic gates, Field Programmable Gate Arrays, etc.) or a combination of hardware and software.
While the invention has been described in terms of preferred embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/21486 | 1/17/2011 | WO | 00 | 7/15/2013 |
Number | Date | Country | |
---|---|---|---|
61295686 | Jan 2010 | US |