1. Field of the Invention
The present invention relates to an image processing method and, more particularly, to a motion detection method for extracting information regarding moving objects from video streams.
2. Description of the Related Art
Motion detection is a key technique used in automatic video surveillance systems to extract information regarding moving objects from video streams. Motion detection methods can be divided into three major categories: temporal difference, optical flow, and background subtraction methods. Temporal difference method is easy to implement, and calculated amount is little; however, it very often generates holes inside moving objects and inevitably extracts incomplete shapes of moving objects. Optical flow method can extract complete shapes of moving objects and can detect moving objects when the camera is moving; however, it is not suitable for real-time applications due to large calculated amount and sensitive to noise. Background subtraction method is easy to implement and can extract reasonable information regarding moving objects with moderate calculated amount by using background models under a relatively static background; however, it is extremely sensitive to luminance changes. Therefore, background subtraction methods have been popularly used in motion detection applications, and various types of background subtraction methods such as Gaussian mixtures model (GMM), sigma difference estimation (SDE), multiple SDE (MSDE), multiple temporal difference (MTD), and simple statistical difference (SSD) methods have been developed.
With recent advances in video communication technology, wireless communication has become more viable for motion detection applications as a way to enhance measurement capabilities in a wide range of detection of moving objects. Unfortunately, wireless communication is especially prone to network congestion and server crashes due to the bandwidth constraints of real-world networks. In response, a video rate control technique has been introduced in video streams such as H.264/AVC video format which supports variable-bit-rate (VBR) encoding to adapt real-world network conditions. The aforementioned conventional background subtraction methods can detect moving objects in video streams which have fixed bit rates. In such an ideal, stable environment, moving objects are easily distinguished by the background models of these methods. However, because real-world networks rarely offer an ideal, stable environment, the aforementioned conventional background subtraction methods possibly misinterpret moving objects when the bit rate suddenly changes, and effective detection of moving objects in VBR video streams is a very difficult problem for these methods.
For example, referring to
The present invention is adapted to providing a motion detection method, which can correctly interpret moving objects in VBR video streams to implement more accurate and complete motion detection, and can eliminate the influence of luminance changes.
According to an aspect of the present invention, there is provided a motion detection method based on grey relational analysis. The motion detection method includes the following steps: S1) receiving a video frame including a plurality of input pixels; S2) establishing a multi-quality background model; S3) detecting moving objects; S4) detecting luminance changes of the video frame; S5) examining false detections of the binary motion mask.
Moreover, the step of S2 includes: S21) calculating a Euclidean distance between a pixel value of each input pixel and a pixel value of each of a plurality of corresponding candidate background pixels; S22) according to the Euclidean distances, calculating a grey relational coefficient between the pixel value of each input pixel and the pixel value of each of the corresponding candidate background pixels; S23) for each input pixel, determining whether the minimum value of the grey relational coefficients is smaller than or equal to a first threshold value; if yes, determining that a bit rate of the input pixel has been changed, and the input pixel is regarded as a new candidate background pixel accordingly; if no, determining that the bit rate of the input pixel is not changed.
Moreover, the step of S3 includes: S31) dividing the video frame into a plurality of blocks, and for each block, summing up the maximum values of the grey relational coefficients between the pixel value of each input pixel and the pixel values of the corresponding candidate background pixels within the block to generate a grey relational coefficient sum regarding the block; S32) determining whether the grey relational coefficient sum is larger than or equal to a second threshold value; if yes, determining that the block is a background block; if no, determining that the block is a motion block; S33) for each input pixel within each motion block, determining whether the maximum value of the grey relational coefficients between the pixel value of each input pixel and the pixel values of the corresponding candidate background pixels is larger than or equal to a third threshold value; if yes, determining that the input pixel is a background pixel; if no, determining that the input pixel is a motion pixel; S34) generating a binary motion mask.
Moreover, the step of S4 includes: S41) calculating an entropy of the grey relational coefficient sum of each block, and summing up the entropies of the grey relational coefficient sums of the blocks to generate an entropy sum regarding the video frame; S42) determining whether a difference between the entropy sum of the video frame and an entropy sum of a previous video frame is larger than or equal to a fourth threshold value; if yes, determining that the video frame has luminance changes, and updating a candidate background pixel corresponding to each input pixel according to the video frame; if no, determining that the video frame has no luminance changes.
Moreover, the step of S5 includes: S51) providing a setting interface for a user to set a detection sensitivity; S52) dividing a total number of the motion pixels and the background pixels of the binary motion mask by an area of the binary motion mask to generate an examination value; S53) determining whether the examination value is larger than a product of a fifth threshold value and the detection sensitivity, wherein a prediction number of true positive pixels divided by the area of the binary motion mask is the fifth threshold value; if yes, determining that there are false detections; if no, determining that there are no false detections.
The motion detection method determines bit-rate changes of the input pixels by the grey relational analysis technique to establish the multi-quality background model, and therefore can correctly interpret moving objects in VBR video streams. The motion detection method further detects moving objects by two-stage detection procedures (i.e. block-based and pixel-based detection procedures) to generate the binary motion mask, and therefore can implement more accurate and complete motion detection. The motion detection method further detects luminance changes of the video frame by entropy calculation to timely update the background model, and therefore can eliminate the influence of luminance changes. The motion detection method further provides the setting interface for the user to set the detection sensitivity with higher values meaning higher detection accuracy, and examines false detections of the binary motion mask; therefore, it can increase the detection accuracy and decrease false detections.
The present invention will be described in further detail below under reference to the accompanying drawings, in which:
Referring to
In a step S2, the motion detection method determines bit-rate changes of the input pixels by the grey relational analysis technique to establish a multi-quality background model. In the embodiment, referring to
Δ(k)=∥pt(x,y)−B(x,y)k∥
where k is an integer ranging from 1 to M, and M is the number of corresponding candidate background pixels. Because the motion detection method is implemented in the YCbCr color space, each candidate background pixel B(x,y)k has a corresponding color or pixel value represented by three variables: a luminance value (Y), a blue-difference chroma value (Cb), and a red-difference chroma value (Cr). The longer Euclidean distance Δ(k) indicates the larger difference between the input pixel pt(x,y) and the candidate background pixel B(x,y)k, whereas the shorter Euclidean distance Δ(k) indicates the smaller difference between the input pixel pt(x,y) and the candidate background pixel B(x,y)k.
In a step S22, the method calculates, according to the Euclidean distances Δ(1) to Δ(M), a grey relational coefficient γ between the pixel value of each input pixel pt(x,y) and the pixel value of each of the corresponding candidate background pixels B(x,y)1 to B(x,y)M. The grey relational coefficient γ can be expressed for each input pixel pt(x,y) as follows:
where k is an integer ranging from 1 to M, Δmin represents the minimum difference, and Δmax represents the maximum difference. Because each of the luminance value (Y), the blue-difference chroma value (Cb), and the red-difference chroma value (Cr) is represented by 8 bits with values of 0 to 255, Δmin and Δmax can be set to 0 and 255, respectively. In addition, ξ represents a distinguishing coefficient ranging from 0 to 1, and in the embodiment, ξ is set to 0.2. The smaller grey relational coefficient γ(pt(x,y), B(x,y)k) indicates the lower correlation (i.e. the larger difference) between the input pixel pt(x,y) and the candidate background pixel B(x,y)k, whereas the larger grey relational coefficient γ(pt(x,y), B(x,y)k) indicates the higher correlation (i.e. the smaller difference) between the input pixel pt(x,y) and the candidate background pixel B(x,y)k.
In a step S23, the method determines, for each input pixel pt(x,y), whether the minimum value of the grey relational coefficients γ(pt(x,y), B(x,y)1) to γ(pt(x,y), B(x,y)M) is smaller than or equal to a first threshold value ε. If yes, it indicates that there is substantially no correlation between the input pixel pt(x,y) and the candidate background pixels B(x,y)1 to B(x,y)M, and the method goes to a step S231 accordingly. In the step S231, the method determines that a bit rate of the input pixel pt(x,y) has been changed, and a bit-rate-change indication Vk of the input pixel pt(x,y) is labelled as ‘1’. Moreover, the input pixel pt(x,y) can be regarded as a new candidate background pixel. If no, it indicates that there is very higher correlation between the input pixel pt(x,y) and some candidate background pixel, and the method goes to a step S232 accordingly. In the step S232, the method determines that the bit rate of the input pixel pt(x,y) is not changed, and the bit-rate-change indication Vk of the input pixel pt(x,y) is labelled as ‘0’. By doing so, the multi-quality background model can be flexibly established in the VBR video stream. The bit-rate-change indication Vk can be expressed for each input pixel pt(x,y) as follows:
where k is an integer ranging from 1 to M, and the first threshold value ε can be empirically set to 0.6.
Referring again to
where k is an integer ranging from 1 to M.
In the step S32, the method determines whether the grey relational coefficient sum S(i,j) is larger than or equal to a second threshold value α. If yes, it indicates that the block M(i,j) is not a motion block, and the method goes to the step S321 accordingly. In the step S321, the method determines that the block M(i,j) is a background block, and a motion block indication R(i,j) of the block M(i,j) is labelled as ‘0’. If no, it indicates that many input pixels pt(x,y) within the block M(i,j) are parts of moving objects, and the method goes to the step S322. In the step S322, the method determines that the block M(i,j) is a motion block, and the motion block indication R(i,j) of the block M(i,j) is labelled as ‘1’. The motion block indication R(i,j) can be expressed for each block M(i,j) as follows:
where the second threshold value α can be experimentally set to 245.
The pixel-based detection procedure includes steps S33, S331, S332, and S34. In the step S33, for each input pixel pt(x,y) within each motion block, the method determines whether the maximum value of the grey relational coefficients γ(pt(x,y), B(x,y)1) to γ(pt(x,y), B(x,y)M) between the pixel value of each input pixel pt(x,y) and the pixel values of the corresponding candidate background pixels B(x,y)1 to B(x,y)M is larger than or equal to a third threshold value β. If yes, it indicates that there is very higher correlation between the input pixel pt(x,y) and some candidate background pixel, and the method goes to the step S331 accordingly. In the step S331, the method determines that the input pixel pt(x,y) is a background pixel. If no, it indicates that there is substantially no correlation between the input pixel pt(x,y) and the candidate background pixels B(x,y)1 to B(x,y)M, and the method goes to the step S332 accordingly. In the step S332, the method determines that the input pixel pt(x,y) is a motion pixel. Next, in the step S34, the method generates a binary motion mask BM which can be expressed as follows:
where k is an integer ranging from 1 to M, and the third threshold value β can be empirically set to 0.6.
Referring again to
In a step S42, the method determines whether a difference between the entropy sum Et of the video frame It and an entropy sum Et-1 of a previous video frame It-1, i.e. |Et−Et-1|, is larger than or equal to a fourth threshold value μ. If yes, it indicates that the video frame It has sudden luminance changes, and the method goes to a step S421. In the step S421, the method determines that the video frame It has luminance changes, and a luminance-change indication Lt is labelled as ‘1’. If no, the method goes to a step S422 to determine that the video frame It has no luminance changes, and the luminance-change indication Lt is labelled as ‘0’. The luminance-change indication Lt can be expressed as follows:
where the fourth threshold value μ can be empirically set to 0.05. Furthermore, in the step S421, because the method determines that the video frame It has luminance changes, it can update a candidate background pixel B(x,y)s corresponding to each input pixel pt(x,y) according to the video frame It as follows:
B(x,y)′s=B(x,y)s+ρ(pt(x,y)−B(x,y)s)
where B(x,y)s represents a candidate background pixel at a position (x,y) which will be updated, B(x,y)′s represents a updated candidate background pixel at the position (x,y), and ρ represents a default parameter. The candidate background pixel B(x,y)s which will be updated can be chosen from a candidate background pixel corresponding to the maximum value of the Euclidean distances Δ(1) to Δ(M). The candidate background pixel B(x,y)s which will be updated can be expressed as follows:
Referring again to
In a step S52, the method divides a total number ndp of the motion pixels and the background pixels of the binary motion mask BM by an area of the binary motion mask BM to generate an examination value EBM. The examination value EBM can be expressed as follows:
where ndp=pp+pn, pp represents the number of the motion pixels of the binary motion mask BM, pn represents the number of the background pixels of the binary motion mask BM, and dimX and dimY represent the width and the height of the binary motion mask BM, respectively.
In a step S53, the method determines whether the examination value EBM is larger than a product of a fifth threshold value δ and the detection sensitivity ds, wherein a prediction number Ω of true positive pixels divided by the area of the binary motion mask BM is the fifth threshold value δ. The fifth threshold value δ can be expressed as follow:
If yes, EBM>δ×ds, and the method goes to a step S531. In the step S531, the method determines that there are false detections in the binary motion mask BM, and a fault alarm indication F is labelled as ‘1’. If no, EBM≦δ×ds, and the method goes to a step S532. In the step S532, the method determines that there are no false detections in the binary motion mask BM, and the fault alarm indication F is labelled as ‘0’. The fault alarm indication F can be expressed as follows:
where the prediction number Ω of true positive pixels can be experimentally set to 30×30.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the present invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
5969766 | Kim | Oct 1999 | A |
20010040924 | Hori | Nov 2001 | A1 |
20020057343 | Ronk | May 2002 | A1 |
20020063702 | Wada | May 2002 | A1 |
20030026479 | Thomas | Feb 2003 | A1 |
20030044071 | Kaneko | Mar 2003 | A1 |
20030086496 | Zhang | May 2003 | A1 |
20100061658 | Yamada | Mar 2010 | A1 |
20120188394 | Park | Jul 2012 | A1 |
20120322551 | Brennan | Dec 2012 | A1 |