This application claims the priority of Korean Patent Application No. 2003-81885, filed on Nov. 18, 2003 in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
1. Field of the Invention
The present relates to object detection, and more particularly, a person detecting apparatus and method of accurately and speedily detecting the presence of a person from an input image and a privacy protection system protecting personal privacy by displaying a mosaicked image of a detected person's face.
2. Description of the Related Art
As modern society becomes more complex and crime becomes more sophisticated, society's interest in protection is increasing and more and more public facilities are being equipped with a large number of security cameras. Since it is difficult to manually control a large number of security cameras, an automatic control system has been developed.
Several face detection apparatuses for detecting a person have been developed. In most of the face detection apparatuses, the motion of an object is detected by using a difference image between a background image stored in advance and an input image. Alternatively, a person is detected by using only shape information about the person, indoors or outdoors. The method using the difference of an image between the input image and the background image is effective when the camera is fixed. However, if the camera is attached to a moving robot, the background image continuously changes. Therefore, the method using the difference of the image is not effective. On the other hand, in the method using the shape information, a large number of model images must be prepared, and an input image must be compared with all the model images in order to detect the person. Thus, the method using the shape information is overly time-consuming.
Today, since too many security cameras are installed, there is a problem in that personal privacy may be invaded. Therefore, there has been a demand for a system for storing detected persons and rapidly searching a person while protecting personal privacy.
According to an aspect of the present invention, there is provided a person detecting apparatus and method of accurately and speedily detecting the presence of a person from an input image by using motion information and shape information of an input image.
According to another aspect of the present invention, there is also provided a privacy protection system protecting a right to a personal portrait by displaying a mosaicked image of a detected person's face.
According to an aspect of the present invention, there is provided a person detection apparatus including: a motion region detection unit, which detects a motion region from a current frame image by using motion information between frames; and a person detecting/tracking unit, which detects a person in the detected motion region by using shape information of persons, and performs a tracking process on a motion region detected as a person in a previous frame image within a predetermined tracking region.
According to another aspect of the present invention, there is provided a person detection method including: detecting a motion region from a current frame image by using motion information between frames; and detecting a person in the detected motion region by using shape information of persons, and performing a tracking process on a motion region detected as a person in a previous frame image within a predetermined tracking region.
According to still another aspect of the present invention, there is provided a privacy protection system including: a motion region detection unit, which detects a motion region from a current frame image by using motion information between frames; a person detecting/tracking unit, which detects a person in the detected motion region by using shape information of persons, and performs a tracking process on a motion region detected as a person in a previous frame image within a predetermined tracking region; a mosaicking unit, which detects the face in the motion region, which is determined to correspond to the person, performs a mosaicking process on the detected face, and displays the mosaicked face; and a storage unit, which stores the motion region, which is detected or tracked as a person, and stores predetermined labels and position information used for searching frame units.
Additional aspects and/or advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
These and/or other aspects and advantages of the invention will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below to explain the present invention by referring to the figures.
In the image input unit 110, an image picked up by a camera is input in units of a frame.
The motion region detection unit 120 detects a background image by using motion information between a current frame image and a previous frame image transmitted from the image input unit 110, and detects at least one motion region from a difference image between the current frame image and the background image. Here, the background image is a motionless image, that is, an image where there is not a motion.
The person detecting/tracking unit 130 detects a person candidate region from the motion regions provided from the motion region detection unit 120 and determines whether the person candidate region corresponds to a person. On the other hand, a motion region in the current frame image which is determined to correspond to the person is not subjected to a general detection process for the next frame image. A tracking region is allocated to the motion region, and a tracking process is performed on the tracking region.
The first storage unit 140 stores the motion regions, each of which is determined to correspond to a person in the person detecting/tracking unit 130, their labels, and their position information. The motion regions are stored in units of a frame. The first storage unit 140 provides the motion region, their labels, and their position information to the person detecting/tracking unit 130 in response to the input of the next frame image.
The mosaicking unit 150 detects a face from the motion region which is determined to correspond to the person in the person detecting/tracking unit 130, performs a well-known mosaicking process on the detected face, and provides the mosaicked face to the display unit 160. In general, there are various methods of detecting a face from a motion region. For example, a face detection method using a Gabor filter or a support vector machine (SVM) may be used. The face detection method using the Gabor filter is disclosed in an article, entitled “Face Recognition Using Principal Component Analysis of Gabor Filter Responses” by Ki-chung Chung, Seok-Cheol Kee, and Sang-Ryong Kim, International Workshop on Recognition, Analysis and Tracking of Faces and Gestures in Real-Time Systems, Sep. 26-27, 1999, Corfu, Greece. The face detection method using the SVM is disclosed in an article, entitled “Training Support Vector Machines: an application to face detection” by E. Osuna, R. Freund, and F. Girosi, In Proc. of CVPR, Puerto Rico, pp. 130-136,1997.
In response to a user's request, the searching unit 170 searches the motion regions determined to correspond to a person stored in the first storage unit 140.
Referring to
The average accumulated image generation unit 230 obtains an average image between the black-and-white image of the current frame image and the previous frame image stored in the second storage unit 220, adds the average image to the average accumulated image from the previous frame to generate the average accumulated image for the current frame. In the average accumulated image for a predetermined number of frames, a region where the same pixel values are added is determined to be a motionless region, and a region where different pixel values are added is determined to be a motion region. More specifically, the motion region is determined by using a difference between a newly added pixel value and the previous average accumulated pixel value.
In the background image detection unit 240, a region where the same pixel values are continuously added to the average accumulated image for the predetermined frames, that is, a region where the pixel values do not change, is detected as a background image in the current frame. The background image is updated every frame. If the number of frames for use in detecting the background image increases, the accuracy of the background image increases. An example of the background image in the current frame is shown in
The difference image generation unit 250 obtains a difference between pixel values of the background image in the current frame and the current frame image in units of a pixel. A difference image is constructed with pixels where the difference between the pixel values is more than a predetermined threshold value. The difference image represents all moving objects. On the other hand, if the predetermined threshold value is small, a small-motion region may be not discarded but used to detect a person candidate region.
As shown in
In the normalization unit 410, information on the sizes and weight centers of the motion regions is input, and each of the sizes of the motion regions are normalized into a predetermined size. The normalized vertical length of the motion region is longer than the normalized horizontal length of the motion region. Referring to
The size/weight center changing unit 430 changes the sizes and weight centers of the normalized motion regions. For example, in a case where the sizes of the motion regions are scaled into s steps and the weight centers are shifted in t directions, the sxt modified shapes of the motion regions can be obtained. Here, the sizes of the motion regions change in accordance with the normalized lengths xnorm and ynorm of the to-be-changed motion regions. For example, the sizes can increase or decrease by a predetermined number of pixels, for example, 5 pixels, in the up, down, left, and right directions. The weight center can be shifted in the up, down, left, right, and diagonal directions, and the changeable range of the weight center is determined based on the distance x from the weight center ycm to the start point ysp in the y axis. By changing the sizes and weight centers, it is possible to prevent an upper or lower half of the person body from being excluded when some portion of the person body moves.
The candidate region detection unit 450 normalizes the motion regions having sxt modified shapes in units of predetermined pixels, for example, 30×40-pixels, and detects a person candidate region from the motion regions. A Mahalanobis distance map D can be used to detect the person candidate regions from the motion regions. The Mahalanobis distance map D is described with reference to
Here, p and q denote pixel numbers in the horizontal and vertical directions of a block l, respectively. Xl denotes total blocks, and x denotes a pixel value in a block l.
The variance of pixel values of the blocks is represented by Equation 2.
A Mahalanobis distance d(i, j) of each of the blocks is calculated by using the average and variance of pixel values of the blocks, as shown in Equations 3. The Mahalanobis distance map D is calculated using the Mahalanobis distances d(i,j), as shown in Equation 4. Referring to
Here, M and N denote partition numbers of the normalized motion region 610 in the horizontal and vertical directions, respectively. When the normalized motion region 610 is portioned by 6 (horizontal) and 8 (vertical), the Mahalanobis distance map D is represented by a 48×48 matrix.
As described above, the Mahalanobis distance map is constructed for sxt modified shapes of the motion regions, respectively. Next, the dimension of the Mahalanobis distance map (matrix) may be reduced using a principal component analysis. Next, it is determined whether or not the sxt modified shapes of the motion regions belong to the person candidate region using the SVM trained in an eigenface space. If at least one of sxt modified shapes belongs to the person candidate region, the associated motion region is detected as a person candidate region.
Returning to
The edge image generation unit 710 detects edges from the person candidate regions out of the normalized motion regions shown in
The model image storage unit 730 stores an edge image of at least one model image. Preferably, but not necessarily, the edge image of the model image includes an edge image of a long distance model image and an edge image of a short distance model image. For example, as shown in
The Hausdorff distance calculation unit 750 calculates a Hausdorff distance between an edge image A generated by the edge image generation unit 710 and an edge image B of a model image stored in the model image storage unit 730 to evaluate similarity between both images. Here, the Hausdorff distance may be represented with Euclidian distances between one specific point, that is, one edge of the edge image A, and all the specific points, that is, all the edges, of the edge image B of the model image. In a case where an edge image A has m edges and an edge image B of the model image has n edges, the Hausdorff distance H(A, B) is represented by Equation 5.
More specifically, the Hausdorff distance H(A, B) is obtained, as follows, Firstly, h(A, B) is obtained by selecting minimum values out of distances between each of edges of the edge image A and all the edges of the model images B and selecting a maximum value out of the minimum values for the m edges of the edge image A. Similarly, h(B, A) is obtained by selecting minimum values out of distances between each of edges of the model image B and all the edges of the edge images A and selecting a maximum value out of the minimum values for the n edges of the model image B. The Hausdorff distance H(A, B) is a maximum value out of h(A, B) and h(B, A). By analyzing the Hausdorff distance H(A, B), it is possible to evaluate the mismatching between the two images A and B. With respect to the input edge image A, the Hausdorff distances for the entire model images such as an edge image of a long distance model image and an edge image of a short distance model image stored in the model image storage unit 730 are calculated, and a maximum of the Hausdorff distances is output as a final Hausdorff distance.
The determination unit 770 compares the Hausdorff distance H(A, B) between the input edge image and the edge image of model images calculated by the Hausdorff distance calculation unit 750 with a predetermined threshold value. If the Hausdorff distance H(A, B) is equal to or more than the threshold value, the person candidate region is detected as a non-person image. Otherwise, the person candidate region is detected as a person region.
The invention can also be embodied as computer-readable codes stored on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data which can thereafter be read by a computer. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission over the Internet). The computer-readable recording medium can also be distributed over network of coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Functional programs, codes, and code segments for accomplishing the present invention can be easily written by computer programmers of ordinary skill.
As described above, according to an aspect of the present invention, a plurality of person candidate regions are detected from an image picked up by a camera indoor or outdoor using motion information between the frames. Thereafter, by determining whether or not each of the person candidate regions corresponds to a person based on shape information of persons, it is possible to speedily and accurately detect a plurality of persons in one frame image. In addition, a person detected in the previous frame is not subjected to an additional detecting process in the current frame but directly to a tracking process. For the tracking process, a predetermined tracking region including the detected person is allocated in advance. Therefore, it is possible to save processing time associated with person detection.
In addition, frame numbers and labels of motion regions where a person is detected can be stored and searched, and a face of a detected person is subjected to a mosaicking process before displayed. Therefore, it is possible to protect the privacy of the person.
In addition, a privacy protection system according to an aspect of the present invention can be adapted to broadcast and image communication as well as an intelligent security surveillance system in order to protect the privacy of a person.
Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2003-0081885 | Nov 2003 | KR | national |