This application claims the priority of Korean Patent Application No. 2003-85828, filed on Nov. 28, 2003, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present relates to object detection, and, more particularly, to a multiple person detection apparatus and a method of accurately and speedily detecting the presence of a person from an input image.
2. Description of the Related Art
As modern society becomes more complex and crime becomes more sophisticated, society's interest in protection is increasing and more and more public facilities are being equipped with a large number of security cameras. Since manually controlling a large number of security cameras is difficult, an automatic control system has been developed. In addition, recently, robots are being used for work in dangerous places or in the home instead of people. While at present, the operation of most robots is to repeat simple operations, in order to work intelligently, there must be good communication between robots and people. In order to enable such communication, robots must be able to accurately detect a person and operate in accordance with the person's commands.
Several face detection apparatuses to detect a person have been developed. In most of the face detection apparatuses, the motion of an object is detected by using a difference image that is between a background image stored in advance and an input image. Alternatively, a person is detected by using only shape information about the person, indoors or outdoors. The method using the difference image that is between the input image and the background image is effective when the camera is fixed. However, if the camera is attached to a moving robot, the background image continuously changes. Therefore, the method using the difference image is not effective. On the other hand, in the method using the shape information, a large number of model images must be prepared, and an input image must be compared with all the model images in order to detect the person. Thus, the method using the shape information is overly time-consuming.
The present invention provides a multiple person detection apparatus and method of accurately and speedily detecting the presence of a person by using skin color information and shape information from an input image.
According to an aspect of the present invention, a multiple person detection apparatus comprises a skin color detection unit, which detects at least one skin color region from a picked-up frame image by using skin color information; a candidate region determination unit, which determines whether or not the skin color region belongs to a person candidate region; and a person determination unit, which determines whether or not the skin color region belonging to the person candidate region corresponds to a person by using person shape information.
According to another aspect of the present invention, a multiple person detection method comprises detecting at least one skin color region from a picked-up frame image by using skin color information; determining whether or not the skin color region belongs to a person candidate region; and determining whether or not the skin color region belonging to the person candidate region corresponds to a person by using person shape information.
According to still another aspect of the present invention, a computer-readable recording medium stores a program to execute the multiple person detection method.
Additional and/or other aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
These and/or other aspects and advantages of the invention will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below to explain the present invention by referring to the figures.
The skin color detection unit 110 detects a skin color region from an input image that is transmitted from a moving or fixed camera. A color range is set in advance to cover human skin colors. In the skin color detection unit 110, skin color regions including colors that are similar to human skin color, that is, colors belonging to the color range are detected from the input image. The skin color detection unit 110 labels the skin color regions and calculates a size and a weight center of each of the labeled skin color regions.
In response to the calculation of the sizes and weight centers of the skin color regions, the size normalization unit 130 normalizes the skin color regions with a predetermined size. This normalization will be described later with reference to
The candidate region determination unit 150 then determines whether each of the skin color regions that are provided from the size normalization unit 130 corresponds to a person candidate region. A skin color region that does not correspond to the person candidate region is detected as background. A skin color region that corresponds to the person candidate region is provided to the person determination unit 170.
The person determination unit 170 determines whether or not each of the person candidate regions that are provided from the candidate region determination unit 150 corresponds to a person. A person candidate region corresponding to a person is detected as a person. A person candidate region not corresponding to a person is detected as background.
Referring to
The color normalization unit 230 color-normalizes the equalized image in units of a pixel to reduce the influence of illumination on pixels of the equalized image. Color normalization is performed as follows. Firstly, an RGB color space of pixels of the equalized image are transformed to an rgb color space using Equation 1 so as to generate the color-normalized image shown in
The influence of illumination on the input image is removed by the equalization and color normalization processes. Therefore, the obtained image has colors unique to the object.
The modeling unit 250 produces the modeling-processed image shown in
As a result of the Gaussian modeling process, the skin color region of the modeling-processed image is highlighted, and the other regions are blackened.
In the labeling unit 270, a pixel value of each pixel of the modeling-processed image is compared with a predetermined threshold value, for example, 240. Then, the color “black” is allocated to pixels having pixel values that are below the predetermined threshold value, and the color “white” is allocated to pixels having pixel values that are above the predetermined threshold value. Thus, a kind of binarization is performed. Consequently, at least one skin color region is extracted. Next, a labeling process is performed to allocate labels to the extracted skin color regions. In an embodiment of the invention, the labeling process is performed in accordance with sizes of the skin color regions. Next, the size and the coordinates of the weight center 310 of each of the labeled skin color regions are output. Each of the sizes of the labeled skin color region is represented by start and end points along x and y axes. The coordinates of the weight center 310 are calculated from the sum of pixel values of pixels of the labeled skin color region and the sum of coordinates pixels of the labeled skin color region.
In response to the 30×40-pixel normalized image for the skin color regions provided from the size normalization unit 130 and the sizes and weight centers of the skin color regions, the distance map generation unit 510 generates a Mahalanobis distance map D to determine whether the skin color regions belong to person candidate regions. The Mahalanobis distance map D is described with reference to
Here, p and q denote pixel numbers in the horizontal and vertical directions of a block, respectively. X denotes total blocks, and x denotes a pixel value in a block.
The variance of pixel values of the blocks is represented by Equation 4.
A Mahalanobis distance d(i, j) of each of the blocks is calculated by using the average and variance of pixel values of the blocks, as shown in Equation 5. The Mahalanobis distance map D is calculated by using the Mahalanobis distances d(i, j), as shown in Equation 6. Referring to
Here, M and N denote partition numbers of the image 610 in the horizontal and vertical directions, respectively. When the image 610 is partitioned into 6 (horizontal) by 8 (vertical) blocks, the Mahalanobis distance map D is represented by a MN×MN matrix, as an example, a 48×48 matrix.
The dimension of the Mahalanobis distance map (matrix) may be reduced by using a principal component analysis.
First, the first determination unit 550 compares the Mahalanobis distance map provided from the distance map generation unit 510 with a Mahalanobis distance map stored in the person/background image database 530. As described above, the Mahalanobis distance map provided from the distance map generation unit 510 is obtained from normalized skin color regions. On the other hand, the Mahalanobis distance map stored in the person/background image database 530 is obtained by a preparatory training method. The first determination unit 550 determines whether each of the normalized skin color regions belongs to a person candidate region based on the result of the Mahalanobis distance map comparison. If each of the normalized skin color regions does not belong to the person candidate region, a normalized skin color region is detected as background region. The person/background image database 530 and the first determination unit 550 are implemented by using a support vector machine (SVM) that is trained in advance to recognize thousands of person and background image models. The skin color regions determined to be person candidate regions by the first determination unit 550 are provided to the person determination unit 170.
The edge image generation unit 710 detects edges from the person candidate regions out of the normalized skin color regions shown in
The model image storage unit 730 stores at least one edge image of a model image. In an embodiment of the invention, the edge images of the model image include a front edge image showing the front of a person, a left edge image showing the same person facing a predetermined angle to the left, and a right edge image showing the same person facing a predetermined angle to the right. As an example, as shown in
The Hausdorff distance calculation unit 750 calculates a Hausdorff distance between an edge image A generated by the edge image generation unit 710 and an edge image B of a model image stored in the model image storage unit 730 to evaluate similarity between both images. Here, the Hausdorff distance may be represented with Euclidian distances between one specific point, that is, one edge of the edge image A, and all the specific points, that is, all the edges, of the edge image B of the model image. In a case where an edge image A has m edges and an edge image B of the model image has n edges, the Hausdorff distance H(A, B) is represented by Equation 7.
More specifically, the Hausdorff distance H(A, B) is obtained as follows. Firstly, h(A, B) is obtained by selecting minimum values of distances between each edge of the edge image A and all edges of the edge image B of the model images, and selecting a maximum value from among the minimum values for the m edges of the edge image A. Similarly, h(B, A) is obtained by selecting minimum values of distances between each edge of the edge image B of the model image and all edges of the edge image A, and selecting a maximum value from among the minimum values for the n edges of the edge image B of the model image. The Hausdorff distance H(A, B) is the larger of h(A, B) and h(B, A). By analyzing the Hausdorff distance H(A, B), evaluating a mismatch between the two images A and B is possible. With respect to the input edge image A, the Hausdorff distances for the entire model images stored in the model image storage unit 730 are calculated, and the largest of the Hausdorff distances is output as a final Hausdorff distance.
The second determination unit 770 compares the Hausdorff distance H(A, B) between the input edge image and the edge image of model images calculated by the Hausdorff distance calculation unit 750 with a predetermined threshold value. If the Hausdorff distance H(A, B) is equal to or greater than the threshold value, the person candidate region (skin color region) is detected as a background region. Otherwise, the person candidate region (skin color region) is detected as a person region.
In operation 911, at least one skin color region is detected from a single frame image picked-up by a camera by using predetermined skin color information. In advance of detecting the skin color regions, a color normalization process is performed on the entire frame image and the pixels of the frame image in order to reduce the effects of illumination on the frame image. On the other hand, a Gaussian modeling process is performed on the frame image to highlight pixels having colors similar to skin color, and then, skin color regions including pixels having pixel values above a predetermined threshold value are detected.
In operation 913, the skin color regions detected in operation 911 are labeled and sizes and centers of weight of the labeled skin color regions are generated. The skin color regions are normalized with a predetermined size by using the sizes and centers of weight of the skin color regions.
In operation 915, a first skin color region is selected from at least one detected skin color region.
In operations 917 and 919, whether the selected skin color region belongs to a person candidate region is determined using the Mahalanobis distance map D and the SVM that are shown in
In operations 925 and 927, if the current skin color region belongs to the person candidate region, whether the current skin color region corresponds to a person is determined. If the current skin color region corresponds to a person, the current skin color region is detected as a person in operation 929. If the current skin color region does not correspond to a person, the current skin color region is detected as background in operation 931,
As is described above, a multiple person detection method and apparatus according to the present invention may be adapted to be used in security surveillance systems, broadcast and image communications, speech recognition robots, and as an intelligent interface with household electronic appliances. As an example, a robot may be controlled to turn toward a detected person, or the direction and/or strength of an air-conditioner may be controlled so that air is blown toward a detected person.
The invention may also be embodied as computer-readable codes stored on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data that may thereafter be read by a computer. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission over the Internet). The computer-readable recording medium may also be distributed over network of coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Computer programmers having ordinary skill in the art, may relatively easily write operational programs, codes, and code segments to accomplish the present invention.
As is described above, according to the present invention, a plurality of person candidate regions are detected from an image picked up by a camera indoors or outdoors by using skin color information. Next, by determining whether or not the person candidate region corresponds to a person based on person shape information, it is possible to speedily and accurately detect a plurality of persons in one frame image. In addition, in a multiple person detection method and apparatus according to the present invention, it is possible to accurately detect a person even if the person's pose and/or illumination conditions change.
Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2003-0085828 | Nov 2003 | KR | national |