The present invention relates to image processing, and particularly to the method and apparatus for processing an image in which human faces will be detected.
A number of techniques are known for detecting areas of interest in an image, such as a face or other identified object of interest. Face detection is an area of particular interest, as face recognition has importance not only for image processing, but also for identification and security purposes, and for human-computer interface purposes. A human-computer interface not only identifies the location of a face, if a face is present, it may also identify the particular face, and may understand facial expressions and gestures.
Many studies on automatic face detection have been reported recently. References for example include “Face Detection and Rotations Estimation using Color Information,” the 5th IEEE International Workshop on Robot and Human Communication, 1996, pp 341-346, and “Face Detection from Color Images Using a Fuzzy Pattern Matching Method,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 21, no. 6, June 1999.
All the conventional methods of detecting human faces have their own advantages as well as shortcomings depending upon different algorithms used for processing images. Some methods are accurate but are complex and time-consuming.
The objective of the present invention is to provide a method and apparatus for processing an image in which human faces will be detected based on the pixels located in the mouth neighborhood, and specifically based on the edge information calculated in relation to the mouth neighborhood.
For the attainment of the above objective, the present invention provides a method of processing an image, characterized by comprising steps of:
classifying said candidate for human face region based on results of said processing step.
The present invention also provides an apparatus for processing an image, characterized by comprising:
a candidate identifier, for identifying one candidate for human face region within said image;
a mouth neighborhood selector, for selecting a mouth neighborhood within said candidate for human face region that has been identified by said candidate identifier;
a mouth neighborhood processor, for processing said mouth neighborhood that has been selected by said mouth neighborhood selector;
a classifier, for classifying said candidate for human face region that has been identified by said candidate identifier based on outputs of said mouth neighborhood processor.
According to the method and apparatus of the present invention, human faces are detected just based on pixels included in the mouth neighborhood, but not all the pixels of the entire face. And the use of edge information in relation to the mouth neighborhood increases the accuracy of detecting human faces.
Additionally, the method of the present invention can be easily combined with various conventional methods of determining candidates for human face regions so as to fit in different situations.
Other features and advantages of the present invention should be apparent from the following description of the preferred embodiments, taken in conjunction with the accompanying drawings, which illustrate, by way of example, the principles of the invention.
The present invention will be described in detail. In the following description, as to how to identify one candidate for human face region within an image, reference can be made to Chinese Patent Application No. 00127067.2, which was filed by the same applicant on Sep. 15, 2000, and made public on Apr. 10, 2002, and Chinese Patent Application No. 01132807.X, which was filed by the same applicant on Sep. 6, 2001. These applications are incorporated here for reference. However, the method of identifying candidates for human face regions disclosed in Chinese Patent Applications No. 00127067.2 and No. 01132807.X constitute no restriction to the present invention. Any conventional method of identifying candidates for human face regions within an image may be utilized in the present invention.
The process begins at step 101. At step 102, an image to be processed is inputted. At step 103, one candidate for human face region is identified within the image inputted at step 102. The size of the candidate for human face region identified at step 103 is denoted as S1. Here, the size of an image is defined as the number of the pixels composing the image.
In steps 102 and 103, any conventional methods of identifying one candidate for human face region within an image can be adopted, and constitute no restriction to the present invention.
At step 104, a mouth neighborhood is selected within the candidate for human face region identified at step 103. As to how to select the mouth neighborhood within one human face, detailed description will be given later with reference to
Then, the mouth neighborhood is processed.
Two different methods of processing the mouth neighborhood are shown at steps 105 to 111 enclosed in the broken line in
Of course, other methods of processing can be applied to the mouth neighborhood as long as the results of the processing of the mouth neighborhood will be sufficient for classifying the candidate for human face region as one candidate with high possibility of being a real human face, one candidate with high possibility of being a false human face, a real human face or a false human face. And other edge information, in addition to the size of special areas (e.g., bright areas formed at step 106 in
Different methods of processing the mouth neighborhood and different edge information do not constitute any restriction to the present invention.
Specifically speaking, in
It should be noted that if the entire candidate for human face region is converted into an edge map, the order of steps 104 and 105 will not be critical.
That is, an edge map can be formed for the entire candidate for human face region and then a mouth neighborhood is selected within the edge map.
If the candidate for human face region is represented as a gray level diagram, the formed edge map will show a plurality of bright edges against a black background. Each bright edge in the edge map represents that the gray level of the pixels at the corresponding place in the gray level diagram of the candidate for human face region changes significantly. In order to change a gray level diagram into an edge map, conventional “Sobel” operator can be utilized. At the end of the specification, four examples are given in order to explain the methods shown in
At step 106, from the edge map formed at step 105, a plurality of pixels whose characteristic values are greater than a first threshold are selected. These selected pixels, at their own positions, compose a series of bright areas within the edge map.
A unique method of selecting such pixels, referred to as “binarization,” will be illustrated in the four examples described at the end of the specification.
Among the series of bright areas, the size of the biggest bright area is denoted as S2, and the size of the second biggest bright area is denoted as S3. As usual, the size of an area is the number of the pixels located within the area. As to how to identify the biggest and the second biggest areas among the series of bright areas, a lot of methods can be utilized. For instance, an algorithm called “Labeling” is applicable here, which is used in the four examples described at the end of the specification.
Then, at step 107, it is decided whether the ratio of S2 to S1 (i.e., S2/S1) is greater than or equal to a second threshold. The second threshold ranges from 0 to 0.1, and preferably takes the value of 0.003. The purpose of step 107 is to decide whether there is a prominent bright area.
If the result of step 107 is Yes, the process goes to step 108; else to step 112.
At step 112, the candidate for human face region is classified as a false human face, or one candidate with high possibility of being a false human face.
At step 108, it is decided whether the ratio of S3 to S2 (i.e., S3/S2) is less than a third threshold. The third threshold ranges from 0.2 to 0.8 and preferably takes the value of 0.5. The purpose of step 108 is to decide whether there is a most prominent bright area.
If the result of step 108 is Yes, the process goes to step 109; else to step 110.
If the process goes to step 109, it means that there is a bright area which is the most prominent among the series of bright areas. If so, this bright area must be the biggest bright area.
At step 109, it is decided whether the center of the biggest bright area is within the mouth area. The mouse area is a predefined area within the mouth neighborhood. As to the definition of mouth area, detailed description will be given later with reference to
If the result of step 109 is No, the process goes to step 112; else to step 113.
At step 113, the candidate for human face region is classified as a real human face, or one candidate with high possibility of being a real human face.
At step 110, the mid-point between the centers of the first two biggest bright areas is identified. Then, the process goes to step 111.
At step 111, it is decided whether the mid-pointed identified at step 110 is within the mouth area.
If the result of step 111 is Yes, the process goes to step 113; else to step 112.
After steps 112 or 113, the process goes to step 114, where the process of detecting a human face within an image is ended.
Once human faces have been detected within an image, further processing can be conducted on the image, or the detected human faces.
Steps 201-205 and steps 212-214 are similar to steps 101-105 and steps 112-114 in
As described above with reference to
Specifically speaking, at step 206, the average edge intensities within the mouth area and the mouth neighborhood are respectively calculated, and denoted as I1 and I2. Then, at step 207, it is decided whether the difference between I1 and I2 is greater than a fourth threshold. The fourth threshold ranges from 0 to 50, and preferably takes the value of 5.
If the result of step 207 is No, the process goes to step 212; else to step 213.
As shown in
Candidate identifier 301 is used to identify one candidate for human face region with the image to be processed. Any conventional algorithms for identifying one candidate for human face region within an image can be adopted in candidate identifier 301, and constitute no restriction to the present invention.
Mouth neighborhood selector 303 is used to select a mouth neighborhood within the candidate for human face region that has been identified by candidate identifier 301. As how to select a mouth neighborhood within a human face, detailed description will be given later with reference to
Mouth neighborhood processor 304, is used to process the mouth neighborhood that has been selected by mouth neighborhood selector 303, and outputs processing results, which are usually some sorts of characteristic values of the mouth neighborhood selected by mouth neighborhood selector 303.
Classifier 302 is used to classify the candidate for human face region that has been identified by candidate identifier 301 as one candidate with high possibility of being a real human face, one candidate with high possibility of being a false human face, a real human face or a false human face, based on the outputs of mouth neighborhood processor 304.
Usually, the outputs of mouth neighborhood processor 304 are some sorts of characteristic values of the mouth neighborhood selected by mouth neighborhood selector 303, if these characteristic values are sufficient for classifier 302 to classify the candidate for human face region as one candidate with high possibility of being a real human face, one candidate with high possibility of being a false human face, a real human face or a false human face.
Although it is shown in
The classification result of classifier 302 can be used for further processing of the image.
It should be noted that any methods of processing can be applied by mouth neighborhood processor 304 to the mouth neighborhood as long as the results of the processing of the mouth neighborhood will be sufficient for classifier 302 to classify the candidate for human face region as one candidate with high possibility of being a real human face, one candidate with high possibility of being a false human face, a real human face or a false human face.
If edge information is utilized in the processing conducted by mouth neighborhood processor 304, mouth neighborhood processor 304 will comprise at least two components, i.e., converter 305 and edge information calculator 306, as shown in
Converter 305 receives mouth neighborhood or the entire candidate for human face region outputted by mouth neighborhood selector 303, and converts at least the mouth neighborhood into an edge map. Edge map has been described above.
Edge information calculator 306 is used to calculate edge information for the mouth neighborhood based on the edge map formed by converter 305, and outputs edge information to classifier 302.
Depending upon different types of edge information, edge information calculator 306 has a lot of embodiments, two of which are shown in
As shown in
A unique method of selecting such pixels, referred to as “binarization,” will be illustrated in the four examples described at the end of the specification.
Size calculator 308, is used to calculated the size of the candidate for human face S1 and the size of the biggest bright area S2. If S2/S1 is less than a second threshold, classifier 302 will classify the candidate for human face region as a false human face, or one candidate with high possibility of being a false human face.
Size calculator 308 may also calculates the size of the second biggest bright area S3, among the series of bright areas identified by bright area identifier 307. Then, if S3/S2 is less than a third threshold and the center of the biggest bright area is not in a mouth area, classifier 302 will classify the candidate for human face region as a false human face, or one candidate with high possibility of being a false human face. The mouth area is a predefined portion of the mouth neighborhood, and will be described in detail later with reference to
If S3/S2 is less than the third threshold and the center of the biggest bright area is in the mouth area, classifier 302 will classify the candidate for human face region as a real human face, or one candidate with high possibility of being a real human face.
If S3/S2 is not less than the third threshold and the mid-point between the centers of the first two biggest bright areas is not in the mouth area, classifier 302 will classify the candidate for human face region as a false human face, or one candidate with high possibility of being a false human face.
If S3/S2 is not less than the third threshold and the mid-point between the centers of the first two biggest bright areas is in said mouth area, classifier 302 will classify the candidate for human face region as a real human face, or one candidate with high possibility of being a real human face.
As shown in
If the difference (I1−I2) between the average edge intensity within the mouth area and the average edge intensity within the mouth neighborhood is not greater than a fourth threshold, classifier 302 will classify the candidate for human face region as a false human face, or one candidate with high possibility of being a false human face.
If the difference (I1−I2) between the average edge intensity within the mouth area and the average edge intensity within the mouth neighborhood is greater than the fourth threshold, classifier 302 will classify the candidate for human face region as a real human face, or one candidate with high possibility of being a real human face.
Reference 401 represents a human face; 402, mouth neighborhood; 403, mouth area; and 404, eye areas.
The width of human face 401 is W, and the height of human face 401 is H.
The width of mouth neighborhood 402 is W2, and the height of mouth neighborhood 402 is H1.
The width of mouth area 403 is W1, and the height of mouth area 403 is at most H1.
The width of eye areas 404 is W4, and the height of eye areas 404 is H2.
The space between eye areas 404 and the vertical border of human face 401 is W3, and the space between eye areas 404 and the top border of human face 401 is H3.
The above width and height satisfy the following equations:
H1=H*r1, 0.1<r1<0.7
W1=W*r2, 0.1<r2<0.7
W2=W*r3, 0.2<r3<1
W3=W*r4, 0<r4<0.2
W4=W*r5, 0.1<r5<0.4
H2=H*r6, 0.1<r6<0.4
H3=H*r7, 0.2<r7<0.4
Preferably, the following is satisfied:
r1=0.3,
r2=0.5,
r3=0.7,
r4=0.125,
r5=0.25,
r6=0.25,
r7=0.3.
The selection of mouth neighborhood and mouth area conducted in
As shown in
Next, a mouth neighborhood A7 is selected within edge map A2. This process corresponds to step 104 in
Then, a process called “binarization” is executed with respect to mouth neighborhood A7. This process corresponds to step 106 if
In this process, a predetermined threshold is used. This threshold is referred to as the first threshold in step 106 of
Let the width of candidate for human face region A1 be 360, and the height of candidate for human face region A1 be 450. The constants r1, r2, . . . , r7 take their preferred values. Thus,
H1=H*r1=450*0.3=135,
W1=W*r2=360*0.5=180,
W2=W*r3=360*0.7=252,
W3=W*r4=360*0.125=45,
W4=W*r5=360*0.25=90,
H2=H*r6=450*0.25=112.5, and
H3=H*r7=450*0.3=135.
Based on the above constants, mouth area A6, mouth neighborhood A7, eye areas A4 and A5 are obtained as shown in
Let the threshold for the “binarization” be:
R9=(the average edge intensity of right eye area+the average edge intensity of left eye area)/2 *r8.
Here, r8 is a proportional threshold, ranging from 0.4 to 13, and preferably takes the value of 0.8.
Suppose the average edge intensity of right eye area A4 is 35, and that of left eye area A5 is 31. Let r8=0.8. Then, r9=(35+31)/2*0.8=26.4.
After the “binarization” of mouth neighborhood A7, area A8 is obtained.
Then, a “labeling” process is executed with respect to area A8. The “labeling” process aims to calculate the number of the bright areas located in A8, the centers and sizes of each of the bright areas. This process also corresponds to step 106 in
After the “labeling” process, it is calculated that there are three bright areas, i.e., A9, A10 and A11, as shown in
Then, it is decided whether there is a prominent bright area among the three bright areas. The size of A1 is S1=360*450=162000. The size of the biggest bright area which is A11 is S2=728. Then, S2/S1=728/162000=0.00449. Let the second threshold be 0.003. Then, S2/S1 is greater than the second threshold. This process corresponds to step 107 in
Next, it is decided whether there is a bright area which is the most prominent. Among A9, A10 and A11, the first two biggest bright areas are A9 and A11, and their sizes are 574 and 728 respectively. That is, S2=728 and S3=574. Then, S3/S2=574/728=0.7885. Let the third threshold be 0.5. Then, S3/S2 is greater than the third threshold. This process corresponds to step 108 in FIG. 1.
Then, the mid-point between the centers of the first two biggest bright areas A9 and A11 are calculated. Since the centers of A9 and A11 are (165, 364) and (180, 397) respectively, the mid-point between A9 and A11 is ((165+180)/2, (364+397)/2)=(172, 380). The position of the mid-point is shown as A12 in
Since the mid-point is located within mouth area A6, candidate for human face region A1 is classified as a real human face, or one candidate with high possibility of being a real human face. This process corresponds to steps 111 and 113 in
Also as shown in
A “binarization” process is executed with respect to mouth neighborhood B4, and obtains B5.
A “labeling” process is executed with respect to B5, and it is calculated that there is no bright area within B5.
Then, candidate for human face region B1 is classified as a false human face, or one candidate with high possibility of being a false human face. This process corresponds to step 112 in
First, the candidate for human face region A1 is identified. Next, with respect to the gray level diagram of candidate for human face region A1, “Sobel” operator is applied, and an edge map A2 is obtained. These processes correspond to steps 203 and 205 in
Then, the average edge intensities of mouth area A6 and mouth neighborhood A7 are calculated as I1=28 and I2=20. This process corresponds to step 206 in
Then, it is decided whether the difference between I1 and I2 is greater than the fourth threshold, which preferable takes the value of 5. Since I1−I2=28−20=8, which is greater than 5, candidate for human face region A1 is classified as a real human face, or one candidate with high possibility of being a real human face. These processes correspond to steps 207 and 213 in
First, one candidate for human face region B1 is identified. Second, with respect to the gray level diagram of candidate for human face region B1, “Sobel” operator is applied, and an edge map B2 is obtained.
Next, the average edge intensities of mouth area B6 and mouth neighborhood B7 are calculated as I1=0 and I2=0. This process corresponds to step 206 in
Then, it is decided whether the difference between I1 and I2 is greater than the fourth threshold, which preferable takes the value of 5. Since I1−I2=0−0=0, which is less than 5, candidate for human face region B1 is classified as a false human face, or one candidate with high possibility of being a false human face. These processes correspond to steps 207 and 213 in
The functions of each component in
The whole system shown in
It involves no inventive work for persons skilled in the art to develop one or more pieces of software based on one or more of the flowcharts shown in
In some sense, the image processing system shown in
While the foregoing has been with reference to specific embodiments of the invention, it will be appreciated by those skilled in the art that these are illustrations only and that changes in these embodiments can be made without departing from the principles of the invention, the scope of which is defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
02 1 60016 | Dec 2002 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
5550581 | Zhou | Aug 1996 | A |
5596362 | Zhou | Jan 1997 | A |
5689575 | Sako et al. | Nov 1997 | A |
5781650 | Lobo et al. | Jul 1998 | A |
5835616 | Lobo et al. | Nov 1998 | A |
5852669 | Eleftheriadis et al. | Dec 1998 | A |
5870138 | Smith et al. | Feb 1999 | A |
6611613 | Kang et al. | Aug 2003 | B1 |
6633655 | Hong et al. | Oct 2003 | B1 |
6928231 | Tajima | Aug 2005 | B2 |
7092554 | Chen et al. | Aug 2006 | B2 |
20010036298 | Yamada et al. | Nov 2001 | A1 |
20020081032 | Chen et al. | Jun 2002 | A1 |
20030021448 | Chen et al. | Jan 2003 | A1 |
20030108225 | Li | Jun 2003 | A1 |
20030133599 | Tian et al. | Jul 2003 | A1 |
20040131236 | Chen et al. | Jul 2004 | A1 |
20060018517 | Chen et al. | Jan 2006 | A1 |
20070263909 | Ojima et al. | Nov 2007 | A1 |
20070263933 | Ojima et al. | Nov 2007 | A1 |
20070263934 | Ojima et al. | Nov 2007 | A1 |
Number | Date | Country | |
---|---|---|---|
20040131236 A1 | Jul 2004 | US |