The present invention relates to classifying videos or images (determining whether objects are contained or not), i.e., detecting or recognizing objects in the videos or the images, and especially to a method of and an apparatus for generating a classifier for discriminating whether objects to be detected are contained in the videos or the images, and a method of and an apparatus for classifying images with the generated classifier.
As widely spreading of applications such as video monitoring, artificial intelligence, computer vision or the like, there are increasing demands for techniques of detecting specific objects such as human, animal, vehicle or the like presenting in videos and images. Among methods of detecting objects in the videos or the images, there is a kind of methods where static image features are employed to create classifiers for discriminating whether objects or non-objects are contained in the videos or the images, thereby employing the classifiers to classifying the images, i.e., detecting objects in the images, whereas for the videos, the detecting is performed by regarding each frame as an image.
One of such techniques has been disclosed in Paul Viola and Michael Jones, “Robust Real-time Object Detection”, Second International Workshop On Statistical And Computational Theories Of Vision-Modeling, Learning, Computing, And Sampling, Vancouver, Canada, Jul. 13, 2001. In the technique of Paul Viola et al., differences between pixel value sums of rectangular blocks are extracted from images as features, features which are more suitable for discriminating object and non-objects are selected from the extracted features to form weak classifiers through the AdaBoost method, and the weak classifiers are merged to form a strong classifier. This kind of methods are suitable for detecting objects such as human face from images, and their robustness in detecting objects such as human is not high.
In view of the above deficiencies of the prior art, the present invention is intended to provide a method of and an apparatus for generating a classifier, and a method of and an apparatus for classifying images, to increase the robust in detecting objects in the image.
According to an embodiment of the present invention, a method of generating a classifier for discriminating object images from non-object images includes: extracting from each of a plurality of input images a set of features as a feature vector, wherein the extracting comprises: for each of the features in the feature vector, determining a plurality of first areas arranged along a direction of a first axis, and a plurality of second areas arranged along a direction of a second axis intersecting the first axis at an intersection; calculating a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas; and calculating a gradient intensity and a gradient orientation based on the first difference and the second difference to form each of the features; and training the classifier according to the extracted feature vectors.
According to another embodiment of the present invention, an apparatus for generating a classifier for discriminating object images from non-object images is provided, wherein the apparatus extracts from each of a plurality of input images a set of features as a feature vector, and wherein the apparatus comprises: a determining unit which, for each of the features in the feature vector, determines a plurality of first areas arranged along a direction of a first axis, and a plurality of second areas arranged along a direction of a second axis intersecting the first axis at an intersection; a difference calculating unit which calculates a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas; and a gradient calculating unit which calculates a gradient intensity and a gradient orientation based on the first difference and the second difference to form the each of the features; and a training unit for training the classifier according to the extracted feature vectors.
According to the above embodiments of the present invention, because the features including a gradient orientation and a gradient intensity are calculated based on pixels of areas arranged in two directions, the extracted features can reflect the distribution of object edges in respective image portions more truly. The classifiers generated based on such features can be used to detect objects such as human or animals, especially those with various postures, in images more robustly.
Further, in the above methods and apparatuses, respective areas may be rectangular, where respective first areas are adjoined, and respective second areas are also adjoined.
In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are two, the first areas are adjoined and the second areas are adjoined, the intersection of the first axis and the second axis locates in a connecting line for adjoining the first areas or within a predetermined range from a connecting point for adjoining the first areas, and locates in a connecting line for adjoining the second areas or within a predetermined range from a connecting point for adjoining the second areas.
In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are two, the first areas are separated apart and the second areas are separated apart, the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the first areas, and locates within a predetermined range from the middle point between respective center positions of the second areas.
In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are three, the intersection of the first axis and the second axis locates respectively in the intermediate one of the first areas and in the intermediate one of the second areas.
In the above methods and apparatuses, the difference between the area arrangements on which at least two of the features are based comprises one or more of the followings: relative positional relation of the areas, shape of the areas, size of the areas and aspect ratio of the areas. This can rich the features in consideration, thereby facilitate to select features suitable for discriminating objects and non-objects.
In the above methods and apparatuses, the features of at least one dimension in a plurality of feature vectors are transformed, where the transformed features include a gradient orientation and a gradient intensity, and the transforming comprises transforming the gradient orientation into one, including the gradient orientation of the feature, of a plurality of predetermined intervals. With respect to each of the at least one dimension, a classifier including sub-classifiers corresponding to the predetermined intervals is generated, where for each of the predetermined intervals, a threshold for the corresponding sub-classifier is obtained based on the distribution of gradient intensity of features of the feature vectors, which are in the dimension and have the same interval as the predetermined interval.
According to another embodiment of the present invention, a method of classifying an image includes: Extracting from the image a set of features as a feature vector, wherein the extracting comprises: for each of the features in the feature vector, determining a plurality of first areas arranged along a direction of a first axis, and a plurality of second areas arranged along a direction of a second axis intersecting the first axis at an intersection; calculating a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas; and calculating a gradient intensity and a gradient orientation based on the first difference and the second difference to form each of the features; and classifying the image according to the extracted feature vector.
According to another embodiment of the present invention, an apparatus for classifying an image includes: a feature extracting device for extracting from the image a set of features as a feature vector, comprising: a determining unit which, for each of the features in the feature vector, determines a plurality of first areas arranged along a direction of a first axis, and a plurality of second areas arranged along a direction of a second axis intersecting the first axis at an intersection; a difference calculating unit which calculates a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas; and a gradient calculating unit which calculates a gradient intensity and a gradient orientation based on the first difference and the second difference to form the each of the features; and a classifying unit which classifies the image according to the extracted feature vector.
In the above methods and apparatuses, as described in the above, because the gradients of portions in the image can be calculated based on pixels of a plurality of areas, the extracted features can reflect the distribution of object edges in respective image portions more completely, and there is less affect imposed by the change in object posture. The classifiers generated based on such features can be used to detect objects such as human or animals, especially those with various postures, in images more robustly.
In the above methods and apparatuses, the areas may be rectangular, wherein the first areas are adjoined, and the second areas are adjoined too.
In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are two, the first areas are adjoined and the second areas are adjoined, the intersection of the first axis and the second axis locates in a connecting line for adjoining the first areas or within a predetermined range from a connecting point for adjoining the first areas, and locates in a connecting line for adjoining the second areas or within a predetermined range from a connecting point for adjoining the second areas.
In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are two, the first areas are separated apart and the second areas are separated apart, the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the first areas, and locates within a predetermined range from the middle point between respective center positions of the second areas.
In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are three, the intersection of the first axis and the second axis locates respectively in the intermediate one of the first areas and in the intermediate one of the second areas.
Further, in the above methods and apparatuses, the difference between the area arrangements on which at least two of the features are based comprises one or more of the followings: relative positional relation of the areas, shape of the areas, size of the areas and aspect ratio of the areas. This can rich the features in consideration, thereby facilitate to select features suitable for discriminating objects and non-objects.
Further, in the above methods and apparatuses, the classifying of the image comprises: for the gradient orientation and gradient intensity of each of the features, determining one, including the gradient orientation of the feature, of a plurality of gradient orientation intervals, wherein each of the gradient orientation intervals has a corresponding threshold; comparing the gradient intensity of the feature with the corresponding threshold of the determined gradient orientation interval; and generating a classification result according to the comparison result.
The above and/or other aspects, features and/or advantages of the present invention will be easily appreciated in view of the following description by referring to the accompanying drawings. In the accompanying drawings, identical or corresponding technical features or components will be represented with identical or corresponding reference numbers. In the accompanying drawings, it is not necessary to present size and relative position of elements in scale.
a illustrates an example of distribution of outline edges of an object (human body).
b and 3c are schematic diagrams respectively illustrating how to determine first areas and second areas in the portion illustrated in
a is a schematic diagram illustrating object outline edges included in portion 302 as illustrated in
b is a schematic diagram illustrating the gradient calculated by the gradient calculating unit from the first difference and the second difference calculated by the difference calculate unit based on the first areas and the second areas as illustrated in
The embodiments of the present invention are below described by referring to the drawings. It is to be noted that, for purpose of clarity, representations and descriptions about those components and processes known by those skilled in the art but unrelated to the present invention are omitted in the drawings and the description.
As illustrated in
In the technique of employing static image features to create a classifier, object images and non-object images are collected, features are extracted from the collected object images and non-object images, and the extracted features are filtered and merged by using filtering methods such as AdaBoost to obtain a classifier for discriminating object images and non-object images. A method of collecting and preparing such object images and non-object images has been disclosed in patent application WO 2008/151470, Ding et al., “A Robust Human Face Detecting Method In Complicated Background Image” (see page 2 to page 3 of the description). The object images and the non-object images as collected and prepared may serve as input images to the apparatus 100. The apparatus 100 extracts a group of features from each of a plurality of input images as a feature vector.
For each of the features in the feature vector, the determining unit 101 determines a plurality of first areas arranged along the direction of a first axis, and a plurality of second areas arranged along the direction of a second axis intersecting the first axis at an intersection (for example, in a right angle or a non-right angle).
The features to be extracted are usually based on pixels in the input image. The determining unit 101 is adapted to determine the pixel in the input image which each feature to the extracted is based on. The determining unit 101 may determine the pixels in the input image to be based on according to a predetermine area arrangement.
The arrangement of the first areas and the second areas may be various. In an example, the weighted mean position of positions of pixels in the plurality of first areas and the weighted mean position of positions of pixels in the plurality of second areas fall within a predetermined range from the intersection of the first axis and the second axis. Specifically, by taking first areas as an example, it is possible to represent positions of pixels in the first areas as (xij, yij), wherein xij represents the coordinate of the j-th pixel of the i-th first area on the first axis (i.e., X-axis), and yij represents the coordinate of the j-th pixel of the i-th first area on the second axis (i.e., Y-axis). The weighted mean position (xa, ya) of positions of pixels in the first areas may be defined in the following:
wherein N is the number of the first areas, Mi is the number of pixels in the i-th first area, wi is the weight of the i-th first area, and
Further or alternatively, in the above example, the weights of all the first areas may be identical, or may be at least in part different. In case of different weights, it is possible to allocate smaller weights to first areas including more pixels, and allocate larger weights to first areas including less pixels.
Although the description has been provided by taking the first areas as an example in the above, the above description is also applicable to the second areas.
In another example, the areas may be rectangular, wherein the first areas are adjoined, and the second areas are adjoined too.
According to an area arrangement, the numbers of the first areas and of the second areas are two, the first areas are adjoined and the second areas are adjoined. According to this arrangement, the intersection of the first axis and the second axis locates in a connecting line for adjoining the first areas or within a predetermined range from a connecting point (for example, when vertex points of rectangular areas are adjoined) for adjoining the first areas (for example, substantially coinciding with each other), and locates in a connecting line for adjoining the second areas or within a predetermined range from a connecting point for adjoining the second areas.
a and
According to another area arrangement, the numbers of the first areas and of the second areas are two, the first areas are separated apart and the second areas are separated apart. According to this arrangement, the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the first areas, and locates within a predetermined range from the middle point between respective center positions of the second areas.
c and
g and
According to another area arrangement, the numbers of the first areas and of the second areas are three. According to this arrangement, the intersection of the first axis and the second axis locates respectively in the intermediate one of the first areas and in the intermediate one of the second areas.
e and
It should be noted that, the shape of first areas and second areas is not limited to rectangular, and it may be other shapes such as polygon, triangle, circle, ring, and irregular shapes. The shape of first areas and second areas may also be different, and in the feature area for the same feature, the shape of different first/second areas may also be different.
In addition, in case of rectangular shape, sides of different areas of first areas may be parallel to each other, or may be rotated by an angle relative to each other. Also, in case of rectangular shape, sides of different areas of second areas may be parallel to each other, or may be rotated by an angle relative to each other. In case of rectangular shape, the adjoining of rectangular areas comprises the cases where the rectangular area are adjoined via respective sides (i.e., the intersection of the first axis and the second axis locates on these sides), and the cases where the rectangular areas are adjoined via vertex points of respective corners (i.e., the intersection of the first axis and the second axis locates at these vertex points).
It should also be noted that the number of first areas arranged in the direction of the first axis and the number of second areas arranged in the direction of the second axis are not limited to the numbers as illustrated in
It should also be noted that in the feature area for the same feature, the relative position relation of first areas and the relative position relation of second areas may be arbitrary. For example, first areas arranged in the direction of the first axis may be adjoined, separated, partly adjoined, and partly separated, second areas arranged in the direction of the second axis may be adjoined, separated, partly adjoined, and partly separated, as long as the weighted mean position of positions of pixels in the first areas and the weighted mean position of positions of pixels in the second areas fall within a predetermined range from the intersection of the first axis and the second axis.
In the collected object images, outline edges of objects present characteristics distinct from that of non-objects. The outline edges of objects in the object images may have various distributions. To be able to extract features enough for reflecting the outline edges of objects, the determining unit 101 may determine first areas and second areas in portions with different sizes and at different positions in the input image, to obtain features of edge outlines in the portions.
a illustrates an example of distribution of outline edges of an object (human body). As illustrated in
b and 3c are schematic diagrams respectively illustrating how to determine first areas and second areas in the portion 302 illustrated in
In an embodiment, the determining unit 101 may determine first areas and second areas at different positions in the input image according to an area arrangement. New area arrangements are then obtained by changing area size and/or area aspect ratio in this area arrangement, and first areas and second areas are determined at different positions in the input image based on the new area arrangements. This process is repeated until all the possible area sizes or area aspect ratios have been attempted for this area arrangement.
In addition or alternatively, in the above embodiments, the determining unit 101 may obtain new area arrangements by changing relative position relation of areas in the area arrangement.
In addition or alternatively, in the above embodiments, the determining unit 101 may obtain new area arrangements by changing the number of areas in the area arrangement.
In addition or alternatively, in the above embodiments, the determining unit 101 may obtain new area arrangements by changing the shape of areas in the area arrangement.
First areas and second areas determined by the determining unit 101 based on one position of an area arrangement in the input image determine one feature to be extracted. In brief, area arrangements of feature areas on which at least two features in a feature vector are based are different. For example, the difference between the area arrangements on which at least two of the features are based comprises one or more of the followings: relative positional relation of the areas, shape of the areas, size of the areas and aspect ratio of the areas.
Returning to
For example, with respect to the area arrangement illustrated in
The first difference=pixel value sum or mean value of the rectangular block 202−pixel value sum or mean value of the rectangular block 201,
The second difference=pixel value sum or mean value of the rectangular block 202−pixel value sum or mean value of the rectangular block 201.
For another example, with respect to the area arrangement illustrated in
The first difference=pixel value sum or mean value of the rectangular block 206−pixel value sum or mean value of the rectangular block 205,
The second difference=pixel value sum or mean value of the rectangular block 208−pixel value sum or mean value of the rectangular block 207.
For another example, with respect to the area arrangement illustrated in
The first difference=pixel value sum or mean value of the rectangular block 209+pixel value sum or mean value of the rectangular block 211−pixel value sum or mean value of the rectangular block 210×2,
The second difference=pixel value sum or mean value of the rectangular block 212+pixel value sum or mean value of the rectangular block 214−pixel value sum or mean value of the rectangular block 213×2.
For another example, with respect to the area arrangement illustrated in
The first difference=pixel value sum or mean value of the rectangular block 216−pixel value sum or mean value of the rectangular block 215,
The second difference=pixel value sum or mean value of the rectangular block 218−pixel value sum or mean value of the rectangular block 217.
The difference between pixel value sums or mean values (grey scale) of areas on an axis is calculated for purpose of obtaining information reflecting the change in pixel grey scale in the direction of the corresponding axis. With respect to different area arrangements, there are corresponding methods of calculating the first difference and the second difference, as long as they are able to reflect this change.
Returning to
It is possible to calculate the gradient direction and the gradient intensity according to the following equations:
According to the above equation (1), the gradient orientation has an angle range from 0 to 180 degrees. In an alternative embodiment, it is possible to calculate the gradient orientation according to the following equation:
According to the above equation (1′), the gradient orientation has an angle range from 0 to 360 degrees.
a is a schematic diagram illustrating object outline edges included in portion 302 as illustrated in
b is a schematic diagram illustrating the gradient orientation calculated by the gradient calculating unit 103 from the first difference and the second difference calculated by the difference calculating unit 102 based on the first areas and the second areas as illustrated in
Because the features including a gradient orientation and a gradient intensity is calculated based on pixels of areas arranged in two directions and co-located, the extracted features can reflect the distribution of object edges in respective image portions more truly. Accordingly, the classifiers generated based on such features can be used to detect objects such as human or animals, especially those with various postures, in images more robustly.
All the features extracted for each input image form one feature vector.
Returning to
It is possible to train the classifier through a machine learning method such as SVM (support vector machine) based on the feature vectors obtained in the above embodiments, by adopting the histogram of oriented gradients. Such methods of training classifiers based on gradient features are described in literatures such as Dalai et al., “Histograms of Oriented Gradients for Human Detection”, Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893, and Triggs et al., “Human Detection Using Oriented Histograms of Flow and Appearance”, Proc. European Conference on Computer Vision, 2006.
As illustrated in
As described by referring to
The arrangement of the first areas and the second area may be that described in connection with the embodiment of
At step 503, it is possible to determine first areas and second areas in portions with different sizes and at different positions in the input image, to obtain features of edge outlines in the portions.
In a modification of the method 500, at step 503, it is possible to determine first areas and second areas at different positions in the input image according to an area arrangement. New area arrangements are then obtained by changing area size and/or area aspect ratio in this area arrangement, and first areas and second areas are determined at different positions in the input image based on the new area arrangements. This process is repeated until all the possible area sizes or area aspect ratios have been attempted for this area arrangement.
In addition or alternatively, in the above embodiments, at step 503, it is possible to obtain new area arrangements by changing relative position relation of areas in the area arrangement.
In addition or alternatively, in the above embodiments, at step 503, it is possible to obtain new area arrangements by changing the number of areas in the area arrangement.
In addition or alternatively, in the above embodiments, at step 503, it is possible to obtain new area arrangements by changing the shape of areas in the area arrangement.
At step 503, first areas and second areas determined based on one position of an area arrangement in the input image determine one feature to be extracted. In brief, area arrangements of feature areas on which at least two features in a feature vector are based are different. For example, the difference between the area arrangements on which at least two of the features are based comprises one or more of the followings: relative positional relation of the areas, shape of the areas, size of the areas and aspect ratio of the areas.
At step 505, a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas are calculated. It is possible to calculate the first difference and the second difference through the method described in connection with the embodiment of
Then at step 507, a gradient intensity and a gradient orientation are calculated based on the first difference and the second difference as calculated to form a feature to be extracted. It is possible to calculate the gradient orientation and the gradient intensity according to equations (1) (or (1′)) and (2).
At step 509 then, it is determined whether there is any feature not extracted for the present input image. If there is a candidate feature not extracted, the process returns to step 503 to extract the next candidate feature; if otherwise, the process proceeds to step 511.
At step 511, it is determined whether there is any input image with feature vectors not extracted. If there is an input image with feature vectors not extracted, the process returns to step 503 to extract the feature vectors of the next input image; if otherwise, the process proceeds to step 513.
In the method 500, because the features including a gradient orientation and a gradient intensity are calculated based on pixels of areas arranged in two directions and co-located, the extracted features can reflect the distribution of object edges in respective image portions more truly. Accordingly, the classifiers generated based on such features can be used to detect objects such as human or animals, especially those with various postures, in images more robustly.
All the features extracted for each input image form one feature vector.
At step 513, the classifier is trained according to the extracted feature vectors.
It is possible to train the classifier through a machine learning method such as SVM (support vector machine) based on the feature vectors obtained in the above embodiments, by adopting the histogram of oriented gradients. Such methods of training classifiers based on gradient features are described in literatures such as Dalal et al., “Histograms of Oriented Gradients for Human Detection”, Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893, and Triggs et al., “Human Detection Using Oriented Histograms of Flow and Appearance”, Proc. European Conference on Computer Vision, 2006.
The method 500 ends at step 515.
As will be described in the below, it is also possible to train the classifier based on gradient features obtain in the above embodiments without adopting the histogram of oriented gradients.
As illustrated in
The transforming unit 601 transforms the features of at least one dimension in a plurality of feature vectors, where the transformed features include a gradient orientation and a gradient intensity. For example, the feature vectors may be that generated in the embodiments described with reference to
For example, the gradient orientation has an angle range from 0 to 180 degrees (i.e., angle range covered by the plurality of predetermined intervals). It is possible to divide this range into a number of predetermined intervals (also called as gradient orientation intervals), for example, into three intervals from 0 to 60 degrees, from 60 to 120 degrees, and from 120 to 180 degrees. Of course, it is also possible to perform other divisions. The angle range of the gradient orientation may also be 360 degrees. Preferably, the number of the predetermined intervals ranges from 3 to 15. The larger number of predetermined intervals can make the angle division finer, and is more advantageous to achieve stronger classification ability (lower error rate). However, the larger number of predetermined intervals is prone to cause an over-learning phenomenon in detecting, making the classification effect deteriorative. The smaller number of predetermined intervals can make the angle division coarser, and achieve weaker classification ability. However, the smaller number of predetermined intervals can result in lower sensitivity to changes in angle, and is more advantageous to improve the robustness in case of posture change. It is possible to achieve a trade-off between the classification ability and the posture robustness according to the demand of specific implementations, so as to determine the number of predetermined intervals.
The transforming unit 601 transforms the gradient orientation of a feature into a corresponding interval based on the interval which the gradient orientation falls within.
It is assumed that there are N predetermined intervals, and feature vectors are represented as <f1, . . . , fM>, where fi includes a gradient intensity Ii and a gradient orientation Oi. For a feature fi to be transformed, the transformed feature is represented as f′i, where f′i includes the gradient intensity Ii and an interval Ri.
It is possible to generate a classifier corresponding to a dimension based on features fi in the dimension of the feature vectors. The classifier may be represented as hi(I, O), where I represents the gradient intensity, O represents the gradient orientation. The classifier includes N sub-classifiers hij(I), 0<j<N+1, corresponding to N predetermined intervals Kj respectively, for performing classification on features having the gradient orientations falling within the corresponding predetermined intervals. Each sub-classifier hij(I) has a corresponding threshold θij, and classes aij and bij (object, non-object) determined according to the threshold. The processing of hij(I) may be represented as: if I<θij, then hij(I)=aij; otherwise hij(I)=bij. For each sub-classifier hij(I), it is possible to obtain the threshold θij by learning based on the distribution of gradient intensity of features having the same interval Ri as the interval KJ in the features f′i of the transformed feature vectors, and thus obtain the classes aij and bij.
With respect to each of the at least one dimension, the classifier generating unit 602 generates a classifier including sub-classifiers corresponding to the predetermined intervals respectively, where for each of the predetermined intervals, a threshold for the corresponding sub-classifier is obtained based on the distribution of gradient intensity of features of the feature vectors, which are in the dimension and have the same interval as the predetermined interval, and the class determined based on the threshold is obtained. Alternatively, it is also possible to obtain a measure on the reliability of the determined class.
In a simple implementation, it is possible to performing the transformation and the classifier generation only for one dimension, and the generated classifier functions as a classifier for distinguishing object images and non-object images.
Preferably, the above at least one dimension may include at least two or all the dimensions of the feature vectors. In this case, it is possible to generate a classifier corresponding to each dimension respectively, and obtain a final classifier based on the generated classifiers.
It is possible to combine the classifiers corresponding to the dimensions into the final classifier through a known method. For example, the AdaBoost method is one for classifying, and can be used to merge the classifiers generate for the respective dimensions together to form a new strong classifier.
In the AdaBoost method, a weight is set for each sample, and the classifiers are combined through an iterative method. In each iteration, when some samples are correctly classified by the classifiers, the weights for these samples are reduced; in case of wrongly classifying, the weights for these samples are increased, so that the learning algorithm can focus on more difficult training samples in the following learning, and finally obtain a classifier with perfect recognizing accuracy.
Such a technique for selecting and merging a plurality of classifiers to form a final classifier has been disclosed in Paul Viola and Michael Jones, “Robust Real-time Object Detection”, Second International Workshop On Statistical And Computational Theories Of Vision—Modeling, Learning, Computing, And Sampling, Vancouver, Canada, Jul. 13, 2001.
In a preferable embodiment, one of the predetermined intervals represents weak gradients. In this case, if the gradient intensity of a feature is smaller than a predetermined threshold, the transforming unit 601 transforms the gradient orientation into the interval representing weak gradients. Regardless of the gradient intensity, the weak sub-classifier corresponding to an interval representing weak gradients classifies the corresponding features into non-object.
As illustrated in
At step 705, with respect to the present dimension of the transformed feature vectors, a classifier including sub-classifiers corresponding to the predetermined intervals respectively is generated, where for each of the predetermined intervals, a threshold for the corresponding sub-classifier is obtained based on the distribution of gradient intensity of features of the feature vectors, which are in the present dimension and have the same interval as the predetermined interval, and the class determined based on the threshold is obtained. Alternatively, it is also possible to obtain a measure on the reliability of the determined class.
At step 707, it is determined whether there is any dimension with no classifier being generated. If any, the method returns to step 705 to generate the classifier for the next dimension; otherwise the method ends at step 709.
In a simple implementation, it is possible to performing the transformation and the classifier generation only for one dimension, and the generated classifier functions as a classifier for distinguishing object images and non-object images.
Preferably, the above at least one dimension may include at least two or all the dimensions of the feature vectors. In this case, it is possible to generate a classifier corresponding to each dimension respectively, and obtain a final classifier based on the generated classifiers.
It is possible to combine the classifiers corresponding to the dimensions into the final classifier through a known method. For example, the AdaBoost method proposed by Paul Viola et al. may be used to form a final classifier based on the generate classifiers.
In a preferable embodiment, one of the predetermined intervals represents weak gradients. In this case, at step 703, if the gradient intensity of a feature is smaller than a predetermined threshold, the gradient orientation is transformed into the interval representing weak gradients. Regardless of the gradient intensity, the weak sub-classifier corresponding to an interval representing weak gradients classifies the corresponding features into non-object.
As illustrated in
The images input to the apparatus 800 may be those of a predetermined size obtained from the images to be processed through a scanning window. The images may be obtained through a method disclosed in patent application WO 2008/151470, Ding et al., “A Robust Human Face Detecting Method In Complicated Background Image” (see page 5 of the description).
In this embodiment, the feature vector to be extracted is that the classifier(s) used by the classifying unit 804 is based on.
For each of the features in the feature vector, the determining unit 801 determines a plurality of first areas arranged along the direction of a first axis, and a plurality of second areas arranged along the direction of a second axis intersecting the first axis at an intersection (for example, in a right angle or a non-right angle).
The area arrangements of the first areas and the second areas which the determining unit 801 is based on may be that described in connection with the determining unit 101 in the above.
For the first areas and the second areas determined by the determining unit 801 according to each position of each area arrangement in the input image, the difference calculating unit 802 calculates a first difference dx between pixel value sums or mean values (grey scale) of the first areas, and a second difference dy between pixel value sums or mean values of the second areas. It is possible to calculate the gradient orientation and the gradient intensity according to equations (1) (or (1′)) and (2).
The gradient calculating unit 803 calculates a gradient intensity and a gradient orientation based on the first difference and the second difference calculated by the difference calculating unit 802 to form a feature to be extracted. It is possible to calculate the gradient intensity and the gradient orientation by using the method described in connection with the gradient calculating unit 103 in the above.
All the features extracted for the input image form one feature vector. The classifying unit 804 classifies the input image according to the extracted feature vector. The classifier adopted by the classifying unit 804 may be that generated in the above embodiments, for example, the classifier generated by adopting the histogram of oriented gradients, or the classifier generate based on the gradient orientation intervals.
As illustrated in
At step 903, for each of the features in the feature vector, a plurality of first areas arranged along the direction of a first axis, and a plurality of second areas arranged along the direction of a second axis intersecting the first axis at an intersection (for example, in a right angle or a non-right angle) are determined. The area arrangements of the first areas and the second areas which the step 903 is based on may be that described in connection with the determining unit 101 in the above.
Then at step 907, a gradient intensity and a gradient orientation are calculated based on the first difference and the second difference as calculated to form a feature to be extracted. It is possible to calculate the gradient orientation and the gradient intensity according to equations (1) (or (1′)) and (2).
At step 909 then, it is determined whether there is any feature not extracted for the present input image. If there is a feature not extracted, the process returns to step 903 to extract the next feature; if otherwise, the process proceeds to step 911.
All the features extracted for the input image form one feature vector. At step 911, the input image is classified according to the extracted feature vector. The classifier adopted by the step 911 may be that generated in the above embodiments, for example, the classifier generated by adopting the histogram of oriented gradients, or the classifier generate based on the gradient orientation intervals.
The method 900 ends at step 913.
As illustrated in
For each feature in the extracted feature vector, in the corresponding classifier (for example classifier 1001), in case that the gradient orientation of the feature falls within a gradient orientation interval corresponding to a sub-classifier (for example, one of the sub-classifiers 1001-1 to 1001-N), the sub-classifier compares the gradient intensity of the feature with the threshold corresponding to the gradient orientation interval, and generates a classification result based on the comparison result. The classification result may be a class of the image (object, non-object). Alternatively, the classification result may also include the reliability of the image class.
In a unit not illustrated, it is possible to combine the classification results generated by the classifiers based on the corresponding features in the feature vector to form a final classification result, via a known method. For example, it is possible to adopt the AdaBoost method.
As illustrated in
At step 1105, the gradient intensity of the feature is compared with the threshold corresponding to the determined gradient orientation interval.
At step 1107, a classification result is generated according to the comparison result. The classification result may be a class of the image (object, non-object). Alternatively, the classification result may also include the reliability of the image class.
At step 1109, it is determined whether there is any feature not processed in the feature vector. If any, the method returns to step 1103 to process the next feature. If no, the method ends at step 1111.
An environment for implementing the apparatus and the method of the present invention is as illustrated in
In
The CPU 1201, the ROM 1202 and the RAM 1203 are connected to one another via a bus 1204. An input/output interface 1205 is also connected to the bus 1204.
The following components are connected to the input/output interface 1205: an input section 1206 including a keyboard, a mouse, or the like; an output section 1207 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), or the like, and a loudspeaker or the like; the storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs a communication process via the network such as the internet.
The driver 1210 is also connected to the input/output interface 1205 as required. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as required, so that a computer program read therefrom is installed into the storage section 1008 as required.
In the case where the above—described steps and processes are implemented by the software, the program that constitutes the software is installed from the network such as the internet or the storage medium such as the removable medium 1211.
One skilled in the art should note that, this storage medium is not limit to the removable medium 1211 having the program stored therein as illustrated in
The present invention is described in the above by referring to specific embodiments. One skilled in the art should understand that various modifications and changes can be made without departing from the scope as set forth in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
200910135298.6 | May 2009 | CN | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2010/072867 | 5/18/2010 | WO | 00 | 1/5/2012 |