METHOD AND DEVICE FOR CLASSIFYING IMAGE

TECHNICAL FIELD

The present invention relates to classifying videos or images (determining whether objects are contained or not), i.e., detecting or recognizing objects in the videos or the images, and especially to a method of and an apparatus for generating a classifier for discriminating whether objects to be detected are contained in the videos or the images, and a method of and an apparatus for classifying images with the generated classifier.

BACKGROUND

As widely spreading of applications such as video monitoring, artificial intelligence, computer vision or the like, there are increasing demands for techniques of detecting specific objects such as human, animal, vehicle or the like presenting in videos and images. Among methods of detecting objects in the videos or the images, there is a kind of methods where static image features are employed to create classifiers for discriminating whether objects or non-objects are contained in the videos or the images, thereby employing the classifiers to classifying the images, i.e., detecting objects in the images, whereas for the videos, the detecting is performed by regarding each frame as an image.

One of such techniques has been disclosed in Paul Viola and Michael Jones, “Robust Real-time Object Detection”, Second International Workshop On Statistical And Computational Theories Of Vision-Modeling, Learning, Computing, And Sampling, Vancouver, Canada, Jul. 13, 2001. In the technique of Paul Viola et al., differences between pixel value sums of rectangular blocks are extracted from images as features, features which are more suitable for discriminating object and non-objects are selected from the extracted features to form weak classifiers through the AdaBoost method, and the weak classifiers are merged to form a strong classifier. This kind of methods are suitable for detecting objects such as human face from images, and their robustness in detecting objects such as human is not high.

SUMMARY

In view of the above deficiencies of the prior art, the present invention is intended to provide a method of and an apparatus for generating a classifier, and a method of and an apparatus for classifying images, to increase the robust in detecting objects in the image.

According to an embodiment of the present invention, a method of generating a classifier for discriminating object images from non-object images includes: extracting from each of a plurality of input images a set of features as a feature vector, wherein the extracting comprises: for each of the features in the feature vector, determining a plurality of first areas arranged along a direction of a first axis, and a plurality of second areas arranged along a direction of a second axis intersecting the first axis at an intersection; calculating a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas; and calculating a gradient intensity and a gradient orientation based on the first difference and the second difference to form each of the features; and training the classifier according to the extracted feature vectors.

According to another embodiment of the present invention, an apparatus for generating a classifier for discriminating object images from non-object images is provided, wherein the apparatus extracts from each of a plurality of input images a set of features as a feature vector, and wherein the apparatus comprises: a determining unit which, for each of the features in the feature vector, determines a plurality of first areas arranged along a direction of a first axis, and a plurality of second areas arranged along a direction of a second axis intersecting the first axis at an intersection; a difference calculating unit which calculates a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas; and a gradient calculating unit which calculates a gradient intensity and a gradient orientation based on the first difference and the second difference to form the each of the features; and a training unit for training the classifier according to the extracted feature vectors.

According to the above embodiments of the present invention, because the features including a gradient orientation and a gradient intensity are calculated based on pixels of areas arranged in two directions, the extracted features can reflect the distribution of object edges in respective image portions more truly. The classifiers generated based on such features can be used to detect objects such as human or animals, especially those with various postures, in images more robustly.

Further, in the above methods and apparatuses, respective areas may be rectangular, where respective first areas are adjoined, and respective second areas are also adjoined.

In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are two, the first areas are adjoined and the second areas are adjoined, the intersection of the first axis and the second axis locates in a connecting line for adjoining the first areas or within a predetermined range from a connecting point for adjoining the first areas, and locates in a connecting line for adjoining the second areas or within a predetermined range from a connecting point for adjoining the second areas.

In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are two, the first areas are separated apart and the second areas are separated apart, the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the first areas, and locates within a predetermined range from the middle point between respective center positions of the second areas.

In the above methods and apparatuses, in case that the numbers of the first areas and of the second areas are three, the intersection of the first axis and the second axis locates respectively in the intermediate one of the first areas and in the intermediate one of the second areas.

In the above methods and apparatuses, the difference between the area arrangements on which at least two of the features are based comprises one or more of the followings: relative positional relation of the areas, shape of the areas, size of the areas and aspect ratio of the areas. This can rich the features in consideration, thereby facilitate to select features suitable for discriminating objects and non-objects.

In the above methods and apparatuses, the features of at least one dimension in a plurality of feature vectors are transformed, where the transformed features include a gradient orientation and a gradient intensity, and the transforming comprises transforming the gradient orientation into one, including the gradient orientation of the feature, of a plurality of predetermined intervals. With respect to each of the at least one dimension, a classifier including sub-classifiers corresponding to the predetermined intervals is generated, where for each of the predetermined intervals, a threshold for the corresponding sub-classifier is obtained based on the distribution of gradient intensity of features of the feature vectors, which are in the dimension and have the same interval as the predetermined interval.

According to another embodiment of the present invention, a method of classifying an image includes: Extracting from the image a set of features as a feature vector, wherein the extracting comprises: for each of the features in the feature vector, determining a plurality of first areas arranged along a direction of a first axis, and a plurality of second areas arranged along a direction of a second axis intersecting the first axis at an intersection; calculating a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas; and calculating a gradient intensity and a gradient orientation based on the first difference and the second difference to form each of the features; and classifying the image according to the extracted feature vector.

According to another embodiment of the present invention, an apparatus for classifying an image includes: a feature extracting device for extracting from the image a set of features as a feature vector, comprising: a determining unit which, for each of the features in the feature vector, determines a plurality of first areas arranged along a direction of a first axis, and a plurality of second areas arranged along a direction of a second axis intersecting the first axis at an intersection; a difference calculating unit which calculates a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas; and a gradient calculating unit which calculates a gradient intensity and a gradient orientation based on the first difference and the second difference to form the each of the features; and a classifying unit which classifies the image according to the extracted feature vector.

In the above methods and apparatuses, as described in the above, because the gradients of portions in the image can be calculated based on pixels of a plurality of areas, the extracted features can reflect the distribution of object edges in respective image portions more completely, and there is less affect imposed by the change in object posture. The classifiers generated based on such features can be used to detect objects such as human or animals, especially those with various postures, in images more robustly.

In the above methods and apparatuses, the areas may be rectangular, wherein the first areas are adjoined, and the second areas are adjoined too.

Further, in the above methods and apparatuses, the difference between the area arrangements on which at least two of the features are based comprises one or more of the followings: relative positional relation of the areas, shape of the areas, size of the areas and aspect ratio of the areas. This can rich the features in consideration, thereby facilitate to select features suitable for discriminating objects and non-objects.

Further, in the above methods and apparatuses, the classifying of the image comprises: for the gradient orientation and gradient intensity of each of the features, determining one, including the gradient orientation of the feature, of a plurality of gradient orientation intervals, wherein each of the gradient orientation intervals has a corresponding threshold; comparing the gradient intensity of the feature with the corresponding threshold of the determined gradient orientation interval; and generating a classification result according to the comparison result.

BRIEF DESCRIPTION OF DRAWINGS

The above and/or other aspects, features and/or advantages of the present invention will be easily appreciated in view of the following description by referring to the accompanying drawings. In the accompanying drawings, identical or corresponding technical features or components will be represented with identical or corresponding reference numbers. In the accompanying drawings, it is not necessary to present size and relative position of elements in scale.

FIG. 1 is a block diagram illustrating the structure of an apparatus for generating a classifier for discriminating object images and non-object images according to an embodiment of the present invention.

FIG. 2 is a schematic diagram illustrating examples of the area arrangements determined by the determination unit.

FIG. 3
a illustrates an example of distribution of outline edges of an object (human body).

FIGS. 3
b and 3c are schematic diagrams respectively illustrating how to determine first areas and second areas in the portion illustrated in FIG. 3a based on the area arrangement illustrated in FIGS. 2a and 2b.

FIG. 4
a is a schematic diagram illustrating object outline edges included in portion 302 as illustrated in FIG. 3a.

FIG. 4
b is a schematic diagram illustrating the gradient calculated by the gradient calculating unit from the first difference and the second difference calculated by the difference calculate unit based on the first areas and the second areas as illustrated in FIGS. 3b and 3c.

FIG. 5 is a flow chart illustrating a method of generating a classifier for discriminating object images and non-object images according to an embodiment of the present invention.

FIG. 6 is a block diagram illustrating a structure of the training unit for generating a classifier for discriminating object images and non-object images according to a preferable embodiment of the present invention.

FIG. 7 is a flow chart illustrating a method of generating a classifier for discriminating object images and non-object images according to a preferable embodiment of the present invention.

FIG. 8 is a block diagram illustrating the structure of an apparatus for classifying an image according to an embodiment of the present invention.

FIG. 9 is a flow chart illustrating a method of detecting an object in an image according to an embodiment of the present invention.

FIG. 10 is a block diagram illustrating a structure of the classifying unit according to a preferable embodiment of the present invention.

FIG. 11 is a flow chart illustrating a method of classifying according to a preferable embodiment of the present invention.

FIG. 12 is a block diagram illustrating the exemplary structure of a computer for implementing the embodiments of the present invention.

DETAILED DESCRIPTION

The embodiments of the present invention are below described by referring to the drawings. It is to be noted that, for purpose of clarity, representations and descriptions about those components and processes known by those skilled in the art but unrelated to the present invention are omitted in the drawings and the description.

FIG. 1 is a block diagram illustrating the structure of an apparatus 100 for generating a classifier for discriminating object images and non-object images according to an embodiment of the present invention.

As illustrated in FIG. 1, the apparatus 100 includes a determining unit 101, a difference calculating unit 102, a gradient calculating unit 103 and a training unit 104.

In the technique of employing static image features to create a classifier, object images and non-object images are collected, features are extracted from the collected object images and non-object images, and the extracted features are filtered and merged by using filtering methods such as AdaBoost to obtain a classifier for discriminating object images and non-object images. A method of collecting and preparing such object images and non-object images has been disclosed in patent application WO 2008/151470, Ding et al., “A Robust Human Face Detecting Method In Complicated Background Image” (see page 2 to page 3 of the description). The object images and the non-object images as collected and prepared may serve as input images to the apparatus 100. The apparatus 100 extracts a group of features from each of a plurality of input images as a feature vector.

For each of the features in the feature vector, the determining unit 101 determines a plurality of first areas arranged along the direction of a first axis, and a plurality of second areas arranged along the direction of a second axis intersecting the first axis at an intersection (for example, in a right angle or a non-right angle).

The features to be extracted are usually based on pixels in the input image. The determining unit 101 is adapted to determine the pixel in the input image which each feature to the extracted is based on. The determining unit 101 may determine the pixels in the input image to be based on according to a predetermine area arrangement.

The arrangement of the first areas and the second areas may be various. In an example, the weighted mean position of positions of pixels in the plurality of first areas and the weighted mean position of positions of pixels in the plurality of second areas fall within a predetermined range from the intersection of the first axis and the second axis. Specifically, by taking first areas as an example, it is possible to represent positions of pixels in the first areas as (x_ij, y_ij), wherein x_ijrepresents the coordinate of the j-th pixel of the i-th first area on the first axis (i.e., X-axis), and y_ijrepresents the coordinate of the j-th pixel of the i-th first area on the second axis (i.e., Y-axis). The weighted mean position (xa, ya) of positions of pixels in the first areas may be defined in the following:

$xa = \sum_{i}^{N} \sum_{j}^{M_{i}} x_{ij} \times w_{i}, ya = \sum_{i}^{N} \sum_{j}^{M_{i}} y_{ij} \times w_{i}$

wherein N is the number of the first areas, M_iis the number of pixels in the i-th first area, w_iis the weight of the i-th first area, and

$\sum_{i}^{N} w_{i} = 1.$

Further or alternatively, in the above example, the weights of all the first areas may be identical, or may be at least in part different. In case of different weights, it is possible to allocate smaller weights to first areas including more pixels, and allocate larger weights to first areas including less pixels.

Although the description has been provided by taking the first areas as an example in the above, the above description is also applicable to the second areas.

In another example, the areas may be rectangular, wherein the first areas are adjoined, and the second areas are adjoined too.

FIG. 2 is a schematic diagram illustrating examples of the area arrangements determined by the determining unit 101. In FIG. 2, X-axis represents the first axis, Y-axis represents the second axis, and white color and black color in rectangular blocks are only for discriminating purpose. Although the first axis and the second axis in FIG. 2 are illustrated as orthogonal to each other, the first axis and the second axis may also intersect with each other in a non-right angle.

According to an area arrangement, the numbers of the first areas and of the second areas are two, the first areas are adjoined and the second areas are adjoined. According to this arrangement, the intersection of the first axis and the second axis locates in a connecting line for adjoining the first areas or within a predetermined range from a connecting point (for example, when vertex points of rectangular areas are adjoined) for adjoining the first areas (for example, substantially coinciding with each other), and locates in a connecting line for adjoining the second areas or within a predetermined range from a connecting point for adjoining the second areas.

FIG. 2
a and FIG. 2b illustrate an example of such area arrangement. Specifically, FIG. 2a illustrates an arrangement of first areas in the direction of the first axis, where each of a white rectangular block 201 and a black rectangular block 202 represents a first area and they are adjoined on a connecting line, whereas the intersection of the first axis and the second axis locates on the connecting line. FIG. 2b illustrates an arrangement of second areas in the direction of the second axis, where each of a white rectangular block 203 and a black rectangular block 204 represents a second area and they are adjoined on a connecting line, whereas the intersection of the first axis and the second axis locates on the connecting line. Although arrangements of areas in the directions of the first axis and the second axis are respectively illustrated in FIG. 2a and FIG. 2b, what is actually reflected is an area arrangement when FIG. 2a and FIG. 2b are merged, i.e., the first axis and the second axis of FIG. 2a are respectively identical to the first axis and the second axis of FIG. 2b. Alternatively, the rectangular blocks 201 and 202 as well as rectangular blocks 203 and 204 may be adjoined with each other via respective vertex points.

According to another area arrangement, the numbers of the first areas and of the second areas are two, the first areas are separated apart and the second areas are separated apart. According to this arrangement, the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the first areas, and locates within a predetermined range from the middle point between respective center positions of the second areas.

FIG. 2
c and FIG. 2d illustrate an example of such area arrangement. FIG. 2c illustrates an arrangement of first areas in the direction of the first axis, where each of a white rectangular block 205 and a black rectangular block 206 represents a first area and they are separated apart, whereas the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the white rectangular block 205 and the black rectangular block 206. FIG. 2d illustrates an arrangement of second areas in the direction of the second axis, where each of a white rectangular block 207 and a black rectangular block 208 represents a second area and they are separated apart, whereas the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the white rectangular block 207 and the black rectangular block 208. Although arrangements of areas in the directions of the first axis and the second axis are respectively illustrated in FIG. 2c and FIG. 2d, what is actually reflected is an area arrangement when FIG. 2c and FIG. 2d are merged, i.e., the first axis and the second axis of FIG. 2c are respectively identical to the first axis and the second axis of FIG. 2d.

FIG. 2
g and FIG. 2h illustrate another example of such area arrangement, where rectangular blocks are against to each other at respective vertex points. FIG. 2g illustrates an arrangement of first areas in the direction of the first axis, where each of a white rectangular block 215 and a black rectangular block 216 represents a first area and they are separated apart, whereas the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the white rectangular block 215 and the black rectangular block 216. FIG. 2h illustrates an arrangement of second areas in the direction of the second axis, where each of a white rectangular block 217 and a black rectangular block 218 represents a second area and they are separated apart, whereas the intersection of the first axis and the second axis locates within a predetermined range from the middle point between respective center positions of the white rectangular block 217 and the black rectangular block 218. Although arrangements of areas in the directions of the first axis and the second axis are respectively illustrated in FIG. 2g and FIG. 2h, what is actually reflected is an area arrangement when FIG. 2g and FIG. 2h are merged, i.e., the first axis and the second axis of FIG. 2g are respectively identical to the first axis and the second axis of FIG. 2h.

According to another area arrangement, the numbers of the first areas and of the second areas are three. According to this arrangement, the intersection of the first axis and the second axis locates respectively in the intermediate one of the first areas and in the intermediate one of the second areas.

FIG. 2
e and FIG. 2f illustrate an example of such area arrangement. FIG. 2e illustrates an arrangement of first areas in the direction of the first axis, where each of a white rectangular block 210 and black rectangular blocks 209, 211 represents a first area and the intersection of the first axis and the second axis locates in the intermediate white rectangular block 210. FIG. 2f illustrates an arrangement of second areas in the direction of the second axis, where each of a white rectangular block 213 and black rectangular blocks 212, 214 represents a second area and the intersection of the first axis and the second axis locates in the intermediate white rectangular block 213. Although arrangements of areas in the directions of the first axis and the second axis are respectively illustrated in FIG. 2e and FIG. 2f, what is actually reflected is an area arrangement when FIG. 2e and FIG. 2f are merged, i.e., the first axis and the second axis of FIG. 2e are respectively identical to the first axis and the second axis of FIG. 2f. Alternatively, the rectangular blocks 209, 210 and 211 may be separated apart, instead of adjoined, and the rectangular blocks 212, 213 and 214 may be separated apart, instead of adjoined.

It should be noted that, the shape of first areas and second areas is not limited to rectangular, and it may be other shapes such as polygon, triangle, circle, ring, and irregular shapes. The shape of first areas and second areas may also be different, and in the feature area for the same feature, the shape of different first/second areas may also be different.

In addition, in case of rectangular shape, sides of different areas of first areas may be parallel to each other, or may be rotated by an angle relative to each other. Also, in case of rectangular shape, sides of different areas of second areas may be parallel to each other, or may be rotated by an angle relative to each other. In case of rectangular shape, the adjoining of rectangular areas comprises the cases where the rectangular area are adjoined via respective sides (i.e., the intersection of the first axis and the second axis locates on these sides), and the cases where the rectangular areas are adjoined via vertex points of respective corners (i.e., the intersection of the first axis and the second axis locates at these vertex points).

It should also be noted that the number of first areas arranged in the direction of the first axis and the number of second areas arranged in the direction of the second axis are not limited to the numbers as illustrated in FIG. 2, and the number of first areas is not necessarily identical to the number of second areas, as long as the weighted mean position of positions of pixels in the first areas and the weighted mean position of positions of pixels in the second areas fall within a predetermined range from the intersection of the first axis and the second axis. Preferably, the number of first areas and the number of second areas are not greater than 3.

It should also be noted that in the feature area for the same feature, the relative position relation of first areas and the relative position relation of second areas may be arbitrary. For example, first areas arranged in the direction of the first axis may be adjoined, separated, partly adjoined, and partly separated, second areas arranged in the direction of the second axis may be adjoined, separated, partly adjoined, and partly separated, as long as the weighted mean position of positions of pixels in the first areas and the weighted mean position of positions of pixels in the second areas fall within a predetermined range from the intersection of the first axis and the second axis.

In the collected object images, outline edges of objects present characteristics distinct from that of non-objects. The outline edges of objects in the object images may have various distributions. To be able to extract features enough for reflecting the outline edges of objects, the determining unit 101 may determine first areas and second areas in portions with different sizes and at different positions in the input image, to obtain features of edge outlines in the portions.

FIG. 3
a illustrates an example of distribution of outline edges of an object (human body). As illustrated in FIG. 3a, in the input image, outline edges of human body exist in respective portions with different sizes and at different positions, such as portions 301, 302 and 303.

FIGS. 3
b and 3c are schematic diagrams respectively illustrating how to determine first areas and second areas in the portion 302 illustrated in FIG. 3a based on the area arrangement illustrated in FIGS. 2a and 2b. In FIG. 3b, reference sign 304 indicates the arrangement of first areas. In FIG. 3c, reference sign 305 indicates the arrangement of second areas.

In an embodiment, the determining unit 101 may determine first areas and second areas at different positions in the input image according to an area arrangement. New area arrangements are then obtained by changing area size and/or area aspect ratio in this area arrangement, and first areas and second areas are determined at different positions in the input image based on the new area arrangements. This process is repeated until all the possible area sizes or area aspect ratios have been attempted for this area arrangement.

In addition or alternatively, in the above embodiments, the determining unit 101 may obtain new area arrangements by changing relative position relation of areas in the area arrangement.

In addition or alternatively, in the above embodiments, the determining unit 101 may obtain new area arrangements by changing the number of areas in the area arrangement.

In addition or alternatively, in the above embodiments, the determining unit 101 may obtain new area arrangements by changing the shape of areas in the area arrangement.

First areas and second areas determined by the determining unit 101 based on one position of an area arrangement in the input image determine one feature to be extracted. In brief, area arrangements of feature areas on which at least two features in a feature vector are based are different. For example, the difference between the area arrangements on which at least two of the features are based comprises one or more of the followings: relative positional relation of the areas, shape of the areas, size of the areas and aspect ratio of the areas.

Returning to FIG. 1, for the first areas and the second areas determined by the determining unit 101 according to each position of each area arrangement in the input image, the difference calculating unit 102 calculates a first difference dx between pixel value sums or mean values (grey scale) of the first areas, and a second difference dy between pixel value sums or mean values of the second areas.

For example, with respect to the area arrangement illustrated in FIGS. 2a and 2b, it is possible to calculate the first difference and the second difference through the following equations:

The first difference=pixel value sum or mean value of the rectangular block 202−pixel value sum or mean value of the rectangular block 201,

The second difference=pixel value sum or mean value of the rectangular block 202−pixel value sum or mean value of the rectangular block 201.

For another example, with respect to the area arrangement illustrated in FIGS. 2c and 2d, it is possible to calculate the first difference and the second difference through the following equations:

The first difference=pixel value sum or mean value of the rectangular block 206−pixel value sum or mean value of the rectangular block 205,

The second difference=pixel value sum or mean value of the rectangular block 208−pixel value sum or mean value of the rectangular block 207.

For another example, with respect to the area arrangement illustrated in FIGS. 2e and 2f, it is possible to calculate the first difference and the second difference through the following equations:

The first difference=pixel value sum or mean value of the rectangular block 209+pixel value sum or mean value of the rectangular block 211−pixel value sum or mean value of the rectangular block 210×2,

The second difference=pixel value sum or mean value of the rectangular block 212+pixel value sum or mean value of the rectangular block 214−pixel value sum or mean value of the rectangular block 213×2.

For another example, with respect to the area arrangement illustrated in FIGS. 2g and 2h, it is possible to calculate the first difference and the second difference through the following equations:

The first difference=pixel value sum or mean value of the rectangular block 216−pixel value sum or mean value of the rectangular block 215,

The second difference=pixel value sum or mean value of the rectangular block 218−pixel value sum or mean value of the rectangular block 217.

The difference between pixel value sums or mean values (grey scale) of areas on an axis is calculated for purpose of obtaining information reflecting the change in pixel grey scale in the direction of the corresponding axis. With respect to different area arrangements, there are corresponding methods of calculating the first difference and the second difference, as long as they are able to reflect this change.

Returning to FIG. 1, the gradient calculating unit 103 calculates a gradient intensity and a gradient orientation based on the first difference and the second difference calculated by the difference calculating unit to form a feature to be extracted.

It is possible to calculate the gradient direction and the gradient intensity according to the following equations:

$\begin{matrix} Gradient orientation = arc tg (\frac{\partial x}{\partial y}), & (1) \\ Gradient intensity = \sqrt{{dx}^{2} + {dy}^{2}} . & (2) \end{matrix}$

According to the above equation (1), the gradient orientation has an angle range from 0 to 180 degrees. In an alternative embodiment, it is possible to calculate the gradient orientation according to the following equation:

$\begin{matrix} Gradient orientation = a \tan 2 (\frac{\partial x}{\partial y}) = \arg (\frac{\partial x}{\partial y}) - π . & (1^{'}) \end{matrix}$

According to the above equation (1′), the gradient orientation has an angle range from 0 to 360 degrees.

FIG. 4
a is a schematic diagram illustrating object outline edges included in portion 302 as illustrated in FIG. 3a. As illustrated in FIG. 4a, an edge 401 schematically represents an edge outline included in the portion 302.

FIG. 4
b is a schematic diagram illustrating the gradient orientation calculated by the gradient calculating unit 103 from the first difference and the second difference calculated by the difference calculating unit 102 based on the first areas and the second areas as illustrated in FIGS. 3b and 3c. In FIG. 4b, a normal line 403 to a diagonal line 402 represents the calculated gradient orientation.

Because the features including a gradient orientation and a gradient intensity is calculated based on pixels of areas arranged in two directions and co-located, the extracted features can reflect the distribution of object edges in respective image portions more truly. Accordingly, the classifiers generated based on such features can be used to detect objects such as human or animals, especially those with various postures, in images more robustly.

All the features extracted for each input image form one feature vector.

Returning to FIG. 1, the training unit 104 trains a classifier based on the extracted feature vectors.

It is possible to train the classifier through a machine learning method such as SVM (support vector machine) based on the feature vectors obtained in the above embodiments, by adopting the histogram of oriented gradients. Such methods of training classifiers based on gradient features are described in literatures such as Dalai et al., “Histograms of Oriented Gradients for Human Detection”, Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893, and Triggs et al., “Human Detection Using Oriented Histograms of Flow and Appearance”, Proc. European Conference on Computer Vision, 2006.

FIG. 5 is a flow chart illustrating a method 500 of generating a classifier for discriminating object images and non-object images according to an embodiment of the present invention.

As illustrated in FIG. 5, the method 500 starts from step 501. Steps 503, 505 and 507 involve extracting a group of candidate features as a feature vector from a present input image. At step 503, for each of the features in the feature vector, a plurality of first areas arranged along the direction of a first axis, and a plurality of second areas arranged along the direction of a second axis intersecting the first axis at an intersection (for example, in a right angle or a non-right angle) are determined.

As described by referring to FIG. 1, a method disclosed in patent application WO 2008/151470, Ding et al., “A Robust Robust Human Face Detecting Method In Complicated Background Image” may be used to collect and prepare input images including object images and non-object images (see page 2 to page 3 of the description).

The arrangement of the first areas and the second area may be that described in connection with the embodiment of FIG. 1 in the above.

At step 503, it is possible to determine first areas and second areas in portions with different sizes and at different positions in the input image, to obtain features of edge outlines in the portions.

In a modification of the method 500, at step 503, it is possible to determine first areas and second areas at different positions in the input image according to an area arrangement. New area arrangements are then obtained by changing area size and/or area aspect ratio in this area arrangement, and first areas and second areas are determined at different positions in the input image based on the new area arrangements. This process is repeated until all the possible area sizes or area aspect ratios have been attempted for this area arrangement.

In addition or alternatively, in the above embodiments, at step 503, it is possible to obtain new area arrangements by changing relative position relation of areas in the area arrangement.

In addition or alternatively, in the above embodiments, at step 503, it is possible to obtain new area arrangements by changing the number of areas in the area arrangement.

In addition or alternatively, in the above embodiments, at step 503, it is possible to obtain new area arrangements by changing the shape of areas in the area arrangement.

At step 503, first areas and second areas determined based on one position of an area arrangement in the input image determine one feature to be extracted. In brief, area arrangements of feature areas on which at least two features in a feature vector are based are different. For example, the difference between the area arrangements on which at least two of the features are based comprises one or more of the followings: relative positional relation of the areas, shape of the areas, size of the areas and aspect ratio of the areas.

At step 505, a first difference of pixel value sums or mean values of the plurality of first areas, and a second difference of pixel value sums or mean values of the plurality of second areas are calculated. It is possible to calculate the first difference and the second difference through the method described in connection with the embodiment of FIG. 1 in the above.

Then at step 507, a gradient intensity and a gradient orientation are calculated based on the first difference and the second difference as calculated to form a feature to be extracted. It is possible to calculate the gradient orientation and the gradient intensity according to equations (1) (or (1′)) and (2).

At step 509 then, it is determined whether there is any feature not extracted for the present input image. If there is a candidate feature not extracted, the process returns to step 503 to extract the next candidate feature; if otherwise, the process proceeds to step 511.

At step 511, it is determined whether there is any input image with feature vectors not extracted. If there is an input image with feature vectors not extracted, the process returns to step 503 to extract the feature vectors of the next input image; if otherwise, the process proceeds to step 513.

In the method 500, because the features including a gradient orientation and a gradient intensity are calculated based on pixels of areas arranged in two directions and co-located, the extracted features can reflect the distribution of object edges in respective image portions more truly. Accordingly, the classifiers generated based on such features can be used to detect objects such as human or animals, especially those with various postures, in images more robustly.

All the features extracted for each input image form one feature vector.

At step 513, the classifier is trained according to the extracted feature vectors.

It is possible to train the classifier through a machine learning method such as SVM (support vector machine) based on the feature vectors obtained in the above embodiments, by adopting the histogram of oriented gradients. Such methods of training classifiers based on gradient features are described in literatures such as Dalal et al., “Histograms of Oriented Gradients for Human Detection”, Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886-893, and Triggs et al., “Human Detection Using Oriented Histograms of Flow and Appearance”, Proc. European Conference on Computer Vision, 2006.

The method 500 ends at step 515.

As will be described in the below, it is also possible to train the classifier based on gradient features obtain in the above embodiments without adopting the histogram of oriented gradients.

FIG. 6 is a block diagram illustrating a structure of the training unit 104 for generating a classifier for discriminating object images and non-object images according to a preferable embodiment of the present invention.

As illustrated in FIG. 6, the training unit 104 includes a transforming unit 601 and a classifier generating unit 602.

The transforming unit 601 transforms the features of at least one dimension in a plurality of feature vectors, where the transformed features include a gradient orientation and a gradient intensity. For example, the feature vectors may be that generated in the embodiments described with reference to FIG. 1 and FIG. 5 in the above. The transforming performed by the transforming unit 601 comprises transforming the gradient orientation into one of a plurality of predetermined intervals which the gradient orientation falls within.

For example, the gradient orientation has an angle range from 0 to 180 degrees (i.e., angle range covered by the plurality of predetermined intervals). It is possible to divide this range into a number of predetermined intervals (also called as gradient orientation intervals), for example, into three intervals from 0 to 60 degrees, from 60 to 120 degrees, and from 120 to 180 degrees. Of course, it is also possible to perform other divisions. The angle range of the gradient orientation may also be 360 degrees. Preferably, the number of the predetermined intervals ranges from 3 to 15. The larger number of predetermined intervals can make the angle division finer, and is more advantageous to achieve stronger classification ability (lower error rate). However, the larger number of predetermined intervals is prone to cause an over-learning phenomenon in detecting, making the classification effect deteriorative. The smaller number of predetermined intervals can make the angle division coarser, and achieve weaker classification ability. However, the smaller number of predetermined intervals can result in lower sensitivity to changes in angle, and is more advantageous to improve the robustness in case of posture change. It is possible to achieve a trade-off between the classification ability and the posture robustness according to the demand of specific implementations, so as to determine the number of predetermined intervals.

The transforming unit 601 transforms the gradient orientation of a feature into a corresponding interval based on the interval which the gradient orientation falls within.

It is assumed that there are N predetermined intervals, and feature vectors are represented as <f₁, . . . , f_M>, where f_iincludes a gradient intensity I_iand a gradient orientation O_i. For a feature f_ito be transformed, the transformed feature is represented as f′_i, where f′_iincludes the gradient intensity I_iand an interval R_i.

It is possible to generate a classifier corresponding to a dimension based on features f_iin the dimension of the feature vectors. The classifier may be represented as h_i(I, O), where I represents the gradient intensity, O represents the gradient orientation. The classifier includes N sub-classifiers h_ij(I), 0<j<N+1, corresponding to N predetermined intervals K_jrespectively, for performing classification on features having the gradient orientations falling within the corresponding predetermined intervals. Each sub-classifier h_ij(I) has a corresponding threshold θ_ij, and classes a_ijand b_ij(object, non-object) determined according to the threshold. The processing of h_ij(I) may be represented as: if I<θ_ij, then h_ij(I)=a_ij; otherwise h_ij(I)=b_ij. For each sub-classifier h_ij(I), it is possible to obtain the threshold θ_ijby learning based on the distribution of gradient intensity of features having the same interval R_ias the interval K_Jin the features f′_iof the transformed feature vectors, and thus obtain the classes a_ijand b_ij.

With respect to each of the at least one dimension, the classifier generating unit 602 generates a classifier including sub-classifiers corresponding to the predetermined intervals respectively, where for each of the predetermined intervals, a threshold for the corresponding sub-classifier is obtained based on the distribution of gradient intensity of features of the feature vectors, which are in the dimension and have the same interval as the predetermined interval, and the class determined based on the threshold is obtained. Alternatively, it is also possible to obtain a measure on the reliability of the determined class.

In a simple implementation, it is possible to performing the transformation and the classifier generation only for one dimension, and the generated classifier functions as a classifier for distinguishing object images and non-object images.

Preferably, the above at least one dimension may include at least two or all the dimensions of the feature vectors. In this case, it is possible to generate a classifier corresponding to each dimension respectively, and obtain a final classifier based on the generated classifiers.

It is possible to combine the classifiers corresponding to the dimensions into the final classifier through a known method. For example, the AdaBoost method is one for classifying, and can be used to merge the classifiers generate for the respective dimensions together to form a new strong classifier.

In the AdaBoost method, a weight is set for each sample, and the classifiers are combined through an iterative method. In each iteration, when some samples are correctly classified by the classifiers, the weights for these samples are reduced; in case of wrongly classifying, the weights for these samples are increased, so that the learning algorithm can focus on more difficult training samples in the following learning, and finally obtain a classifier with perfect recognizing accuracy.

Such a technique for selecting and merging a plurality of classifiers to form a final classifier has been disclosed in Paul Viola and Michael Jones, “Robust Real-time Object Detection”, Second International Workshop On Statistical And Computational Theories Of Vision—Modeling, Learning, Computing, And Sampling, Vancouver, Canada, Jul. 13, 2001.

In a preferable embodiment, one of the predetermined intervals represents weak gradients. In this case, if the gradient intensity of a feature is smaller than a predetermined threshold, the transforming unit 601 transforms the gradient orientation into the interval representing weak gradients. Regardless of the gradient intensity, the weak sub-classifier corresponding to an interval representing weak gradients classifies the corresponding features into non-object.

FIG. 7 is a flow chart illustrating a method 700 of generating a classifier for discriminating object images and non-object images according to a preferable embodiment of the present invention.

As illustrated in FIG. 7, the method 700 starts from step 701. At step 703, the features of at least one dimension in a plurality of feature vectors are transformed, where the transformed features include a gradient orientation and a gradient intensity. For example, the feature vectors may be that generated in the embodiments described with reference to FIG. 1 and FIG. 5 in the above. The transforming being performed comprises transforming the gradient orientation into one of a plurality of predetermined intervals which the gradient orientation falls within.

At step 705, with respect to the present dimension of the transformed feature vectors, a classifier including sub-classifiers corresponding to the predetermined intervals respectively is generated, where for each of the predetermined intervals, a threshold for the corresponding sub-classifier is obtained based on the distribution of gradient intensity of features of the feature vectors, which are in the present dimension and have the same interval as the predetermined interval, and the class determined based on the threshold is obtained. Alternatively, it is also possible to obtain a measure on the reliability of the determined class.

At step 707, it is determined whether there is any dimension with no classifier being generated. If any, the method returns to step 705 to generate the classifier for the next dimension; otherwise the method ends at step 709.

It is possible to combine the classifiers corresponding to the dimensions into the final classifier through a known method. For example, the AdaBoost method proposed by Paul Viola et al. may be used to form a final classifier based on the generate classifiers.

In a preferable embodiment, one of the predetermined intervals represents weak gradients. In this case, at step 703, if the gradient intensity of a feature is smaller than a predetermined threshold, the gradient orientation is transformed into the interval representing weak gradients. Regardless of the gradient intensity, the weak sub-classifier corresponding to an interval representing weak gradients classifies the corresponding features into non-object.

FIG. 8 is a block diagram illustrating the structure of an apparatus 800 for classifying an image according to an embodiment of the present invention.

As illustrated in FIG. 8, the apparatus 800 includes a determining unit 801, a difference calculating unit 802, a gradient calculating unit 803 and a classifying unit 804.

The images input to the apparatus 800 may be those of a predetermined size obtained from the images to be processed through a scanning window. The images may be obtained through a method disclosed in patent application WO 2008/151470, Ding et al., “A Robust Human Face Detecting Method In Complicated Background Image” (see page 5 of the description).

In this embodiment, the feature vector to be extracted is that the classifier(s) used by the classifying unit 804 is based on.

For each of the features in the feature vector, the determining unit 801 determines a plurality of first areas arranged along the direction of a first axis, and a plurality of second areas arranged along the direction of a second axis intersecting the first axis at an intersection (for example, in a right angle or a non-right angle).

The area arrangements of the first areas and the second areas which the determining unit 801 is based on may be that described in connection with the determining unit 101 in the above.

For the first areas and the second areas determined by the determining unit 801 according to each position of each area arrangement in the input image, the difference calculating unit 802 calculates a first difference dx between pixel value sums or mean values (grey scale) of the first areas, and a second difference dy between pixel value sums or mean values of the second areas. It is possible to calculate the gradient orientation and the gradient intensity according to equations (1) (or (1′)) and (2).

The gradient calculating unit 803 calculates a gradient intensity and a gradient orientation based on the first difference and the second difference calculated by the difference calculating unit 802 to form a feature to be extracted. It is possible to calculate the gradient intensity and the gradient orientation by using the method described in connection with the gradient calculating unit 103 in the above.

All the features extracted for the input image form one feature vector. The classifying unit 804 classifies the input image according to the extracted feature vector. The classifier adopted by the classifying unit 804 may be that generated in the above embodiments, for example, the classifier generated by adopting the histogram of oriented gradients, or the classifier generate based on the gradient orientation intervals.

FIG. 9 is a flow chart illustrating a method 900 of classifying an image according to an embodiment of the present invention.

As illustrated in FIG. 9, the method 900 starts from step 901. Steps 903, 905 and 907 involve extracting a group of features as a feature vector from a present input image. The feature vector to be extracted is that the classifier(s) being used is based on. The input image may be those of a predetermined size obtained from the images to be processed through a scanning window. The images may be obtained through a method disclosed in patent application WO 2008/151470, Ding et al., “A Robust Human Face Detecting Method In Complicated Background Image” (see page 5 of the description).

At step 903, for each of the features in the feature vector, a plurality of first areas arranged along the direction of a first axis, and a plurality of second areas arranged along the direction of a second axis intersecting the first axis at an intersection (for example, in a right angle or a non-right angle) are determined. The area arrangements of the first areas and the second areas which the step 903 is based on may be that described in connection with the determining unit 101 in the above.

Then at step 907, a gradient intensity and a gradient orientation are calculated based on the first difference and the second difference as calculated to form a feature to be extracted. It is possible to calculate the gradient orientation and the gradient intensity according to equations (1) (or (1′)) and (2).

At step 909 then, it is determined whether there is any feature not extracted for the present input image. If there is a feature not extracted, the process returns to step 903 to extract the next feature; if otherwise, the process proceeds to step 911.

All the features extracted for the input image form one feature vector. At step 911, the input image is classified according to the extracted feature vector. The classifier adopted by the step 911 may be that generated in the above embodiments, for example, the classifier generated by adopting the histogram of oriented gradients, or the classifier generate based on the gradient orientation intervals.

The method 900 ends at step 913.

FIG. 10 is a block diagram illustrating a structure of the classifying unit 104 according to a preferable embodiment of the present invention.

As illustrated in FIG. 10, the classifying unit 104 includes classifiers 1001 to 100M, where M represents the number of features in the feature vector to be extracted. Each classifier corresponds to one feature. Classifiers 1001 to 100M may be that described with reference to FIG. 6 in the above. Taking the classifier 1001 as an example, the classifier 1001 includes a plurality of sub-classifiers 1001-1 to 1001-N. As described with reference to FIG. 6 in the above, each sub-classifier 1001-1 to 1001-N corresponds to one different gradient orientation interval, and each gradient orientation interval has a corresponding threshold.

For each feature in the extracted feature vector, in the corresponding classifier (for example classifier 1001), in case that the gradient orientation of the feature falls within a gradient orientation interval corresponding to a sub-classifier (for example, one of the sub-classifiers 1001-1 to 1001-N), the sub-classifier compares the gradient intensity of the feature with the threshold corresponding to the gradient orientation interval, and generates a classification result based on the comparison result. The classification result may be a class of the image (object, non-object). Alternatively, the classification result may also include the reliability of the image class.

In a unit not illustrated, it is possible to combine the classification results generated by the classifiers based on the corresponding features in the feature vector to form a final classification result, via a known method. For example, it is possible to adopt the AdaBoost method.

FIG. 11 is a flow chart illustrating a method of classifying according to a preferable embodiment of the present invention. The method may be adopted to implementation the step 911 of FIG. 9.

As illustrated in FIG. 11, the method starts from step 1101. At step 1103, for one feature in the extracted feature vector, one, including the gradient orientation of the feature, of a plurality of gradient orientation intervals (as described with reference to FIG. 6) associated with the feature is determined. As described with reference to FIG. 6, each gradient orientation interval has a corresponding threshold.

At step 1105, the gradient intensity of the feature is compared with the threshold corresponding to the determined gradient orientation interval.

At step 1107, a classification result is generated according to the comparison result. The classification result may be a class of the image (object, non-object). Alternatively, the classification result may also include the reliability of the image class.

At step 1109, it is determined whether there is any feature not processed in the feature vector. If any, the method returns to step 1103 to process the next feature. If no, the method ends at step 1111.

FIG. 12 is a block diagram illustrating the exemplary structure of a computer for implementing the embodiments of the present invention.

An environment for implementing the apparatus and the method of the present invention is as illustrated in FIG. 12.

In FIG. 12, a central processing unit (CPU) 1201 performs various processes in accordance with a program stored in a read only memory (ROM) 1202 or a program loaded from a storage section 1208 to a random access memory (RAM) 1203. In the RAM 1203, data required when the CPU 1201 performs the various processes or the like is also stored as required.

The CPU 1201, the ROM 1202 and the RAM 1203 are connected to one another via a bus 1204. An input/output interface 1205 is also connected to the bus 1204.

The following components are connected to the input/output interface 1205: an input section 1206 including a keyboard, a mouse, or the like; an output section 1207 including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), or the like, and a loudspeaker or the like; the storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs a communication process via the network such as the internet.

The driver 1210 is also connected to the input/output interface 1205 as required. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as required, so that a computer program read therefrom is installed into the storage section 1008 as required.

In the case where the above—described steps and processes are implemented by the software, the program that constitutes the software is installed from the network such as the internet or the storage medium such as the removable medium 1211.

One skilled in the art should note that, this storage medium is not limit to the removable medium 1211 having the program stored therein as illustrated in FIG. 12, which is delivered separately from the approach for providing the program to the user. Examples of the removable medium 1211 include the magnetic disk, the optical disk (including a compact disk-read only memory (CD-ROM) and a digital versatile disk (DVD)), the magneto-optical disk (including a mini-disk (MD)), and the semiconductor memory. Alternatively, the storage medium may be the ROM 1202, the hard disk contained in the storage section 1208, or the like, which have the program stored therein and is deliver to the user together with the method that containing them.

The present invention is described in the above by referring to specific embodiments. One skilled in the art should understand that various modifications and changes can be made without departing from the scope as set forth in the following claims.

METHOD AND DEVICE FOR CLASSIFYING IMAGE

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information