1. Field of the Invention
The present invention relates to an apparatus and a method for detecting a multi-view specific object, and more particularly relates to an apparatus and a method for detecting a multi-view specific object which can increase detection speed of the specific object on a condition where detection accuracy is not influenced.
2. Description of the Related Art
A rapid and accurate object detection algorithm is the basis of many applications in a field of image processing and video content analysis; the applications include, for example, human face detection and affect analysis, video conference control and analysis, a passerby protection system, etc. The AdaBoost human face detection algorithm may be effectively applied to frontal view human face recognition; there are many products based on this algorithm in the market, for example, a digital camera having a human face detection function, etc. However, with the rapid development of digital cameras and cell phones, a technique that can only carry out frontal view object detection cannot satisfy the demand; a rapid and accurate multi-view object detection technique is beginning to be watched around the world.
In U.S. Pat. No. 7,324,671 B2, an algorithm and an apparatus able to carry out human face detection are disclosed. In this patent, a human face detection system uses a sequence of strong classifiers of gradually increasing complexity to quickly discard non-human-face data at earlier stages (i.e. stages having lower complexity) in a multi-stage classifier structure. The multi-stage classifier structure has a pyramid-like architecture, and uses a coarse-to-fine and simple-to-complex scheme; as a result, by using relatively simple features (i.e. features employed at earlier stages in the multi-stage classifier structure), it is possible to discard a large amount of non-human-face data. By this way, a real-time multi-view human face detection system is achieved. However, the biggest problem of the algorithm is that the pyramid-like architecture includes a large amount of redundant information in the detection process; as a result, the detection speed and the detection accuracy are influenced.
In U.S. Pat. No. 7,457,432 B2, a method and an apparatus able to carry out specific object detection are disclosed. In this patent, HAAR features are employed as weak features. The Real AdaBoost algorithm is employed to train a strong classifier at each stage in a multi-stage classifier structure so as to further improve detection accuracy, and a LUT (i.e. look-up table) data structure is proposed to improve speed of feature selection. Here it should be noted that “strong classifier” and “weak feature” are well-known concepts in the art. However, one major drawback of this patent is that the method is able to be only applied to a specific object detection within a certain range of angles, i.e., frontal view human face detection is mainly carried out; as a result, its application is limited in some measure.
In International Publication WO No. 2008/151470 A1, a method and an apparatus able to carry out robust human face detection in a complicated background image are disclosed. In this patent, microstructure features having low calculation complexity and high redundancy are adopted to express human face features. The AdoBoost algorithm that is sensitive to loss is adopted to choose the most effective weak feature of a human face so as to form a strong classifier at each stage in a multi-stage classifier structure; by this way, human face data and non-human-face data are separated. Since the strong classifier at each stage can reduce false acceptance rate of non-human-face data as much as possible on a condition of ensuring detection rate, the final classifier structure can realize high-performance human face detection in the complicated background image only in a case of having a simple structure. Here it should be noted that “weak feature” is a well-known concept in the art. However, one major drawback of this patent is that the method is able to be applied only to specific object detection within a certain range of angles, i.e., frontal view human face detection is mainly carried out; as a result, its application is limited in some measure.
Although a multi-stage classifier structure formed of plural classifiers used to detect angles can achieve multi-view detection in theory, a normal multi-stage classifier structure used to carry out multi-view detection cannot overcome the following two major problems: (1) as the number of the classifiers increases, detection time of the normal multi-stage classifier structure is increased, and then detection speed of the whole detection system becomes slow; as a result, it may be hard or even impossible to achieve real time detection, and (2) it may be hard or even impossible to reach detection accuracy equal to that of single-view object detection carried out under a certain angle; in other words, detection accuracy of the normal multi-stage classifier structure is low.
The present invention is proposed for overcoming the disadvantages of the prior art. The present invention focuses on a key point of determining whether a window image is a specific object image in a specific object detection process so as to provide a multi-view specific object detection apparatus and a multi-view specific object detection method with regard to the key point. In embodiments of the present invention, by utilizing a multi-stage classifier structure formed of plural cascade classifiers, speed and accuracy of determining whether the window image is the specific object image are improved; by this way, the detection process is speeded up, and the detection accuracy is improved at the same time.
According to one aspect of the present invention, a multi-view specific object detection apparatus is provided. The multi-view specific object detection apparatus comprises an input device for inputting image data; and plural cascade classifiers in which each of the plural cascade classifiers is formed of plural stage classifiers corresponding to the same detection angle and corresponding to different features, and each of the plural stage classifiers is used to calculate degree of confidence of the image data for a specific object corresponding to the detection angle based on the aspect of the corresponding feature and used to determine whether the image data belongs to the specific object based on the degree of confidence. Between two stage classifiers in each of the plural cascade classifiers, a self-adaptive posture prediction device is disposed to determine, based on the degree of confidence calculated by the respective plural stage classifiers corresponding to the same detection angle and located before the self-adaptive posture prediction device, whether the image data enters the plural stage classifiers corresponding to the same detection angles and located after the self-adaptive posture prediction device.
According to another aspect of the present invention, a multi-view specific object detection method is provided. The multi-view specific object detection method comprises an inputting step of inputting image data; and plural parallel classification steps in which the plural parallel classification steps are sequentially formed of plural sub classification steps corresponding to the same detection angle and corresponding to different features, and each of the plural sub classification steps calculates degree of confidence of the image data for a specific object corresponding to the detection angle based on the aspect of the corresponding feature and determines whether the image data belongs to the specific object based on the degree of confidence. Between the sub classification steps in each of the plural parallel classification steps, a self-adaptive posture prediction step is executed for determining, based on the degree of confidence calculated by the plural sub classification steps corresponding to the same detection angles and located before the self-adaptive posture prediction step, whether the plural sub classification steps corresponding the same detection angles and located after the self-adaptive posture prediction step are executed with regard to the image data.
As a result, by adding the self-adaptive posture prediction process, some stage classifiers that are not related to the posture of the image data may be discarded at the earlier stages of the structure so that the determination speed is increased; at the same time, the self-adaptive posture prediction process can ensure that the stage classifiers related to the posture of the image data can be selected to carry out the follow-on determination, so that the determination accuracy is guaranteed. Therefore, according to the embodiments of the present invention, the determination speed of the specific object can be increased on a condition where the determination accuracy is not influenced. Here it should be noted that the posture generally refers to a rotation angle of a specific object with regard to a frontal view image in the art, for example, as shown in
Hereinafter, embodiments of the present invention will be concretely described with reference to the drawings.
Furthermore it should be noted that, in
The input image data enters the cascade classifiers, respectively. First the stage classifier at the first stage in each of the cascade classifiers calculates degree of confidence of the image data for a specific object corresponding to the detection angle (i.e. the cascade classifier) based on the aspect of the corresponding feature, and then determines, based on the degree of confidence, whether the image data belongs to the specific object; here the specific object is, for example, a human face. If a stage classifier determines that the image data belongs to a non-human-face, the determination result is F (false), then the image data is classed in a non-human-face image group, and then the determination of the image data with regard to the corresponding detection angle ends; if a stage classifier determines that the image data belongs to a human face, the determination result is T (true), then the image data enters the next stage classifier corresponding to the detection angle to be determined. In this way, such kind of process goes to the last stage classifier in each of the cascade classifiers. For example, if the stage classifier at the n-th stage determines that the image data belongs to a human face, the determination result is T, and then the image data is classed in a human face image group.
Each of the stage classifiers may be any kind of strong classifier; for example, it is possible to adopt a known stage classifier in the algorithms of the Support Vector Machine (SVM), AdaBoost, etc. In each of strong classifiers, it is possible to use various weak features expressing local texture structures or a combination of them to make a calculation; the weak features are those usually adopted in the art, for example, HAAR features, multi-scale LBP features, etc.
A stage classifier with regard to a specific object is obtained according to training regarding the property of the specific object carried out under a specific posture; here the posture generally refers to a rotation angle of the specific object with regard to a frontal view image in the art as shown in
In the conventional technique shown in
The multi-view specific object detection apparatus shown in
The result output by the multi-view specific object detection apparatus is a determination result of whether the input image data belongs to the specific object, or in other words, whether the input image data is the image data of the specific object. The input image data whose determination result is T (true) is output as the detection result of the specific object. If there are plural windows images, there may be plural detection results. However it is possible to adopt a grouping device to group plural window images, each of whose determination result is T (true), actually belonging to the same specific object in the original whole image into one image, so that one specific object has only one detection result.
The cascade classifiers 210, 220, and 230 correspond to different detection angles; the cascade classifier 210 is formed of stage classifiers 211, 221, . . . , 21n; the cascade classifier 220 is formed of stage classifiers 221, 222, . . . , 22n; the cascade classifier 230 is formed of stage classifiers 231, 232, . . . , 23n; here n is a counting number. The second number from the left in a stage classifier symbol, for example, the second number 2 from the left in the stage classifier symbol 221, refers to the detection angle of the stage classifier. The third number from the left in a stage classifier symbol, for example, the third number 1 from the left in the stage classifier symbol 221, refers to the position of the stage classifier in the corresponding cascade classifier. That is, stage classifiers, whose symbols have the same third number from the left, in the cascade classifiers can be considered being at the same stage; here it should be noted that, like the conventional multi-view specific object detection apparatus shown in
Furthermore it should be noted that, in
Like the conventional multi-view specific object detection apparatus shown in
By comparing the multi-view specific object detection apparatuses shown in
In each cascade classifier, the stage classifiers may be arranged in ascending order of feature complexity. That is, the feature calculated by a stage classifier at the earlier stage is relatively simple, and the calculation complexity is relatively low; the later the stage is, the more complicated the feature calculated by the stage classifier is, and the higher the calculation complexity is. However, it can be understood by those trained in the art that, in a cascade classifier, the arrangement of the stage classifiers may also be carried out in any other order, may not be related to the features at all, or may be related to the features. The self-adaptive posture prediction device 250 may be disposed at any position inside the respective cascade classifiers; for example, it may be disposed between the first stage and the second stage, or between the second stage and the third stage. It can be understood by those educated in the art that the self-adaptive posture prediction device 250 disposed between two other stage classifiers may also realize the goal of discarding the images of the stage classifiers which are not related to the input image data so as to save the detection time and improve the determination accuracy.
Since the self-adaptive posture prediction device 250 is located between the stage classifiers at the same stage in each of the cascade classifiers, the self-adaptive posture prediction device 250 and its units i.e. the normalization calculation unit 252, the merger calculation unit 254, the posture prediction unit 256, and the cascade classifier selection unit 258 carry out the prediction with regard to the determination result before the stage; that is, the operation of the self-adaptive posture prediction device 250 and its units is carried out with regard to the stage classifiers before the stage in each of the cascade classifiers.
The task of the normalization calculation unit 252 is normalizing the output data by the strong classifiers at each stage in each cascade classifier located before the self-adaptive posture prediction device 250 into the same measurement space. It is supposed that, in the i-th cascade classifier currently being handled, there are m stages before the self-adaptive posture prediction device 250, the stage classifier of the j-th stage in the i-th cascade classifier is currently being handled (here m is a counting number; i and j are positive integer indexes), and the degree of confidence, calculated by this stage classifier, of the image data of the specific object corresponding to the detection angle based on the aspect of the corresponding feature is vali,j. The normalization calculation unit 252 may adopt various conventional normalization methods, for example, the Min-Max method, the Z-Score method, the MAD method, the Double-Sigmoid method, the Tanh-Estimator method, etc.
For example, in a case where the Min-Max method is adopted, the normalization value nvali,j of the stage classifier at the j-th stage in the i-th cascade classifier can be calculated by the following equation (1).
nval
i,j=(vali,j−valmin)/(valmax−valmin) (1)
Here valmax and valmin is a value obtained by the stage classifier in a training process, respectively. In particular, valmax refers to the maximum value among the degrees of confidence obtained in the training process carried out with regard to the feature adopted by the j-th stage classifier of the detection angle corresponding to the i-th cascade classifier, i.e., the maximum value which can be acquired by this strong classifier with regard to all the input sample data; valmin refers to the minimum value among the degrees of confidence obtained in the training process carried out by the stage classifier, i.e., the minimum value which can be acquired by this strong classifier with regard to all the input sample data.
In a case where a non-human-face sample image is adopted in training, since the variation range of the degrees of confidence calculated with regard to the non-human-face is relatively wide, noise data is easily introduced when measuring data; as a result, the accuracy of the normalization result is influenced. The classified result i.e. the degree of confidence calculated by the stage classifier with regard to the non-human-face sample generally is a negative value, whereas the degree of confidence calculated with regard to the human face sample generally is a positive value. In order to solve this problem, it is possible to directly let the value of valmin in the equation (1) be zero so that the influence on the normalization caused by the noise data departing from an accurate data distribution can be removed. By improving the equation (1) in this way, the following equation (2) i.e. the normalization equation can be obtained.
nval
i,j=(vali,j−0)/(valmax−0) (2)
The normalization calculation unit 252 may also adopt, for example, the Z-Score method; in this case, the normalization value nvali,j of the stage classifier at the j-th stage in the i-th cascade classifier can be calculated by the following equation (3)
nval
i,j=(vali,j−μ)/σ (3)
Here μ and σ are the average value and the mean square error of the values obtained in a training process carried out with regard to the feature adopted by j-th stage classifier of the detection angle corresponding to the i-th cascade classifier, respectively.
The merger calculation unit 254 is used for merging data. It can merge the calculation results of the strong classifiers at all the stages located before the self-adaptive posture prediction device 250, of the respective cascade classifiers so as to acquire a merger value with regard to each cascade classifier. The merger calculation unit 254 may adopt various data-based merger methods, for examples, the sum method, the product method, the MAX method, etc.
For example, when the merger calculation unit 254 adopts the sum method to merge the output data of the strong classifiers at preceding stages, it is possible not only to utilize historic information at the preceding stages of each cascade classifier efficiently but also to further increase the robustness of merger. In this circumstance, the merger value snvali can be calculated based on the degree-of-confidence normalization value nvali,j of the stage classifiers at m stages before the normalization calculation unit 252, in the i-th cascade classifier by using the following equation (4).
snvali=Σnvali,j (4)
Alternatively, except the sum method, the merger calculation unit 254 may also adopt, for example, the product method, to merge the output data of the stage classifiers at the preceding stages, and then the merger value snvali can be calculated based on the degree-of-confidence normalization value nvali,j of the stage classifiers at m stages before the normalization calculation unit 252, in the i-th cascade classifier by using the following equation (5).
snvali=Πnvali,j (5)
The posture prediction unit 256 may self-adaptively predict the most proper posture of the specific object based on the merger result obtained by the merger calculation unit 254; here the most proper posture of the specific object is the actual angle of the specific object in the handled image data. Then degree of belonging of the image data to the corresponding detection angle is calculated based on the relationship of the angle of the specific object in the image data and the corresponding detection angle. The self-adaptivity is presented as follows: the adopted calculation formula may self-adaptively make a posture prediction based on the data distribution of the stage classifiers at the preceding stages.
For example, the posture prediction unit 256 may utilize the following self-adaptive equation (6) to calculate the degree of belonging ratio, of the image data to the detection angle corresponding to the i-th cascade classifier based on the degree-of-confidence merger value snvali of the preceding m stage classifiers in the i-th cascade classifier calculated by the merger calculation unit 254 and the maximum value snvalmax of the degree-of-confidence merger values of the preceding m stage classifiers in the cascade classifiers corresponding to all the detection angles which are covered by the self-adaptive posture prediction device 250.
ratioi=abs(snvali−snvalmax)/snvali (6)
Here abs refers to the calculation of absolute value.
Alternatively the posture prediction unit 256 may also utilize the following self-adaptive equation (7) to calculate the degree of belonging ratio, of the image data to the detection angle corresponding to the i-th cascade classifier based on the degree-of-confidence merger value snvali of the preceding m stage classifiers in the i-th cascade classifier calculated by the merger calculation unit 254 and the maximum value snvalmax of degree-of-confidence merger values of the preceding m stage classifiers in the cascade classifiers corresponding to all the detection angles which are covered by the self-adaptive posture prediction device 250.
ratioi=snvali/snvalmax (7)
The cascade classifier selection unit 258 is used to choose the most proper one or plural detection angles from the plural detection angles for being employed in the object recognition determination at the follow-on stages; that is, if the degree of belonging of the angle of the specific object in the image data with regard to a detection angle is too low, the stage classifier of this detection angle is not utilized anymore in the determination at the follow-on stages.
In the process of choosing the stage classifier of each of the detection angles, a predetermine threshold value thr is employed to determine whether the degree of belonging calculated by the posture prediction unit 256 for each of the detection angles can pass through the cascade classifier selection unit 258. For example, in a case where the following equation (8) is used to determine whether the i-th cascade classifier is selected, if ratio, is greater than or equal to the predetermined threshold value thr, then the selection result res is 1 that means that the stage classifiers in the i-th cascade classifier after the self-adaptive posture prediction device 250 are continuously adopted; if ratio, is less than the predetermined threshold value thr, then the selection result res is 0 that means that the stage classifiers in the i-th cascade classifier after the self-adaptive posture prediction device 250 are not adopted anymore.
It is apparent that those skilled in the art can understand that the above-mentioned criteria may be rewritten as follows: if ratio, is greater than the predetermined threshold value thr, then the selection result res is 1; if ratioi is less than or equal to the predetermined threshold value thr, then the selection result res is 0.
Here the predetermined threshold value thr may be obtained by adopting a certain amount of sample data to carry out training; it may be determined as follows: when carrying out the training, as for the most positive samples in the sample data, it is necessary to ensure that the above-mentioned degrees of belonging calculated by the above-mentioned calculation are greater than the predetermined threshold value. For example, it is necessary to ensure that 95% of the human face samples can be determined as human face data. However, it is apparent that those practiced in the art can understand that the predetermined threshold value thr may also be obtained by ensuring that 80%, 90%, etc. of the human face samples can be determined as human face data.
According to
In the experiment with regard to
Furthermore a multi-view specific object detection method is provided too. The multi-view specific object detection method comprises an input step executed by the input device 200, of inputting image data; plural parallel classification steps executed by the plural cascade classifiers, respectively, wherein, each of the plural classification steps is sequentially formed of plural sub classification steps corresponding to the same detection angle, each of the sub classification steps executed by each of the stage classifiers, different sub classification steps correspond to different features, and in each of the sub classification steps, a degree of confidence of the image data of a specific object corresponding to the detection angle based on the aspect of the corresponding feature is calculated and whether the image data belongs to the specific object is determined based on the degree of confidence; and a self-adaptive posture prediction step between the sub classification steps of each of the plural classification steps, executed by the self-adaptive posture prediction device 250, wherein, based on the degree of confidence calculated in each of the sub classification steps corresponding to the same detection angle and located before the self-adaptive posture prediction step, whether each of the sub classification steps corresponding to the detection angle, located after the self-adaptive posture prediction step is carried out with regard to the image data is determined.
The self-adaptive posture prediction step comprises a normalization calculation step executed by the normalization calculation unit 252, of normalizing the degree of confidence calculated in each of the sub classification steps corresponding to the same detection angle and located before the self-adaptive posture prediction step so as to obtain a degree-of-confidence normalization value; a merger calculation step executed by the merger calculation unit 254, of merging the degree-of-confidence normalization values corresponding the detection angle obtained in the normalization calculation step so as to obtain a merger value corresponding to the detection value; a posture prediction step executed by posture prediction unit 256, of calculating a degree of belonging of the image data of each of the detection angles based on the merger values obtained in the merger calculation step; and a classification step selection step executed by the cascade classifier selection unit 258, of selecting, by comparing the degree of belonging corresponding to the detection angles and a predetermined threshold value, the sub classification steps corresponding to at least one detection angle whose degree of belonging is greater than the predetermined threshold value and located after the self-adaptive posture prediction step to handle the image data.
Each of the classification steps comprises a sub classification arrangement step of arranging the sub classification steps in ascending order of feature complexity. The sub classification steps, whose positions in the arranged results obtained in the sub classification arrangement step are the same, belong to the same stage. The self-adaptive posture prediction step is executed between the first stage and the second stage or between the second stage and the third stage.
A series of operations described in this specification can be executed by hardware, software, or a combination of hardware and software. When the operations are executed by software, a computer program can be installed in a dedicated built-in storage device of a computer so that the computer can execute the computer program. Alternatively the computer program can be installed in a common computer by which various types of processes can be executed so that the common computer can execute the computer program.
For example, the computer program may be stored in a recording medium such as a hard disk or a read-only memory (ROM) in advance. Alternatively the computer program may be temporarily or permanently stored (or recorded) in a movable recording medium such as a floppy disk, a CD-ROM, a MO disk, a DVD, a magic disk, or a semiconductor storage device.
While the present invention is described with reference to the specific embodiments chosen for purpose of illustration, it should be apparent that the present invention is not limited to these embodiments, but numerous modifications could be made thereto by those skilled in the art without departing from the basic concept and scope of the present invention.
The present application is based on Chinese Priority Patent Application No. 201010108579.5 filed on Feb. 8, 2010, the entire contents of which are hereby incorporated by reference.
Number | Date | Country | Kind |
---|---|---|---|
201010108579.5 | Feb 2010 | CN | national |