The present invention generally relates to object detection, and more particularly to an adaptive system and method for object detection.
Object detection, for example, face detection, is a computer technology being used in a variety of applications that identifies locations and sizes of all objects in a digital image. Paul Viola and Michael Jones proposed in 2001 an object detection framework that provides competitive object detection rates in real-time. The Viola-Jones method is robust with high detection rate, and is adaptable for real-time applications in which, for example, at least two frames per second should be processed. The Viola-Jones method adopts cascade training mechanism to achieve better detection rates.
There is a growing trend towards low-power applications (e.g., smart phones) that have limited electric and processing power and/or fast applications that require fast (though usually rough) object detection. Therefore, accurate or real-time object detection may be difficult or impossible to achieve in such applications using existing methods. Therefore, a need has thus arisen to propose a novel method to effectively accelerate object detection.
In view of the foregoing, it is an object of the embodiment of the present invention to provide an adaptive system and method for object detection that is capable of quickly detecting objects by skipping window images or early terminating adaptively according to background and/or foreground locality.
According to one embodiment, object detection is performed on a current window image, thereby generating a current likelihood value indicating how likely an object is detected. A predetermined number of next window images following the current window image are skipped, if the current likelihood value is less than a predetermined background threshold.
According to another embodiment, object detection is performed on a current window image, thereby generating a current likelihood value indicating how likely an object is detected. The object detection early terminates, if a previous window image preceding the current window image contains the object to be detected and the current likelihood value is greater than or equal to a predetermined foreground threshold.
In the embodiment, the adaptive system 100 may include a plurality of classifiers 11 (e.g., first stage classifier to nth stage classifier as exemplified in
As shown in
The classifier 11 of the embodiment may include a summing device 112 that is configured to collect and sum up scores generated by the weak classifiers 112, therefore generating a score sum. In the specification, the score of the weak classifier 112 may be a numerical value indicating a level of confidence that a stage will produce a stage decision of face or non-face (e.g., corresponding to a measure how likely it is that a face is present or not present within a scanning window). The score sum is then compared with a predetermined stage threshold by a comparator 113. The classifier 11 can decide, based on the comparison result of the comparator 113, whether the scanning window 110 contains at least a portion of the object (e.g., the face). If the classifier 11 decides in the affirmative, the stage thereof passes, otherwise that stage fails. If one stage passes, the image of the same scanning window 110 is then subjected to detection in the next stage with more features and more time consumed. According to pass/fail conditions of the cascading classifiers 11, the adaptive system 100 (
In step 33, a current likelihood value L is compared with the predetermined background threshold θbg. If the current likelihood value L is less than the predetermined background threshold θbg. (i.e., L<θbg), it indicates that the current window image and neighboring window images are background images not containing the object to be detected. That is, the current window image is in a background locality. Therefore, a predetermined number δ of next window images following the current window image are skipped in step 34, where δ is a preset value representing a degree of locality. In other words, the skipped window images are not subjected to detection, thereby accelerating the object detection. Moreover, in step 34 of the embodiment, likelihood values of the skipped window images are set with a minimum likelihood value Lmin (e.g., L=0), which represents absence of the object to be detected. In an alternative embodiment, likelihood values of the skipped window images are set with a predetermined value less than the predetermined background threshold θbg.
If the result of step 33 is negative (i.e., L≥θbg), indicating that the current window image and neighboring window images are not background images, a previous likelihood value L (associated with a previous window image) is compared with a maximum likelihood value Lmax (e.g., 25) in step 35, which represents presence of the object to be detected. In an alternative embodiment, step 35 determines whether a previous likelihood value L (of a previous window image) is greater than a predetermined value that is greater than the predetermined foreground threshold θfg.
If the previous likelihood value L is equal to the maximum likelihood value Lmax in step 35, indicating that the previous window image preceding the current window image contains the object to be detected, the current likelihood value L is further compared with the predetermined foreground threshold θfg in step 36. If the current likelihood value L is greater than or equal to the predetermined foreground threshold θfg. (i.e., L≥θfg), it indicates that the current window image is a foreground image containing the object to be detected. That is, the current window image is in a foreground locality. Therefore, remaining window images that have not yet been subjected to detection are skipped in step 37. In other words, the skipped window images are not subjected to detection or the flow of the adaptive method 300 early terminates, thereby accelerating the object detection. Moreover, in step 37 of the embodiment, likelihood values of the skipped window images are set with a maximum likelihood value Lmax, which represents presence of the object to be detected. In an alternative embodiment, likelihood values of the skipped window images are set with a predetermined value that is greater than the predetermined foreground threshold θfg.
If either result of step 35 or step 36 is negative, the flow of the adaptive method 300 geos to step 38 to determine whether any window image remains undetected. If the determination is affirmative, the flow of the adaptive method 300 goes to step 32 for detecting a subsequent window image, otherwise the flow goes to step 39, in which the likelihood values L for the window images in the row are outputted.
According to the embodiment proposed above, a plurality of window images may be skipped when the current window image is in a background locality, or the adaptive method 300 may terminate early when the current window image is in a foreground locality, thereby saving substantial processing time and associated power. Accordingly, the embodiment of the present invention may, for example, be adapted to a normally-operated low-power (or power-limited) camera that is capable of quickly detecting objects.
Although specific embodiments have been illustrated and described, it will be appreciated by those skilled in the art that various modifications may be made without departing from the scope of the present invention, which is intended to be limited solely by the appended claims.