The present invention, in general, relates to the field of object detection in an image. More particularly, the present Invention provides a cascade Structure for classifying various types of objects in the image in real time.
Face detection in Images and videos is a key component in a wide variety of applications of human-computer interaction, search, security, and surveillance. Recently, the technology has made its way into digital cameras and mobile phones as well. Implementation of face detection technology in these devices facilitates enhanced precision in applications such as Auto Focus and Exposure control, thereby helping the camera to take better images. Further, some of the advanced features in these devices such as Smile shot, Blink shot, Human detection, Face beautification, Red eye reduction, and Face emoticons make use of the face detection as their first step.
Various techniques have been employed over the last couple of decades for obtaining an efficient face detector. The techniques varies from a simple color based method for rough face localization to a structure that make use of complex classifiers like neural networks and support vector machines (SVM). One of the most famous techniques has been AdaBoost algorithm. The AdaBoost algorithm for face detection was proposed by Viola and Jones in “Robust Real-Time Object Detection,” Compaq Cambridge Research Laboratory, Cambridge, Mass., 2001. In the AdaBoost algorithm, Haar features are used as weak classifiers. Each weak classifier of the face detection structure was configured to classify an image sub-window into either face or non-face. To accelerate the face detection speed, Viola and Jones introduced the concepts of an integral image and a cascaded framework. A conventional cascade detection structure 100 proposed by Viola and Jones is illustrated in
However, the face detection technique developed by Viola and Jones primarily deals with frontal faces. Many real-world applications would profit from multi-view detectors that can detect objects with different orientations in 3-Dimension space such as faces looking left or right, faces looking up or down, or faces that are tilted left or right. Further, it is complicated to detect multi-view faces due to the large amount of variation and complexity brought about by the changes in facial appearance, lighting and expression. In case of conventional cascade detection structure 100, it is not feasible to train a single cascade structure for classifying multi-view faces. Hence, to detect multi-view faces using Viola and Jones cascade structure, multiple cascade structures trained with multi-views faces may be employed. However, the use of multiple cascade structures increases the overall computational complexity of the structure.
In the light of the foregoing, for exploiting the synergy between the face detection and pose estimation, there is a well felt need of a cascade structure and method that is capable of classifying faces and objects with different orientations in 3-Dimension space without increasing the computational complexity.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.
In order to address the problem of classifying faces and objects with different orientations, the present invention provides a cascade structure for classifying one or more objects in an image without increasing computational complexity.
In accordance with an embodiment of the present invention, a cascade object classification structure for classifying one or more objects in an image is provided. The cascade object classification structure includes a plurality of nodes that are arranged in one or more layers. Each layer includes at least one parent node and each subsequent layer includes at least two child nodes such that at least one child node in at least one of the subsequent layers is operatively linked to two or more parent nodes in a preceding layer. Each node includes one or more classifiers for classifying the one or more objects as a positive object and a negative object. Each of the positive objects and the negative objects as classified by the at least one parent node in each layer are further classified by one or more operatively linked child nodes in the corresponding subsequent layer.
In accordance with another embodiment of the present invention, a method for classifying one or more objects in an image is provided. The method includes determination of one or more features, associated with the one or more objects, from the image. The one or more objects is evaluated at each node of a plurality of nodes, wherein the plurality of nodes are arranged in one or more layers. In at least one of the one or more evaluations, the node receives the evaluated objects from two or more nodes of a preceding layer. At each node, the one or more objects is classified as a positive object and a negative object based at least in part on the evaluation. At least one of the one or more classifications includes further classifying the positive object and the negative object in the subsequent layer.
Additional features of the invention will be set forth in the description that follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
To further clarify the above and other advantages and features of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
a illustrates a net cascade object classification structure, in accordance with another embodiment of the present invention;
b is a schematic diagram illustrating the training of the net cascade object classification structure for classifying multi-view faces, in accordance with an exemplary embodiment of the present invention;
c is a schematic diagram illustrating the detection of the multi-view faces in the image using the net cascade object classification structure, in accordance with an exemplary embodiment of the present invention;
Embodiments of the present invention provide a cascade object classification structure and a method for classifying one or more objects in an image. For purposes of the following description, the term “object” may refer to various 3 Dimensional objects such as faces, cars, houses, and so forth. The cascade object classification structure is capable of classifying faces and objects with different orientations in 3-Dimension space without increasing overall computational complexity. Further, the objects classified by a parent node are further classified by one or more operatively linked child nodes, thereby increasing the detection rate of the structure. For example, the cascade object classification structure of the present invention achieves a frontal face detection rate of >95% and a profile face detection rate of ˜85%. Furthermore, one or more classifiers of the structure may be configured to detect various types of objects in the image. Moreover, at least one child node in the structure is operatively linked to two or more parent nodes, thereby reducing the number of nodes in the structure and consequently increasing the detection speed.
Referring now to
Each node in a layer is operatively linked to two nodes in a subsequent layer. As depicted in
Each node of the plurality of nodes implements one or more classifiers (not shown). As known in the field of pattern recognition, a classifier is an algorithm that is configured to analyze a received input and to provide a decision based on the analysis. Examples of the classifier include, but are not limited to, AdaBoost classifier, support vector machine (SVM) classifier, and Gaussian mixture model (GMM) classifier. It may be appreciated by a person skilled in the art that, any other classifier known in the art may be used for the purpose of classification at each node.
For classifying one or more objects in an image, the classifiers of the nodes are trained using the supervised mode with an input data in a preliminary or one-time learning phase. The input data includes a set of samples relating to the objects that may be present in the image such as face samples and object samples having different orientations in 3-Dimension space. In various embodiments of the present invention, the classifiers are trained with either similar or different type of the input data. The samples of the input data are termed as positive training samples and negative training samples based on the type of the object that the classifier is configured to classify. For example, to detect the objects such as faces in the image, the input data may include samples of face images with different orientations and non-face images. The face image samples will be termed as the positive training samples and the non-face image samples will be termed as the negative training samples. These samples are then fed into the training code of the classifiers.
During the training, the classifiers compute one or more features from the images relating to the input data. Examples of the features include, but are not limited to, DCT features, Wavelet transformed features, and Haar features. For example, in case of computing Haar features, the classifier may perform simple additions and subtractions corresponding to the intensity values of the pixels in the image. The intensity value of the pixels related to the white and black region is added separately. Thereafter, the sum of the intensity values of the pixels which lie within the black region is subtracted from the sum of the intensity values of the pixels which lie within the white region. The resulting value is known as Haar feature value. It may be appreciated that, the Haar features may correspond to various other features such as dimension co-ordinates, pixel values etc., associated with the images. Thereafter, the computed feature information corresponding to each node is stored in a look up table.
The classifiers are trained layer-by-layer based on the input data (face image samples and non-face image samples) and the corresponding location of the nodes in the pyramid. Further, the training of the classifiers of the operatively linked child nodes also depends on the output provided by the parent nodes in the corresponding preceding layer. For example, the classifiers of node 206a are trained based on the input data, its location in the pyramid and the output provided by its parent node i.e., node 204a in preceding layer 204. However, the classifiers of node 206b are trained based on the input data, its location in the pyramid and the output provided by its two parent nodes i.e., node 204a and node 204b in preceding layer 204. In accordance with an embodiment of the present invention, the classifiers are trained to pass the objects relating to the positive training samples and to reject the objects relating to the negative training samples. Further, the classifiers are trained in such a way that most of the objects relating to the negative training samples are passed through the low complexity nodes of the pyramid; whereas the objects relating to the positive training samples are passed through the high complexity nodes of the pyramid.
While classifying the objects in the image, the classifier evaluates the objects of the image based on selecting features from the computed feature information stored in the look up table and classifies the objects of the image as a positive object and a negative object. In accordance with various embodiments of the present invention, the classifiers of the nodes are configured to detect either similar or different type of the objects such as faces, house, car, and so forth. Moreover, each of the positive objects and the negative objects as classified by each node in a layer are further classified by the operatively linked nodes in the corresponding subsequent layer.
Referring now to
Referring now to
As illustrated in
Referring now to
Referring now to
At step 404, the objects are evaluated at each node of a plurality of nodes of the cascade object classification structure (discussed with reference to
At step 406, at each node, the objects of the image are classified as a positive object and a negative object based on the evaluation. Examples of the objects include, but are not limited to, faces and objects with different orientations in 3-Dimension space. For example, the nodes may classify an object as the positive object if the accumulated sum of the threshold functions are above a given node classifier threshold otherwise the object may get classified as the negative object. The classification of the positive object and the negative object depends on the training provided to the classifiers, as discussed above with reference to
Program data 508 stores all static and dynamic data for processing by the processor in accordance with the one or more program modules. In particular, program data 508 includes image data 518 to store information representing image characteristics and statistical data, for example, DCT coefficients, absolute mean values of the DCT coefficients, etc. The program data 508 also stores a cascade data 520, a classification data 522, a look up table data 524, and other data 526. Although, only selected modules and blocks have been illustrated in
Having described a general system 500 with respect to
In operation, image acquisition module 510 involves image capturing device 528 to captures an image. System 500 receives the captured image and stores the information representing image characteristics and statistical data in image data 518. Object detection module 512 generates a cascade object classification structure, as discussed above with reference to
Thereafter, object detection module 512 executes one or more evaluation operations on the objects (based on the computed feature information stored in look up table data 524) at each node of the cascade object classification structure. In succession, object detection module 512 executes one or more classification operations of the evaluated objects at each node of the cascade object classification structure. Such classification operations result in the one or more objects in the image being classified as a positive object and a negative object. Object detection module 512 stores the classified objects in classification data 522. Thereafter, object detection module 512 detects the classified objects as faces and objects with different orientations in 3-Dimension space. Object detection module 512 stores the detected objects in other data 524.
Feature processing module 604 determines and evaluates the one or more objects in the image. It may be appreciated by a person skilled in the art that, existing systems and methods for determination of features and evaluation of objects may be employed for the purposes of ongoing description.
Object classification module 606 executes one or more classifications at each of the nodes of the structure and classifies the one or more objects as a positive object and a negative object. The execution of the classifications depends at least in part on one or more evaluated objects and the corresponding location of each of the nodes in the cascade object classification structure. Each of the positive objects and the negative objects as classified by the nodes in each layer, using object classification module 606, are further classified by one or more operatively linked nodes in the corresponding subsequent layer.
The embodiments described herein may include the use of a special purpose or general-purpose computer including various computer hardware or software modules, as discussed in greater detail below.
Embodiments within the scope of the present invention also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media.
Computer-executable instructions comprise, for example, instructions and data, which cause a general-purpose computer, special purpose computer, or special purpose-processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
As used herein, the term “module” or “component” can refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). While the system and methods described herein are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In this description, a “computing entity” may be any computing system as previously defined herein, or any module or combination of modulates running on a computing system.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.