This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2006-241804, filed on Sep. 6, 2006; the entire contents of which are incorporated herein by reference.
The present invention relates to a recognizing apparatus configured to recognize feature vectors and a method thereof.
In recent years, importance of security is increased, and an automatic recognizing apparatus such as face recognition or finger print recognition is now becoming widespread. From the viewpoints of ease and safety of people, automatic recognition for various objects (pedestrians or vehicles in the periphery) using a sensor mounted to a vehicle for preventing traffic accidents now attracts the public attention.
In such recognition, an automatic recognition system is achieved from different types of input sensor information by eventually recognizing patterns thereof. In general, the term “patterns” takes the form of “feature vectors” obtained by extracting the features from the input sensor information.
Various multi-variable analyses may be employed in classification of the vectors, and are generally classified into linear classification and non-linear classification. The term “linear classification” indicates classification achieved by applying linear transformation to an entered vector, and the term “non-linear classification” indicates classification achieved by applying non-linear transformation to the vector.
Regarding the linear classification, learning methods employing various statistical methods have been proposed, such as linear discriminant analysis described in L. Chen, H. Liao, M. Ko, J. Lin, and G. Yu, “Anew LDA-based face recognition system which can solve the small sample size problem,” Pattern Recognition, Vol. 33, No. 10, pp. 1713-1726, 2000 and a Support Vector Machine (SVM) described in Christopher J. C Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, Vol. 2, No. 2, pp. 121-167, 1988, incorporated by reference.
On the other hand, in the case of the non-linear classification, there are a few effective learning methods since the non-linear transformation cannot be obtained easily. However, Kernel SVM using Kernel method disclosed in “A Tutorial on Support Vector Machines for Pattern Recognition” and Boosting (AdaBoost, Real AdaBoost, Joint Boosting) have produced good results. Boosting includes a plurality of weak classifiers disclosed in Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences 55(1), 1997, R. Schapire and Y. Singer, “Improved Boosting Algorithms using confidence-rated predictions,” Machine Learning, Vol. 37, No. 3, 1999, and A Torralba, K Murphy and W Freeman, “Sharing Features: efficient boosting procedures for multiclass object detection,” In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2004. In Kernel SVM, the non-linear transformation is performed by replacing the inner product of the vectors by Kernel function, and the non-linear property is expressed by effectively combining the weak classifiers.
The classifiers in the related art as described above have advantages and disadvantages as described below.
Since most of the linear classifiers employ an inner product of the vectors, a calculation cost required for classification is low. However, they have a limit that these classifiers are effective only when the distribution of the target vectors is linearly separable. Therefore, they cannot be effective classifiers for the non-linear distributions.
On the other hand, the non-linear classifiers represented by Kernel SVM or Boosting are effective for most of non-linear distributions. However, since calculation of Kernel function and calculation of the weak classifiers are required by number of times in Kernel SVM and in Boosting respectively, the calculation cost is significantly increased.
For example, in video surveillance applications, hardware resources which can process a large amount of calculation, such as a personal computer (PC) or equivalent apparatuses, may be utilized for executing the applications. Therefore, the classifiers such as Kernel SVM or Boosting which has a high classification performance but also requires high calculation cost may be utilized in the above applications.
However, in view of general application, the recognizing apparatus to be mounted to vehicles for example, the hardware which can be installed is limited to the small one having a performance lower than personal computers.
In the environment such as the above-described video surveillance, if the implementation with low-performance hardware is possible, the cost may be reduced correspondingly.
Therefore, it is desirable to use a classifier with low calculation cost. However, in the case of the linear classifier, although the low calculation cost may be achieved, the constraint of “linear” may result in significantly insufficient classification performance.
Accordingly, it is an object of the present invention is to provide a non-linear recognizing apparatus which configures a non-linear classification plane having a high classification performance at a calculation cost of the same level as a linear classifier.
According to embodiments of the invention, a recognizing apparatus includes a training vector input unit configured to enter a plurality of training vectors as feature vectors for training; a weak classifier generator configured to obtain a plurality of weak classifiers based on the value of an element of a dimension common to the plurality of training vectors using a learning method of Boosting, the plurality of weak classifiers each classifying the plurality of training vectors based on an element of each dimension of the plurality of training vectors; a classifier integrator configured to obtain non-linear mappings for each dimension of the plurality of training vectors by combining the plurality of weak classifiers; a test vector input unit configured to input a test vector to be classified; a non-linear transformer configured to obtain a transformed vector by transforming the values of the elements of the test vector using the respective non-linear mapping corresponding to the dimension of the element; and a score calculator configured to obtain a classification score by summing the value of the respective elements of the transformed vector and recognize the test vector using the classification store.
According to the embodiments of the invention, the non-linear distribution may be recognized in a calculation cost equivalent to that of the linear classifier.
Referring now to
As shown in
The learner 15 includes a training vector input unit 13 for entering training feature vectors (hereinafter, referred simply to as “training vectors”), a pre-processor 14 which is identical to the above one, a weak classifier generator 16 to obtain a plurality of weak classifiers and a classifier integrator 18 for non-linear mapping. Functions of the respective components 12 to 20 may be implemented by a software program stored in a memory 801 in a computer 800, as shown in
(1) Learning Method in Learner 15
Referring now to
(1-1) Training Vector Input Unit 13
Following data is supplied to the training vector input unit 13 as training vectors.
(x1,y1), . . . ,(xN,yN)
xεRd,y={+1,−1}
where N denotes the number of training vectors, x denotes a d-dimensional training vector, and y denotes a teacher label following thereto. In this embodiment, the teacher label has two classes of {+1, −1} for simplicity.
(1-2) Pre-Processor 14
Since the pre-processor 14 is operated in two ways; where the pre-process is performed and where the pre-process is not performed, description will be given separately.
(1-2-1) When the Pre-Process is Performed
Firstly, a case in which the pre-process is performed in the pre-processor 14 will be described.
The pre-processor 14 transforms the training vector x by using a statistical method.
For example, when principal component analysis is performed, principal component axes are obtained from the training vector x by solving an eigenvalue problem shown below.
where A denotes a matrix of eigen vectors (principal component axes), and Λ denotes eigenvalues. The training vector x is transformed by using A learned in this manner.
{tilde over (x)}=Atx
That is, the training vector will be expressed by:
({tilde over (x)}1,y1), . . . ,({tilde over (x)}n,yN)
{tilde over (x)}εRd,y={+1,−1}
It is also the same in other statistical methods, for example, in an independent component analysis. In this case, the dimension of the vector is still “d” because the principal component analysis is exemplified. However, the following process may also be applied even when the dimension is different from “d” after transformation through the employment of the statistical method.
(1-2-2) When the Pre-Process is not Performed
On the other hand, it is also possible not to perform any process as the pre-process, that is, not to perform the pre-process. In this case,
{tilde over (x)}=x
is satisfied.
(1-2-3) Comparison Between Both Operations As described above, there are two types of pre-processes; one is a case in which the statistical method is performed, and the other one is a case in which the process is not performed.
When the pre-process is performed, the independency of the respective vector elements is enhanced through transformation. Therefore, the non-linear mappings of the respective vector elements may be effectively learned in the learner 16 through following Boosting in terms of increasing classification performance.
However, since the pre-processor 14 is used also in the pre-process in the classifier 20, the statistical process is performed as the pre-process, and hence the calculation cost required for classification is slightly increased correspondingly. Therefore, when the pre-process is not performed, the calculation cost is still low.
(1-3) Main Functions of Learner 15
A weak classifier generator 16 and a classifier integrator 18 learn the non-linear mapping of the respective vector elements by applying the Boosting learning method to the training vectors.
The example in which AdaBoost in “A decision-theoretic generalization of on-line learning and an application to boosting” is used will be described.
(1-3-1) Weak Classifier generator 16
In this embodiment, the following classification function is given as the weak classifier.
where LεR and UεR denote a lower limit and an upper limit respectively, sε{−1, +1} is a sign for adjusting the direction of the inequality signs, iε{1, . . . d} denotes element numbers of the vector {tilde over (x)}, and {tilde over (x)}<i> denotes ith element of the vector {tilde over (x)}.
The weak classifier is configured to return +1 when s=+1 is established and the value of the vector elements {tilde over (x)}(i) exists between a lower limit value L and an upper limit value U, which is expressed as in
(1-3-2) Classifier Integrator 18
The final classifier of AdaBoost is the weighted sum of the weak classifiers, and hence the expression:
is established, where T is the number of repeats of AdaBoost learning, and α is weight to be applied to the respective weak classifiers, {t|it=i} is a set of “t”s which satisfy the expression it=i from among the values of t, where t={1, . . . , T}. Note that the weighted sum does not depend on T, but depends on dimension d of the vector in the second row of the expression (2). The sum of the weak classifiers of the dimension d may be expressed as a non-linear mapping φi as shown in:
(1-3-3) Obtaining a Table Function for the Non-Linear Mapping
Actually, the non-linear mapping φi can be implemented as a table function.
The table function (hereinafter, referred simply to as “table”) will be obtained by the following procedure.
Firstly, when the range of value {tilde over (x)}<i> is divided into n ranges (bin), as shown in
where Δz is the width of bin, is established.
Since the weak classifier in Equation (1) is considered for the divided range, the value of L, U in Equation (1) takes any one of value of z0 . . . zn. Therefore, the classifier may be expressed without approximating the value, and hence deterioration of the accuracy of the classifier by dividing the range may be prevented. Accordingly, the non-linear mapping φi is obtained by calculating the expression:
for the respective bins (z0, . . . , zn) in advance and preparing the calculated results as the table φi.
The flow of calculation of the non-linear mapping φi ({tilde over (x)}<i>) from {tilde over (x)}<i> will be as shown in
A key of the table φi[j], that is, “j” is obtained by calculating a corresponding value zj from the value of {tilde over (x)}<i>.
Subsequently, assuming that one of the elements of the set {t|it=i} is t1, αt1 W[Lt1, Ut1, st1, it1,] is one of the weak classifiers added on the right side of Equation (5), that is, the components of the non-linear mapping φi. The component αt1 W[Lt1, Ut1, st1, it1] is shown as the first weak classifier 601 in
Here, for example, it is assumed that there are six elements in the set {t|it=i}, which are t1, t2, t3, t4, t5 and t6 and expressed as the first weak classifier 601, a second weak classifier 602, a third weak classifier 603, a fourth weak classifier 604, a fifth weak classifier 605, and a sixth weak classifier 606 shown on the left side in
In other words, assuming that the i-dimensional elements of the training vectors satisfy the relation t1<t2<t3<t4<t5<t6 in
Therefore, the calculation of the non-linear mapping φi ({tilde over (x)}<i>), in which the value is renewed by the respective weak classifier, is performed only by calling and referencing the renewed non-linear mapping from the table, the calculation cost is very low.
The number of repeats of learning in AdaBoost T has no relation with the calculation of the non-linear mapping. That is, the calculation cost does not depend on the number of repeats T, and is constant because it is achieved only by referencing the non-linear mapping which is stored in the renewed table.
It is generally known that when the number of repeats of learning is increased, the performance of the classifiers is improved. Although there is an upper limit in number of repeats T from the limit of the calculation cost consumed in classification in the related art, the number of repeats Twhich is close to infinite so far as leaning time permits in this embodiment.
Therefore, the classification performance of the non-linear mapping obtained by this leaning is very high.
Therefore, the vector {tilde over (x)} is non-linearly transformed by the non-linear mapping φi into:
(1-3-4) Calculating a Classification Score of a Feature Vector
The linear classification is further applied to the transformed vector φ({tilde over (x)}) obtained through Equation (6), so that a function of the classification score H is obtained as in the following expression:
H({tilde over (x)})=sign└αtφ({tilde over (x)})+b┘ (7)
Then, since the normal vector a and the bias b on the linear classification plane in Equation (7) is unknown, the training vectors are substituted in sequence into Equation (7), which is a function of the classification score H in the state in which the weight a and the bias b are unknown, so that optimal weight a and the bias b are learned by the classifier integrator 18.
As is clear from Equation (2), in a classifier h obtained through AdaBoost, a=1 and b=0 are satisfied. In other cases, they are obtained also through the statistical method. In this case, the training vector is expressed by:
(φ({tilde over (x)}1), . . . ,(φ({tilde over (x)}N),yn)
φ({tilde over (x)})εRd,y={+1,−1}
depending on the non-linear function φ.
On the other hand, when SVM leaning disclosed in the aforementioned document “A Tutorial on Support Vector Machines for Pattern Recognition” for example, the vector a and the bias b which are optimal for classification are obtained.
Eventually, through the transformation of the respective non-linear mappings into:
{circumflex over (φ)}=αiφi+bi
considering corresponding weights ai and biases bi, that is, through renewal of the values in the table, the classifier in Equation (7) is expressed by the expression:
and hence the non-linear classifier 10 may be configured only by reference of the table of the non-linear functions and its sum (that is, the classification score H). The function of Equation (8) configured by the classifier integrator 18 is used in the non-linear classifier 10, described later.
For example, it is also possible to fix to the weight ai=1, and the bias bi=0, and omit the classifier integrator 18. The value of the classification score H in this case corresponds to the sum of the respective elements of the transformed vector transformed by the non-linear mapping obtained by the learner 15.
(2) Non-Linear Classifier 10
Subsequently, a method of recognizing by the non-linear classifier 10 on the basis of the non-linear mappings learned as described above will be described.
(2-1) Test Input Unit 12
The test vector x<i> to be recognized in this embodiment is supplied to the test input unit 12. The test vector x<i> is d-dimension as the training vector x.
(2-2) Pre-Processor 14
The pre-processor 14 obtains {tilde over (x)}<i> through the same process as the pre-process in the learning method shown above.
In other words, whether no process is performed as the pre-process or the statistical process, such as principal component analysis or independent component analysis, is selected.
(2-3) Non-linear Transformer 19 and Score Calculator 20
The Non-linear classifier 10, which is consists of a non-linear transformer 19 and a score calculator 20.
A result of classification is obtained by performing non-linear mappings to {tilde over (x)}<i> in Equation (6) and the calculation method of the classification score H in Equation (8) obtained by the learner 15. In other words, in the non-linear transformer 19, the test vector is substituted into Equation (6) which is a non-linear mappings and then, in the score calculator 20, the results are substituted into Equation (8) which is a function of the classification score H in which the weight a and the bias b are determined, so that the value of the classification score H of the test vector x<i> is obtained. Then, the result of classification is obtained by using the value of the classification score H.
Calculation of a non-linear mapping {tilde over (φ)} in Equation (6) is performed by referencing the non-linear mappings of the respective dimensions stored in the table which is eventually renewed through the leaning procedure shown as in
(3) Modification
The invention is not limited to the above-described embodiment, and may be modified variously without departing the scope of the invention.
For example, in the above-described embodiment, AdaBoost described in “A decision-theoretic generalization of on-line learning and an application to boosting,” is exemplified in the description of the learner 15. However, the non-linear mapping may be obtained also through “real AdaBoost” described in “Improved Boosting Algorithms using confidence-rated predictions”. The “real AdaBoost” does not determine whether or not the weak classifier exits in the range as Equation (1), but realizes the weak classifier by allocating values to the respective ranges S after dividing the range as in Equation (4), so that the superposition in Equation (5) may be performed naturally.
Although AdaBoost targets two classes in the embodiment described above, the non-linear functions may be obtained in the same manner for a plurality of classes by applying Joint Boosting in “Sharing Features: efficient boosting procedures for multiclass object detection” using the weak classifier in Equation (1).
Number | Date | Country | Kind |
---|---|---|---|
2006-241804 | Sep 2006 | JP | national |
Number | Date | Country | |
---|---|---|---|
20080077543 A1 | Mar 2008 | US |