The present invention relates to a learning technique for a nearest neighbor classifier.
As one typical pattern classifier, a nearest neighbor classifier (hereinafter also referred to as an NNC) is known. The NNC holds reference vectors (also referred to as templates or prototypes) classified into categories and outputs a category (class) to which a reference vector having the shortest distance to an input vector is assigned, as a recognition (classification) result for the input vector. In the NNC, by changing reference vectors and categories to which the reference vectors are assigned, a two-class classifier or a multi-class classifier may be configured.
A classification boundary surface of the NNC is a boundary surface of feature spaces that are Voronoi-divided by reference vectors. When the reference vectors that determine the classification boundary surface are learned by using a learning sample previously prepared, a recognition accuracy of the NNC can be enhanced. As a learning method of reference vectors of the NNC, a method referred to as LVQ (Learning Vector Quantization) (see NPL 1 described below) and a method referred to as GLVQ (Generalized Learning Vector Quantization) (see NPL 2 described below), that is an improvement method of the former method, are known. In these learning methods, reference vectors are updated respectively in accordance with their own criteria.
As another pattern classifier, an SVM (Support Vector Machine) is known (see NPL 3 describe below). The SVM learns so as to maximize a distance (margin) between a classification boundary and a learning sample and thereby suppresses over-learning. For example, a linear SVM is applicable to a classification problem of two classes such that a classification boundary surface is planar (see NPL 4 described below). Through this application, learning of the SVM is performed so as to maximize a margin between a classification boundary surface and a learning sample, and therefore classifier having high classification performance may be obtained. PTL 1 described below proposes a method in which a discrimination function utilizing a learning mechanism of an SVM is derived by the use of feature information for the learning and discrimination result information for the learning, and corrects the derived discrimination function by adjusting an influence coefficient indicating an influence degree which is exerted upon the discrimination function by erroneous discrimination feature informationthat a discrimination result is wrong.
As a learning method of a pattern classifier which sets margin maximization as a criterion, an SVM using a kernel technique is also well-known (see NPL 5 described below). Further, PTL 2 described below proposes a voice recognition method using a continuous HMM (Hidden Markov Model) during learning, and using a discrete type HMM during recognition.
However, in the aforementioned learning methods such as LVQ and GLVQ, a phenomenon referred to as over-learning occurs, and a classification accuracy of an NNC may be deteriorated. This is because it is not guaranteed that a margin between a classification boundary and a learning sample of the NNC is maximized. Further, the aforementioned SVM is a method for learning a plane to be a classification boundary, and therefore the SVM may not be applicable to learning of reference vectors of the NNC. Likewise, the aforementioned SVM using a kernel technique may also not be applicable to learning of reference vectors of the NNC.
Thus, it is not possible for the aforementioned methods to execute learning which sets margin maximization as a criterion, with regard to the NNC, and therefore it is difficult to suppress over-learning. As a result, it is difficult to enhance a classification accuracy of the NNC.
The present invention has been achieved in view of these circumstances. An objective of the present invention is to provide a learning technique of reference vectors of an NNC that is capable of enhancing classification accuracy.
To solve the aforementioned problem, each aspect of the present invention employs the following configuration respectively.
A first aspect relates to a classifier learning apparatus. The classifier learning apparatus according to the first aspect includes: an object acquisition unit that acquires a set of reference vectors and assigned category information of the respective reference vectors as a processing object; a specifying unit that specifies an internal nearest neighbor reference vector nearest to a sample vector from among the reference vectors of the processing object assigned to the same category as the sample vector and specifies an external nearest neighbor reference vector nearest to the sample vector from among the reference vectors of the processing object assigned to a category different from the sample vector; a calculation unit that calculates an evaluation value of the processing object by using a distance between the sample vector and a classification boundary formed by the internal nearest neighbor reference vector and the external nearest neighbor reference vector; and an updating unit that updates an original set of reference vectors and originally assigned category information with the processing object based on the evaluation value of the processing object calculated by the calculation unit.
A second aspect of the present invention relates to a classifier learning method. The classifier learning method according to the second aspect is executed by at least one computer and includes: acquiring a set of reference vectors and assigned category information of the respective reference vectors as a processing object; specifying an internal nearest neighbor reference vector nearest to a sample vector from among the reference vectors of the processing object assigned to the same category as the sample vector; specifying an external nearest neighbor reference vector nearest to the sample vector from among the reference vectors of the processing object assigned to a category different from the sample vector; calculating an evaluation value of the processing object by using a distance between the sample vector and a classification boundary formed by the internal nearest neighbor reference vector and the external nearest neighbor reference vector; and updating an original set of reference vectors and originally assigned category information by using the processing object based on the calculated evaluation value of the processing object.
Another aspect of the present invention may be a program that causes at least one computer to execute the method of the second aspect or may be a computer-readable recording medium recorded with such a program. This recording medium includes a non-transitory tangible medium.
The aforementioned respective aspects can provide a learning technique for reference vectors of an NNC capable of enhancing classification accuracy.
The aforementioned object and other objects as well as features and advantages will become more apparent from the following description of suitable exemplary embodiments and the following drawings that accompany the exemplary embodiments.
Exemplary embodiments of the present invention will now be described below. The following exemplary embodiments are illustrative, and therefore the present invention is not limited to configurations of the following exemplary embodiments.
The classifier learning apparatus 100 has the same hardware configuration as a nearest neighbor classifier learning apparatus 1 in detailed exemplary embodiments to be described later, for example, and the respective processing units described above are realized by processing a program in the same manner as the nearest neighbor classifier learning apparatus 1.
A classifier learning method according to the exemplary embodiment of the present invention is executed by at least one computer such as the classifier learning apparatus 100 and includes: acquiring a set of reference vectors and assigned category information of the respective reference vectors as a processing object; specifying an internal nearest neighbor reference vector nearest to a sample vector from among the reference vectors of the processing object assigned to the same category as the sample vector; specifying an external nearest neighbor reference vector nearest to the sample vector from among the reference vectors of the processing object assigned to a category different from the sample vector; calculating an evaluation value of the processing object by using a distance between a classification boundary formed by the internal nearest neighbor reference vector and the external nearest neighbor reference vector, and the sample vector; and updating an original set of reference vectors and originally assigned category information with the processing object based on the calculated evaluation value of the processing object. However, the respective steps included in the present classifier learning method may be executed sequentially in random order or may be executed at the same time.
Thus, in the present exemplary embodiment, from a relation between a sample vector and a set of reference vectors acquired as a processing object, an evaluation value of the processing object is calculated, and on the basis of the evaluation value, the set of reference vectors and assigned category information thereof of an NNC are learned. The term “vector” means not only data having a magnitude and a direction but also all of the data consists of a plurality of elements. The respective reference vectors of a set of reference vectors are assigned to categories, and assigned category information indicate categories to which the respective reference vectors are assigned. The “sample vector” refers to learning data having the same number of elements as the respective reference vectors of the processing object and is assigned to a certain category. The processing object and the sample vector may be generated by the classifier learning apparatus 100 or may be acquired from another apparatus or a portable recording medium.
Specifically, an internal nearest neighbor reference vector (hereinafter referred to as an IN-NN reference vector), that is assigned to the same category as a sample vector and is nearest to the sample vector, is specified from among the reference vectors of the processing object. Further, an external nearest neighbor reference vector (hereinafter referred to as an EX-NN reference vector), that is assigned to a category different from the sample vector and is nearest to the sample vector is specified from among the reference vectors of the processing object. An evaluation value of the processing object is calculated by using a distance between a classification boundary formed by the IN-NN reference vector and the EX-NN reference vector, and the sample vector.
Therefore, according to the present exemplary embodiment, it is possible to learn reference vectors of an NNC so as to maximize a distance, i.e. a margin, between a learning sample vector and a classification boundary in a set of reference vectors. As a result, according to the present exemplary embodiment, it is possible to enhance a classification accuracy of the NNC.
The aforementioned exemplary embodiment will be described below in more detail. In the following, as detailed exemplary embodiments, a first exemplary embodiment to a third exemplary embodiment will be exemplified. The following exemplary embodiments are examples in which the classifier learning apparatus 100 and the classifier learning method are applied to a nearest neighbor classifier (NNC) learning apparatus. Applications of the classifier learning apparatus 100, the classifier learning method, and a classifier learned in the following detailed exemplary embodiments are not limited. The classifier is usable in various types of pattern recognition such as character recognition, face recognition, vehicle detection, and voice classification.
The parameter setting unit 11 sets a set of reference vectors and assigned category information thereof in an NNC to be a processing object. The processing object set by the parameter setting unit 11 can be expressed by N (N is an integer equal to or greater than 2) reference vectors ri (i is an integer of 1 to N) and categories ci corresponding to the respective reference vectors. The object acquisition unit 101 acquires, for example, such a processing object set by the parameter setting unit 11.
The parameter setting unit 11 sets reference vectors that are parameters of the NNC, for example, by using a technique used for a method such as MDS (Multi-Directional Search), a simplex method, and Alternating Directional Search described in the following reference documents.
The parameter setting unit 11 may use, for the reference vectors, a mean vector of a cluster obtained by clustering learning sample vectors by using a K-means method (see Reference Document 3 described below).
The learning sample holding unit 12 holds a plurality of learning sample vectors and assigned category information of the respective sample vectors. The information held by the learning sample holding unit 12 can be expresses by M (M is an integer equal to or greater than 2) sample vectors sj (j is an integer of 1 to M) and categories cj corresponding to the respective sample vectors. The sample vectors sj and the reference vectors ri have the same number of elements.
The optimum parameter holding unit 16 holds a set of reference vectors and assigned category information thereof which may be obtained from learning results by the NNC learning apparatus 1, and which are to be optimum parameters of the NNC.
The specifying unit 13 specifies an IN-NN reference vector and an EX-NN reference vector respectively in the same manner as the specifying unit 102 with respect to each of a plurality of sample vectors held in the learning sample holding unit 12. The specifying unit 13 calculates, for example, distances between reference vectors ri and a sample vector sj respectively, and specifies an IN-NN reference vector and an EX-NN reference vector based on the calculated distances. In the present exemplary embodiment, a distance between a sample vector sj and a reference vector ri having the same number of elements is calculated by using a square distance as represented by equation (1) described below.
d({right arrow over (s)}j,{right arrow over (r)}i)=|{right arrow over (s)}j−{right arrow over (s)}i|2 (1)
The calculation unit 14 calculates an evaluation value of a processing object so that when a sample vector is closer to an EX-NN reference vector than to an IN-NN reference vector and a distance between the classification boundary and the sample vector is longer, a lower evaluation is indicated. And also the calculation unit 14 calculates the evaluation value of a processing object so that when the sample vector is closer to the IN-NN reference vector than to the EX-NN reference vector and the distance is longer, a higher evaluation is indicated. When a category of a reference vector to be a nearest neighbor to the sample vector from among all the reference vectors is the same as a category of the sample vector, the sample vector is correctly classified, and therefore it is desirable to maximize a distance (margin) in that case. Therefore, in this case, as described above, the calculation unit 14 calculates the evaluation value of the processing object so as to present a higher evaluation when the distance is longer. On the other hand, when a category of a reference vector to be a nearest neighbor to the sample vector, from among all the reference vectors, is different from the category of the sample vector, the sample vector is incorrectly classified, and therefore it is desirable to minimize a distance (margin) in that case. Therefore, in this case, as described above, the calculation unit 14 calculates the evaluation value of the processing object so as to present a lower evaluation when the distance is longer. The present exemplary embodiment does not limit a calculation method for calculating an evaluation value, by the calculation unit 14, of a processing object from a distance between the classification boundary and a sample vector, when the calculation method may embody such a technical idea as this.
The calculation unit 14 may use, for example, the following calculation method for embodying a technical idea as described above. That is, the calculation unit 14 calculates the distance as a negative value when a sample vector is closer to an EX-NN reference vector than to an IN-NN reference vector. Further, the calculation unit 14 calculates the distance as a positive value when the sample vector is closer to the IN-NN reference vector than to the EX-NN reference vector. Then, the calculation unit 14 calculates an evaluation value of a processing object based on an output value of a sigmoid function which uses the calculated distance as an input. The following equation (2) represents one example of a calculation method for the distance, and the following equation (3) represents a sigmoid function. In the following equation, rw represents an IN-NN reference vector, and rb represents an EX-NN reference vector. A coefficient σ in equation (3) represents a positive constant experimentally set beforehand.
The calculation unit 14 calculates a total value of evaluation values calculated respectively for each of a plurality of sample vectors held in the learning sample holding unit 12, and uses the total value as a final evaluation value of the processing object. According to the example of equation (2) and equation (3) described above, the calculation unit 14 calculates a final evaluation value J by using the following equation (4).
Here, a classification boundary of an NNC will be described with reference to
When a square distance is used for an inter-vector distance, a Voronoi boundary is configured by a plane perpendicular to the central point between adjacent reference vectors. For example, the Voronoi boundary of the dashed line B12 is a plane perpendicular to the central point of the reference vector r1 and the reference vector r2. However,
Next, a margin will be described with reference to
The updating unit 15 compares a final evaluation value regarding to a set of reference vectors and assigned category information thereof, that are held in the optimum parameter holding unit 16, and a final evaluation value, which is calculated in the calculation unit 14, regarding to a processing object set in the parameter setting unit 11. Then, the updating unit 15 updates the set of reference vectors and the assigned category information thereof held in the optimum parameter holding unit 16 by the set of reference vectors and the assigned category information thereof having a high evaluation value, based on the comparison result. With this manner, the optimum parameter holding unit 16 is not updated, when the final evaluation value of the processing object is smaller than the final evaluation value regarding to the set of reference vectors and the assigned category information thereof held in the optimum parameter holding unit 16.
The NNC learning apparatus 1 causes the parameter setting unit 11 to set new processing objects for a predetermined number of times, and causes the specifying unit 13, the calculation unit 14, and the updating unit 15 to handle each processing object. Consequently, the NNC learning apparatus 1 sequentially updates a set of reference vectors and assigned category information thereof held in the optimum parameter holding unit 16. The NNC learning apparatus 1 may terminate learning processing when information of the optimum parameter holding unit 16 is not updated by the updating unit 15 for a predetermined number of times or more.
A classifier learning method in the first exemplary embodiment will be described below with reference to
The NNC learning apparatus 1 sets a processing object (S60). The processing object is a set of reference vectors and assigned category information thereof that are parameters of an NNC. The NNC learning apparatus 1 sets the processing object by using the method described above for the parameter setting unit 11. The processing object set in (S60) can be expressed by N (N is an integer equal to or greater than 2) reference vectors ri (i is an integer of 1 to N), and categories ci corresponding to the respective reference vectors.
Further, the NNC learning apparatus 1 acquires a sample vector s1 (S61). All the sample vectors sj can be expressed as M (M is an integer equal to or greater than 2) sample vectors sj (j is an integer of 1 to M) each having the same number of elements as the reference vectors ri. In (S61), one of all the sample vectors sj is acquired.
The NNC learning apparatus 1 respectively calculates distances d(s1,ri) between the sample vector s1 and the respective reference vectors ri (S62). In the present exemplary embodiment, the distances d(s1,ri) are calculated by using the square distance represented by the above equation (1).
The NNC learning apparatus 1 specifies an IN-NN reference vector rw and an EX-NN reference vector rb in regard to the sample vector s1 based on the distances d(s1,ri) calculated in (S62) (S63 and S64). The IN-NN reference vector rw is one of the reference vectors ri assigned to the same category as the sample vector s1, and the EX-NN reference vector rb is one of the reference vectors ri assigned to a category different from the sample vector s1.
The NNC learning apparatus 1 calculates a margin m(s1) that is a distance between the sample vector s1 and a classification boundary formed by the IN-NN reference vector rw and the EX-NN reference vector rb (S65).
The NNC learning apparatus 1 calculates an evaluation value g(m(s1)) of the margin m(s1) by inputting the calculated margin m(s1) to a gain function g(m) such as a sigmoid function (S66).
According to (S65) and (S66), the NNC learning apparatus 1 calculates the evaluation value g(m(s1)) of the margin m(s1) so that when the sample vector s1 is closer to the EX-NN reference vector rb than to the IN-NN reference vector rw and a distance between the classification boundary and the sample vector s1 is longer, a lower evaluation is indicated, and when the sample vector s1 is closer to the IN-NN reference vector rw than to the EX-NN reference vector rb and the distance is longer, a higher evaluation is indicated. For example, in (S65), the NNC learning apparatus 1 calculates the margin m(s1) as a negative value when the sample vector s1 is closer to the EX-NN reference vector rb than to the IN-NN reference vector rw. And, for example, the NNC learning apparatus 1 calculates the margin m(s1) as a positive value when the sample vector s1 is closer to the IN-NN reference vector rw than to the EX-NN reference vector rb. Then, in (S66), the NNC learning apparatus 1 may calculate the evaluation value g(m(s1)) of the margin m(s1) by inputting the calculated margin m(s1) to the sigmoid function g(m).
The NNC learning apparatus 1 adds the calculated evaluation value g(m(s1)) to a final evaluation value J (S66).
The NNC learning apparatus 1 determines whether unprocessed sample vectors sj are still present (S67). When j is equal to or smaller than M, i.e., unprocessed sample vectors sj are still present (S67: YES), the NNC learning apparatus 1 acquires an unprocessed sample vector sj (j=j+1) (S68). Here, an unprocessed sample vector s2 is acquired. Hereafter, (S62) to (S66) are executed with respect to the sample vector s2 as the case of the sample vector s1. Hereby, in (S66), an evaluation value g(m(s2)) with respect to the sample vector s2 is further added to the final evaluation value J. Such processing is executed in regard to all the sample vectors sj respectively.
When a total value of evaluation values g(m(sj)) with respect to all the sample vectors sj is calculated as the final evaluation value J (S67; NO), the NNC learning apparatus 1 determines whether the final evaluation value J is higher than a final evaluation value calculated with respect to an original set of reference vectors and assigned category information thereof (S69). When the final evaluation value j is enhanced (S69: YES), the NNC learning apparatus 1 updates the optimum parameters (S70), with the processing object set in (S60). In other words, when the final evaluation value j is enhanced (S69: YES), the NNC learning apparatus 1 uses the set of reference vectors and the assigned category information thereof set in (S60) as optimum parameters. On the other hand, when the final evaluation value j is not enhanced (S69: NO), the NNC learning apparatus 1 does not update the optimum parameters because the processing object set in (S60) is inferior to the current optimum parameters.
The NNC learning apparatus 1 determines whether learning has been terminated (S71). The termination of learning is determined in accordance with a criterion such that the number of times of repetition of the aforementioned processing has reached to a predetermined number of times, or the final evaluation value J has not been enhanced (S69: NO) even when the aforementioned processing is repeated for a predetermined number of times or more, for example. When the learning is not terminated (S71; NO), the NNC learning apparatus 1 sets a new processing object in (S60), and executes steps succeeding (S60) in regard to the new processing object.
As described above, in the first exemplary embodiment, in regard to a processing object, a total value of evaluation values of respective margins of a plurality of sample vectors is calculated as a final evaluation value. And optimum parameters of an NNC are updated by the processing object when the final evaluation value is enhanced. The evaluation value of each margin is set as a higher value, when a nearest neighbor reference vector of the sample vector is assigned to the same category as the sample vector, and a distance between a classification boundary and a sample vector is longer. The evaluation value of each margin is also set as a lower value, when the nearest neighbor reference vector of the sample vector is assigned to a category different from the sample vector, and the distance is longer.
Therefore, according to the first exemplary embodiment, it is possible to learn reference vectors of an NNC in accordance with a criterion to maximize margin, and consequently, it is possible to enhance a classification accuracy of the NNC.
In the aforementioned first exemplary embodiment, a distance between a sample vector sj and a reference vector ri respectively having the same number of elements was calculated by using a square distance. In the second exemplary embodiment, the distance between the sample vector sj and the reference vector ri is calculated by weighting the square distance. The NNC learning apparatus 1 in the second exemplary embodiment will be described below by focusing on differences to the first exemplary embodiment. In the following description, the same contents as the first exemplary embodiment will be omitted appropriately.
Initially, a relation between a pattern distribution and a classification boundary in an NNC will be described. When patterns are isotropically distributed based on distribution functions of which only center locations are different and the others are the same, a classification boundary thereof forms a plane. It is assumed, for example, a case in which m-th dimensional vectors x of a category CA and a category CB are respectively distributed based on the following two normal distributions pA and pB. In the following equations, Σ represents a variance-covariance matrix, and μ represents a mean vector.
In the equation, when ΣA and ΣB are the same and ΣA and ΣB are an identity matrix I, the category CA and the category CB are represented isotropic normal distributions. When a classification is performed in regard to the distributions, a classification boundary having the least classification error is a boundary in which pA and pB are equal, and is a plane passing through the central point between a mean vector μA and a mean vector μB and being orthogonal to a straight line connecting the mean vector μA and the mean vector μB. In other words, an ideal NNC that classifies patterns consisting of the distributions pA and pB is a case that a reference vector of the category CA is μA and a reference vector of the category CB is μB.
However, when ΣA and ΣB are different from each other, a classification boundary having the least classification error does not always become the plane, which is as described above, passing through the central point between means vectors and being orthogonal to a line segment connecting the means vectors. This is a case of a distribution in which, for example, the respective variances of the category CA and the category CB are ΣA=sAI and ΣB=sBI.
A classification boundary having the least error at this time is a plane in which the distributions pA and pB are equal. Logarithm likelihoods of the distributions pA and pB are represented by the following equations.
d
A=−2 log pA=sA|{right arrow over (x)}−{right arrow over (μ)}A|2−2 log kA
d
B=−2 log pB=sB|{right arrow over (x)}−{right arrow over (μ)}B|2−2 log kB
A classification plane in which pA and pB are equal is a classification plane in which dA and dB are equal, and therefore it is understood that this classification plane is a quadratic hypersurface. The aforementioned first exemplary embodiment employs a square distance, and the classification plane becomes a plane (a plane passing through the central point between a reference vector and a reference vector, and being orthogonal to a straight line connecting the reference vectors). Therefore, it is difficult for the classification plane of the first exemplary embodiment to directly express a quadratic hypersurface. Thus, when a square distance is used, it is desirable to approximately express the quadratic hypersurface by using as many reference vectors as possible and setting a plurality of classification planes. However, this causes deterioration of a classification accuracy of an NNC, an increase of the number of reference vectors necessary for the NNC, and a decrease of processing speed.
In regard to such problems, the NNC may be configured with a less number of reference vectors, by modifying a distance function. These problems are able to be resolved by using, for example, the following equation (5) as the distance function. Hereinafter, the following equation (5) will be expressed as a CWP (Compoundly Weighted Power) distance. In this CWP distance, a square distance is weighted by weighting coefficients αi and βi.
αi|{right arrow over (s)}j−{right arrow over (r)}i|2+βi (5)
An ideal NNC in which the CWP distance is used is a case when there is one reference vector in regard to the category CA and the category CB respectively; a reference vector of the category CA is rA=μA and a distance to the reference vector rA is calculated by using αA=sA and βA=−2 log kA; and a reference vector of the category CB is rB=μB and a distance to the reference vector rB is calculated by using αB=sB and βB=−2 log kB. In other words, the ideal NNC can be configured with two reference vectors by using the CWP distance instead of the square distance as the distance function.
Further, an NNC suitable for classifying patterns of which variances are equal, and of which distribution of prior probabilities are different, can be also configured by using the following equation (6) as a distance function. Hereinafter, the following equation (6) will be expressed as an AWP (Additively Weighted Power) distance. In this AWP distance, the square distance is weighted by a weighting coefficient βi.
|{right arrow over (s)}j−{right arrow over (r)}i|2+βi (6)
The NNC learning apparatus 1 in the second exemplary embodiment calculates a distance between the sample vector sj and the reference vector ri having the same number of elements by using the CWP distance or the AWP distance.
An apparatus configuration of the NNC learning apparatus 1 in the second exemplary embodiment is the same as the first exemplary embodiment as illustrated in
The parameter setting unit 11 further sets a weighting coefficient used for calculating an inter-vector distance as the processing object, in addition to a set of reference vectors, and assigned category information thereof. When the CWP distance (equation (5) described above) is used as the distance function, the parameter setting unit 11 further sets weighting coefficients αi and βi as the processing object. Further, when the AWP distance (equation (6) described above) is used as the distance function, the parameter setting unit 11 further sets the weighting coefficient βi as the processing object. The parameter setting unit 11 sets the weighting coefficients αi and βi for each reference vector respectively.
The specifying unit 13 calculates a distance between the sample vector sj and the reference vector ri by using the CWP distance or the AWP distance including the weighting coefficient set in the parameter setting unit 11. Further, a classification boundary is formed as a quadratic hypersurface in the CWP distance, and therefore the specifying unit 13 calculates, for example, a distance m (sj) between the sample vector sj and the classification boundary by using the following equations. In the following equations, the case of “d(sj,rw)<d(sj,rb)” indicates a case in which the sample vector is closer to an IN-NN reference vector than to an EX-NN reference vector, and “other cases” indicate a case in which the sample vector is closer to the EX-NN reference vector than to the IN-NN reference vector.
When the AWP distance is used, the specifying unit 13 calculates, for example, a distance m(sj) between the sample vector sj and the classification boundary by using the following equation.
The optimum parameter holding unit 16 holds the aforementioned weighting coefficients together with a set of reference vectors and assigned category information thereof as optimum parameters of the NNC.
The updating unit 15 further reflects the weighting coefficients set as the processing object, to the optimum parameter holding unit 16 when updating the optimum parameter holding unit 16.
A learning method for the classifier in the second exemplary embodiment will be described below with reference to
In (S60), the NNC learning apparatus 1 further sets a weighting coefficient used for a distance function, as a processing object. In (S62), the NNC learning apparatus 1 respectively calculates a distance d(s1,ri) between a sample vector s1 and each of reference vectors ri by using the CWP distance or the AWP distance. In (S70), the NNC learning apparatus 1 updates the optimum parameters with the weighting coefficient, in addition to the set of reference vectors and the assigned category information thereof, as the processing object set in (S60).
Thus, in the second exemplary embodiment, the AWP distance or the CWP distance that weights the square distance with the weighting coefficient is used so as to calculate the distance between the sample vector s1 and each of reference vectors ri. Then, in the second exemplary embodiment, the weighting coefficient used for the distance function is also learned and optimized, in addition to the set of reference vectors and assigned category information thereof, that are parameters of the NNC.
Hereby, learning of reference vectors of an NNC having various pattern distributions becomes possible according to the second exemplary embodiment. In other words, a classification accuracy of the NNC having various pattern distributions can be enhanced according to the second exemplary embodiment. Further, an NNC can be configured with a less number of reference vectors by calculating an inter-vector distance by using the CWP distance or the AWP distance according to the second exemplary embodiment.
A pattern classification problem that detects an object such as a face and the like will be exemplified as an example. When a decision about this problem is executed by a pattern classifier, pattern classification is processed as a two-class classification problem of classifying an object (e.g., a face) that is a detection object, and a non-object (e.g., a background) that is not a detection object. In other words, the pattern classifier determines whether it is an object class or a non-object class in regard to input data. As a typical scale of classification accuracy in object detection, there are detection failure rate and excessive detection rate. The detection failure rate is an error rate unable to detect an object which is to be detected, and the excessive detection rate is an error rate detecting a non-object which is not to be detected. In general, the detection failure rate and the excessive detection rate are in a trade-off relation. When the detection failure rate is adjusted to be smaller, the excessive detection rate increases, and on the contrary, when the excessive detection rate is adjusted to be smaller, the detection failure rate increases. In object detection, there is a case in which the excessive detection rate is set at a certain value and detection failure rate is minimized, a case in which the detection failure rate is set at a certain value and excessive detection rate is minimized, or a case in which the excessive detection rate and the detection failure rate are to be the same, and the like.
The NNC learning apparatus 1 in the third exemplary embodiment learns reference vectors of an NNC so as to bring classification accuracy such as detection failure rate, excessive detection rate, or the like close to a specified value. The NNC learning apparatus 1 in the third exemplary embodiment will be described below by focusing on contents different from the first exemplary embodiment and the second exemplary embodiment. In the following description, the same contents as the first exemplary embodiment and the second exemplary embodiment will be omitted appropriately.
On the basis of assigned category information of the nearest neighbor reference vector for each of sample vectors sj and assigned category information of each sample vector sj, the correction unit 21 calculates a classification accuracy of a processing object set by the parameter setting unit 11 in regard to the sample vector sj held in the learning sample holding unit 12. The correction unit 21 corrects a final evaluation value of the processing object corresponding to the sample vector sj, by using a correction value corresponding to the calculated classification accuracy and specified classification accuracy information. The specified classification accuracy information may be input by a user by operating an input unit or the like by referring an input screen or the like, or may be acquired via the input/output I/F 4 from a portable recording medium, or from another computer or the like. The specified classification accuracy information indicates a desired classification accuracy, and indicates, for example, a request for a desired detection failure rate, a desired excessive detection rate, or a request for causing a detection failure rate and an excessive detection rate to be the same.
A specific example of the correction unit 21 will be described below. However, the correction unit 21 is not limited only to the specific example as described below. In the following, an embodiment in which an NNC, which is the learning object of the NNC learning apparatus 1, is used for pattern recognition in object detection, will be exemplified.
In learning of the NNC for object detection, a plurality of sample vectors each assigned to either one of an object class and a non-object class are used. A classification error at that time may include two types, that are detection failure and excessive detection. The detection failure means a classification error in which a sample vector of the object class is classified as the non-object class. Therefore, when a nearest neighbor reference vector of a sample vector of the object class is assigned to the non-object class, the sample vector corresponds to the detection failure. On the other hand, the excessive detection means a classification error in which a sample vector of the non-object class is classified as the object class. Therefore, when a nearest neighbor reference vector of a sample vector of the non-object class is assigned to the object class, the sample vector corresponds to the excessive detection.
When a desired detection failure rate is specified as classification accuracy information, the correction unit 21 calculates the number of sample vectors corresponding to the detection failure, for the number of sample vectors held in the learning sample holding unit 12, as a detection failure rate Eobj. When a desired excessive detection rate is specified as classification accuracy information, the correction unit 21 calculates the number of sample vectors corresponding to the excessive detection for the number of sample vectors held in the learning sample holding unit 12, as an excessive detection rate Ebg. When the calculation unit 14 calculates a margin m(sj) by using equation (2) described above, the correction unit 21 can determine whether the sample vector sj corresponds to detection failure or excessive detection, or the sample vector sj corresponds to classification success, depending on whether a value of the margin m(sj) is positive or negative.
When a final evaluation value J is calculated by the calculation unit 14 as each of the aforementioned exemplary embodiments, the correction unit 21 corrects the final evaluation value J, based on the correction value corresponding to the classification accuracy calculated as described above and specified classification accuracy information. This correction can be represented by the following equations corresponding to the specified classification accuracy information. The following equation (7) is used when the specified classification accuracy information indicates a desired detection failure rate e. The following equation (8) is used when the specified classification accuracy information indicates a desired excessive detection rate e. And the following equation (9) is used when the specified classification accuracy information indicates a request for causing the detection failure rate and the excessive detection rate to be the same. The sign λ in the following each equation is a negative value and is previously set so that an absolute value thereof is sufficiently large compared with a value of J.
J′=J+λ(Eobj−e)2 (7)
J′=J+λ(Ebg−e)2 (8)
J′=J+λ(Eobj−Ebg)2 (9)
A classifier learning method in the third exemplary embodiment will be described below with reference to
The NNC learning apparatus 1 detects a classification error by processing in (S63), (S64), and (S65). As the classification error, at least either one of detection failure or excessive detection may be detected as mentioned above. Or, as the classification error, it may be detected that a nearest neighbor reference vector of the sample vector sj is an EX-NN reference vector.
When a total value of evaluation values g(m(sj)) with respect to all the sample vectors sj is calculated as the final evaluation value J (S67; NO), the NNC learning apparatus 1 calculates a classification accuracy of a processing object in regard to the sample vector sj based on specified classification accuracy information (S81). For example, The NNC learning apparatus 1 calculates, as the classification accuracy, a ratio of the number of sample vectors sj in which the nearest neighbor reference vector is an EX-NN reference vector, to the number of all the sample vectors sj. As described above, the NNC learning apparatus 1 may calculate the detection failure rate and the excessive detection rate.
The NNC learning apparatus 1 corrects the final evaluation value J by using a correction value corresponding to the classification accuracy calculated in (S81) and specified classification accuracy information (S82). The NNC learning apparatus 1 determines whether a final evaluation value is enhanced based on the corrected final evaluation value (S69).
Thus, in the third exemplary embodiment, a classification accuracy of a processing object is calculated in regard to a learning sample vector, and a final evaluation value of the processing object is corrected by using a correction value corresponding to this classification accuracy and specified classification accuracy information. That is, in the third exemplary embodiment, a set of reference vectors and assigned category information thereof are updated so that a classification accuracy of an NNC approaches a specified value. Therefore, according to the third exemplary embodiment, a classification accuracy such as a detection failure rate, an excessive detection rate, and the like can be controlled to become a desired value.
In each of the aforementioned exemplary embodiments, examples using a square distance, a CWP distance, and an AWP distance as a distance function are illustrated. However, a distance function other than these may be used. In case of a classification of a pattern having a distribution that is not isotropic, for example, the following equation (10) may be used as the distance function. The following equation (10) is expressed as an anisotropic weighting distance. In the following equation, Σ represents a variance-covariance matrix of the sample vector sj and the reference vector ri.
({right arrow over (s)}j−{right arrow over (r)}i)TΣ−1({right arrow over (s)}j−{right arrow over (r)}i) (10)
When the anisotropic weighting distance is used, a classification boundary also becomes a quadratic hypersurface, and a distance between the sample vector sj and the quadratic hypersurface can be calculated by using a method described in the following reference document.
In the aforementioned exemplary embodiments, the NNC learning apparatus 1 includes the parameter setting unit 11, the learning sample holding unit 12, and the optimum parameter holding unit 16. However an apparatus other than the NNC learning apparatus 1 may include the parameter setting unit 11, the learning sample holding unit 12, and the optimum parameter holding unit 16. In this case, the NNC learning apparatus 1 may access the learning sample holding unit 12 and the optimum parameter holding unit 16 via the another apparatus to acquire a processing object from another apparatus.
In a plurality of flowcharts used in the above description, a plurality of steps (processing) are described in order, but an execution order of steps executed in the present exemplary embodiment is not limited to the order of the description. In the present exemplary embodiment, the order of the steps illustrated can be changed within the scope without hindrance from the standpoint of contents. Further, any combination of the aforementioned exemplary embodiments and the modified example can be made as long as contents of them do not conflict.
A part or the whole of the aforementioned exemplary embodiments and the modified example can be specified also as in the following supplemental notes. However, the exemplary embodiments and the modified example are not limited to the following description.
(Supplemental Note 1)
A classifier learning apparatus including:
an object acquisition unit that acquires a set of reference vectors and assigned category information of the respective reference vectors as a processing object;
a specifying unit that specifies an internal nearest neighbor reference vector nearest to a sample vector among the reference vectors of the processing object assigned to the same category as the sample vector and specifies an external nearest neighbor reference vector nearest to the sample vector among the reference vectors of the processing object assigned to a category different from that of the sample vector;
a calculation unit that calculates an evaluation value of the processing object using a distance between a classification boundary formed by the internal nearest neighbor reference vector and the external nearest neighbor reference vector and the sample vector; and an updating unit that updates an original set of reference vectors and original assigned category information with the processing object based on the evaluation value of the processing object calculated by the calculation unit.
(Supplemental Note 2)
The classifier learning apparatus according to Supplemental Note 1, wherein
the calculation unit calculates an evaluation value of the processing object so that a lower evaluation is indicated with an increase in the distance when the sample vector is closer to the external nearest neighbor reference vector than the internal nearest neighbor reference vector and a higher evaluation is indicated with an increase in the distance when the sample vector is closer to the internal nearest neighbor reference vector than the external nearest neighbor reference vector.
(Supplemental Note 3)
The classifier learning apparatus according to Supplemental Note 2, wherein
the calculation unit calculates the distance as a negative value when the sample vector is closer to the external nearest neighbor reference vector than the internal nearest neighbor reference vector, calculates the distance as a positive value when the sample vector is closer to the internal nearest neighbor reference vector than the external nearest neighbor reference vector, and calculates an evaluation value of the processing object based on an output value of a sigmoid function using the calculated distance as an input.
(Supplemental Note 4)
The classifier learning apparatus according to any one of Supplemental Notes 1 to 3, wherein
the specifying unit specifies the internal nearest neighbor reference vector and the external nearest neighbor reference vector for each of a plurality of sample vectors,
the calculation unit calculates a total value of evaluation values respectively calculated for each of the plurality of sample vectors, and
the updating unit compares a total value of evaluation values calculated by the calculation unit for the original set of reference vectors and the original assigned category information, and a total value of evaluation values calculated by the calculation unit for the processing object, and determines whether updating the processing object or not, based on the comparison result.
(Supplemental Note 5)
The classifier learning apparatus according to Supplemental Note 4, wherein
the calculation unit includes
a correction unit that calculates a classification accuracy of the processing object with respect to the plurality of sample vectors based on assigned category information of a nearest neighbor reference vector for each of the plurality of sample vectors and assigned category information of the each sample vector and corrects a total value of evaluation values of the processing object corresponding to the plurality of sample vectors with a correction value corresponding to the calculated classification accuracy and specified classification accuracy information.
(Supplemental Note 6)
The classifier learning apparatus according to any one of Supplemental Notes 1 to 5, wherein
the specifying unit calculates a distance between the sample vector and the reference vector using any one of equation (1), equation (5), equation (6), and equation (10) described above.
(Supplemental Note 7)
The classifier learning apparatus according to Supplemental Note 6, wherein
the object acquisition unit further acquires the weighting coefficient as the processing object,
the specifying unit calculates a distance between the sample vector and the reference vector using equation (5) or equation (6) including the weighting coefficient acquired by the object acquisition unit, and
the updating unit further updates an original weighting coefficient with the weighting coefficient acquired as the processing object.
(Supplemental Note 8)
A classifier learning method executed by at least one computer, the method including:
acquiring a set of reference vectors and assigned category information of the respective reference vectors as a processing object;
specifying an internal nearest neighbor reference vector nearest to a sample vector among the reference vectors of the processing object assigned to the same category as the sample vector;
specifying an external nearest neighbor reference vector nearest to the sample vector among the reference vectors of the processing object assigned to a category different from that of the sample vector;
calculating an evaluation value of the processing object using a distance between the sample vector and a classification boundary formed by the internal nearest neighbor reference vector and the external nearest neighbor reference vector; and
updating an original set of reference vectors and original assigned category information with the processing object based on the calculated evaluation value of the processing object.
(Supplemental Note 9)
The classifier learning method according to Supplemental Note 8, wherein
calculating the evaluation value of the processing object so that a lower evaluation is indicated with an increase in the distance when the sample vector is closer to the external nearest neighbor reference vector than the internal nearest neighbor reference vector, and a higher evaluation is indicated with an increase in the distance when the sample vector is closer to the internal nearest neighbor reference vector than the external nearest neighbor reference vector.
(Supplemental Note 10)
The classifier learning method according to Supplemental Note 9, wherein
in order to calculate the evaluation value, calculating the distance as a negative value when the sample vector is closer to the external nearest neighbor reference vector than the internal nearest neighbor reference vector and calculating the distance as a positive value when the sample vector is closer to the internal nearest neighbor reference vector than the external nearest neighbor reference vector; and
calculating an evaluation value of the processing object based on an output value of a sigmoid function using the calculated distance as an input.
(Supplemental Note 11)
The classifier learning method according to any one of Supplemental Notes 8 to 10, wherein
specifying the internal nearest neighbor reference vector for each of a plurality of sample vectors in order to specify the internal nearest neighbor reference vector;
specifying the external nearest neighbor reference vector for each of a plurality of sample vectors in order to specify the external nearest neighbor reference vector;
calculating a total value of evaluation values respectively calculated for each of the plurality of sample vectors in order to calculate the evaluation value; and
comparing a total value of evaluation values calculated in the calculation unit for the original set of reference vectors and the original assigned category information and a total value of evaluation values calculated in the calculation unit for the processing object and determines whether updating the processing object or not, based on the comparison result, in order to perform the update.
(Supplemental Note 12)
The classifier learning method according to Supplemental Note 11, further including:
calculating a classification accuracy of the processing object with respect to the plurality of sample vectors based on assigned category information of a nearest neighbor reference vector for each of the plurality of sample vectors and assigned category information of the each sample vector; and
correcting a total value of evaluation values of the processing object corresponding to the plurality of sample vectors using a correction value corresponding to the calculated classification accuracy and specified classification accuracy information.
(Supplemental Note 13)
The classifier learning method according to any one of Supplemental Notes 8 to 12, further including
calculating a distance between the sample vector and the reference vector using any one of equation (1), equation (5), equation (6), and equation (10) described above, wherein
in the equations, a vector sj represents the sample vector, a vector ri represents the reference vector, αi and βi represent weighting coefficients corresponding to the reference vector ri, and Σ represents a variance-covariance matrix.
(Supplemental Note 14)
The classifier learning method according to Supplemental Note 13, further including:
acquiring the weighting coefficient as the processing object; and
updating an original coefficient with the weighting coefficient acquired as the processing object based on the calculated evaluation value of the processing object, wherein
a distance between the sample vector and the reference vector is calculated using equation (5) or equation (6) including the weighting coefficient, respectively.
(Supplemental Note 15)
A computer program causing at least one computer to execute the classifier learning method according to any one of Supplemental Notes 8 to 14.
(Supplemental Note 16)
A computer-readable recording medium recorded with the program according to Supplemental Note 15.
This application is based upon and claims the benefit of priority from Japanese patent application No. 2013-013674, filed on Jan. 28, 2013, the disclosure of which is incorporated herein in its entirety by reference.
Number | Date | Country | Kind |
---|---|---|---|
2013-013674 | Jan 2013 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2013/072665 | 8/26/2013 | WO | 00 |