1. Field of the Invention
The present invention relates to a learning method for a support vector machine, and particularly relates to a learning method for a support vector machine, in which a large amount of data sets are used.
2. Description of the Related Art
The principal process for the learning of a support vector machine (hereinafter, SVM) is to solve a quadratic programming problem (hereinafter, QP problem) given in the following equation (1) when a set of training data xi (here, i=1, 2, . . . , l) which has a label yi={−1, +1} is provided.
where, K (xi, xj) represents a kernel function for calculating a dot product between two vectors xi and xj in a certain feature space, and C represents a parameter for imposing a penalty on the training data (among the various training data) in which noise entered.
The conventional SVM learning methods include a decomposition algorithm, a SMO (Sequential Minimal Optimization) algorithm, a CoreSVM, etc.
The decomposition algorithm is a method in which at the time of the SVM learning, an initial QP problem is decomposed into a plurality of small QP problems, and these small problems are repeatedly optimized. This method is mentioned in Non-Patent Documents 1 and 2 given below.
The SMO algorithm is a method in which in order to solve the QP problem, two pieces of training data are selected and the coefficients are analyzed and updated. This method is mentioned in Non-Patent Documents 3 and 4 given below.
Further, the CoreSVM is one of the SVM formats in which random sampling is used. The CoreSVM is a method in which the QP problem is converted into a mathematical-geometric MEB (minimum enclosing ball) problem and a solution of the QP problem is obtained by applying the MEB problem. This method is mentioned in Non-Patent Documents 5 and 6 given below.
Non-Patent Document 1: E. Osuna, R. Freund, and F. Girosi, “An improved training algorithm for support vector machines,” in Neural Networks for Signal Processing VII—Proceedings of the 1997 IEEE Workshop, N. M. J. Principe, L. Gile and E. Wilson, Eds., New York, pp. 276-285, 1997.
Non-Patent Document 2: T. Joachims, “Making large-scale support vector machine learning practical,” in Advances in Kernel Methods: Support Vector Machines, A. S. B. Scholkopf, C. Burges, Ed., MIT Press, Cambridge, Mass., 1998.
Non-Patent Document 3: J. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods—Support Vector Learning, B. Scholkopf, C. J. C. Burges, and A. J. Smola, Eds., Cambridge, Mass.: MIT Press, 1999.
Non-Patent Document 4: R. Fan, P. Chen, and C. Lin, “Working Set Selection Using Second Order Information for Training Support Vector Machines,” J. Mach. Learn. Res. 6, 1889-1918, 2005.
Non-Patent Document 5: I. W. Tsang, J. T. Kwok, and P. M. Cheung, “Core vector machines: Fast SVM training on very large datasets,” in J. Mach. Learn. Res., vol. 6, pp. 363-392, 2005.
Non-Patent Document 6: I. W. Tsang, A. Kocsor, and J. T. Kwok, “Simpler core vector machines with enclosing balls” Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML), pp. 911-918, Corvallis, Oreg., USA, June 2007.
In the decomposition algorithm and the SMO algorithm, it is necessary to take into consideration all the training data in order to optimize the SVM learning, which causes the following problems: time is consumed in learning by using all the training data after the decomposition, in particular, when a large amount of the training data is non-support vectors, the efficiency is very poor. In the CoreSVM, the training data is subjected to random sampling. As a result, the learning effect becomes unstable unless a stopping condition is appropriately set.
An object of the present invention is to provide a learning method for an SVM capable of speeding up learning while maintaining the accuracy of the SVM.
In order to achieve the object, a first feature of the present invention is that a learning method for a support vector machine (hereinafter, SVM) comprises a step of selecting two training vectors from two opposite classes to learn an SVM, a step of arbitrarily selecting a plurality of unused training vectors from a set of previously prepared training vectors to extract an unused training vector having a largest error amount, a step of adding the extracted unused training vector to an already used training vector to update the training vector, a step of learning the SVM by using the updated training vector, and a step of stopping the learning when the number of updated training vectors is equal to or more than a predetermined number or when an error amount of the extracted unused training vector is smaller than a predetermined value.
A second feature of the present invention is that a learning method for an SVM, performed after the learning the SVM comprises a step of arbitrarily selecting one training vector from a set of previously prepared training vectors, a step of adding the training vector to an already used training vector to update the training vector when an error amount of the selected training vector is larger than a predetermined value a step of learning the SVM by using the updated training vector and a step of stopping the learning when the number of unused training vectors is smaller than the previously determined number.
According to the present invention, SVM learning is possible by using training vectors having a large error amount, and thus, the SVM can be effectively learned and the learning can be speeded up. Also, the learning is stopped when the error amount in the training vector is smaller than the previously set value or when the number of unused training vectors is smaller than a certain value, and thus, the stopping condition of the learning can be appropriately set and the learning effect can be stabilized.
The present invention provides a two-stage learning method for expanding and updating training data. The present invention is characterized in that in a first stage (first phase), an approximate solution is found as soon as possible; while in a second stage (second phase), solutions are derived one by one for all or a previously determined number “n” of training data (vectors). This will be described in the following embodiment.
At step A105, solution S0 is derived by learning SVM with the help of the training vector set W0. At step S110, a set T0 of unused training vectors is derived, where t representing a repeat count is set to t=0 and T represents all the data of the training vectors. The set T0 of the unused training vectors is obtained by removing T0 from T. As a result, T0=T−W0.
At step S115, it is determined whether the number of unused training vectors |Tt| reaches 0 or the number of used training data |Wt| becomes larger than a previously determined number “m”. It is noted that the symbol “| |” represents the number of elements in the set. When this determination is positive, the first phase is stopped and when it is negative, the process proceeds to step S120. At step S120, 59 training vectors are subjected to random sampling from among the set Tt of the unused training vectors. It is noted that the random sampling may be performed for any number of vectors, rather than 59.
At step S125, a training vector vt having the largest error amount Et(vk) is selected from among the 59 training vectors. In this case, the training vector vt can be derived by the following equations (2) and (3):
At step S130, it is determined whether the error amount Et(vk) is smaller than a certain setting value ε. When this determination is positive, the first phase is stopped and when it is negative, the process proceeds to step S130. At step S135, the training vector vt is added to the used training vector Wt. On the other hand, the training vector vt is removed from the unused training vector Tt. As a result, Tt+1=Tt−vt. Subsequently, the process proceeds to step S140, at which the SVM is learned by the training vector Wt+1 so as to obtain a solution St+1. Thereafter, although not shown, depending on each case, the non-support vectors are removed based on the parameter α which is obtained based on the St+1. At step S145, the repeat count t is incremented by one. The process then returns to step S115 to repeat the aforementioned process again.
As obvious from the aforementioned description, in the first phase, the processes from step S115 to step S145 are repeated until the determinational step S115 or step S130 becomes positive. When the determination at step S115 or step S130 becomes positive, the first phase is stopped and the process moves to the second phase.
As described above, in the first phase, the best vector with respect to learning, i.e., the training vector vt having the largest error amount, is derived from among the randomly selected training vectors (59 vectors in the above example); the training vector vt is added to the already used training vector Wt so as to update to the training vector Wt+1; and the updated training vector Wt+1 is used to learn the SVM. Thus, an approximate solution of the SVM can be promptly derived.
Further, when the error amount is smaller than the setting value ε, the first phase is stopped. Thus, it becomes possible to avoid an unnecessary learning of SVM and also to speed up the learning, because the learning is performed by using a training vector having an error amount smaller than the setting value ε.
Subsequently, a process for the phase 2 will be described with reference to
Initially, the determination at step S205 is negative, and thus, the process proceeds to step S210. At step S210, one training vector v is randomly selected from among the unused training vectors Tt. At step S215, the training vector v is removed from the unused training vector Tt. At step S220, it is determined whether the error amount Et (v) of the training vector v is larger than a certain value ε. When the error amount of the training vector v is less than ε, the determination at step S220 is negative. After t is incremented by one at step S235, the process returns to step S205, at which it is determined whether the number of unused training vectors |Tt| reaches equal to or less than the setting value n.
On the other hand, when the error amount Et(v) is larger than ε, the process proceeds to step S225. At step S225, the training vector v is further added to the already used training vector Wt, and the training vector is updated to Wt+1. At step S230, SVM learning is performed by using the updated training vector Wt+1 so that a solution St+1 is derived. Subsequently, t is incremented by one at step S230 and the process returns to step S205. Thereafter, the procedure from step S205 to step S235 mentioned previously is repeated, and when the determination at step S205 is positive, the second phase is stopped.
As obvious from the aforementioned description, in the second phase, learning is performed by using the training vector having an error amount larger than the value ε, and thus, the accuracy of SVM is maintained or improved, and by the process at step S205, the stopping condition in the second phase can be made appropriate.
Also, although the SMO is used for the processes at steps S105, S135 and S225, the learning efficiency improves greatly because the training data Wt is much smaller than all the training data T.
Subsequently, learning results by using “web,” “zero-one” and “KDD-CUP,” which are well known evaluation reference data sets are shown in
Number | Date | Country | Kind |
---|---|---|---|
2008-057922 | Mar 2008 | JP | national |