1. Field of the Invention
The present invention relates to a re-learning method for a support vector machine, and particularly, relates to a re-learning method for a support vector machine capable of implementing the improvement of a classification performance and the reduction of a computation amount.
2. Description of the Related Art
For systems that search or manage video archives, a function of a shot boundary detection for detecting a shot boundary occurring during an editing task from an existing video file is essential. Therefore, a support vector machine (hereinafter, referred to as SVM) is applied so as to realize a high-performance shot boundary detector.
In Patent Document 1 described below, a feature extraction method for detecting a shot boundary is disclosed. As clearly specified in Patent Document 1, the obtained feature amount is classified by using a pattern recognition device such as the SVM. The precondition of the SVM is that training samples previously prepared are used for learning so as to construct an SVM for classification. In Patent Document 2, there is disclosed an invention relating to a data classifier in which a support vector machine performs a data classification based on a learning result obtained by using an active learning method.
Moreover, there is also a related art called semi-supervised learning. In the semi-supervised learning, a learning machine constructed from a set of samples attached with known labels is used to extract a sample close to a label-imparted instance from among a set of unlabeled samples, and on the precondition that the extraction is almost successful, further learning (called “re-learning”) is performed, whereby it is intended to improve the performance of a classifier. An expansion technique of a case where this technology is applied to the SVM is described in Non-Patent Document 1.
Patent Document 1: Japanese Published Unexamined Patent Application No. 2007-142633
Patent Document 2: Japanese Published Unexamined Patent Application No. 2004-21590 Non-Patent Document 1: Operations Research Society of Japan, “Semi-Supervised Learning based on SVM,” Abstracts, the 2005 Fall Research Presentation Forum of Operations Research Society of Japan, Vol. 2005 (20050914), pp. 32-33
There is a possibility of improving the classification performance if the technologies described in Patent Document 1 and Non-Patent Document 1 are combined, i.e., if the technology of the semi-supervised learning is applied to the classifier (SVM) for shot detection. However, in the normal semi-supervised learning, there are many cases that the labels of samples to be added for re-learning are wrong because these are imparted by the classifier before the re-learning. There is a problem that when the samples including those wrongly attached with the labels are learned, the performance after re-learning is not sufficiently improved.
Moreover, in the technique presented by Non-Patent Document 1, there is a problem that the number of samples added is enormous and the re-learning becomes very difficult.
An object of the present invention is to provide a re-learning method for a support vector machine, capable of achieving the accuracy improvement of an SVM and the reduction in a calculation amount by re-learning using a small number of high quality samples.
In order to achieve the object, a first feature of the present invention is that a re-learning method for a support vector machine comprises a step of learning an SVM by using a set of training samples for initial learning which have known labels, a step of perturbation-processing the training samples for initial learning, a step of using the perturbation-processed sample as a training sample for addition, and a step of re-learning the learned SVM by using the training sample for initial learning and the training sample for addition.
A second feature is that the training sample for initial learning to be perturbation-processed is a training sample obtained by removing the training sample for initial learning corresponding to a non-support vector.
A third feature is that the training sample for initial learning to be perturbation-processed is a training sample corresponding to a support vector existing on a soft margin hyperplane.
A fourth feature is that the training sample for initial learning to be perturbation-processed is a training sample corresponding to a support vector existing on a soft margin hyperplace having an inferior determination performance at the time of evaluating a conditional probability that a support vector on the soft margin hyperplane belongs to another class using a logistic function derived by using a maximum likelihood estimation.
In the perturbation learning according to the present invention, the training samples having a new feature amount are generated by making use of the fact that the position of the shot boundary does not change even if an image process such as luminance conversion is performed on video data. As such, the present invention differs greatly from the normal semi-supervised learning in that label imparting of the training sample to be newly added is precise, and thus, the effect of the re-learning is improved.
Moreover, even if the sample, which is apart from the existing boundary surface, is subjected to perturbation, it is highly likely not to affect, as anon-support vector, the position of the boundary surface. Thereby, the non-support vector is not subject to the perturbation, and in this way, accuracy improvement and reduction in a calculation amount can be achieved.
And, it is highly likely that the α=C support vector being near the classification boundary is an outlier. Consequently, when a new sample is added by perturbation, the effect is limited and a risk is greater. As such, when the target to be perturbed is limited to a support vector existing on a margin hyperplane, it becomes possible to achieve the accuracy improvement and the reduction in the calculation amount.
Furthermore, when there is a bias in the number of samples among classes such as shot boundary detection, the separation accuracy with other classes is not very good near the margin hyperplane. Thus, a logistic function derived by using a maximum likelihood estimation is used to evaluate a conditional probability in which a support vector on the soft margin hyperplane belongs to the other classes, and only those hyperplane support vectors having an inferior determination performance are targets to be perturbed. Therefore, the accuracy improvement and the reduction in the calculation amount can be achieved.
In this embodiment, luminance conversion and contrast conversion are performed on video data used for learning so as to change a value of a feature amount used for boundary detection (hereinafter, referred to as “perturbation”), whereby a new learning sample is generated.
First, at step S1, a set of training samples for initial learning is prepared. For the set of training samples for initial learning, data {x1, x2, x3, . . . , xm} having known class labels {y1, y2, y3, . . . , ym} is prepared. At step S2, the set of training samples for initial learning is used to perform initial learning (pilot learning) of SVM. Through this process, a parameter (α value) corresponding to the training sample for initial learning is obtained, as well as an initially learned SVM (1). The meaning of this parameter (α value) will be described later. At step S3, the training sample for initial learning is subjected to a perturbation process. The perturbation process will be described in detail later.
As a matter of course, the feature amount of the perturbation-processed training sample for initial learning (hereinafter, “new sample”) is different from a feature amount of the training sample for initial learning. However, the class label of the new sample carries over the class label of the training sample for initial learning. At step S4, the perturbed sample is set as a training sample to be added. At step S5, the training sample for initial learning and the training sample for addition are used to re-learn the SVM, thereby generating a re-learned SVM (2). At this time, a parameter (α value) corresponding to each training sample is obtained. At step S6, it is determined whether to stop the re-learning process. When the determination is negative, the process returns to step S3 to repeat the aforementioned process. When the process is repeated, further re-learned SVMs (3), (4), . . . , can be obtained. On the other hand, when the step S6 is positive, the re-learning process is stopped.
According to this embodiment, the training sample for addition carries over the class label of the training sample for initial learning. Thus, it is possible to further implement the accuracy improvement of the SVM and the reduction in calculation amount, as compared to the conventional re-learning by using samples without the class label.
Subsequently, a second embodiment of the present invention will be described with reference to
Steps S1 and S2 in
The non-support vector samples are apart from the classification boundary surface, and thus, even when the samples are subjected to perturbation, it is highly likely not to affect the position of the boundary surface. Therefore, according to this embodiment, when the non-support vectors are not subject to perturbation, it becomes possible to achieve the accuracy improvement and the reduction in the calculation amount.
Subsequently, a third embodiment of the present invention will be described with reference to
Steps S1 and S2 in
In the present embodiment, because the label imparting of the training sample to be newly added is precise, the effect of the re-learning is increased unlike in the semi-supervised learning in the conventional technology and the first and second embodiments.
The third embodiment will be more specifically described below. In the following description, detection of a shot boundary in a video instantaneous cut will be described as an example. However, the present invention is not limited thereto. The present invention can also be applied to detection of various shot boundaries such as “fade out” in which a current shot is transitioned to a next shot while a screen gradually darkens or “dissolves” in which videos are gradually switched while being overlapped. Moreover, the present invention can also be applied not just to the detection of shot boundary of videos but also to classification or identification of other objects.
In the normal SVM, a soft margin for performing linear separation allowing some classification errors is used.
Obviously, the data for shot boundary detection cannot also be linearly separated on the kernel space; therefore, learning is performed by using the SVM by the soft margin. A hyperparameter value for this soft margin is represented by C. A classification function Φ(x) is written as follows:
However, 0≦αi≦C.
In the Equation 1, xi represents the sample data for learning, x represents the sample, yi(=+1 or −1) represents the class label, and αi represents the internal parameter, representing a Lagrange multiplier, for example. In the present embodiment, a sample of y=−1 is a shot boundary and when y=+1, it is not a shot boundary. k(xi, xj) represents a kernel function, and in a case of Gaussian kernel, it is k(xi, xj)=exp{−γ·∥xi−xj∥}.
A sample corresponding to 0<αi is called a support vector. In particular, a support vector of 0<αi<C exists on margin hyperplanes H1 and H2.
If the distribution of class estimation results obtained by using the learned SVM is approximated with a logistic function, the classification performance often improves. Actually, in the shot boundary detection, using the logistic function further improves the accuracy.
With this, a logistic function P representing a conditional probability of each class is represented by the following equation:
A and B are calculated by using maximum likelihood estimation from the sample data for training.
When the SVM learning is executed once (the step S2), a value of a parameter αi corresponding to each training sample i is obtained. On principle, a non-support vector (where α1=0 is established) does not affect the position of the classification boundary surface. As shown in
It is highly likely that a support vector (where αi=C is established) being near the classification boundary is an outlier. It is difficult to automatically determine whether the outlier is caused due to either a mislabel or uncommon noise. There is a higher risk if the support vector (where αi=C is established) is added as a new sample, and as such, the target to be perturbed is limited to support vectors ▪ and existing on a margin hyperplanes (where 0<αi<C is established) (non-bounded support vectors). This process is equivalent to the step S21.
Subsequently, a generation process of samples attached with labels by perturbation (the step S22) will be described.
As an example of perturbation, an image quality conversion of a video is considered. In the image quality conversion, there are cases where the luminance is collectively increased or decreased (brightness conversion) or the contrast is strengthened or weakened (contrast conversion). The luminance conversion equation in each case is given below.
In a case of brightness conversion
Z′=256.0×[Z÷256.0]δ
Z: Input luminance information (0 to 255)
Z′: Output luminance information (0 to 255)
δ: Brightness conversion adjustable parameter
In a case of contrast conversion
Z′=256.0+(1.0+exp(−η×(Z−128.0)))
Z: Input luminance information (0 to 255)
Z′: Output luminance information (0 to 255)
η: Contrast conversion adjustable parameter
It is noted that besides the brightness conversion and contrast conversion, other perturbations such as blurring conversion, edge enhancement, etc., may also be used.
In the perturbation learning of the present invention, the fact that the position of the shot boundary does not change, even if the image process such as luminance conversion is performed on the video data, is utilized to generate the training sample having a new feature amount. Unless there is an error in imparting the class label in the data for initial learning (original), the imparting of the class label of the training sample to be newly added is precise, which is greatly different from the normal semi-supervised learning.
Subsequently, a fourth embodiment of the present invention will be described. In the shot boundary detection problem which is a subject in the present embodiment, the number of shot boundary instances is significantly fewer as compared to that of non-shot boundary instances. Therefore, when a conditional probability indicated by the logistic function obtained by sigmoid training is evaluated, in the support vectors existing on the margin hyperplane on a side of “class of non-shot boundary instances,” the probability of “class of shot boundary instances” is almost zero. On the contrary, in the support vectors existing on the margin hyperplane of “class of shot boundary instances,” the probability of “class of non-shot boundary instances” is somewhat high. As a result, in the present embodiment, the target to be perturbed is limited to support vectors on a margin hyperplane, in which a conditional probability of other classes is equal to or more than a certain threshold value.
As mentioned above, in the shot boundary detection problem which is a subject in the present embodiment, since the number of shot boundary instances is significantly fewer as compared to that of the non-shot boundary instances, the determined position in the logistic function in
Thus, according to each of the above mentioned embodiments, the accuracy improvement of the SVM and reduction in the calculation amount can be achieved. Further, the present invention is not limited to each of the above-described embodiments, and it is obvious that various modifications that fall within the scope of the present invention are included.
Number | Date | Country | Kind |
---|---|---|---|
2008-057921 | Mar 2008 | JP | national |