This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2016-150717, filed on Jul. 29, 2016, the entire contents of which are incorporated herein by reference.
The present invention relates to a data processing method, a data processing apparatus, and a computer readable medium and, more particularly, to a technique of reducing data used in machine learning.
In recent years, supervised machine learning methods such as a neural network, support vector machine, and boosting have rapidly been developed. These machine learning methods generally tend to obtain a learning result of high generalization capability as the number of training data used in leaning increases. On the other hand, as the number of training data used in leaning increases, the time needed for the learning increases. For this reason, Japanese Patent No. 5291478 proposes a method of repetitively performing a procedure of selecting a plurality of training data to be used in a support vector machine and obtaining one optimum training vector from them, thereby reducing the training data.
For each training data used in a supervised machine learning method, a class to which the training data belongs is defined. The supervised machine learning can also be called a procedure of defining a criterion used to discriminate the class of given training data. Hence, reducing training data is equivalent to changing training data, and may therefore greatly affect generation of the criterion by supervised machine learning. With this as a backdrop, it is demanded to raise the appropriateness of reduction of training data.
According to an aspect of the present invention, there is provided a data processing method executed by a processor, comprising mapping each of a plurality of data, for which the classes the data belong to are known, to one point on an N-dimensional (N is an integer of not less than 2 or infinity) feature space using at least two feature amounts, dividing a set of points corresponding to the plurality of data mapped on the feature space into a plurality of N-dimensional simplexes having each point as an apex, classifying a set of points that constitute a hyperplane of each simplex obtained by the division into a subset including points that belong to the same class as elements, and reducing the elements of the subsets for each of the classified subsets, wherein the dividing comprises dividing the set of points into the plurality of simplexes so a hypersphere circumscribed on each simplex does not include a point that constitutes another simplex.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
<Outline of Support Vector Machine>
As for machine learning that is the premise of a data processing technique according to an embodiment, the outline will be described first using a support vector machine (to be referred to as “SVM” hereinafter) as an example.
SVM is a kind of supervised machine learning, which is a method of generating discriminators of two classes using a linear input element. The main task of SVM is to solve the constrained quadratic programing problem (QP problem) of equation (1) when one training data xi (where i=1, 2, . . . , 1) having a label yi of −1 or +1 is given. Note that the training data xi having the label yi of −1 and the training data xi having the label yi of +1 correspond to the above-described data of two classes.
Each element of training data is mapped to one point on a multidimensional feature space by a plurality of feature amounts. For this reason, each training data can be specified using a position vector xi on the feature space. Hence, each element of training data will be referred to using the position vector xi on the feature space hereinafter. That is, if given training data is mapped to the position vector xi on the feature space, the training data will be expressed as “vector xi”.
K(xi, xj) in equation (1) is a kernel function that calculates the inner product between two vectors xi and xj on the feature space, and Ci (i=1, 2, . . . , l) is a parameter for giving a penalty to training data with noise out of the given training data.
In solving the above-described problem, if the number 1 of training data is large, the following three problems arise.
1) A problem of the capacity of a memory for storing kernel matrix Kij=K(xi, xj), (where i, j=1, 2, . . . , l). That is, the problem of the data amount of a kernel matrix more than the normal memory capacity of a computer.
2) A problem of complex calculation of the kernel value Kij (i, j=1, 2, . . . , l) by the computer.
3) A problem of complex solution of the QP problem by the computer.
In a test phase, that is, in a phase in which the class of unknown data x is verified using an identifier generated using teacher data, a decision function ƒ(x) of SVM is expressed by
and is formed from data selected from Ns training data xi (i=1, 2, . . . , Ns) called support vectors.
In equation (2), if ƒ(x)>0, the unknown data x is classified into a class of a positive label. Similarly, if ƒ(x)<0, the unknown data x is classified into a class of a negative label.
The complexity of the decision function ƒ(x) of SVM in equation (2) linearly increases along with an increase in the number Ns of support vectors. If the number of support vectors increases, the calculation speed of SVM in the test phase decreases because the calculation amount of the kernel value K(xi, x) (i=1, 2, . . . , Ns) increases.
In summary, if the number 1 of training data increase, the time needed for training to generate discriminators increases. If the number of support vectors that are obtained as discriminators increases, the time needed for discrimination of unknown data in the test phase increases.
Concerning each of a plurality of data prepared as training data, the class to which the data belongs, that is, the value of the above-described label yi is known. Also for each of one or more support vectors selected from the training data by the learning method of SVM, the class to which the support vector belongs is known. This is because a support vector is data selected from a plurality of training data for which the classes the data belong to are known. Hence, data for which the class the data belongs is known will simply be referred to as “known data” in this specification except a case in which training data and a support vector that is a discriminator are particularly discriminated.
Japanese Patent No. 5291478 proposes a method of reducing N training data to M (M<<N) training data called reduced vectors to speed up the calculation of SVM. Since both training data and support vectors are known data, the reduction method is applicable to reduction of support vectors as well.
On the other hand, since reduction of training data may greatly affect generation of a criterion (a support vector in SVM) by supervised machine learning, it is preferable to raise the appropriateness of reduction of training data.
A data processing method according to the embodiment is directed to a method of selecting known data as reduction targets when reducing known data including training data and support vectors. A data processing apparatus according to the embodiment maps each known data to a point on a feature space and executes Delaunay triangulation for the mapped point group on a multidimensional space.
“Delaunay triangulation” is a kind of method of wholly dividing a two-dimensional plane without overlap by triangles having apexes at points discretely distributed on the two-dimensional plane. Triangles divided by Delaunay triangulation have a characteristic to be described below. That is, a circle circumscribed on an arbitrary triangle divided by Delaunay triangulation does not include a point that constitutes another triangle.
Delaunay triangulation is known to be extendable to a space division method for a point group on a multidimensional space with three or more dimensions. In the extended Delaunay triangulation, a multidimensional space is divided by simplexes having apexes at points discretely distributed on the multidimensional space.
For example, a simplex in a three-dimensional space is a tetrahedron. Hence, in Delaunay triangulation of a three-dimensional space, the three-dimensional space is divided by tetrahedrons having apexes at points discretely distributed on the three-dimensional space. When Delaunay triangulation is executed in a three-dimensional space, a sphere circumscribed on an arbitrary tetrahedron does not include a point that constitutes another tetrahedron.
Similarly, a simplex in a four-dimensional space is a 5-cell. Hence, in Delaunay triangulation of a four-dimensional space, the four-dimensional space is divided by 5-cells having apexes at points discretely distributed on the four-dimensional space. When Delaunay triangulation is executed in a four-dimensional space, a sphere circumscribed on an arbitrary 5-cell does not include a point that constitutes another 5-cell.
Note that a “hyperplane” in a tetrahedron is a triangle, and a hyperplane in a 5-cell is a tetrahedron. In general, a hyperplane that constitutes an N-dimensional simplex is an (N−1)-dimensional simplex.
As described above, properly speaking, Delaunay triangulation for a point group on a multidimensional space with three or more dimensions is “simplex division”. In this specification, division of a multidimensional space with two or more dimensions will simply be referred to as “Delaunay division” for the descriptive convenience, and a simplex of two or more dimensions obtained by Delaunay division will simply be referred to as a “simplex”. As for an arbitrary simplex obtained by executing Delaunay division, a hypersphere circumscribed on the simplex does not include a point that constitutes another simplex. This characteristic is a broad characteristic that holds over the entirety of a space on which known data are distributed.
The data processing apparatus according to the embodiment selects, as a reduction target, the hyperplane of each simplex obtained by executing multidimensional Delaunay division for known data discretely distributed on a feature space. The data processing apparatus according to the embodiment classifies the known data distributed on the feature space using Delaunay division and then executes reduction. For this reason, it is possible to incorporate not simple local information such as the distance between two known data on a feature space but the broad characteristic of Delaunay division in reduction. It is therefore considered that the appropriateness of reduction processing of data used in the machine learning method rises.
The data processing apparatus according to the embodiment will be described below in more detail. Note that a data processing apparatus 1 is assumed below to execute machine learning using the SVM method.
<Functional Arrangement of Data Processing Apparatus>
The control unit 10 is a computer, for example, a PC (Personal Computer) or server including calculation resources such as a CPU (Central Processing Unit) and memories. The control unit 10 executes a computer program and thus functions as the mapping unit 11, the data division unit 12, the classification unit 13, the data reduction unit 14, the training unit 15, the unknown data acquisition unit 16, and the verification unit 17.
The database 20 is a known mass storage device, for example, an HDD (Hard Disc Drive) or SSD (Solid State Drive). Both the training data database 21 and the support vector database 22 included in the database 20 are databases for storing a plurality of known data.
More specifically, the training data database 21 stores a plurality of training data for which the classes the data belong to are known. The support vector database 22 stores support vectors generated from the training data using SVM. The database 20 also stores an operating system configured to control the data processing apparatus 1, a computer program configured to cause the control unit 10 to implement the function of each unit, and a plurality of feature amounts to be used in SVM.
The mapping unit 11 maps each of the plurality of known data stored in the database 20 to one point on an N-dimensional feature space using two or more feature amounts. Here, N is an integer of 2 or more or infinity, and changes depending on the type of K(xi, xj) in equation (1).
The data division unit 12 divides a set of points corresponding to the plurality of data mapped on the feature space by the mapping unit 11 into a plurality of N-dimensional simplexes having each point as an apex using the Delaunay division method. More specifically, the data division unit 12 divides the point group into a plurality of simplexes so a hypersphere circumscribed on each simplex does not include a point that constitutes another simplex.
The classification unit 13 classifies a set of points that constitute the hyperplane of each simplex obtained by Delaunay division executed by the data division unit 12 into a subset including points that belong to the same class as elements. The data reduction unit 14 reduces the elements of each subset classified by the classification unit 13.
Note that a side in a two-dimensional simplex corresponds to a hyperplane in a multidimensional simplex. Like the two-dimensional simplex, the hyperplanes of multidimensional simplexes include three types of hyperplanes, that is, a hyperplane formed from only points corresponding to data of a positive label, a hyperplane formed from only points corresponding to data of a negative label, and a hyperplane including both points.
The data reduction unit 14 reduces, of the elements constituting each of the subsets classified by the classification unit 13, two elements having the minimum Euclidean distance on the feature space into one new element. For example, in the example shown in
The tetrahedron as the hyperside of the simplex shown in
Since the subset having the positive label includes a plurality of points, the points are selected as the targets of reduction processing by the data reduction unit 14. In
The data reduction unit 14 sets the class of the new element obtained by reduction to the same class as the class to which the two elements of the reduction targets belong. In the example shown in
Note that in
The data division unit 12 executes Delaunay division again for the new data set. The classification unit 13 reclassifies a set of points that constitute the hyperplane of each simplex obtained by Delaunay division executed again by the data division unit 12 into a subset including points of the same class as elements. While referring to the subsets reclassified by the classification unit 13, the data reduction unit 14 executes the reduction processing again for the hypersides of all simplexes newly divided by the data division unit 12, thereby generating a new data set. The data processing apparatus 1 can decrease the number of known data by repeating the above-described processing.
Referring back to
The unknown data acquisition unit 16 acquires unknown data for which the class the data belongs to is unknown. The verification unit 17 applies the discriminator generated by the training unit 15 to the unknown data acquired by the unknown data acquisition unit 16, thereby discriminating the class of the unknown data.
When executing reduction processing for training data stored in the training data database 21 as known data, the data processing apparatus 1 can decrease the number of training data as the SVM execution targets. In this case, since the data processing apparatus 1 can decrease the calculation amount needed for training, the training can be speeded up.
On the other hand, when executing reduction processing for support vectors stored in the support vector database 22 as known data, the data processing apparatus 1 can decrease the number of support vectors. In this case, since the data processing apparatus 1 can decrease the calculation amount needed for test processing that is processing of discriminating the class of unknown data, the test processing can be speeded up.
<Processing Procedure of Data Reduction Processing>
In step S2, the mapping unit 11 acquires known data from the database 20. In step S4, the mapping unit 11 maps each known data to one point on the feature space. In step S6, the data division unit 12 executes Delaunay division for the point group of known data mapped on the feature space by the mapping unit 11.
In step S8, the classification unit 13 classifies points that constitute the hyperplanes of a plurality of simplexes obtained by the Delaunay division into subsets for each class to which corresponding data belongs. In step S10, for each of the classified subsets, the data reduction unit 14 reduces data that constitute the subset. In step S12, the data division unit 12 stores new known data obtained by the reduction in the database 20.
Until the iteration count reaches a predetermined count, the data processing apparatus 1 does not end the reduction processing (NO in step S14), and continues each of the above-described processes. If the data processing apparatus 1 executes the reduction processing as many times as the predetermined iteration count (YES in step S14), the processing of this flowchart ends.
As described above, according to the data processing apparatus 1 of the embodiment, it is possible to raise the appropriateness of reduction processing of data used in the supervised machine learning method.
In particular, when the data processing apparatus 1 executes reduction processing for training data, the time needed for machine learning can be shortened. In addition, when the data processing apparatus 1 executes reduction processing for support vectors, the time needed for the test phase for discriminating the class of unknown data can be shortened.
The present invention has been described above using the embodiment. However, the present invention is not limited to the technical scope described in the embodiment. Various modifications or improvements can be made for the embodiment, as is apparent to those skilled in the art. In particular, a detailed embodiment of distribution/integration of devices is not limited to that illustrated, and all or some of the devices can be functionally or physically distributed/integrated in an arbitrary unit in accordance with various additions or a functional load.
For example, in the above example, SVM has mainly been exemplified as machine learning. However, training data reduction can also be applied to another machine learning method other than SVM, for example, a neural network or boosting.
In the above-described example, the data division unit 12 executes Delaunay triangulation for data mapped on the feature space. As the duality of Delaunay triangulation, there exists a Voronoi diagram. More specifically, a division diagram obtained by Delaunay triangulation represents the adjacent relationship of Voronoi regions. Hence, executing Delaunay triangulation and obtaining a Voronoi diagram have a one-to-one relationship. In this sense, the data division unit 12 may obtain a Voronoi diagram instead of executing Delaunay triangulation for data mapped on the feature space.
Number | Date | Country | Kind |
---|---|---|---|
2016-150717 | Jul 2016 | JP | national |