This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-229626, filed on Nov. 25, 2015, the entire contents of which are incorporated herein by reference.
This invention relates to machine learning.
Machine learning is performed also on series data that continuously changes as time elapses.
As a method for performing machine learning on series data, there is a known method in which a feature value that is extracted from series data is used as input. The feature value that is used is, for example, (a) a statistical amount such as an average value, a maximum value and a minimum value, (b) a moment of a statistical amount such as a dispersion and kurtosis, and (c) data of frequency that is calculated using Fourier transformation and the like.
However, a rule of change (in other words, an original feature) in series data does not always appear in a waveform. For example, in the case of a chaotic time series, even when rules of change are the same, completely different waveforms appear due to a butterfly effect. Therefore, the feature value extracted from the actual series data does not reflect the rule of change, and there is a case where the series data is not able to be classified according to the rule of change.
As a method for analyzing in chaos theory, there is a method for artificially generating an attractor which is a set of points in N-dimensional space from series data, each of which includes N (N is an embedding dimension; typically, N=3 or 4) values sampled at an equal interval. Hereafter, an attractor that is generated in this way will be referred to as a pseudo attractor.
Non-Patent Document 1: David Ruelle, “WHAT IS . . . a Strange Attractor?”, Notices of the American Mathematical Society, August 2006, Vol. 53, No.7, pp.764-765
Non-Patent Document 2: J. Jimenez, J. A. Moreno, and G. J. Ruggeri, “Forecasting on chaotic time series: A local optimal linear-reconstruction method”, Physical Review A, Mar. 15, 1992, Vol.45, No.6, pp.3553-3558
Non-Patent Document 3: J. Doyne Farmer and John J. Sidorowich, “Predicting Chaotic Time Series”, Physical Review Letters, Aug. 24, 1987, Vol.59, No.8, pp.845-848
By using the method described above, it is possible to express a rule of change in series data according to a mutual relationship among points in N-dimensional space, however, coordinates themselves of each point do not have a meaning. Therefore, even though machine learning is performed on a set of points in N-dimensional space by using coordinates of each point, the series data is classified independently of its original feature.
Moreover, there is a case where not only white noise but also noise other than white noise is included in series data and an effect of that noise may also remain in a pseudo attractor that is generated from the series data. Therefore, when machine learning is performed based on a mutual relationship among points in N-dimensional space, accuracy of classification will decrease due to that noise. Particularly, when time resolution with respect to change in series data is not sufficient, the effect of that noise remarkably appears.
In other words, there is no technique to classify series data by using a pseudo attractor that was generated from the series data.
A machine learning method related to this invention includes: first generating a pseudo attractor from each of plural series data sets, the pseudo attractor being a set of points in N-dimensional space, each of the points including N values sampled at an equal interval; second generating a series data set of Betti numbers from each of plural pseudo attractors generated in the first generating by calculation of persistent homology, each of the Betti numbers being a number of holes for a radius of a N-dimensional sphere in the N-dimensional space; and performing machine learning for each of plural series data sets of Betti numbers generated in the second generating, the series data set of Betti numbers being used as input in the machine learning.
The object and advantages of the embodiment will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the embodiment, as claimed.
The first generator 103 generates a pseudo attractor from series data that is stored in the first series data storage unit 101, and stores the generated pseudo attractor in the pseudo attractor data storage unit 105. The second generator 107 generates, for each dimension of elements (in other words, holes) of a persistent homology group, barcode data from the pseudo attractor that is stored in the pseudo attractor data storage unit 105, and stores the generated barcode data in the barcode data storage unit 109. The removal unit 119 deletes data related to noise from the data stored in the barcode data storage unit 109. The third generator 111 generates series data from the barcode data that is stored in the barcode data storage unit 109, and stores the generated series data in the second series data storage unit 113. The machine learning unit 115 executes machine learning in which the series data that is stored in the second series data storage unit 113 is used as input, and stores the machine learning result (for example, classification result) in the learning result storage unit 117.
Here, time series data of a heart rate is exemplified as series data, however, the series data is not limited to this kind of time series data. For example, the series data may also be biological data other than heart rate data (time series data of brain waves, pulse, body temperature and the like), wearable sensor data (time series data of a gyro sensor, acceleration sensor, geomagnetic sensor and the like), financial data (time series data of interest, commodity prices, balance of international payments, stock prices and the like), natural environment data (time series data of temperature, humidity, carbon dioxide concentration and the like), social data (data of labor statistics, population statistics and the like). However, series data that is the target of this embodiment is data that changes according to at least the following rule.
x(i)=f(x(i−1), . . . , x(i−2),x(i−N))
For example, irregular time series data or data related to artificial movement such as tracks of handwritten characters and the like is not a target of this embodiment.
Machine learning of this embodiment may be supervised learning or unsupervised learning. In the case of supervised learning, series data that is stored in the first series data storage unit 101 is labeled series data, and parameters of calculation processing are adjusted based on a comparison of output results of machine learning and the label. The label is called teacher data. Supervised learning and unsupervised learning are well-known techniques, and a detailed explanation is omitted here.
Next, operation of the information processing apparatus 1 of the first embodiment will be explained using
First, the first generator 103 of the information processing apparatus 1 reads out unprocessed series data that is stored in the first series data storage unit 101. When there are plural sets of unprocessed series data stored in the first series data storage unit 101, one set of unprocessed series data is read out. Then, the first generator 103 generates a pseudo attractor from the read out series data according to Takens' embedding theorem (
The generation of a pseudo attractor will be explained using
Here, τ=1 and thus elements are extracted alternately. When τ=2, for example, a pseudo attractor that includes points (f(1), f(3), f(5)), point (f(2), f(4), f(6)), . . . is generated.
In the generation of a pseudo attractor, an effect of differences in appearance due to the butterfly effect and the like is removed, and a rule of change of original series data is reflected in the pseudo attractor. A similarity relationship among pseudo attractors is equivalent to a similarity relationship among rules. Therefore, that a certain pseudo attractor is similar to a different pseudo attractor means that rules of change in original series data are similar. Pseudo attractors that are similar to each other are generated from series data for which rules of change are the same but phenomena (appearance) are different. Pseudo attractors that are different are generated from series data for which rules of change are different but phenomena are similar.
Moreover, in the case of using series data as direct input of machine learning, starting positions must be adequately aligned.
However, by using pseudo attractors, there is no such limitation.
Returning to the explanation of
Here, persistent homology will be explained. First, “homology” is a method for expressing features of an object by the number of holes in m (m≧0) dimensions. A “hole” referred to here is an element in a homology group, and a 0-dimensional hole is a cluster, a 1-dimensional hole is a hole (tunnel), a 2-dimensional hole is a void. The number of holes of each dimension is called a Betti number.
Homology will be explained in more detail using
Here, “persistent homology” is a method for characterizing transition of m-dimensional holes in an object (here, a set of points), and it is possible to find features related to arrangement of points by using persistent homology. In this method, each point in an object is gradually made to inflate into a sphere, and in that process, a time at which each hole is born (expressed by a radius of a sphere at birth) and a time at which each hole dies (expressed by a radius of a sphere at death) are identified.
Persistent homology will be explained in more detail using
In the calculation processing of persistent homology, a birth radius and a death radius of elements (or in other words, holes) of a homology group are calculated.
Moreover, by using a birth radius and a death radius of holes, it is possible to generate a barcode diagram such as illustrated in
By executing processing such as described above, a similarity relationship between barcode data that is generated by a certain pseudo attractor and barcode data that is generated from another pseudo attractor is equivalent to similarity relationship between pseudo attractors. Therefore, a relationship between a pseudo attractor and barcode data is a one-to-one relationship.
In other word, when pseudo attractors are the same, generated barcode data are the same. That is, when rules of change in series data are the same, generated barcode data are the same. On the other hand, when barcode data are the same, pseudo attractors are also the same. Moreover, when pseudo attracters are similar, barcode data are also similar, and thus conditions necessary for machine learning are satisfied. When pseudo attractors are different, barcode data are also different.
For details about persistent homology, refer to “Yasuaki Hiraoka, ‘Protein Structure and Topology: Introduction to Persistent Homology’, Kyoritsu Shuppan”, for example.
Returning to the explanation of
Elements whose time from birth to death is short mostly occur due to noise that is added to a time series. By deleting data of persistent intervals whose lengths are less than the predetermined length, it is possible to lessen an effect of noise, and thus it becomes possible to improve classification performance. However, a target of deletion is taken to be data of persistent intervals whose dimension is 1 or more.
The effect of noise will be explained using
Here, attention will be paid to an effect due to shifting of point b2. As illustrated in
As illustrated in
As illustrated in
As illustrated in
As explained using
Data of persistent intervals having a length that is less than the predetermined length is deleted, and thus a similarity relationship among barcode data after data is deleted is not strictly equivalent to a similarity relationship among original barcode data. When data is not deleted, the similarity relationships are equivalent.
Returning to the explanation of
As described above, barcode data is generated for each hole dimension, and thus the third generator 111 generates one block of barcode data by combining barcode data of plural hole dimensions. Series data is data that represents a relationship between a radius (in other words, time) of spheres in persistent homology and a Betti number. A relationship between barcode data and generated series data will be explained using
Basically, the same series data is obtained from the same barcode data. In other words, when original pseudo attractors are the same, the same series data are obtained. However, a case in which the same series data are obtained from different barcodes rarely occurs. For example, consider barcode data such as illustrated in
In such a case, completely the same series data are obtained from the barcode data in both cases, and thus it is not possible to distinguish between both cases by the series data. However, a possibility that such a phenomenon will occur is extremely low. Moreover, the pseudo attractors in both cases are originally similar, and an effect on classification by machine learning is extremely small, and thus there is no problem even when a phenomenon such as described above occurs.
Therefore, a similarity relationship between series data that is generated from certain barcode data and series data that is generated from different barcode data is equivalent to a similarity relationship between barcode data as long as a rare case such as described above does not occur. From the above, even though the definition of distance between data changes, a similarity relationship between series data that is generated from barcode data is mostly equivalent to the similarity relationship between original series data.
An image of a point set that is represented by a pseudo attractor is sparse image data, and thus identification is difficult and classification using machine learning is difficult. Moreover, in barcode data such as described above, the number of barcodes is not fixed, and thus handling barcodes as input for machine learning is difficult. However, in the case of series data such as described above, oscillation is lessened when compared with original series data, and is suitable for input of machine learning.
Returning to the explanation of
The machine learning unit 115 determines whether there is unprocessed series data (step S11). When there is unprocessed series data (step S11: YES route), the processing returns to step S1. When there is no unprocessed series data (step S11: NO route), the processing ends.
As described above, by executing persistent homology calculation, it is possible to reflect rules of change in original series data on barcode data. As a result, it becomes possible to perform classification according to the rules of change of original series data by using machine learning.
Calculation for persistent homology is a topological method, and has been used for analysis of a structure of a static object (for example, a protein, a molecular crystal, a sensor network or the like) that is represented by a set of points. On the other hand, in this embodiment, a set of points (or in other words, a pseudo attractor), which expresses a rule of change of data that continuously change as time passes, is a target of calculation. In this embodiment, analyzing structure of a set of points itself is not a purpose of the calculation, and thus the target and purpose are completely different from those of typical calculation of persistent homology.
Moreover, the number of barcodes in the barcode data that is generated by calculation for persistent homology is not fixed, and thus it is difficult to use the barcode data itself as input for machine learning. Therefore, in this embodiment, by converting barcode data that is derived from series data again to series data, it is possible to use that barcode data as input for machine learning, which lessens oscillation and improves accuracy of classification.
Furthermore, as described above, by applying this embodiment, it is possible to remove an effect of noise that is included in series data. This will be explained by concrete examples in
Examples of pseudo attractors are illustrated in
Data conversion until a time when the final series data is generated from the original series data will be explained in more detail below using
As illustrated in
Therefore, by using the Betti time series of this embodiment, it becomes possible to properly classify original series data according to original rules of change, and thus to improve accuracy of classification.
As was described in the explanation of the first embodiment, a similarity relationship among original series data is mostly equivalent (in other words, a 1-to-1 relationship) to a similarity relationship among series data that is generated from barcode data. However, when it is possible to translate certain series data (in other words, biasing) and superimpose that series data over other series data, a 1-to-1 relationship is not established.
For example, as illustrated in
In the following, a method for establishing a 1-to-1 relationship even when handling series data that are capable of being superimposed by translation will be explained.
The first generator 103 generates pseudo attractors from series data that is stored in the first series data storage unit 101, and stores the generated pseudo attractors in the pseudo attractor data storage unit 105. The second generator 107 generates barcode data from the pseudo attractors that are stored in the pseudo attractor data storage unit 105 for each dimension of element (in other words, hole) of persistent homology group, and stores the generated barcode data in the barcode data storage unit 109. The removal unit 119 deletes data related to noise of the data stored in the barcode data storage unit 109. The third generator 111 generates series data from the barcode data that is stored in the barcode data storage unit 109, and stores the generated series data in the second series data storage unit 113. The machine learning unit 115 executes machine learning using the series data that is stored in the second series data storage unit 113 as input, and stores the machine learning result (for example, classification result) in the learning result storage unit 117. The addition unit 121 generates additional data based on the data that is stored in the first series data storage unit 101, and adds that data to the series data that is stored in the second series data storage unit 113.
Next, operation of the information processing apparatus 1 will be explained using
First, the first generator 103 of the information processing apparatus 1 reads out unprocessed series data that is stored in the first series data storage unit 101. When there are plural sets of unprocessed series data stored in the first series data storage unit 101, series data of one unprocessed set is read out. Then, the first generator 103 generates a pseudo attractor from the read out series data according to Takens' embedding theorem (FIG.46: step S21), and stores the generated pseudo attractor in the pseudo attractor data storage unit 105. This processing is the same as the processing of step S1.
The second generator 107 reads out the pseudo attractor that was generated in step S21 from the pseudo attractor data storage unit 105. Then the second generator 107 generates barcode data from the pseudo attractor for each hole dimension by calculation processing of persistent homology (step S23). The second generator 107 stores the generated barcode data in the barcode data storage unit 109. This processing is the same as the processing of step S3.
When barcode data is stored in the barcode data storage unit 109, the removal unit 119 deletes, from the barcode data storage unit 109, data of persistent intervals that have a length that is less than a predetermined length (step S25). This processing is the same as the processing of step S5.
The third generator 111 reads out barcode data that is stored in the barcode data storage unit 109. Then, the third generator 111 integrates the read out barcode data, and generates series data from the integrated barcode data (step S27). The third generator 111 stores the generated series data in the second series data storage unit 113. This processing is the same as the processing of step S7.
The addition unit 121 reads out the series data that was read out in step S21 (hereinafter, referred to as the original series data) from the first series data storage unit 101. Then, the addition unit 121 calculates an average value of the values included in the original series data, and normalizes the calculated average value (step S29). The calculation and normalization of average value is well-known calculation, and thus a further explanation is not given here.
The addition unit 121 generates additional data in which values during a whole period are fixed by the average value normalized in step S29 (step S31). In other words, values at each time of the additional data are the same as the normalized average value in the whole period. Then, the addition unit 121 adds the additional data at the head of or at the tail of the series data that is stored in the second series data storage unit 113 (step S33).
Returning to the explanation of
The machine learning unit 115 determines whether there is unprocessed series data (step S37). When there is unprocessed series data (step S37: YES route), the processing returns to step S21. When there is no unprocessed series data (step S37: NO route), the processing ends.
By executing processing such as described above, it is possible to distinguish different series data in machine learning even when translation and superimposing of series data are possible.
Although the embodiments of this invention were explained above, this invention is not limited to those. For example, the functional block configuration of the information processing apparatus 1, which is explained above, does not always correspond to actual program module configuration.
Moreover, the aforementioned data configuration is a mere example, and may be changed. Furthermore, as for the processing flow, as long as the processing results do not change, the turns of the steps may be exchanged or the steps may be executed in parallel.
In
The series data may also be data other than time series data (for example, a number sequence or a character string).
Moreover, in the second embodiment, it is also possible to use a set of series data and additional data as input for machine learning without adding the additional data to the series data. In other words, it is also possible to perform multiple input learning.
In this addendum, an explanation of matter related to these embodiments is added.
As for a time series having much oscillation, values with respect to time (in other words, vector element numbers) changes variously, and thus it is difficult to set a meaning for each element number. Therefore, for time series having much oscillation, a feature value was used such as explained in the column of background art.
However, when a target is a chaotic time series, these kinds of feature values may become a completely different value even for time series having the same rule of change. Chaos is a phenomenon in which different initial values produce results that appear to be completely different even though rules of change are the same. Such a characteristic of chaos is called initial value sensitivity, and commonly is also called a butterfly effect.
For example, assume that a time series changes according to the following rule.
x(i+1)=0.25·(tan h(−20·(x(i)−0.75))+tan h(−20·(x(i)−0.25)))+0.5
Here i is a variable that represents time. And when following this rule, when an initial value is 0.23 the value changes as illustrated in
A feature value of a dynamical system (for example, maximum Lyapunov exponent or the like) maybe used for a chaotic time series. However, a feature value of a dynamical system becomes the same value or becomes a meaningless value in all non-chaotic time series. Therefore, even when a feature value of a dynamical system is used, it is not possible to generate input for machine learning, which is able to handle a chaotic time series and non-chaotic time series at the same time.
For example, as illustrated in
On the other hand, by using the methods of the first embodiment and the second embodiment, it is possible to generate input for machine learning that is capable of handling both a chaotic time series and non-chaotic time series at the same time.
This appendix then ends.
In addition, the aforementioned information processing apparatus 1 is computer device as illustrated in
CPU 2503, they are read out from the HDD 2505 to the memory 2501. As the need arises, the CPU 2503 controls the display controller 2507, the communication controller 2517, and the drive device 2513, and causes them to perform predetermined operations. Moreover, intermediate processing data is stored in the memory 2501, and if necessary, it is stored in the HDD 2505. In these embodiments of this technique, the application program to realize the aforementioned functions is stored in the computer-readable, non-transitory removable disk 2511 and distributed, and then it is installed into the HDD 2505 from the drive device 2513. It may be installed into the HDD 2505 via the network such as the Internet and the communication controller 2517. In the computer device as stated above, the hardware such as the CPU 2503 and the memory 2501, the OS and the application programs systematically cooperate with each other, so that various functions as described above in details are realized.
The aforementioned embodiments are summarized as follows:
A machine learning method related to these embodiments includes: (A) first generating a pseudo attractor from each of plural series data sets, the pseudo attractor being a set of points in N-dimensional space, each of the points including N values sampled at an equal interval; (B) second generating a series data set of Betti numbers from each of plural pseudo attractors generated in the first generating by calculation of persistent homology, each of the Betti numbers being a number of holes for a radius of a N-dimensional sphere in the N-dimensional space; and (C) performing machine learning for each of plural series data sets of Betti numbers generated in the second generating, the series data set of Betti numbers being used as input in the machine learning.
By performing processing as described above, it becomes possible to convert a pseudo attractor to a format suitable for input of machine learning. And thus it becomes possible to classify a series data set by using a pseudo attractor generated from the series data set.
Moreover, the second generating may further include: (b1) third generating data of duration between birth and death of holes for each hole dimension by calculation of persistent homology; (b2) calculating the Betti numbers based on the data of duration for each hole dimension; and (b3) fourth generating the series data set of Betti numbers based on the Betti numbers calculated for each hole dimension. It becomes possible to classify with higher accuracy.
Moreover, each of the Betti numbers may be a number of holes whose difference between a radius at birth and a radius at death is a predetermined length or more. It becomes possible to remove an effect of noise.
Moreover, the machine learning method may further include: (D) calculating an average of values included in the series data set for each of the plural series data sets. And the performing may include (c1) performing the machine learning, the series data set of Betti numbers and the average being used as input in the machine learning. It becomes possible to classify properly even when handling series data that is able to be overlap by translation.
Moreover, each of the plural series data sets may be a labeled series data set, and (c2) the performing may include performing the machine learning for a relationship between the Betti numbers for the radius of the N-dimensional sphere and a label. It becomes possible to also handle supervised learning.
Moreover, the holes may be elements of a homology group.
Incidentally, it is possible to create a program causing a computer to execute the aforementioned processing, and such a program is stored in a computer readable storage medium or storage device such as a flexible disk, CD-ROM, DVD-ROM, magneto-optic disk, a semiconductor memory, and hard disk. In addition, the intermediate processing result is temporarily stored in a storage device such as a main memory or the like.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-229626 | Nov 2015 | JP | national |