This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-000885, filed on Jan. 6, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to an incremental learning management device, an incremental learning management method and a computer readable recording medium storing incremental learning management program.
Machine learning has been attracting attention for its ability to gain new knowledge and information that are useful for business from a large amount of time-series data arising from the Internet and various kinds of sensors. Achievement of both of short learning time and high accuracy is important in machine learning when dealing with a large amount of time-series data.
Zhao, J. “Parallelized incremental support vector machines based on MapReduce and Bagging technique”, 2012 discloses a related art.
According to an aspect of the embodiments, an incremental learning management method includes: extracting data by a computer from input data that are sequentially input based on a first window size and a first sampling rate; storing learning history information in which the first window size is associated with a learning time for the data and the first sampling rate; measuring a data rate of the input data; and calculating a second window size and a second sampling rate based on the data rate, the learning history information, and the first sampling rate.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
In a case where the input rate is high (for example, several thousand to several ten thousand pieces/second), the online learning method in which data for recent update are learned in several milliseconds is selected. The incremental learning method exhibits higher accuracy than the online learning unless the input rate exceeds a specific value (for example, several ten to several hundred pieces/second).
With the “incremental learning” method, almost equivalent accuracy to the batch learning is maintained, and learning is continued by using a previous result, without starting learning from scratch at each time when data arise.
For example, as illustrated in
In the support vector machine algorithm, the window size, for example, the number of input data that are accumulated in the input buffer is a predetermined fixed value and does not dynamically fluctuate. The window size is fixed in incremental learning algorithms other than the support vector machine. For example, the fixed value of the window size is decided such that the learning speed becomes the same as or faster than the input rate. For example, in a case where the window size is N3 pieces and the input rate is 100 pieces/second in
In a case where not all the data are learned but the data that are sampled in a specific ratio are learned, control is performed such that the time used for relearning by using M models becomes shorter, the learning speed becomes faster, and the learning time again becomes shorter than the accumulation time of input data.
Because not all the input data are used for learning because of the sampling, the accuracy of a learning result may decrease.
In the specification and drawings, the same reference characters are given to the elements that have substantially same or similar functional configurations, and descriptions thereof may not be repeated or reduced.
The machine learning includes three categories of learning methods, which are “batch learning”, “online learning”, and “incremental learning” illustrated in
In learning methods that are included in the category of “online learning” or “mini-batch learning”, a large amount of time-series data are learned almost in real time because learning is fast. However, those learning methods may exhibit low prediction accuracy about data that are not linearly separable.
In learning methods that are included in the category of “incremental learning”, almost equivalent accuracy to the batch learning is maintained, and learning is continued by using a previous result, without starting learning from scratch at each time when data arise. Thus, the learning time is shorter than the batch learning, and learning may be performed on time-series data almost in real time while high accuracy is retained.
As illustrated in
Hereinafter, the window size is the number of input data that are used for one piece of learning and will be represented by “N”. The sampling rate is an extraction rate of sample data that are actually used for learning from the window size N and will be represented by “S”. The input rate is a data amount (data rate) that is input in one second and will be represented by “R”.
The incremental learning apparatus 2e combines M pieces of model data, which are learning results so far obtained, with new and additional learning data and performs the incremental learning by using the combined data in accordance with an incremental learning algorithm. Models of M pieces of learned data are saved in a model table 2f.
For example, the incremental learning device 2 receives data transmitted from a terminal of a user who is provided with a certain service and models a behavioral pattern of the user by using those data. The result of the incremental learning is used for a purpose such as prediction of next behavior of the user. For example, in a case where the behavioral pattern of another user who has withdrawn from a certain service is similar to the modeled behavioral pattern of the user, the user is predicted to withdraw from the service with high probability. The result of the incremental learning may be used for some action to avoid withdrawal of the user or the like.
An incremental learning management device 1 is a device that manages the incremental learning device 2. The incremental learning management device 1 has an input rate measurement unit 11, a storage unit 12, a learning time calculation unit 13, an accuracy calculation unit 14, and optimization unit 15.
The input rate measurement unit 11 measures data that are input to the input data table 2a and received via the network or the like, for example, a flow rate (input rate or data rate) of additional data. The input rate measurement unit 11 counts how many pieces of data are received for one second, for example. The input rate measurement unit 11 may perform count for one minute or one hour in a case where the input rate is low.
The storage unit 12 has a learning history information table 121, a learning time prediction model table 122, an accuracy history information table 123, and an accuracy prediction model table 124.
The learning time calculation unit 13 has a learning time measurement unit 131, a learning time modeling unit 132, and a learning time prediction unit 133.
The learning time measurement unit 131 receives new data for next incremental learning, for example, additional data from the input buffer 2b and measures learning time t by recording the times when the incremental learning starts and finishes. The measured learning time t is stored in the learning history information table 121 while being associated with the window size N that is set at the point in time. The learning time measurement unit 131 calculates the learning time at each time when N pieces of data are learned by the incremental learning apparatus 2e.
The learning time modeling unit 132 extracts all the learning times and data amounts of the learning performed in the past from the learning history information table 121 and performs regression processing based on those pieces of information. When the regression processing finishes, the learning time modeling unit 132 records coefficients that are obtained as a result of the regression processing in the learning time prediction model table 122. The regression processing may not necessarily be executed by extracting all the learning times and data amounts of the learning performed in the past but may be executed based on portions of the leaning times and the data amounts of the learning performed in the past, for example.
A regression operation may employ regression techniques such as linear regression, polynomial regression, or non-parametric regression, for example. The regression technique to be used may be decided by the user. In a case where polynomial regression is used, it is assumed that the learning time function T(N, S) to be modeled is in a form such as the function T(N, S)=k2(N·S)2+k1(N·S)+k0, for example. The coefficient k2, coefficient k1, and coefficient k0 are defined in a regressive manner based on the learning times that are stored in the learning history information table 121 in the past. N×S is an amount of data that is newly added to the incremental learning apparatus 2e.
Modeling of the learning time may be effective in a case where the form of the learning time is in advance known to some extent. A method of non-parametric regression may be used in a case where the form of the learning time is not known in advance. Non-parametric regression may include Gaussian process regression, for example.
The learning time prediction unit 133 predicts the learning time in accordance with the regression technique that is used for modeling the learning time and designated by the user or set in advance. For example, in a case where polynomial regression is used for modeling the learning time, the coefficients that are obtained as a result of the polynomial regression (for example, k2, k1, and k0) are used to calculate the learning time function T(N, S)=k2(N·S)2+k1(N·S)+k0. Accordingly, a prediction value T of the learning time is calculated.
The accuracy calculation unit 14 has an accuracy measurement unit 141, an accuracy modeling unit 142, and an accuracy prediction unit 143. The accuracy measurement unit 141 receives model data (input data) that the incremental learning device 2 obtains by the incremental learning and measures the accuracy of the learning result with the set sampling rate S. The accuracy may be calculated by a function A(S) that models the accuracy, for example. The accuracy measurement unit 141 may acquire test data instead of the model data. Measured accuracy P is stored in the accuracy history information table 123.
The accuracy modeling unit 142 performs regression processing based on the accuracy prediction model table 124. In a case where logarithm regression is used, it is assumed that an accuracy function P(S: sampling rate) to be modeled is in a form such as A(S)=k0+k1 log(N·S), for example, and the accuracy is modeled.
The accuracy prediction unit 143 predicts the accuracy in accordance with the regression technique that is used for modeling the accuracy and designated by the user or set in advance.
The learning time calculation unit 13 and the accuracy calculation unit 14 may be examples of a calculation unit that calculates (optimizes) the window size N and the sampling rate S based on the measured data rate, the learning history information, and the present sampling rate S.
The optimization unit 15 optimizes the window size N and the sampling rate S based on the accuracy in accordance with the sampling rate S. Thus, the number of data of the additional data that are accumulated in the input buffer 2b is variably controlled to an appropriate value based on the optimized window size N. The number of sampled data that are output from the sampler 2c to the incremental learning apparatus 2e is variably controlled to an appropriate value based on the optimized sampling rate S.
When the process starts, the input data are received (operation S1) and accumulated in the input buffer 2b. When N pieces of data of the window size are accumulated in the input buffer 2b, N pieces of data are output from the input buffer 2b, and the output data are sampled by the sampler 2c (operation S2).
The learning time calculation unit 13 acquires N×S pieces of output data that are sampled by the sampler 2c and measures the learning time of the incremental learning apparatus 2e (operation S3).
The learning time calculation unit 13 calculates the learning time in accordance with the set regression technique by using the coefficients of the regression equation, which are calculated by the regression processing, based on the learning time prediction model table 122 (operation S4).
The accuracy calculation unit 14 calculates the accuracy in accordance with the set regression technique by using the coefficients of the regression equation based on the accuracy prediction model table 124 (operation S5).
The optimization unit 15 optimizes the window size N and the sampling rate S based on the input rate 111 in this point in time, the learning time prediction model table 122, and the accuracy prediction model table 124 (operation S6). The optimization unit 15 sets the window size N and the optimized window size N based on accuracy A and controls the input buffer 2b (operation S7). The optimization unit 15 sets the optimized sampling rate S and controls the sampler 2c (operation S8). The process returns to operation S2 and repeats operations S2 to S8.
In a case where it is determined that the N pieces of data are saved in the input buffer 2b, the incremental learning apparatus 2e acquires data that are output from the input buffer 2b and extracted by the sampler 2c (operation S18) and outputs the acquired data to the learning time measurement unit 131 (operation S20). The input buffer 2b thereafter waits for next input data (operation S16) and repeats operations S10 to S20 when new input data are received (operation S10). The incremental learning is performed at each time when N pieces of data are accumulated in the input buffer 2b. The learning history information that corresponds to the incremental learning is accumulated in the learning history information table 121 by a next learning time measurement process.
The learning time modeling unit 132 performs regression processing based on the acquired learning times (operation S52). When the regression processing finishes, the learning time modeling unit 132 records coefficients that are obtained as a result of the regression processing in the learning time prediction model table 122 (operation S54) and finishes the process. A search for the window size N and the sampling rate S is performed. Each time when the model of the learning time is updated or the input rate is changed largely (in a specific or higher ratio), the window size N and the sampling rate S are again set, and the combination of the window size N and the sampling rate S is optimized.
A method of obtaining the optimal combination (N, S) may include the derivative equation of the learning time function and the hill climbing method. The derivative equation of the learning time function exhibits fast processing time but may not be applicable due to low versatility. In a case where polynomial regression is used for modeling the learning time, the derivative equation of the learning time function is used. The hill climbing method exhibits slow processing time but is applied to any case because of high versatility. The hill climbing method is used in a case where non-parametric regression is used for modeling the learning time.
The optimization by using the derivative equation of the learning time function is applied in a case where the combinations of (N, S), which allows the learning speed to be equivalent to the input rate, may be used for formulation of a function such as S=F(N), formulation of a derivative function ds(N)/dN of the function S=F(N), and formulation of Nmax(K) where ds(N)/dN=0. K represents the model obtained by modeling the learning time. For example, in a case of a quadratic polynomial, K={k0, k1, k2} is obtained.
Nmax(k2, k0) and Smax(k2, k0) are in advance formulated, the coefficients k2 and k0 that are obtained by modeling the learning time during execution are acquired from the learning time prediction model table 122, and Nmax and Smax are directly obtained.
For example, a model equation (A) that is expressed by a function T(N, S)=k2(N·S)2+k0 is used for the learning time, and a model equation (B) that is expressed by A(S)=k0+k1 log(N·S) is used for the accuracy.
For example, the hill climbing method (subgradient method) is used to select the optimal values of the window size N and the sampling rate S. In the optimization of two parameters (window size N and sampling rate S) by using the hill climbing method, the combination with the highest accuracy A(S) is selected from multiple combinations of (N, S) in which a learning speed TS(N, S) is the same as or approximates an input rate R.
For example, as the modeled learning time T, the model equation that is expressed by the function T(N, S)=k2(N·S)2+k0 is used. As the modeled accuracy A, the model that is expressed by the function A(S)=k0+k1 log(N·S) is used.
The modeling of the learning time T and the accuracy A is not limited to the above modeling. The learning time T(N, S) may be modeled by using a method of polynomial regression or linear interpolation, for example.
The learning time calculation unit 13 fixes the window size N at this point in time and searches for a sampling rate Snext that allows the learning speed TS(N, S) to be equivalent to the input rate R at this point in time (operation S62).
The learning time calculation unit 13 increases the window size N without changing the sampling rate Snext and searches for the sampling rate S in accordance with the model of the learning time T(N, S), and searches for a window size Nnext (N2 in
The learning time calculation unit 13 fixes the window size N at this point in time and searches for the sampling rate Snext that allows the learning speed TS(N, S) to be equivalent to the input rate R at this point in time (operation S66). As a result, as illustrated in (2) in
The accuracy calculation unit 14 predicts accuracy A(Snext) in accordance with the model of the accuracy A(S), which is in advance defined, based on the sampling rate Snext at this point in time (operation S68). The accuracy calculation unit 14 determines whether the accuracy A is lower than the previous accuracy (operation S70). In a case where the accuracy calculation unit 14 determines that the accuracy A is not lower than the previous accuracy, the accuracy calculation unit 14 determines whether an accuracy improvement (the difference from the previous accuracy) is higher than a threshold (operation S72). In a case where the accuracy calculation unit 14 determines that the accuracy improvement is higher than the threshold, the accuracy calculation unit 14 determines that the accuracy is improved. The process returns to operation S64, and the process of operation S64 and subsequent operations is repeated.
In a case where the accuracy calculation unit 14 determines that the accuracy A is not lower than the previous accuracy (operation S70) and the accuracy improvement is equivalent to or lower than the threshold (operation S72), the accuracy calculation unit 14 determines that a further search probably does not lead to an accuracy improvement and selects the combination of (N, S) at this point in time as the optimal values (operation S78).
In operation S70, in a case where the accuracy calculation unit 14 determines that the accuracy A is lower than the previous accuracy, the accuracy calculation unit 14 determines that the search is in a lower point than the vertex (inflection point) of the model of the learning time T(N, S), reduces the step size, and changes a step direction (the direction of search) to the opposite direction (operation S74). The learning time calculation unit 13 determines whether the step size at this point in time is equivalent to a minimum value that is in advance defined (operation S76). In a case where the learning time calculation unit 13 determines that the step size at this point in time is different from the minimum value, the process returns to operation S64 and repeats operation S64 and subsequent operations.
In a case where the learning time calculation unit 13 determines that the step size at this point in time is equivalent to the minimum value, the learning time calculation unit 13 determines that the optimal values of the window size N and the sampling rate S are obtained and selects the combination of (N, S) at this point in time as the optimal values (operation S78). The rightward search process finishes.
As indicated by “optimal solution” in
As indicated by “optimal solution” in
As indicated by “optimal solution” in
The learning time calculation unit 13 reduces the window size N without changing the sampling rate Snext and searches for the sampling rate S in accordance with the model of the learning time T(N, S), and searches for the window size Nnext (operation S80). Accordingly, the step size of the leftward search is defined.
The learning time calculation unit 13 fixes the window size N at this point in time and searches for the sampling rate Snext that allows the learning speed TS(N, S) to be equivalent to the input rate R at this point in time (operation S82). As a result, as illustrated in (2) in
The learning time calculation unit 13 predicts the accuracy A(Snext) in accordance with the model of the accuracy A(S), which is in advance defined, based on the sampling rate Snext at this point in time (operation S84). The accuracy calculation unit 14 determines whether the accuracy A is lower than the previous accuracy (operation S86). In a case where the accuracy calculation unit 14 determines that the accuracy A is not lower than the previous accuracy, the accuracy calculation unit 14 fixes the window size N at this point in time and searches for the sampling rate Snext that allows the learning speed TS(N, S) to be equivalent to the input rate R at this point in time (operation S88). In a case where the sampling rate Snext is found as a result (operation S90: “Yes”), the process returns to operation S80 and repeats operation S80 and subsequent operations.
In a case where the sampling rate Snext is not found in operation S88 (operation S90: “No”) or a determination is made that the accuracy A is lower than the previous accuracy in operation S86, the accuracy calculation unit 14 reduces the step size and changes the step direction (the direction of search) to the opposite direction (operation S92).
The learning time calculation unit 13 determines whether the step size at this point in time is equivalent to the minimum value that is in advance defined (operation S94). In a case where the learning time calculation unit 13 determines that the step size at this point in time is different from the minimum value, the process returns to operation S80, and the process of operation S80 and subsequent operations is repeated.
In a case where the learning time calculation unit 13 determines that the step size at this point in time is equivalent to the minimum value, the learning time calculation unit 13 determines that the optimal values of the window size N and the sampling rate S are obtained and selects the combination of (N, S) at this point in time as the optimal values (operation S96). The leftward search process finishes.
As indicated by “optimal solution” in
In a case where the learning time is in a form such as T(N)=log(N), only a slight accuracy improvement may be achieved regardless of how much the sampling rate S is increased. On the other hand, in the above search method, as described about operation S72, in a case where the probable accuracy improvement is smaller than a specific threshold, continuation of the process by further increasing the sampling rate S is avoided, and the window size N and sampling rate S at this point in time are set as the optimal values.
For example, the higher the sampling rate S becomes, the more the accuracy A(S) increases. For example, depending on the circumstances, it is not preferable to unconditionally make the sampling rate S higher. For example, in a case where the learning time is a function such as T(N)=log(N), the learning speed TS(N, S) becomes equivalent to the input rate R if N is increased in response to the increase in S even if S is unconditionally increased. Because the learning time for one difference also increases, freshness of the model may be lost.
As for the accuracy such as A(S)=log(S) also, the more the sampling rate S increases, the smaller the improvement in the accuracy A(S) becomes. Thus, in a case where the sampling rate S exceeds a specific sampling rate, increasing the sampling rate S may result in little effect. Thus, selection of (N, S) from which an improvement in the accuracy A(S) is expected, that is, selection of the optimal values of the window size N and the sampling rate S may be performed.
When the process illustrated in
The optimization unit 15 may control the incremental learning device 2 by using either one of two optimal values of the window size N and the sampling rate S, for example, either one of the optimal values M1 and M2 in
The optimization unit 15 changes the window size N of the input buffer 2b to the set window size Nmax (operation S102) and increases or reduces the data amount that is retained in the input buffer 2b.
The optimization unit 15 changes the sampling rate S of the sampler 2c to the set sampling rate Smax (operation S104) and increases or reduces the data amount that the sampler 2c samples from the data output from the input buffer 2b.
In the incremental learning management device 1, the combination (N, S) of the optimal window size N and sampling rate S is selected based on prediction of the input rate R and the learning time T. For example, the processing speed of the incremental learning is increased by reducing the sampling rate S, the window size N is also changed, and the combination with the highest accuracy A of learning is thereby found.
In the incremental learning, the window size N and the sampling rate S are variably set within the restriction range of the learning time T in accordance with the input rate R, for example, the range where the learning time T does not exceed the accumulation time of input data into the input buffer 2b.
Learning time:TT(N,S)=k2(M+S·N)2+k0(M is a model size)
k2 and k0 are calculated from the polynomial equation in the following form during execution.
T(N′)=k2(N′)2+k0(N′=M+S×N)
Learning speed:TS(N,S)=N/T(N,S)
During execution, for example, at each time when the input rate R changes, the combination (Nmax, Smax) of the optimal window size and sampling rate that maximizes the accuracy among the combinations of (N, S) that allow the learning speed TS(N, S) to be equivalent to the input rate R is obtained as follows.
A function of the accuracy A=F(N, k2, k0) is extracted in advance (before execution) from the conditions where the learning speed TS(N, S) becomes equivalent to the input rate R. The derivative function S′(N, k2, k0)=ds(N, k2, k0)/dN of S(N, k2, k0) is obtained. The function of Nmax(k2, k0) that allows S′(N, k2, k0) to become zero is obtained in advance (before execution), and thereby the optimal value Nmax of N is simply calculated by using k2 and k0 that are decided by the regression processing during execution.
For example, the function of S(N, k2, k0) is extracted in advance (before execution) from the conditions where learning speed TS(N, S)=input rate R, as follows. For example, equation (3) may be obtained based on the following equation (1) and equation (2). The function of S(N, k2, k0) expressed by equation (4) may be obtained based on equation (3).
The following equation (5) may be obtained by calculating the derivative function S′(N, k2, k0)=ds(N, k2, k0)/dN of S(N). As expressed by equation (6), the function of Nmax(k2, k0) that allows the derivative function S′(N, k2, k0) to become zero is obtained in advance (before execution) based on equation (5) and equation (6). Accordingly, as expressed by equation (7), the function of Nmax(k2, k0) that allows the derivative function S′(N, k2, k0) to become zero may be obtained. The optimal value Nmax(k2, k0) of the window size N is calculated based on k2 and k0 that are decided by the regression processing during execution.
The optimal value Nmax(k2, k0) calculated based on equation (7) is substituted into equation (8), and the optimal value Smax(k2, k0) of the sampling rate S is thereby calculated.
S
max(k2,k0)=S(Nmax(k2,k0)) (8)
The input device 101 includes a keyboard, a mouse, and so forth and is used to input operating signals to the incremental learning management device 1. The display device 102 includes a display and so forth and displays various kinds of processing results.
The communication I/F 107 is an interface that couples the incremental learning management device 1 with the network. The incremental learning management device 1 thereby performs data communication with other apparatuses via the communication I/F 107.
The HDD 108 is a non-volatile storage device that stores programs and data. The stored programs and data may include basic software that controls wholly the device and application software. For example, the HDD 108 stores various kinds of DB information, programs, and so forth.
The external I/F 103 is an interface with external devices. The external devices may include a recording medium 103a and so forth. The incremental learning management device 1 performs readout from and/or writing in the recording medium 103a via the external I/F 103. The recoding medium 103a may include compact disks (CD), digital versatile disks (DVD), SD memory cards, universal serial bus memories (USB memory), and so forth.
The ROM 105 is a non-volatile semiconductor memory (storage device) that is capable of retaining internal data even if the ROM 105 is powered off. The ROM 105 stores programs and data about network settings and so forth. The RAM 104 is a volatile semiconductor memory (storage device) that temporarily retains programs and data. The CPU 106 may be a computing device that reads out programs and data to the RAM 104 from the storage devices, for example, the “HDD 108”, the “ROM 105”, and so forth, executes processing, and thereby realizes control of the device wholly and installed functions.
The incremental learning management device 1 manages the incremental learning device 2 by using the hardware configuration. For example, the CPU 106 executes an optimization process of window size/sampling rate (N/S) by using the data and programs that are stored in the ROM 105 and the HDD 108. Thus, the window size and the sampling rate are variably set within the restricting range of the learning time in accordance with the input rate by the incremental learning device 2, and the learning result with high learning accuracy may thereby be obtained. Information about the learning history information table 121, the learning time prediction model table 122, the accuracy history information table 123, and the accuracy prediction model table 124 may be stored in a cloud server or the like that is coupled with the incremental learning management device 1 via the RAM 104, the HDD 108, or the network.
The functions of the incremental learning management device may be configured with hardware, software, or a combination of hardware and software.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2015-000885 | Jan 2015 | JP | national |