The present invention relates to a learning apparatus, a learning method, and a learning program.
Machine learning is applied to a field of performing parameter learning of a model so as to lower an error function based on observation data for a problem to be solved such as discrimination, regression, and clustering and performing prediction onto unknown data. In machine learning, models are created from past observation data to predict future data. In this machine learning, models need to be created with less deviation (error) between predicted data and measured data. Furthermore, machine learning is expected to create models with small errors and in a short time.
Among the existing algorithms applied to the learning of parameters of a model, a stochastic gradient descent method is established as one of general-purpose learning algorithms. The stochastic gradient descent method is a method of iteratively performing operation of randomly selecting learning data to calculate an error function and correcting a parameter in a gradient direction of decreasing the error function. Recently, various learning algorithms based on the stochastic gradient descent method have been proposed to implement efficient learning. Note that efficient means that the error function can be lowered by the less number of times of parameter updates than in the conventional stochastic gradient descent method.
For example, there is proposed an algorithm referred to as AdaGrad to implement efficient learning by automatically adjusting the learning rate based on the stochastic gradient descent method (refer to Non Patent Document 1, for example). Note that the learning rate is a hyperparameter for controlling an update amount of a parameter at the time of model learning. The setting of this learning rate determines how quickly the error function can be minimized.
In addition, the algorithm referred to as RMSProp is an algorithm that applies automatic adjustment of the learning rate also to learning complex models such as deep learning. In addition, there are proposed algorithms including one referred to as AdaDelta (for example, refer to Non Patent Document 2) having convergence faster than AdaGrad and capable of easily obtaining local optimal solution, and an efficient learning algorithm referred to as Adam (for example, refer to Non Patent Document 3). Among them, experiments have indicated that Adam has the highest efficiency among the algorithms that automatically adjust learning rates.
The above-described AdaGrad, RMSProp, AdaDelta, and Adam automatically adjust the learning rate by dividing a learning rate by a moving average of an absolute value of past first-order gradient. Note that the first-order gradient refers to differentiation associated with parameters in an error function.
This first-order gradient is information that defines a direction of parameter update. Therefore, it can be speculated that information indicating the direction of the first-order gradient is important in adjusting the learning rate. However, since AdaGrad, RMSProp, AdaDelta, and Adam use an absolute value of the first-order gradient, information related to the direction of the first-order gradient is lost in the learning rate, and efficient learning is estimated to have limitations.
The present invention has been made in view of the above, and aims to provide a learning apparatus, a learning method, and a learning program capable of achieving efficient learning.
To solve the above problem and attain the object, a learning apparatus according to the present invention is a learning apparatus that performs learning using a stochastic gradient descent method in machine learning, and includes: a gradient calculation unit that calculates a first-order gradient in the stochastic gradient descent method; a statistic calculation unit that calculates a statistic of the first-order gradient; an initialization bias removing unit that removes an initialization bias when the statistic calculation unit calculates the statistic of the first-order gradient from the statistic of the first-order gradient calculated by the statistic calculation unit; a learning rate adjustment unit that adjusts a learning rate by dividing the learning rate by standard deviation of the first-order gradient based on the statistic of the first-order gradient; and a parameter updating unit that updates a parameter of a learning model using the learning rate adjusted by the learning rate adjustment unit.
According to the present invention, it is possible to achieve efficient learning.
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited by the present embodiment. Furthermore, same portions are denoted by same reference numerals in the description of the drawings.
Main symbols used in the embodiments are illustrated in a table below. Hereinafter, the same symbols are used in a mathematical background of the conventional technology, a mathematical background of the embodiments, and individual descriptions of the embodiments.
[Mathematical Background of Conventional Technology]
First, the background knowledge to be the basis of the following description will be explained. Machine learning is basically a technique of learning a model from observation data so as to minimize an error function of a problem to be solved and performing prediction onto unknown data using the learned model. Examples of the problems to be solved include data classification, regression, and clustering. Examples of the error function include a square error and cross entropy. Examples of models include logistic regression and neural network.
Here, when an error function is f(⋅) and a parameter of a learning model is θ, learning is a problem of finding θ that minimizes f(θ). The stochastic gradient descent method is a widely used algorithm among various types of algorithms for learning. In the stochastic gradient descent method, learning is performed by repeatedly applying the following Formula (1).
θi,t=θi,t-1−α∇f(θi,t-1;xt-1) (1)
α is one of hyperparameters manually set in order to define an update range of the parameter and is referred to as a learning rate. Since the learning rate defines the update range, it greatly affects efficiency of learning. With appropriate setting of the learning rate, it is possible to continue learning with high efficiency. In recent years, researches to achieve high efficiency with automatic adjustment of the learning rate based on various types of information are ongoing. Note that high efficiency means that the error function can be lowered by the less number of times of parameter updates than in the conventional stochastic gradient descent method.
For example, the learning algorithm referred to as Adam automatically adjusts the learning rate by dividing the learning rate by the moving average of the absolute value of the past first-order gradient. First-order gradient refers to differentiation with respect to parameters in the error function and includes information that defines the direction of parameter update. However, since Adam uses an absolute value of the first-order gradient for the learning rate, information on the direction of the first-order gradient is lost, and efficient learning is also expected to be limited.
[Mathematical Background of Embodiment]
The present embodiment automatically adjusts the learning rate based on information indicating the direction of the gradient in the stochastic gradient descent method. The present embodiment repeats application of the following series of Formulas (2) to (7) instead of Formula (1), thereby implementing adjustment of the learning rate based on the information indicating the direction of the gradient. In the present embodiment, a repetitive calculation count is denoted by t.
First, in the present embodiment, each of variables used in Formulas (2) to (7) described below is initialized, hyperparameters α, β1, and β2 are set to empirically obtained standard values. β1 and β2 are weights in calculating the statistics of the first-order gradient in the stochastic gradient descent method. β1 is a weight in calculating an approximation of a moving average of the first-order gradient and β2 is a weight in calculating a moving average of a variance of the first-order gradient. Subsequently, the present embodiment executes calculation using the following Formula (2). Formula (2) indicates that the first-order gradient of the i-th parameter in the (t−1)th repetition is denoted by a symbol gi,t.
g
i,t
=∇f(θi,t-1;xt-1) (2)
In addition, in the present embodiment, an approximate value of the moving average of the i-th first-order gradient gi,t in the t-th repetition is obtained using the following Formula (3).
m
i,t=β1mi,t-1+(1−β1)(gi,t−mi,t-1) (3)
The approximate value mi,t of the moving average of the first-order gradient gi,t in Formula (3) is an approximation of the moving average of the first-order gradient over the past time. The approximate value mi,t of the moving average of the first-order gradient is a statistic related to the first-order gradient gi,t.
Subsequently, the present embodiment uses the following Formula (4) onto the approximate value mi,t of the moving average of the first-order gradient gi,t to remove an initialization bias. In other words, the present embodiment uses Formula (4) to remove the initialization bias from the approximate value mi,t of the moving average of the first-order gradient gi,t.
In addition, the present embodiment uses the following Formula (5) to obtain a moving average of the variance of the i-th first-order gradient gi,t in the t-th repetition.
c
i,t=β2ci,t-1+β2(1−β2)(gi,t−mi,t-1)2 (5)
A moving average ci,t of the variance of the i-th first-order gradient gi,t in Formula (5) is a moving average of the variance of the first-order gradient over the past time. The moving average ci,t of the variance of this first-order gradient gi,t is a statistic of the first-order gradient gi,t. The moving average ci,t of the variance of the first-order gradient gi,t is a value determined by the dispersion in the past direction of the first-order gradient gi,t, and includes information indicating the direction of the first-order gradient gi,t.
Subsequently, the present embodiment uses the following Formula (6) onto the moving average ci,t of the variance of the first-order gradient gi,t to remove an initialization bias. In other words, the present embodiment uses Formula (6) to remove the initialization bias from the moving average ci,t of the variance of the first-order gradient gi,t.
In addition, the present embodiment uses the following Formula (7) to adjust the learning rate.
where ϵ is a small value to stabilize calculation, for example, 10−8.
The present embodiment repeats the calculation of Formulas (2) to (7) until a parameter θt of the learning model converges. As illustrated in Formula (7), the present embodiment uses formulation of automatically adjusting the learning rate by dividing the learning rate by a square root of the moving average ci,t of the variance of the first-order gradient gi,t after bias removal, that is, dividing by standard deviation of the first-order gradient. Here, the variance is determined by the dispersion of the first-order gradient in the past direction.
Therefore, the present embodiment enables execution of adjustment of the learning rate based on the information of the direction of the first-order gradient, making it possible to lower the error function. That is, according to the present embodiment, it is possible to achieve efficient learning.
A learning apparatus or the like according to the present embodiment will be described based on the mathematical background of the embodiment described above. Note that the following embodiments are given as an example.
[Configuration of Learning Apparatus]
The gradient calculation unit 11 calculates a first-order gradient in the stochastic gradient descent method. Specifically, the gradient calculation unit 11 takes θt updated by the parameter updating unit 15 as input. Furthermore, the gradient calculation unit 11 takes input data xt from the external apparatus as input. The gradient calculation unit 11 calculates a first-order gradient gt for t representing the repetitive calculation count and outputs a calculation result to the statistic calculation unit 12.
First, the gradient calculation unit 11 initializes each of the variables. In this case, the gradient calculation unit 11 sets t=0 for the repetitive calculation count t. Then, the gradient calculation unit 11 sets mt=m0 for the approximate value mt of the moving average of the first-order gradient gt, and sets ct=c0 for the moving average ct of the variance of the first-order gradient gt. In a similar manner, the initial value is also set for mt after removal of an initialization bias and ct after removal of the initialization bias. This initialization is performed merely at the first time.
Then, the gradient calculation unit 11 takes inputs of the input data xt and the parameter θt. Subsequently, the gradient calculation unit 11 increments t by +1. Due to this +1 increment, the approximate value mt of the moving average of the first-order gradient and the moving average ct of the variance of the first-order gradient from each of which the initialization bias described below has been removed are going to be an approximate value mt-l of the moving average of the first-order gradient and a moving average ct-l of the variance of the first-order gradient. Accordingly, in a case where initialization is performed for each of variables, t=1 is established due to this +1 increment, and then, the approximate value mt-1 of the moving average of the first-order gradient and the moving average ct-1 of the variance of the first-order gradient from each of which the initialization bias has been removed are going to be the approximate value mt-1 of the moving average of the first-order gradient and the moving average ct-1 of the variance of the first-order gradient from each of which the initialization bias has been removed.
Then, the gradient calculation unit 11 uses Formula (2) to calculate the first-order gradient gt and outputs the result to the statistic calculation unit 12.
The statistic calculation unit 12 calculates the statistics of the first-order gradient. Specifically, the statistic calculation unit 12 takes the first-order gradient gt output from the gradient calculation unit 11 and the standard values of hyperparameters α, β1, and β2 as inputs and calculates the approximate value mt of the moving average of the first-order gradient gt and the moving average ct of variance of the first-order gradient gt as statistics. The statistic calculation unit 12 uses Formula (3) to calculate the approximate value mt of the moving average of the first-order gradient gt. Then, the statistic calculation unit 12 uses Formula (5) to calculate the moving average ct of the variance of the first-order gradient gt. The statistic calculation unit 12 outputs the approximate value mt of the moving average of the first-order gradient gt and the moving average ct of the variance of the first-order gradient gt to the initialization bias removing unit 13.
The initialization bias removing unit 13 removes the initialization bias from the statistic of the first-order gradient calculated by the statistic calculation unit 12. Specifically, the initialization bias removing unit 13 uses Formula (4) onto the approximate value mt of the moving average of the first-order gradient gt to remove the initialization bias. Then, the initialization bias removing unit 13 uses Formula (6) onto the moving average ct of the variance of the first-order gradient gt to remove the initialization bias. Note that the calculation described in Non Patent Document 3 may be used for calculation to remove the initialization bias, for example.
The learning rate adjustment unit 14 adjusts the learning rate by dividing the learning rate by the standard deviation of the first-order gradient based on the statistic of the first-order gradient. Specifically, the learning rate adjustment unit 14 uses Formula (7) to adjust the learning rate based on the approximate value mt of the moving average of the first-order gradient gt and the moving average ct of the variance of the first-order gradient gt from each of which the initialization bias has been removed by the initialization bias removing unit 13. Specifically, the learning rate adjustment unit 14 adjusts the learning rate by dividing the learning rate by the standard deviation of the first-order gradient based on the statistic from which the initialization bias has been removed.
The parameter updating unit 15 updates the parameters of the learning model using the learning rate adjusted by the learning rate adjustment unit 14. Specifically, the parameter updating unit 15 updates the model parameter θt based on the calculation result of the learning rate adjustment unit 14. In a case where the parameter θt converges, the parameter updating unit 15 finishes the calculation processing. In contrast, in a case where the parameter θt does not converge, the parameter θt is output to the gradient calculation unit 11. This causes the gradient calculation unit 11 to increment t by +1. Then, the gradient calculation unit 11, the statistic calculation unit 12, the initialization bias removing unit 13, and the learning rate adjustment unit 14 repeat calculation of Formulas (2) to (7).
[Learning Processing]
Next, learning processing executed by the learning apparatus 10 will be described.
Then, the gradient calculation unit 11 takes inputs of input data xt and the parameter θt and increments t by +1 (Step S3). Subsequently, the gradient calculation unit 11 uses Formula (2) to calculate the first-order gradient gt (Step S4) and outputs the result to the statistic calculation unit 12.
Then, the statistic calculation unit 12 takes the first-order gradient gt output from the gradient calculation unit 11 and the standard values of hyperparameters α, β1, and β2 as inputs, and uses Formula (3) to calculate the approximate value mt of the moving average of the first-order gradient gt (Step S5). In addition, the statistic calculation unit 12 uses Formula (5) to calculate the moving average ct of the variance of the first-order gradient gt (Step S6).
Then, the initialization bias removing unit 13 removes the initialization bias with respect to the approximate value mt of the moving average of the first-order gradient gt and the moving average ct of the variance of the first-order gradient gt calculated by the statistic calculation unit 12 (Step S7). The initialization bias removing unit 13 uses Formula (4) onto the approximate value mt of the moving average of the first-order gradient gt to remove the initialization bias. Then, the initialization bias removing unit 13 uses Formula (6) onto the moving average ct of the variance of the first-order gradient gt to remove the initialization bias.
Subsequently, the learning rate adjustment unit 14 adjusts the learning rate (Step S8) using a second term of Formula (7) based on the approximate value mt of the moving average of the first-order gradient gt and the moving average ct of the variance of the first-order gradient gt, from each of which the initialization bias has been removed by the initialization bias removing unit 13. In Formula (7), the learning rate is adjusted by calculating a product of the learning rate and the value obtained by dividing the approximate value of the moving average of the first-order gradient by the standard deviation of the first-order gradient that is a square root of the moving average of the variance of the first-order gradient.
Then, the parameter updating unit 15 updates the parameter θt of the model based on the calculation result of Step S8 (Step S9). Thereafter, the parameter updating unit 15 determines whether the parameter θt of the model has converged (Step S10). In a case where the parameter updating unit 15 determines that the parameter θt has converged (Step S10: Yes), the learning apparatus 10 finishes the processing. In contrast, in a case where the parameter updating unit 15 determines that the parameter θt has not converged (Step S10: No), the learning apparatus 10 returns to Step S3. That is, the gradient calculation unit 11 increments t by +1 and executes the processing from Step S4 onward again.
The above-described learning processing adjusts the learning rate by dividing the learning rate by the standard deviation of the first-order gradient. In other words, the above-described learning processing adjusts the learning rate using the standard deviation of the first-order gradient including information defining the direction of parameter update. Therefore, according to the learning processing described above, it is possible to achieve efficient learning.
[Learning Algorithm]
Next, a learning algorithm used by the learning apparatus 10 will be described.
First, the learning algorithm inputs α, β1, β2, and θ0. This corresponds to Step S1 illustrated in
The learning algorithm increments t by +1 (third line in
The learning algorithm uses Formula (3) to calculate the approximate value mt of the moving average of the first-order gradient gt (fifth line in
Then, the learning algorithm uses Formula (4) onto the approximate value mt of the moving average of the first-order gradient gt to remove the initialization bias (seventh line in
The learning algorithm uses Formula (7) to adjust the learning rate based on the approximate value mt of the moving average of the first-order gradient gt and the moving average ct of the variance of the first-order gradient gt from each of which the initialization bias has been removed, and updates the parameter θt (ninth line in
The learning algorithm repeats the processing from the second line to the seventh line in
In the present embodiment, the learning rate is adjusted by dividing the learning rate by the standard deviation of the first-order gradient instead of by the absolute value of the first-order gradient in the stochastic gradient descent method, making it possible to execute more efficient learning than in conventional methods.
Specifically, it was experimentally found that, in the present embodiment, the learning rate is adjusted by dividing the learning rate by the standard deviation of the first-order gradient, making it possible to achieve a greater error decrease in a case where the repetitive calculation count t is incremented by one, compared with conventional Adam (refer to Non Patent Document 3, for example). That is, according to the present embodiment, the parameter θt can be converged by the learning with less repetitive calculation count t than the conventional Adam. Therefore, according to the present embodiment, it is possible to achieve more efficient learning as compared with the conventional Adam.
Furthermore, in the present embodiment, the standard deviation of the first-order gradient including the information defining the direction of updating the parameter is used to adjust the learning rate, making it possible to obtain a smaller error function of the learned model than by Adam, leading to acquisition of experimental results with high accuracy.
In addition, the present embodiment does not need a learning rate attenuation schedule necessary in conventional learning (e.g., AdaGrad (refer to Non Patent Document 1, for example), it is also unnecessary to perform manual tuning on the learning rate attenuation schedule, enabling the reduction in the tuning cost.
Here, some conventional algorithms need to perform manual tuning for a gradient clipping threshold in order to avoid a failure in learning due to extremely large update of parameters in a case where the gradient becomes extremely large. That is, in a conventional case where the gradient becomes extremely large and exceeds the threshold, calculation is performed using a threshold instead of an actual gradient value to reduce learning failures. Conventionally, it was necessary to manually tune this threshold.
In contrast, the present embodiment divides the learning rate by the standard deviation of the first-order gradient in Formula (7) that is an arithmetic expression to obtain the parameter θt. Here, when the gradient becomes extremely large, the variance of the gradient also increases accordingly. Therefore, in the embodiment, even when the gradient included in the numerator of Formula (7) becomes extremely large, the variance of the gradient included in the denominator also increases, and thus, the parameter θt would not become extremely large. In this manner, in the present embodiment, since the update amount of the parameter θt does not become extremely large, it can be said that the likelihood of occurrence of a learning failure is low. For this reason, in the present embodiment, since the calculation proceeds without providing a gradient clipping threshold, manual tuning itself with respect to the gradient clipping threshold becomes unnecessary, leading to the reduction of the tuning cost.
[Modification]
A modification according to the present embodiment will be described. Also in the modification, the learning rate is automatically adjusted based on information indicating the direction of the gradient in the stochastic gradient descent method. The present modification repeats application of the following series of Formulas (8) to (12) instead of Formulas (2) to (7), thereby implementing adjustment of the learning rate based on the information indicating the direction of the gradient. Also in this modification, the repetitive calculation count is denoted by t.
First, in this modification, each of variables used in Formulas (8) to (12) described below is initialized, and hyperparameters α and β1 are set to empirically obtained standard values. β1 is a weight in calculating the statistic of the first-order gradient in the stochastic gradient descent method. β1 is a weight in calculating the moving average of the first-order gradient and the moving average of the variance of the first-order gradient. Subsequently, in this modification, the calculation is executed using the following Formula (8). Formula (8) indicates that the first-order gradient of the i-th parameter in the (t−1)th repetition is denoted by a symbol
g
i,t
=∇f(θi,t-1;xt-1) (8)
In this modification, a moving average of the i-th first-order gradient gi,t in the t-th repetition is obtained using the following Formula (9).
m
i,t=β1mi,t-1+(1−β1)gi,t (9)
The moving average mi,t of the first-order gradient gi,t in Formula (9) is a moving average of the first-order gradient over the past time. The moving average mi,t of this first-order gradient is the statistic related to the first-order gradient gi,t.
In the present embodiment, a moving average of the variance of the i-th first-order gradient gi,t in the t-th repetition is obtained using the following Formula (10).
c
i,t=β1ci,t-1+β1(1−β1)(gi,t−mi,t-1) (10)
The moving average ci,t of the variance of the i-th first-order gradient gi,t in Formula (10) is a moving average of the variance of the first-order gradient over the past time. The moving average ci,t of the variance of this first-order gradient gi,t is a statistic of the first-order gradient gi,t. The moving average ci,t of the variance of the first-order gradient gi,t is a value determined by the dispersion in the past direction of the first-order gradient gi,t, and includes information indicating the direction of the first-order gradient gi,t.
Subsequently, the present modification uses the following Formula (11) onto the moving average ci,t of the variance of the first-order gradient gi,t to remove an initialization bias. In other words, the present modification uses Formula (11) to remove the initialization bias from the moving average ci,t of the variance of the first-order gradient gi,t.
In addition, the present embodiment uses the following Formula (12) to adjust the learning rate.
where ϵ is a small value to stabilize calculation, for example, 10−8.
In the present modification, the calculation of Formulas (8) to (12) is repeated until the parameter θt of the learning model converges. As illustrated in Formula (12), the present modification uses formulation of automatically adjusting the learning rate by dividing the learning rate by a square root of the moving average ci,t of the variance of the first-order gradient gi,t after bias removal, that is, dividing by standard deviation of the first-order gradient. Here, the variance is determined by the dispersion of the first-order gradient in the past direction.
Therefore, also in this modification, adjustment of the learning rate based on the information of the direction of the first-order gradient can be executed, making it possible to lower the error function. The learning apparatus according to the present modification has a configuration similar to that of the learning apparatus 10 illustrated in
[Learning Processing]
Subsequently, the gradient calculation unit 11 uses Formula (8) to calculate the first-order gradient gt (Step S14) and outputs the result to the statistic calculation unit 12. Then, the statistic calculation unit 12 takes the first-order gradient gt output from the gradient calculation unit 11 and the standard values of hyperparameters α and β1 as inputs, and uses Formula (9) to calculate the moving average mt of the first-order gradient gt (Step S15). In addition, the statistic calculation unit 12 uses Formula (10) to calculate the moving average ct of the variance of the first-order gradient gt (Step S16).
Then, the initialization bias removing unit 13 removes the initialization bias from the moving average ct of the variance of the first-order gradient gt calculated by the statistic calculation unit 12 (Step S17). The initialization bias removing unit 13 uses Formula (11) onto the moving average ct of the variance of the first-order gradient gt to remove the initialization bias.
Subsequently, the learning rate adjustment unit 14 adjusts the learning rate using the second term of Formula (12) based on the first-order gradient gt and the moving average ct of the variance of the first-order gradient gt from which the initialization bias has been removed (Step S18). In Formula (12), the learning rate is adjusted by calculating a product of the learning rate and the value obtained by dividing the first-order gradient by the standard deviation of the first-order gradient that is a square root of the moving average of the variance of the first-order gradient.
Steps S19 and S20 illustrated in
[Learning Algorithm According to Modification]
Next, a learning algorithm according to the present modification will be described.
As illustrated in
The learning algorithm increments t by +1 (third line in
The learning algorithm uses Formula (9) to calculate the moving average mt of the first-order gradient gt (fifth line in
The learning algorithm uses Formula (12) to adjust the learning rate based on the first-order gradient gt and the moving average ct of the variance of the first-order gradient gt, and updates the parameter θt (eighth line in
The learning algorithm repeats the processing from the second line to the eighth line in
[System Configuration of Embodiment]
Individual components of the learning apparatus 10 illustrated in
In addition, all or a certain part of each of processing performed in the learning apparatus 10 may be implemented by a central processing unit (CPU) or a program analyzed and executed by the CPU. In addition, each of the processing performed in the learning apparatus 10 may be implemented as hardware using wired logic.
In addition, among all the processing described in the embodiments, all or a part of the processing described as being automatically performed can also be performed manually. Alternatively, all or a part of the processing described as being performed manually can be automatically performed by a known method. Besides this, information including the processing procedure, control procedure, specific nomenclature, various data, and parameters as described above or in the drawings can be appropriately changed unless otherwise noted.
[Programs]
The memory 1010 includes a read only memory (ROM) 1011 and a RAM 1012. The ROM 1011 stores, for example, a boot program such as basic input output system (BIOS). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. For example, a detachable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to a display 1130, for example.
The hard disk drive 1090 stores an OS 1091, an application program 1092, a program module 1093, and program data 1094, for example. That is, the program that defines each of processing of the learning apparatus 10 is implemented as a program module 1093 describing codes executable by the computer 1000. The program module 1093 is stored in the hard disk drive 1090, for example. For example, the program module 1093 for executing processing similar to functional configurations of the learning apparatus 10 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced by a solid state drive (SSD).
In addition, the setting data used in the processing of the above-described embodiment is stored as program data 1094 in the memory 1010 and the hard disk drive 1090, for example. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 onto the RAM 1012 as necessary for execution.
The program module 1093 and the program data 1094 is not necessary stored in the hard disk drive 1090, but may be stored in a detachable storage medium and read out by the CPU 1020 via the disk drive 1100, for example. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN, WAN, or the like). In addition, the program module 1093 and the program data 1094 may be read out from the other computer by the CPU 1020 via the network interface 1070.
The embodiments of the present invention made by the present inventors have been described above, while the present invention is not limited by description and drawing according to the present embodiment, which form a part of the disclosure of the present invention. That is, other embodiments, examples, operation techniques, or the like that are performed by those skilled in the art based on the present embodiments are all included in the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2016-083141 | Apr 2016 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2017/015337 | 4/14/2017 | WO | 00 |