This application is based upon and claims the benefit of priority from Japanese patent application No. 2022-099982, filed on Jun. 22, 2022, the disclosure of which is incorporated herein in its entirety by reference.
The present disclosure relates to a learning model generation apparatus, a learning model generation method, and a program, and particularly, relates to a learning model generation apparatus, a learning model generation method, and a program that generate a learning model for dynamically estimating an action plan.
In a medical field, a doctor records a treatment plan for treating a disease of a patient and manages an implementation situation of the treatment plan. For example, Patent Literature 1 (Japanese Unexamined Patent Application Publication No. 2002-163374) discloses a disease management system that generates a treatment plan by an operation of a doctor.
However, in Patent Literature 1 described above, a doctor analyzes a present state from various pieces of information about a patient, and then creates a treatment plan tailored to the patient according to a treatment guideline, and thus a preparation burden is large. In addition, quality of the treatment plan is influenced by an experience value of a doctor. Therefore, it is desired to support creation of a treatment plan in a medical field by automatically generating a treatment plan tailored to a patient by using a learning model. The above-described problems are not limited to the medical field, but are also applicable to an education field, sports training, or the like.
In view of the problems described above, an example object of the present disclosure is to provide a learning model generation apparatus, a learning model generation method, and a program that suitably generate a learning model for creating an action plan tailored to a subject.
In a first example aspect of the present disclosure, a learning model generation apparatus includes:
In a second example aspect of the present disclosure, a learning model generation method includes:
The observation data include at least a state and an action of a sample at a specific time until time T, and
In a third example aspect of the present disclosure, a program causes a computer to execute:
The observation data include at least a state and an action of a sample at a specific time until time T, and
The above and other aspects, features and advantages of the present disclosure will become more apparent from the following description of certain example embodiments when taken in conjunction with the accompanying drawings, in which:
Hereinafter, example embodiments of the present disclosure will be described in detail with reference to the drawings. In each drawing, the same or corresponding elements are denoted by the same reference signs, and redundant descriptions are omitted as necessary for clarity of description.
First, a problem of at least one example embodiment of the present disclosure will be described in detail.
In order to support creation of a treatment plan of a patient in a medical field, automatically creating a treatment plan by a computer has been performed. For example, it has been considered to find a previous treatment plan of a patient suffering from the same disease as a target patient, and to create a similar treatment plan as a treatment plan for the target patient.
However, even in a patient having the same disease, there is a difference in information such as a characteristic and a gene related to the disease. Therefore, it is required to create a treatment plan suitable for each individual, based on patient-specific information. Thus, it is expected to enhance a treatment effect of the patient.
Further, it has been studied to sequentially select a treatment tailored to individual characteristic after start of the treatment, while considering a personal history of a patient. Thus, it is expected that a treatment effect can be further enhanced, and that the treatment with a less burden and cost of a patient can be proposed.
In such a background, a system using a learning model that sequentially and automatically selects a treatment that maximizes a treatment effect of a patient according to a response of the patient has been developed. Note that, the above-described system is not limited to a medical field, but may also be used in an education field, sports training, or the like. Therefore, hereinafter, a term “action” being a superordinate concept is used instead of a term “treatment”.
The system proposes an action to be performed at time j, which maximizes a treatment effect of a patient, by using a learning model that differs for each time j. The time may be absolute time, or may be relative time. When it is the relative time, the time may be referred to as a stage. In addition, the time may refer to a point on a time axis, or may refer to a predetermined period on the time axis. Hereinafter, j is assumed to be a natural number. For example, time j=1 indicates a first day of a treatment, time j=2 indicates a second day of the treatment, time j=t indicates a t-th day of the treatment, and time j=T (where T is a natural number larger than t) may indicate final time, i.e., a final day of the treatment.
For example, a j-th order learning model D*j regards a state Xjh of a subject h observed at time j as an input. Then, the j-th order learning model D*j estimates an action Ajh of the subject h at the time j. The estimated action Ajh is an action in which a sum of effects acquired by the subject h from the time j to the final time T is maximized. For example, in
Processing of the system is divided into a model generation phase in which the j-th order learning models at times j=1 to T are generated, and an estimation phase in which an action of the subject h is planned by using the j-th order learning models at times j=1 to T.
The j-th order learning model D*j is generated by using observation data of a target sample group (hereinafter, referred to as a T sample group) TGj at time j. The T sample group TG is a set of patients (that is, samples) whose observation data are used as training data at a time of learning. Note that, observation data at the time j of a sample i are represented by a vector {Xji, Aji, Yji) in which a state Xji, an action Aji, and an effect Yji are combined with one another. The effect Yji indicates an amount of an effect acquired by an action of the sample i having a state at a specific time in the time j=1 to T. First, observation data at times j=1 to T for samples of i=1, 2, . . . , n (n is a natural number) are prepared.
For example, observation data of the T sample group TGt at time t, which is used for generating a t-th order learning model D*t when j=t, are training data for learning the t-th order learning model D*t. The training data for learning the t-th order learning model D*t are observation data of a sample included in the T sample group among the samples of i=1, 2, . . . , n (n is a natural number).
Generation of each learning model is performed backward in such a way as to go back in the time j. In other words, the t-th order learning model D*t in a case of j=t is generated after the (t+1)-th order learning model D*t+1 in a case of j=(t+1) is generated. In learning of the (t+1)-th order learning model D*t+1, the observation data of a sample included in the T sample group TGt+1 associated to the time t+1 are used. Herein, all the observation data of the sample that is not optimal in the (t+1)-th order learning model D*t+1 are discarded without being used in the t-th order learning model D*t. Therefore, the observation data of the sample included in the T sample group TGt associated to the time t is a sample acquired by excluding the sample that is not optimal in the (t+1)-th order learning model D*t+1 among the samples included in the T sample group TGt+1 associated to the time t+1. In other words, it is a sample being optimal in the (t+1)-th order learning-model D*t+1.
Therefore, generally, the number of samples of the T sample group TGt associated to the time t is smaller than the number of samples of the T sample group TGt+1 associated to the time (t+1). When the number of samples decreases, the number of pieces of observation data used as training data decreases, and therefore it becomes difficult to generate a learning model with high estimation accuracy.
Note that, a “sample that is not optimal in the j-th order learning model D*j” indicates a sample in which an error between the action Aji included in the observation data and an output of the j-th order learning model D*j is larger than a predetermined amount when the state Xji of the sample i included in the observation data is input to the j-th order learning model D*j. Hereinafter, the error is referred to as an “output error of j-th order learning model D*j”. Hereinafter, as one example, a “sample that is not optimal in the j-th order learning model D*j” indicates that the output error of the j-th order learning model D*j is larger than 0, that is, the j-th order learning model D*j has misclassified. In contrast, a “sample being optimal in the j-th order learning model D*j” indicates a sample in which the output error of the j-th order learning model D*j is equal to or less than a predetermined amount. Hereinafter, as one example, a “sample being optimal in the j-th order learning model D*j” indicates that the Aji included in the observation data matches the output of the j-th order learning model D*j, that is, the output error is 0.
On the other hand, estimation of an action is performed forward with a lapse of time j. For example, assuming that current time j is t, an action Ath of the subject h at the current time t is acquired by inputting, to the t-th order learning model D*t, a status Xth of the subject h observed at the current time t. Then, when the time has elapsed and the time t+1 is reached, an action A(t+1)h of the subject h at the time t+1 is acquired by inputting a status X(t+1)h of the subject h observed at the time t+1. As described above, an action to be taken is sequentially estimated with a lapse of time. Therefore, an action plan is dynamically created. In a medical field, an action plan of a healthcare worker to be taken for a patient, who is the subject, is dynamically created.
The above-described problem can also be grasped from a mathematical expression.
Herein, πA
A function f included in the expression (1) is a function associated to the t-th order learning model D*t. Therefore, deriving the function f is equivalent to deriving the t-th order learning model D*t.
A block 900 illustrated in
Herein, a block 904 included in a block 901 indicates an output error. The block 901 indicates becoming 1 in a case where the output error is 0 at all times from the time (t+1) to the time T, but becoming 0 in other case. Becoming 0 in the block 901 means that the observation data of the sample are discarded. In other words, when there is a time at which the output error is not 0 even once between the time (t+1) and the time T, the observation data of the sample are discarded at that time. Therefore, it can be understood that the number of samples decreases as the time proceeds backward.
At least one of the following example embodiments solves such a problem.
Next, a first example embodiment of the present disclosure will be described. The first example embodiment may be described as an overview of example embodiments described below.
Herein, observation data at any time j of a sample i include at least a status Xji and an action Aji of the sample i at that time j.
In addition, the j-th order learning model D*j is a learned model for outputting an action Ajh at the time j by inputting at least a state Xjh at the time j of a subject h. For example, the j-th order learning model D*j is a model obtained by ensembling weak learners. An example of an ensembled model is boosting. Hereinafter, it is assumed that the j-th order learning model D*j adopts an AdaBoost algorithm being one example of boosting. In addition, the j-th order learning model D*j is described as being represented by a weighted sum of a weak learner.
As illustrated in
The movement unit 12 executes movement processing. The movement processing is processing of moving a sample, among a plurality of samples included in a T sample group, having an output error of a (t+1)-th order learning model D*t+1 with respect to observation data at time t+1 larger than a predetermined amount, from the T sample group to a source sample group. Hereinafter, the source sample group is referred to as an S sample group. In addition, a sample included in the T sample group associated to a predetermined time may be referred to as a T sample, and a sample included in the S sample group associated to a predetermined time may be referred to as an S sample.
In addition, “movement” may be physically to be moved or logically to be moved. Physically to be moved may include changing a storage destination. As logically to be moved, it may be included that an attribute (a belonging destination or a type) of the sample is changed.
In addition, “larger than a predetermined amount” may be, but is not limited to, larger than 0. In other words, the movement unit 12 moves a sample that is not optimal at time t+1 from the T sample group to the S sample group at time t. The T sample group at the time t does not include a sample that is not optimal.
The generation unit 13 generates a t-th order learning model D*t by using observation data from the time t to the time T of a sample included in the T sample group and observation data from the time t to the time T of a sample included in the S sample group.
Specifically, the generation unit 13 generates a plurality of weak learners, and generates the t-th order learning model D*t by combining the plurality of generated weak learners. The weak learner included in the t-th order learning model D*t is regard at least a state Xti at the time t of a subject i as an input, and outputs an action Ati at the time t.
More specifically, first, the generation unit 13 generates a plurality of weak learners by using, as training data, observation data {Xji, Aji, Yji} (j=t, t+1, . . . , T, i=1, 2, . . . , n), from the time t to the time T, of a sample included in the T sample group after the movement processing and a sample included in the S sample group after the movement processing. At this time, the generation unit 13 may use, as the training data, observation data, from the time t to the time T, of all the samples included in the T sample group, or observation data, from the time t to the time T, of some samples. It is similar for the sample included in the S sample group.
Next, for each of the plurality of weak learners, the generation unit 13 evaluates a classification error by using the observation data at the time t of the sample included in the T sample group after the movement processing. In other words, the generation unit 13 calculates a classification error with respect to the T sample.
Finally, the generation unit 13 generates the t-th order learning model D*t, based on at least each of the plurality of weak learners and an associated classification error. For example, the generation unit 13 generates the t-th order learning model D*t by combining the weak learners weighted by a weight associated to the above-described classification error.
Next, the movement unit 12 of the learning model generation apparatus 10 repeats processing illustrated in S12 to S13 for each sample of the T sample group at time j. In S12, the movement unit 12 decides whether there is an output error of the j-th order learning model D*j with respect to the observation data at the time j of the sample. Presence of an output error indicates erroneous decision. At this time, the movement unit 12 inputs, to the j-th order learning model D*j, a state Xji included in the observation data at the time j of a sample i, and calculates, as the output error, a difference between the acquired output value and an action Aji included in the observation data. When there is the output error (Yes in S12), the movement unit 12 moves the sample from the T sample group to the S sample group without discarding the sample (S13). On the other hand, when there is no output error (No in S12), the movement unit 12 does not move the sample from the T sample group to the S sample group, and leaves the sample as it is in the T sample group.
After performing the above processing on all the samples included in the T sample group, the learning model generation apparatus 10 decrements the time j (S14). Then, when the time j is larger than 0 (Yes in S15), the learning model generation apparatus 10 returns the processing to S11, and when the time j becomes 0 (No in S15), the processing ends.
Note that, the generation unit 13 may repeat S20 to S21 by the number of generated weak learners, and then execute S22.
As described above, according to the first example embodiment, in generation of a weak learner included in a learning model associated to target time t, in addition to the T sample, the S sample being determined to be not optimal in the learning model associated to the later time t+1 is used. Note that, the learning model associated to the later time t+1 is generated before the learning model associated to the target time t. Therefore, training data used for learning can be increased. As a result, it is possible to generate a learning model that estimates in highly accuracy an action at each time according to each individual subject.
Next, a second example embodiment of the present disclosure will be described.
The learning model generation apparatus 10a is one example of the above-described learning model generation apparatus 10. For each time, the learning model generation apparatus 10a generates a learning model for estimating an action A to be taken at that time. The action A to be taken is an action that maximizes an effect acquired after the time.
The learning model storage apparatus 20 is a storage apparatus that stores a learning model at each time generated by the learning model generation apparatus 10a.
The estimation apparatus 30 dynamically creates an action plan of a subject h. Specifically, the estimation apparatus 30 reads a learning model stored in the learning model storage apparatus 20, and sequentially estimates the action A to be taken by the subject h at a target time by using the read learning model.
In
Then, when generating a t-th order learning model D*t, the learning model generation apparatus 10a uses observation data of the sample included in the S sample group SGt, in addition to observation data of the sample included in the T sample group TGt.
In the second example embodiment, a j-th order learning model D*j is expressed by a weighted sum of M weak learners (a first weak learner T(1), a second weak learner T(2), . . . , a M-th weak learner T(M)) (M is a natural number). Specifically, the j-th order learning model D*j is given by the following equation (2).
αj(m) is reliability of a m-th weak learner T(m) constituting the j-th order learning model.
For example, the reliability αj(m) is given by the following equation (3).
K is a total number of classifications, that is, the number of types of an action Aji. As indicated in the equation (3), the reliability αj(m) is calculated based on at least a second classification error err2(m) of the weak learner T(m). The second classification error err2(m) is one example of a classification error of the first example embodiment, and is a classification error evaluated for the weak learner T(m) by using observation data at time j of a sample included in the T sample group TG.
For example, the second classification error err2(m) is given by the following equation (4).
XiT and AiT each is a condition and an action when a sample i is a sample (T sample) included in the T sample group TG.
Note that, a coefficient ξi included in the second classification error err2(m) is given by the following equation (5).
First, the weak learner T(m) is related with f included in the expression (1). The f included in the expression (1) is given by the following equation (6).
[Mathematical 6]
f(Xi)=Σm=1Mβmgm(Xi) (6)
g(m)(X) is a function associated to the weak learner T(m) one-to-one. In other words, learning the weak learner T(m) corresponds to deriving the optimized g(m)(X).
g(X) represents a K-dimensional vector. A relationship between g(X) and T(X) is given by the following equation (7).
g(X) is a vector in which a k element takes 1 when T(X)=k and the other elements take −1/(K−1).
A deriving expression of the optimized g(X) is given by the following expression (8-1).
Then, the expression (8-1) can be expressed, by using m weak learners, as following.
βm is a parameter for the m-th weak learner. zi represents Ati of a sample i. Herein, z represents a K-dimensional vector, and is given by the following equation (9).
When Ati=k, the z vector is a vector in which a k element takes 1 and the other elements take −1/(K−1).
An objective function indicated after arg min may be referred to as a first classification error err1. The first classification error err1 corresponds to a classification error evaluated for the weak learner T(m) by using observation data at time t of the T sample and observation data at time t of the S sample.
Deriving the optimized g(X) corresponds to deriving g(X) that minimizes the first classification error err1.
A block 900′, included in
ωi included in the block 100 is a weight to be added to the loss for the sample i. ωim-1 is given by an equation (10).
Note that, f is a weighted sum of m-1, and is given by the following equation (11).
[Mathematical 11]
(m-1)(Xi)=β1g1(Xi)+β2g2(Xi)+ . . . +βm-1gm-1(Xi) (11)
The weight ωi indicates a degree of influence of observation data of the sample i on optimization of g(X) (that is, learning of a weak learner T(X)). In the second example embodiment, the weight ωi may be updated every time when one learned weak learner is generated. A manner of updating is different depending on whether the sample i is classified as a T sample or an S sample at associated time. A weight is set as ωiT when the sample i is the T sample, and a weight is set as ωiS when the sample i is the S sample.
The storage unit 11 is a storage apparatus that stores observation data of samples i=1 to n at time j=1 to T.
The movement unit 12a is one example of the movement unit 12 described above. The generation unit 13a is one example of the generation unit 13 described above. The generation unit 13a includes a weak learner generation unit 14, a reliability calculation unit 15, a weight update unit 16, and a learning model generation unit 17. The movement unit 12a and the generation unit 13a sequentially generate a j-th order learning model D*j backward from j=T, and output the generated j-th order learning model D*j to the output unit 18.
The output unit 18 outputs the generated j-th order learning model D*j. In addition, the output unit 18 stores the generated j-th order learning model D*j in a learning model storage apparatus 20.
At a time point of generating a T-th order learning model D*T associated to final time j=T, all the samples are included in the T sample group TG. Then, all the observation data stored in the storage unit 11 are classified into the observation data d_TG. Then, at this time, the number of samples included in the S sample group SG is 0, and the observation data d_SG do not exist. In addition, at this time, the number of samples included in the N sample group NG is 0, and the observation data d_NG do not exist.
Then, as j associated to the generated learning model decreases, the number of samples included in the T sample group TG decreases, and the number of samples included in any of the S sample group SG and the N sample group NG increases. Therefore, as j decreases, the number of pieces of observation data classified into the observation data d_TG decreases, and the number of pieces of observation data classified into any of the observation data d_SG and the observation data d_NG increases.
Note that, the sample included in the S sample group SG is a sample that is not optimal by a learning model associated to a time (for example, time t+1) immediately after a time (for example, time t) associated to the learning model to be generated.
Next, specific processing of each element will be described with reference to
First,
In S100, the movement unit 12a moves a sample of the S sample group SG to the N sample group NG, and discards a sample included in the S sample group SG. Specifically, the movement unit 12a re-classifies observation data classified as the observation data d_SG into the observation data d_NG. Initialization of the S sample group SG makes it possible to consider only a sample that is optimal until the latest but is not optimal only the latest, in generation of the learning model. As a result, it is possible to suppress an influence on decrease in estimation accuracy of a learning model due to use of observation data of a non-optimal sample as training data, and to suitably increase the training data.
Note that, in S12 to S13, the movement unit 12a moves a sample of the T sample group TG in which an output error of the (t+1)-th order learning model D*t+1 occurs with respect to the observation data at the time t+1, from the T sample group TG to the S sample group SG. Specifically, the movement unit 12a re-classifies, into the observation data d_SG, the observation data being classified as the observation data d_TG in which the output error of the (t+1)-th order learning model D*t+1 occurs.
First, in step S110 in
Next, following processing indicated in steps S111 to S115 are repeated M times. M is predetermined.
In step S111 of iteration m, the weak learner generation unit 14 generates the m-th weak learner T(m) among the M weak learners included in the t-th order learning model D*t. At this time, the weak learner generation unit 14 uses the observation data d_TG, at the time t to the time T, of the T sample and the observation data d_SG, at the time t to the time T, of the S sample, which are weighted by the weight ωi being set for each sample. Then, the weak learner generation unit 14 finds a weak learner in which the first classification error err1 evaluated by using the observation data of the T sample and the observation data of the S sample is minimized, and generates the found weak learner as the weak learner T(m). Specifically, the weak learner generation unit 14 generates the weak learner T(m) by using the expression (8) as illustrated in a paragraph 5 in
In step S112, the reliability calculation unit 15 of the generation unit 13a evaluates the second classification error err2(m) of the weak learner T(m) by using the observation data at the time t of the T sample. Specifically, as illustrated in paragraphs 6 to 7 in
In S113, the reliability calculation unit 15 calculates the reliability αj(m) of the weak learner T(m), based on the second classification error err2(m). Specifically, as illustrated in a paragraph 8 in
Next, the weight update unit 16 of the generation unit 13a repeats processing indicated in S114 to S115 for each sample. Note that, in the present processing, the weight update unit 16 executes different pieces of processing between the T sample and the S sample. Specifically, the weight update unit 16 increases the weight ωiT for a sample of the T sample having the output error of the weak learner T(m) with respect to the observation data d_TG at the time t (Yes in S114→S115). In addition to or instead of this, the weight update unit 16 reduces the weight ωiS for a sample of the S sample having the output error of the weak learner T(m) with respect to the observation data d_SG at the time t (Yes in S114→S115). On the other hand, regardless of the T sample and the S sample, the weight update unit 16 does not update the weight for a sample having the output error equal to or less than a predetermined amount or having no output error (No in S114). As a result, the observation data of the sample being optimal at the time t have a relatively larger degree of influence than the observation data of the sample that are not optimal at the time t, as repetition proceeds. In other words, the t-th order learning model can be generated with emphasis on the sample being optimal at the time t rather than the sample that is not optimal. Therefore, estimation accuracy of the t-th order learning-model D*t is improved.
More specifically, the weight update unit 16 may update the weights ωiT and ωiS in the manner illustrated in paragraphs 9 to 11 in
After executing the processing indicated in S114 to S115 for all the samples, the weight update unit 16 proceeds processing to the next iteration m+1.
By repeating this M times, the generation unit 13a generates M weak learners T(m) (a first weak learner T(1), a second weak learner T(2), . . . , a M-th weak learner T(M)) and reliability αt(m) (first reliability αt(1), second reliability αt(2), . . . , M-th reliability αt(M)) associated to each weak learner.
Then, in S116, the learning model generation unit 17 of the generation unit 13a generates the t-th order learning model D*t by combining each of the generated M pieces of weak learners T(m) weighted by the associated reliability αt(m). Specifically, as illustrated in a paragraph 13 in
As described above, according to the second example embodiment, similarly to the first example embodiment, it is possible to increase training data used for generating a learning model, particularly for generating a weak learner. As a result, it is possible to generate a learning model that estimates in highly accuracy an action at each time according to each individual subject.
In addition, in a process of generating a plurality of weak learners included in the learning model, the weight ω; indicating a degree of influence is updated in such a way that the T sample has a larger degree of influence on learning than the S sample. Therefore, the estimation accuracy of the learning model is improved.
Next, a third example embodiment of the present disclosure will be described. In the third example embodiment, when generating a plurality of weak learners included in a t-th order learning model D*t, a result of generation of a (t+1)-th order learning model D*t+1 is considered. Specifically, a generation unit 13a determines a weight ωiS indicating a degree of influence of a S sample in learning of a weak learner included in the t-th order learning model D*t, according to an amount that the S sample is determined to be not optimal by using the (t+1)-th order learning model D*t+1.
Since a flow of a generation method of the t-th order learning model according to the third example embodiment is basically similar to steps illustrated in
First, in step S110, a weak learner generation unit 14 determines an initial value of the weight ωiS, based on an output error τi of the (t+1)-th order learning model D*t+1 with respect to observation data d_SG at time t of the S sample. The above-described output error τi corresponds to the above-described “amount determined to be not optimal”. Specifically, as illustrated in a paragraph 1 in
In addition, a weight update unit 16 also determines a weight reduction amount when the weight ωiS of the S sample is updated in S115, based on the output error τi. Specifically, as illustrated in a paragraph 10 in
In
Next, a fourth example embodiment of the present disclosure will be described. In the fourth example embodiment, when a generation unit 13a generates a t-th order learning model D*t, information acquired by subtracting a predetermined amount from an amount of an effect Y included in observation data at time t+1 is used as an effect at the time t+1 for a S sample. Thus, it is possible to explicitly teach in generation of the t-th order learning model D*t that the S sample is a sample being determined not to be optimal in a (t+1)-th order learning model D*t+1.
As illustrated in
λ is an adjustment parameter less than 1. By multiplying λ by the effect Y at the time t+1 of the S sample, an amount of the effect Y at the time t+1 can be reduced for the S sample. Meanwhile, for a T sample, the amount of the observed effect Y is used as an amount of an effect at time t+1. As a result, a learning model can be generated in consideration of the S sample.
Note that, the λ applied to the effect at time t+1 of the S sample may be determined for each sample, based on an output error τi of the (t+1)-th order learning model D*t+1 with respect to observation data at time t+1 of the sample. As one example, a weak learner generation unit 14 may assign λ=0.9 to the S sample having the small output error τi, and assign λ=0.5 to the S sample having the large output error τi. By doing so, the weak learner generation unit 14 can increase an amount to be subtracted from the effect Y at the time t+1 as the S sample having the larger output error τi, that is, as the S sample being farther from an optimum.
Next, a fifth example embodiment of the present disclosure will be described. In the fifth example embodiment, a generation unit 13a uses cost-sensitive learning when generating a weak learner included in a learning model.
In the fifth example embodiment, the weak learner T(m) can be derived from the following expression (14) instead of the expression (8).
Herein. D*(Xi) represents an index of a largest element of a vector f(Xi)=β1g1(Xi)+β2g2(Xi)+ . . . +βMgM(Xi). In other words, D*(Xi)=argmax f(M)(Xi).
The expression (14) is different from the expression (8) in that a cost function C* (block 130 in
For example, the cost function C* when K=5 is given by the following equation (15). Note that, C*(p, q) represents an element of a p-th column and a q-th row of a matrix (cost matrix) indicating the cost function.
A non-diagonal component of the C* functions when there is the output error of the weak learner T(m), that is, when the weak learner T(m) makes erroneous decision. Specifically, the non-diagonal component of the C* is set in such a way that the penalty becomes large when the output error is large.
Since a flow of a generation method of the t-th order learning model D*t according to the fifth example embodiment is basically similar to steps illustrated in
In S111, a weak learner generation unit 14 generates the weak learner T(m) by using the expression (14) instead of the expression (8), as illustrated in a paragraph 5 in
Next, physical configurations of learning model generation apparatuses 10 and 10a and an estimation apparatus 30 included in a system 1 will be described.
The communication interface 1050 is an interface for connecting the computer 1000 and a communication network via a wired communication means, a wireless communication means, or the like. The user interface 1060 includes a display unit, for example, such as a display. In addition, the user interface 1060 includes an input unit such as a keyboard, a mouse, and a touch panel. Note that, the user interface 1060 is not essential.
The storage unit 1020 is an auxiliary storage apparatus capable of holding various types of data. The storage unit 1020 is not necessarily a part of the computer 1000, and may be an external storage apparatus, or may be a cloud storage connected to the computer 1000 via a network.
The ROM 1030 is a non-volatile storage apparatus. For example, a semiconductor memory apparatus such as a flash memory having a relatively small capacity is used for the ROM 1030. A program executed by the processor 1010 may be stored in the storage unit 1020 or the ROM 1030. The storage unit 1020 or the ROM 1030 stores various programs for achieving a function of each unit in the learning model generation apparatuses 10 and 10a or the estimation apparatus 30, for example.
The program can be stored and provided to the computer 1000 using any type of non-transitory computer readable media. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (compact disc read only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash ROM, RAM, etc.). The program may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable medium can provide the program to a computer via a wired communication line (e.g., electric wires, and optical fibers) or a wireless communication line.
The RAM 1040 is a volatile storage apparatus. Various types of semiconductor memory apparatuses such as a dynamic random access memory (DRAM) and a static random access memory (SRAM) are used for the RAM 1040. The RAM 1040 may be used as an internal buffer for temporarily storing data and the like. The processor 1010 develops a program stored in the storage unit 1020 or the ROM 1030 to the RAM 1040, and executes the program. The processor 1010 may be a central processing unit (CPU) or a graphics processing unit (GPU). When the processor 1010 executes the program, the function of each unit in the learning model generation apparatuses 10 and 10a or the estimation apparatus 30 can be achieved. The processor 1010 may include an internal buffer capable of temporarily storing data and the like.
Note that, the present disclosure is not limited to the above-described example embodiments, and can be appropriately modified without departing from the spirit.
It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the claims.
The first to fifth example embodiments can be combined as desirable by one of ordinary skill in the art.
An example advantage according to the present disclosure is to provide a learning model generation apparatus, a learning model generation method, and a program that suitably generate a learning model for estimating an action plan tailored to a subject.
Some or all of the above-described example embodiments may be described as the following supplementary notes, but are not limited thereto.
(Supplementary Note 1)
A machine learning model generation apparatus including:
(Supplementary Note 2)
The machine learning model generation apparatus according to supplementary note 1, wherein the movement unit moves, after discarding a sample included in the source sample group, a sample having an output error of a (t+1)-th order machine learning model with respect to observation data at time t+1 being larger than a predetermined amount, to a source sample group.
(Supplementary Note 3)
The machine learning model generation apparatus according to supplementary note 1 or 2, wherein
(Supplementary Note 4)
The machine learning model generation apparatus according to supplementary note 3, wherein the generation unit
(Supplementary Note 5)
The machine learning model generation apparatus according to supplementary note 3 or 4, wherein the generation unit determines, for each sample included in the source sample group after the movement processing, at least one of an initial value of a weight of observation data of the sample, and a reduction amount of a weight when the weight of the sample is updated, based on an output error of a (t+1)-th order machine learning model with respect to observation data at time t+1 of the sample.
(Supplementary Note 6)
The machine learning model generation apparatus according to supplementary note 5, wherein the generation unit
(Supplementary Note 7)
The machine learning model generation apparatus according to any one of supplementary notes 1 to 6, wherein the observation data include an amount of an effect acquired by an action at a specific time in a sample having a state at the specific time until time T.
(Supplementary Note 8)
The machine learning model generation apparatus according to supplementary note 7, wherein the generation unit uses, when the plurality of weak learners is generated, information acquired by subtracting an amount according to an output error of a (t+1)-th order machine learning model from an amount of an effect included in observation data at time t+1, for each sample included in the source sample group after the movement processing, as an effect at time t+1 of the sample.
(Supplementary Note 9)
The machine learning model generation apparatus according to supplementary note 8, wherein the generation unit increases an amount of reduction as a sample has a larger output error of a (t+1)-th order machine learning model at time t+1 when the amount of the effect is reduced.
(Supplementary Note 10)
The machine learning model generation apparatus according to any one of supplementary notes 1 to 9, wherein the generation unit uses cost-sensitive learning when each of the plurality of weak learners is generated.
(Supplementary Note 11)
The machine learning model generation apparatus according to any one of supplementary notes 1 to 10, wherein the generation unit
(Supplementary Note 12)
A machine learning model generation method including:
(Supplementary Note 13)
A program for causing a computer to execute:
(Supplementary Note 14)
The machine learning model generation apparatus according to supplementary note 1, wherein the t-th order learning model outputs an action of a healthcare worker at the time t, the action having been optimized in order to maximize a treatment effect of a patient.
Number | Date | Country | Kind |
---|---|---|---|
2022-099982 | Jun 2022 | JP | national |