Embodiments of the present invention relate to a storage medium, a machine learning method, and an information processing device.
Conventionally, there is known a method of solving a classification task using a machine learning model generated by machine learning, the classification task (for example, part of speech estimation, named entity extraction, word sense determination, or the like of each word included in a sentence) determining which category input data belongs to in a predefined category aggregation when the input data is given.
Furthermore, there is also a machine learning method called stacking that executes machine learning using an output result by a first machine learning model for training data as an input to a second machine learning model. In general, it is known that inference accuracy of a plurality of machine learning models stacked using stacking that is a method of ensemble learning is better than inference accuracy of a single machine learning model.
In this stacking, for example, machine learning of the second machine learning model may be executed so as to correct an error in a determination result of the first machine learning model. As such an existing technique for generating training data for generating the second machine learning model, the training data is divided into k subsets, and a determination result is added to a remaining one subset using the first machine learning model generated with (k−1) subsets. Next, there is known a method of generating training data of the second machine learning model by repeating the operation of adding the determination result k times while replacing the subset to be added with the determination result.
According to an aspect of the embodiments, a non-transitory computer-readable storage medium storing machine learning program that causes a computer to execute a process, the process includes selecting a plurality of data from a first training data group based on an appearance frequency of first data attached with a first label, the first data being included in the first training data group; generating a first machine learning model by training by the plurality of data; and generating a second training data group obtained by combining the first training data group and an output by the first machine learning model when the first data is input.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
In the above-described existing technique, k first machine learning models need to be created by repeating the processing k times while replacing the k divided subsets, which makes it difficult to efficiently perform machine learning.
In one aspect, an object is to provide a machine learning program, a machine learning method, and an information processing device capable of executing efficient machine learning.
Efficient machine learning is executed.
Hereinafter, a machine learning program, a machine learning method, and an information processing device according to an embodiment will be described with reference to the drawings. Configurations having the same functions in the embodiments are denoted with the same reference numerals, and redundant description will be omitted. Note that the machine learning program, the machine learning method, and the information processing device described in the following embodiment are merely examples, and do not limit the embodiment. Furthermore, each of the embodiments below may be appropriately combined unless otherwise contradicted.
Note that the classification task is not limited to the above-described example, and may be word part of speech estimation or word sense determination. Furthermore, the classification task may be any classification task as long as the classification task is solved using a machine learning model generated by machine learning, and the classification task may classify presence or absence of body abnormality according to biological data such as blood pressure, heart rate, or the like, or may classify pass or fail of a target person (examinee) according to performance data such as evaluation of each subject and scores of midterm and final exams, in addition to the classification regarding words in a document. Therefore, data (hereinafter referred to as cases) included in the training data set used to generate the machine learning model may be cases as learning targets according to the classification task. For example, in a case of generating a machine learning model that classifies the presence or absence of body abnormality, biological data for each learning target, a correct answer (the presence or absence of body anomaly) for the biological data, and the like are included in each case.
In the training data set D, each case (for example, each word in a sentence) is given a correct label indicating the correct “named entity label” in that case. In the present embodiment, the first machine learning model M1 and the second machine learning model M2 such as a gradient boosting tree (GBT), a neural network, or the like are generated by performing supervised learning using the training data set D.
Specifically, in the present embodiment, for each case included in the training data set D, stability of determination by the machine learning model using the training data set D is estimated based on a frequency (appearance frequency) at which a case with the same content with the same correct label given appears in all the cases (S1). The frequency may be an absolute frequency, a relative frequency, or a cumulative frequency. Furthermore, the stability of each case may be estimated based on a ratio calculated based on the appearance frequency. Furthermore, the “case with the same content” is the same data with the same label attached, and in the present embodiment, it is assumed that the stability is estimated based on such an appearance frequency for each data.
The stability of determination by the machine learning model using the training data set D for each case included in the training data set D means that each case can be stably determined by the machine learning model using the training data set D. For example, in the case that can be stably determined, it is estimated that the same determination result can be obtained by the machine learning model obtained regardless of how the training data set D is divided and trained in k-fold cross-validation. Since the case that can be stably determined correspond to a case in which there are many cases with the same content with the same correct label given in the training data set D, or a case in which ambiguity of a classification destination category is low, it can be estimated based on the appearance frequency of the case with the same content with the same correct label given. Conversely, a case with an unstable determination result is a case in which a different determination result is presumed to be obtained depending on a division method in the k-fold cross-validation. Since the case with an unstable determination result correspond to a case in which there are few cases with the same content in the training data set D, or a case in which the ambiguity of the classification destination category is high, it can be estimated based on the appearance frequency of the case with the same content with the same correct label given.
In the present embodiment, the training data set D is divided into a training data set D1 in which the case that can be stably determined is selected and a training data set D2 other than the training data set D1 based on the estimation result in S1. Next, in the present embodiment, machine learning is performed using data (training data set D1) determined to be stably determinable to generate the first machine learning model M1 (S2). Next, in the present embodiment, each data included in the training data set D is input to the first machine learning model M1, and a first determination result output by the first machine learning model M1 is added to the training data set D to generate a training data set D3 (S3). Next, in the present embodiment, machine learning is performed using the training data set D3 to generate the second machine learning model M2.
In the first machine learning model M1 generated by the machine learning using data estimated to be stably determinable based on the appearance frequency, a result (an error of the determination result) different from the correct label is more easily obtained in the case estimated to be unstable in the first determination result in a case of inferring the training data set D as input data. Therefore, the training data set D3 obtained by adding the first determination result to the training data set D is suitable to generate the second machine learning model M2 that outputs a final determination result so as to correct an error of the first machine learning model M1.
In this way, in the existing example, in the process of generating the training data set D101 of the second machine learning model M102, the processing is repeated k times while replacing the k divided subsets (D1001, . . . , D100k-1, D100k), whereby k first machine learning models M101 are created. In contrast, in the present embodiment, for example, the training data set D3 of the second machine learning model M2 can be efficiently created without creating a plurality of machine learning models M1, and efficient machine learning can be executed. Furthermore, an amount of data to which correct flags are given is smaller than a simple method of preparing a training data set for each of the first machine learning model M1 and the second machine learning model M2 in advance. Therefore, the machine learning can be efficiently executed.
When noise is added in this way, the case in which the result less easily changes even if the noise is added can be stably determined by the first machine learning model M1, and the case in which the determination result easily changes when the noise is added has an unstable determination result of the first machine learning model M1. Therefore, it is possible to generate the training data set D3 for generating the second machine learning model M2 so as to correct an error in the determination result of the first machine learning model M1 and to improve the accuracy of the final determination result by the second machine learning model M2 by adding the first determination result by the first machine learning model M1 to the training data set D.
The input/output unit 10 serves as an input/output interface when the control unit 30 inputs/outputs various types of information. For example, the input/output unit 10 serves as an input/output interface with an input device such as a keyboard and a microphone connected to the information processing device 1 and a display device such as a liquid crystal display device. Furthermore, the input/output unit 10 also serves as a communication interface for data communication with an external device connected via a communication network such as a local area network (LAN).
For example, the information processing device 1 receives an input such as the training data set D via the input/output unit 10 and stores the input in the storage unit 20. Furthermore, the information processing device 1 reads first machine learning model information 21 and second machine learning model information 22 regarding the generated first machine learning model M1 and second machine learning model M2 from the storage unit 20, and outputs the read information to the outside via the input/output unit 10.
The storage unit 20 is implemented by, for example, a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk drive (HDD). The storage unit 20 stores the training data set D, appearance frequency data Sf, entropy data Sh, self-information amount data Si, score data Sd, the training data set D3, the first machine learning model information 21, the second machine learning model information 22, and the like.
The training data set D is an aggregation of a plurality of training data for a set of a case as a learning target (for example, each word included in each of a plurality of sentences), and a correct label given to the case (for example, a “named entity label”) (a pair of the case and the correct label). Note that the training data is data in units of one sentence, and is assumed to include pairs of a plurality of cases and correct labels.
The “named entity label” includes “O”, “General”, or “Molecular”. “O” is a label that means a word that is not a named entity (partially inclusive). “General” is a label that means a word of a named entity (partially inclusive) of type “General”. “Molecular” is a label that means a word of a named entity (partially inclusive) of type “Molecular”. Note that it is assumed that in “General” and “Molecular”, the first word is prefixed with “B-”, and the second and subsequent words are prefixed with “I-”.
For example, in the training data set D in the illustrated example, the named entity of the type “General” is correct for a case of “solvent mixture”. Furthermore, the named entity of the type “Molecular” is correct for a case of “n-propyl bromide”.
Returning to
Returning to
Returning to
Returning to
Returning to
The control unit 30 has a first machine learning model generation unit 31, a training data generation unit 32, and a second machine learning model generation unit 33. The control unit 30 can be implemented by a central processing unit (CPU), a micro processing unit (MPU), or the like. Furthermore, the control unit 30 can be realized by a hard wired logic such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or the like.
The first machine learning model generation unit 31 is a processing unit that generates the first machine learning model M1 using the training data set D. Specifically, the first machine learning model generation unit 31 selects a plurality of cases from the training data set D based on the appearance frequency of each case with the same content given the same correct label included in the training data set D. Therefore, the first machine learning model generation unit 31 obtains the training data set D1 in which a case that can be stably determined is selected from the training data set D. Next, the first machine learning model generation unit 31 generates the first machine learning model M1 by machine learning using the plurality of cases included in the training data set D1. Next, the first machine learning model generation unit 31 stores the first machine learning model information 21 regarding the generated first machine learning model M1 in the storage unit 20.
The training data generation unit 32 is a processing unit that generates the training data set D3 for generating the second machine learning model M2. Specifically, the training data generation unit 32 constructs the first machine learning model M1 based on the first machine learning model information 21. Next, the training data generation unit 32 adds a result output by the first machine learning model M1 in a case of inputting data to the first machine learning model M1 in which each case included in the training data set D is constructed to the training data set D to generate the training data set D3.
The second machine learning model generation unit 33 is a processing unit that generates the second machine learning model M2 using the training data set D3. Specifically, the second machine learning model generation unit 33 generates the second machine learning model M2 by machine learning using each case included in the training data set D3 and the determination result of the first machine learning model M1 for the case (the result output by the first machine learning model M1). Next, the second machine learning model generation unit 33 stores the second machine learning model information 22 regarding the generated second machine learning model M2 in the storage unit 20.
Here, details of the processing of the first machine learning model generation unit 31 and the training data generation unit 32 will be described. First, the first machine learning model generation unit 31 performs training data stability determination processing of calculating the score indicating the stability of the determination result of each case and obtaining the training data set D1 based on the appearance frequency of each case in the training data set D (S10).
Specifically, the first machine learning model generation unit 31 stores an aggregation of the data IDs in the training data set D in a processing array (I) or the like (S21). Next, the first machine learning model generation unit 31 determines whether the data ID in the array (I) is empty (S22) and repeats processing of S23 to S25 until the data ID is determined to be empty (S22: Yes).
In a case where the data ID in the array (I) is determined not to be empty (S22: No), the first machine learning model generation unit 31 acquires one data ID from the array (I) and stores the acquired data ID in a processing variable (id) (S23). At this time, the first machine learning model generation unit 31 deletes the acquired data ID from the array (I). Next, the first machine learning model generation unit 31 acquires a pair of the case with the same content and the same correct label from the data corresponding to the variable (id) in the training data set D (S24), and updates the appearance frequency data Sf based on the acquired number (appearance frequency) (S25).
In a case where the data ID in the array (I) is determined to be empty (S22: Yes), the first machine learning model generation unit 31 performs processing of calculating the entropy for each collected case and the self-information amount for each case with the same content and each same correct label (S30).
Specifically, the first machine learning model generation unit 31 stores a case aggregation in the appearance frequency data Sf in a processing array (E) or the like (S31). Next, the first machine learning model generation unit 31 determines whether the case in the array (E) is empty (S32) and repeats processing of S33 to S35 until the case is determined to be empty (S32: Yes).
In a case where the case in the array (E) is determined not to be empty (S32: No), the first machine learning model generation unit 31 selects one case from the array (E) and stores the acquired case in a processing variable (ex) (S33). At this time, the first machine learning model generation unit 31 deletes the acquired case from the array (E). Next, the first machine learning model generation unit 31 searches for cases corresponding to the variable (ex) in the training data set D, and totals the number of the cases for each correct label (S34). Next, the first machine learning model generation unit 31 calculates the entropy and the self-information amount in a known information theory for the pair of the case to be processed and the correct label based on the aggregation result of S34, and updates the entropy data Sh and the self-information amount data Si based on the calculation result (S35).
In a case where the case in the array (E) is determined to be empty (S32: Yes), as illustrated in
Specifically, the first machine learning model generation unit 31 stores the aggregation of data IDs in the training data set D in the processing array (I) or the like (S41). Next, the first machine learning model generation unit 31 determines whether the data ID in the array (I) is empty (S42) and repeats processing of S43 to S46 until the data ID is determined to be empty (S42: Yes).
In a case where the data ID in the array (I) is determined not to be empty (S42: No), the first machine learning model generation unit 31 acquires one data ID from the array (I) and stores the acquired data ID in the processing variable (id) (S43). At this time, the first machine learning model generation unit 31 deletes the acquired data ID from the array (I).
Next, the first machine learning model generation unit 31 acquires a pair of the case with the same content and the same correct label from the data corresponding to the variable (id) in the training data set D (S44). In other words, the first machine learning model generation unit 31 acquires a pair for each case with the same content regarding the sentence of the data ID and each same correct label. Next, the first machine learning model generation unit 31 determines the stability or instability for the above-described stability of determination in each case with the same content and each correct label based on the appearance frequency data Sf, the entropy data Sh, and the self-information amount data Si of the acquired pair for each case with the same content and each correct label (S45).
For example, the first machine learning model generation unit 31 treats a pair of a rare case having the appearance frequency less than a threshold (f) and the correct label in the training data set D, as an unstable case. Alternatively, the first machine learning model generation unit 31 treats a pair of a case with high ambiguity having the self-information amount larger than a threshold (i) and the entropy less than a threshold (h) and the correct label, as an unstable case. Furthermore, pair of cases and correct labels that do not satisfy the above conditions are treated as stable cases. Note that the thresholds (f), (i), and (h) regarding this determination may be arbitrarily set by a user, for example.
As an example, when the thresholds are f=4, i=1.0, and h=0.8, respectively, “solvent mixture” and “General” is an unstable case in the appearance frequency data Sf of
Next, the first machine learning model generation unit 31 calculates the score indicating the stability of the data (sentence) corresponding to the variable (id) based on the stability/instability result determined for each case with the same content regarding the sentence of the data ID and each correct label, and adds the calculation result to the score data Sd (S46). For example, the first machine learning model generation unit 31 uses the number of unstable cases or a ratio of unstable cases to the total number as an index value, and calculates the score by performing weighting according to the index value.
In a case where the data ID in the array (I) is determined to be empty (S42: Yes), the first machine learning model generation unit 31 performs processing of setting a data set of remaining sentences obtained by excluding sentences with low stability as the training data set D1 for generating the first machine learning model M1 based on the score data Sd (S50).
Specifically, the first machine learning model generation unit 31 sorts the score data Sd and excludes unstable data (sentences) with low scores from the training data set D (S51). Next, the first machine learning model generation unit 31 outputs the remaining data set as the training data set D1 (S52) and terminates the processing. Note that the first machine learning model generation unit 31 may select and exclude some cases (for example, a pair of the case determined as an unstable case and the correct label) included in the sentence, other than excluding the unstable data (sentences) with low scores.
Note that the training data set D1 for generating the first machine learning model M1 may be selected from the training data set D by performing different processing (another selection method) for S30 and S40 described above.
Specifically, the first machine learning model generation unit 31 sets each self-information amount as an initial value of the score representing the stability of the collected pair of each case and the correct label, and repeats the following procedures (−) a prespecified number of times. Next, first machine learning model generation unit 31 sets the remaining training data set as the training data set D1 for the first machine learning model M1.
In this another selection method, the first machine learning model generation unit 31 may repeat the processing until the maximum value among the scores of each sentence falls below a prespecified threshold, instead of repeating the processing the prespecified number of times.
In the above-described another selection method, by lowering the score of the case included in the excluded sentence, the sentence containing the same case is less likely to be excluded. In other words, the same case is included in both the excluded sentence and the retained sentence. Note that, regarding the score calculation method, in the above example, the self-information amount is divided by N+1, but any calculation method can be used as long as the score is updated to decrease each time excluded.
Specifically, the first machine learning model generation unit 31 stores the aggregation of data IDs in the training data set D in the processing array (I) or the like (S41). Next, the first machine learning model generation unit 31 determines whether the data ID in the array (I) is empty (S42) and repeats processing of S43 to S46a until the data ID is determined to be empty (S42: Yes).
In a case where the data ID in the array (I) is determined not to be empty (S42: No), the first machine learning model generation unit 31 acquires one data ID from the array (I) and stores the acquired data ID in the processing variable (id) (S43). At this time, the first machine learning model generation unit 31 deletes the acquired data ID from the array (I).
Next, the first machine learning model generation unit 31 acquires a pair of the case and the correct label from the data corresponding to the variable (id) in the training data set D (S44). In other words, the first machine learning model generation unit 31 acquires a pair for each case with the same content regarding the sentence of the data ID and each same correct label. Next, the first machine learning model generation unit 31 obtains the score Si for the pair of each case and the correct label using the above-described score calculation method, and adds the sum to the score data Sd (S46a).
In a case where the data ID in the array (I) is determined to be empty (S42: Yes), the first machine learning model generation unit 31 excludes the data d with the maximum score data Sd from the training data set D (S53). Next, the first machine learning model generation unit 31 updates the score Si corresponding to the pair of each case and the correct label in the excluded data d (S54), and determines whether an end condition of the above-described repetition is satisfied (S55).
In a case where the end condition of repetition (for example, the processing is repeated a prespecified number of times, the maximum value in the scores of the sentence falls below a prespecified threshold, or the like) is not satisfied (S55: No), the first machine learning model generation unit 31 returns the processing to S41. In a case where the end condition of repetition is satisfied (S55: Yes), the first machine learning model generation unit 31 outputs the remaining data set as the training data set D1 (S56) and terminates the processing.
Returning to
Next, the first machine learning model generation unit 31 calculates and sorts the score of each data of Dk based on the application result (S63). Next, the first machine learning model generation unit 31 compares the result of each stability determination method (selection method in S10) in each training data with the score, and scores the degree of matching (S64). Next, the first machine learning model generation unit 31 adopts the result of the method (selection method) with the highest degree of matching among the plurality of selection methods performed in S10 (S65).
Returning to
Next, the training data generation unit 32 constructs the first machine learning model M1 based on the first machine learning model information 21, and adds the determination result output by the first machine learning model M1 in the case of inputting data to the first machine learning model M1 in which each case included in the training data set D is constructed to the training data set D (S13). Therefore, the training data generation unit 32 generates the training data set D3.
Here, a case of adding noise when generating the training data set D3 of the second machine learning model M2 will be described.
As illustrated in
In a case where the data ID in the array (I) is determined not to be empty (S82: No), the training data generation unit 32 acquires one data ID from the array (I) and stores the acquired data ID in the processing variable (id) (S83). At this time, the training data generation unit 32 deletes the acquired data ID from the array (I).
Next, the training data generation unit 32 applies the first machine learning model M1 to the data corresponding to the variable (id) in the training data set D (S84). Next, the training data generation unit 32 randomly changes the score of each label assigned to each word (case) with respect to the determination result obtained from the first machine learning model M1 (S85). Next, the training data generation unit 32 determines the label to be assigned to each word based on the score after the change (S86).
As illustrated in
Returning to
As illustrated in
In a case where the data ID in the array (I) is determined not to be empty (S82: No), the training data generation unit 32 acquires one data ID from the array (I) and stores the acquired data ID in the processing variable (id) (S83). At this time, the training data generation unit 32 deletes the acquired data ID from the array (I).
Next, the training data generation unit 32 applies the first machine learning model M1 to the data corresponding to the variable (id) in the training data set D (S84). Next, the training data generation unit 32 converts the score of each label assigned to each word (case) with respect to the determination result obtained from the first machine learning model M1 into a probability value (S85a). Specifically, the score is converted into the probability value according to the score such that the higher the score, the more likely to be selected. Next, the training data generation unit 32 determines the label to be assigned to each word based on the converted probability value (S86a).
As illustrated in
As illustrated in
In a case where the data ID in the array (I) is determined not to be empty (S82: No), the training data generation unit 32 acquires one data ID from the array (I) and stores the acquired data ID in the processing variable (id) (S83). At this time, the training data generation unit 32 deletes the acquired data ID from the array (I).
Next, the training data generation unit 32 randomly selects some words of the data corresponding to the variable (id) in the training data set D, and replaces the selected words with other words (S84a). Note that the word to be replaced may be randomly selected from the data or may be selected based on certainty (score) of the estimation result. Furthermore, the replacement with another word may be replacement with any word. Alternatively, the word to be replaced may be replaced with a synonym/related word using a synonym/related word dictionary, or may be replaced with a word selected using word distributed representation.
Next, the training data generation unit 32 applies the first machine learning model M1 to the data after replacement (S84b) and determines the label to be assigned to each word based on the determination result obtained from the first machine learning model M1 (S84c).
As illustrated in
As described above, the information processing device 1 has the control unit 30 that executes the processing related to the first machine learning model generation unit 31 and the training data generation unit 32. The first machine learning model generation unit 31 selects a plurality of cases from the training data set D based on the appearance frequency of each case included in the training data set D. Furthermore, the first machine learning model generation unit 31 generates the first machine learning model M1 by machine learning using the plurality of selected cases. The training data generation unit 32 generates the training data set D3 obtained by combining the training data set D and the result output by the first machine learning model M1 in the case of inputting each case included in the training data set D. Furthermore, the control unit 30 executes the processing regarding the second machine learning model generation unit 33 that generates the second machine learning model M2 using the training data set D3. In the classification task of classifying data to be classified, the control unit 30 inputs the data to be classified to the first machine learning model M1 and obtains the output result of the first machine learning model M1. Next, the control unit 30 inputs the output result of the first machine learning model M1 to the second machine learning model M2, and obtains the classification result from the second machine learning model M2. Therefore, it is possible to obtain the classification result that is more accurate than the classification accuracy of a single machine learning model.
Thus, since the information processing device 1 generates the first machine learning model M1 by machine learning using the plurality of cases selected based on the appearance frequency of each case included in the training data set D, the first machine learning model M1 is not repeatedly generated k times when the training data set D3 for training the second machine learning model M2 is generated. Therefore, the information processing device 1 can efficiently generates the training data set D3 for training the second machine learning model M2 and can execute efficient machine learning.
Furthermore, the first machine learning model generation unit 31 excludes the cases in the training data set D from the selection targets, the cases having the appearance frequency less than the threshold. In this way, the information processing device 1 generates the first machine learning model M1 after excluding the cases from the selection targets, the cases having the appearance frequency less than the threshold and having the determination result of the first machine learning model M1 estimated to be unstable in the training data set D. For this reason, in a case where the result output by the first machine learning model M1 is estimated to be unstable in the case where each case included in the training data set D is input, a result different from the correct label of the training data set D is more easily obtained. Therefore, the information processing device 1 can generate the training data set D3 for generating the second machine learning model M2 so as to correct an error in the determination result of the first machine learning model M1, and can improve the accuracy of the final determination result by the second machine learning model M2.
Furthermore, the first machine learning model generation unit 31 calculates the entropy and the self-information amount of each case based on the appearance frequency, and excludes the case having the self-information amount larger than the threshold and the entropy less than the threshold in the training data set D from the selection target. In this way, the information processing device 1 generates the first machine learning model M1 after excluding the cases from the selection targets, the cases having the self-information amount larger than the threshold and the entropy less than the threshold, and having the determination result by the first machine learning model M1 estimated to be unstable in the training data set D. For this reason, in a case where the result output by the first machine learning model M1 is estimated to be unstable in the case where each case included in the training data set D is input, a result different from the correct label of the training data set D is more easily obtained. Therefore, the information processing device 1 can generate the training data set D3 for generating the second machine learning model M2 so as to correct an error in the determination result of the first machine learning model M1, and can improve the accuracy of the final determination result by the second machine learning model M2.
Furthermore, when the training data generation unit 32 generates the training data set D3 for the second machine learning model M2 by combining the training data set and the result output by the first machine learning model M1 in the case of inputting each case included in the data set after some content of each case included in the training data set D is changed. In this way, by changing some content of each case included in the training data set D and adding noise to the training data set D, a result different from the correct label of the training data set D is more easily obtained in the result output by the first machine learning model M1 in the case where the determination result of the first machine learning model M1 is likely to change. Therefore, the information processing device 1 can generate the training data set D3 for generating the second machine learning model M2 so as to correct an error in the determination result of the first machine learning model M1, and can improve the accuracy of the final determination result by the second machine learning model M2.
Furthermore, the training data generation unit 32 adds noise at a specific ratio to the result output by the first machine learning model M1 to generate the training data set D3. In this way, the information processing device 1 may add noise at a specific ratio to the result output by the first machine learning model M1 and generate the training data set D3 for generating the second machine learning model M2 so as to correct the error in the determination result of the first machine learning model M1.
Furthermore, the control unit 30 executes the processing regarding the second machine learning model generation unit 33. The second machine learning model generation unit 33 generates the second machine learning model M2 by machine learning based on the generated training data set D3. Therefore, the information processing device 1 can generate the second machine learning model M2 from the generated training data set D3.
Furthermore, each case included in the training data set D is a word included in each of a plurality of supervised sentences. Therefore, the information processing device 1 efficiently generates the training data set D3 for generating the second machine learning model M2 that outputs part of speech estimation, named entity extraction, word sense determination, or the like of each word contained in the sentence as the final result.
Note that each of the illustrated components in each of the devices does not necessarily have to be physically configured as illustrated in the drawings. In other words, specific modes of distribution or integration of the individual devices are not limited to those illustrated, and all or a part of the devices may be configured by being functionally or physically distributed or integrated in an optional unit depending on various loads, use situations, and the like.
Furthermore, various processing functions executed by the information processing device 1 may be entirely or optionally partially executed by a CPU (or microcomputer such as MPU or micro controller unit (MCU)) or a graphics processing unit (GPU). Furthermore, it goes without saying that all or optional part of the various processing functions may be executed by a program to be analyzed and executed by a CPU (or microcomputer such as MPU or MCU) or a GPU, or hardware using a wired logic. Furthermore, the various processing functions performed by the information processing device 1 may be executed by a plurality of computers in cooperation through cloud computing.
Meanwhile, the various types of processing described in the above embodiments may be implemented by executing a program prepared in advance on a computer. Thus, hereinafter, an example of a computer (hardware) that executes a program having functions similar to the above-described above embodiments will be described.
As illustrated in
The hard disk device 209 stores a program 211 for executing various types of processing in the first machine learning model generation unit 31, the training data generation unit 32, the second machine learning model generation unit 33, and the like in the control unit 30 described in the above-described embodiments. Furthermore, the hard disk device 209 stores various types of data 212 such as the training data set D that the program 211 refers to. The input device 202 accepts, for example, an input of operation information from an operator. The monitor 203 displays, for example, various screens operated by the operator. For example, the interface device 206 is connected to a printing device or the like. The communication device 207 is connected to a communication network such as a local area network (LAN) and exchanges various types of information with an external device via the communication network.
The CPU 201 or GPU 201a performs the various types of processing related to the first machine learning model generation unit 31, the training data generation unit 32, the second machine learning model generation unit 33, and the like by reading the program 211 stored in the hard disk device 209, expands the program 211 in the RAM 208, and executes the program 211. Note that the program 211 does not have to be stored in the hard disk device 209. For example, the program 211 stored in a storage medium readable by the computer 200 may be read and executed. For example, the storage medium readable by the computer 200 corresponds to a portable recording medium such as a compact disc read only memory (CD-ROM), a digital versatile disc (DVD), or a universal serial bus (USB) memory, a semiconductor memory such as a flash memory, a hard disk drive, or the like. Alternatively, the program 211 may be stored in a device connected to a public line, the Internet, a LAN, or the like, and the computer 200 may read the program 211 from the device and execute the program 211.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2020/027411 filed on Jul. 14, 2020 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2020/027411 | Jul 2020 | US |
Child | 18060188 | US |