This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2022-079842, filed on May 13, 2022, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a machine learning program, a machine learning method, and an information processing device.
In recent years, data classification techniques by machine learning have been developed. In an example, a document classification system is known. The document classification system classifies documents into a plurality of fields (classes) according to content by applying natural language processing by machine learning.
During training of a classifier (model) in supervised training, supervised data is created in which target data and a ground truth indicating a class to which the target data belongs are paired. The classifier is trained using the supervised data as training data. During inference, the classifier calculates a probability that data belongs to each class when the data to be determined is input. The classifier may output the class with the highest probability that the data belongs to the class as a determination label.
There are some cases where the training data becomes obsolete due to changes in the ground truth for the target data due to changes in current affairs or the like. In an example, in a case of classifying a sentence related to “virus mutation”, there are some cases where the ground truth is “science” during creation of existing training data, but the ground truth is “society” during creation of subsequent new training data.
However, recreating all the existing training data into new training data in accordance with the changes in current affairs or the like increases a burden on an operator. Therefore, in the past, retraining has been performed by sequentially adding new training data to the existing training data.
Japanese Laid-open Patent Publication No. 2020-160543 is disclosed as related art.
According to an aspect of the embodiments, a non-transitory computer-readable recording medium storing a machine learning program for causing a computer to execute a process, the process includes determining a similar range for second training data in a case where a determination label inferred by inputting the second training data to a classifier machine-learned by using a first training data group that includes a plurality of first training data, and a ground truth of the second training data are different, creating a second training data group by removing at least the first training data included in the similar range from the plurality of first training data, and newly performing machine learning of the classifier using the second training data group.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
According to the method of performing retraining by adding new supervised data to existing supervised data, there is a possibility that the obsolete existing supervised data temporarily remains. The presence of the existing supervised data similar to new supervised data, and the existing supervised data and the new supervised data having different ground truths causes a decrease in classification accuracy. Therefore, if obsolete training data remains, it may be difficult to suppress the decrease in the classification accuracy.
Hereinafter, embodiments of techniques capable to suppress a decrease in data classification accuracy due to obsolescence of training data will be described with reference to the drawings. Note that the embodiments to be described below are merely examples, and there is no intention to exclude application of various modifications and techniques not explicitly described in the embodiments. For example, the present embodiments may be variously modified and performed without departing from the gist thereof. Furthermore, each drawing is not intended to include only configuration elements illustrated in the drawings, and may include another function and the like.
As illustrated in
The processor 11 controls the entire information processing device 1. The processor 11 is an example of a control unit. The processor 11 may be a multiprocessor. The processor 11 may be, for example, any one of a central processing unit (CPU), a micro processing unit (MPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a programmable logic device (PLD), a field programmable gate array (FPGA), and a graphics processing unit (GPU). Furthermore, the processor 11 may be a combination of two or more types of elements of the CPU, MPU, DSP, ASIC, PLD, FPGA, and GPU.
The processor 11 executes a control program (machine learning program 13a or training data generation program 13b) to implement a function of a training processing unit 100 illustrated in
The information processing device 1 executes the machine learning program 13a, the training data generation program 13b, and an operating system (OS) program that are programs recorded in a computer-readable non-transitory recording medium, for example, to implement the function as the training processing unit 100.
The program in which processing content to be executed by the information processing device 1 is described may be recorded in various recording media. For example, the machine learning program 13a or the training data generation program 13b to be executed by the information processing device 1 can be stored in the storage device 13. The processor 11 loads at least part of the machine learning program 13a or the training data generation program 13b in the storage device 13 into the memory 12 and executes the loaded program.
Furthermore, the machine learning program 13a or the training data generation program 13b to be executed by the information processing device 1 (processor 11) can also be recorded in a non-temporary portable recording medium such as an optical disc 16a, a memory device 17a, or a memory card 17c. The program stored in the portable recording medium becomes executable after being installed in the storage device 13 under the control of the processor 11, for example. Furthermore, the processor 11 can also read and execute the machine learning program 13a or the training data generation program 13b directly from the portable recording medium.
The memory 12 is a storage memory including a read only memory (ROM) and a random access memory (RAM). The RAM of the memory 12 is used as a main storage device of the information processing device 1. The RAM temporarily stores at least a part of the OS program and the control program to be executed by the processor 11. Furthermore, the memory 12 stores various types of data needed for processing by the processor 11.
The storage device 13 is a storage device such as a hard disk drive (HDD), a solid state drive (SSD), or a storage class memory (SCM), and stores various types of data. The storage device 13 is used as an auxiliary storage device of the information processing device 1. The storage device 13 stores the OS program, the control program, and various types of data. The control program includes the machine learning program 13a or the training data generation program 13b.
A semiconductor storage device such as an SCM or a flash memory may be used as the auxiliary storage device. Furthermore, redundant arrays of inexpensive disks (RAID) may be configured using a plurality of the storage devices 13.
Furthermore, the storage device 13 may store various types of training data (supervised data) to be described below and various types of data generated when each processing is executed.
The graphic processing device 14 is connected to a monitor 14a. The graphic processing device 14 displays an image on a screen of the monitor 14a in accordance with an instruction from the processor 11. Examples of the monitor 14a include a display device using a cathode ray tube (CRT), a liquid crystal display device, and the like.
The input interface 15 is connected to a keyboard 15a and a mouse 15b. The input interface 15 transmits signals sent from the keyboard 15a and the mouse 15b to the processor 11. Note that the mouse 15b is an example of a pointing device, and another pointing device may be used. Examples of the another pointing device include a touch panel, a tablet, a touch pad, a track ball, and the like.
The optical drive device 16 reads data recorded in the optical disc 16a by using laser light or the like. The optical disc 16a is a non-transitory portable recording medium having data recorded in a readable manner by reflection of light. Examples of the optical disc 16a include a digital versatile disc (DVD), a DVD-RAM, a compact disc read only memory (CD-ROM), a CD-recordable (R)/rewritable (RW), and the like.
The device connection interface 17 is a communication interface for connecting a peripheral device to the information processing device 1. For example, the device connection interface 17 may be connected to the memory device 17a or a memory reader/writer 17b. The memory device 17a is a non-transitory recording medium equipped with a communication function with the device connection interface 17, for example, a universal serial bus (USB) memory. The memory reader/writer 17b writes data to the memory card 17c or reads data from the memory card 17c. The memory card 17c is a card-type non-transitory recording medium.
The network interface 18 is connected to a network (not illustrated). The network interface 18 may be connected to another information processing device, a communication device, and the like via the network. For example, data such as an input sentence may be input via the network.
The training processing unit 100 implements training processing (training) in machine learning using the training data. For example, the information processing device 1 functions as a training device that trains a machine learning model of a classifier 110 by the training processing unit 100.
The training processing unit 100 includes a training data update unit 120.
A ground-truth-labeled sentence collection unit 20 is a device that acquires training data to be used for training the classifier 110. The training data may be supervised data in which target data and a ground truth indicating classification (class) to which the target data belongs are paired.
In the present example, the training data includes an existing training data group 21. The classifier 110 is machine-learned using the existing training data group 21. Second training data 22 is added to the existing training data group 21 in order to suppress obsolescence of the existing training data group 21 due to changes in current affairs or the like. The second training data 22 is new training data added to the existing training data group 21.
The training data update unit 120 updates the existing training data group 21 by deleting some data of the existing training data group 21. The training data update unit 120 adds the second training data 22 to the existing training data group 21.
The existing training data group 21 before addition of the second training data 22 and before update is referred to as a “first training data group 211”. The existing training data group 21 after addition of the second training data 22 and after update is referred to as a “second training data group 212”. The second training data group 212 includes the added second training data 22.
During inference, the classifier 110 classifies the input data into a plurality of classes according to the content. The training processing unit 100 implements the training (machine learning) of the classifier 110 during training.
The classifier 110 may be a document classifier that classifies input sentence data into a plurality of fields according to the content.
In
The classifier 110 of
The input layer 112 is given by an n x m matrix corresponding to the number n of dimensions (hidden dimensions) of the hidden layer 114 and the number m of word strings (word string direction). The transformer 113 machine-learns weighting factors so as to classify data into a set ground truth 117. The hidden layer 114 outputs a semantic vector of the input data. A semantic vector is an example of a feature map vector.
The output layer 115 calculates a probability that the input data belongs to each classification (class). In the example of
Note that the configuration of the classifier 110 is not limited to that in
The classifier 110 performs machine learning by adjusting the weighting factors of the transformer 113, the hidden layer 114, and the like such that an error between the determination label 116 by the classifier 110 and the ground truth 117 added to the first training data group 211 becomes small.
The training data update unit 120 may include a new data adding unit 121, a comparison unit 122, and an existing data update unit 123.
The new data adding unit 121 adds the second training data 22 as the new training data to the existing training data group 21 such as the first training data group 211. As a result, the existing training data group 21 is updated from the first training data group 211 to the second training data group 212. The number of second training data 22 to be added is N and may be predetermined. By adding the second training data 22, the new data adding unit 121 prevents obsolescence of the existing training data group 21 due to changes in current affairs or the like.
The second training data 22 (#10, #11, and #12 in
A semantic vector 23 and a determination result are obtained as the second training data 22 is input to the classifier 110 trained using the first training data group 211. The semantic vector 23 is not a word-based semantic vector but a sentence semantic vector. The semantic vector 23 may be represented by values of a plurality of components 1 to 4. The number of components may be appropriately determined. In an example, the number of components is several hundred. The determination result includes the determination label 116.
Returning to
In the data #11 illustrated in
Description will be given taking a case where #7 and #11 are sentences related to virus mutation in
The existing data update unit 123 illustrated in
The similar range determination unit 124 determines a similar range for the different data 221. In
The similar range may be a range on a vector space that satisfies a predetermined relationship with a feature map vector (for example, the semantic vector 23) obtained by vectorizing the different data 221. The similar range will be described with reference to
In
An old classification plane means a boundary plane where the label “society” and the label “science” are distinguished by the classifier 110 trained with the first training data group 211. A new classification plane means a boundary plane where the label “society” and the label “science” are distinguished by the classifier 110 trained with the second training data group 212.
In
The equivalent data 222 most similar to N1 that is the different data 221 is N3. A similar range 130a in N1 that is the different data 221 may be determined to be narrower as the similarity between N1 that is the different data 221 and any data of the plurality of equivalent data 222 (N3 and N4) is higher. The closer the distance in the vector space, the higher the similarity.
The similar range 130a may be determined based on a that is a maximum value of similarities between the different data 221 (N1) and each piece of the plurality of equivalent data 222 (N3 and N4). A similar range 130b may also be determined based on a that is a maximum value of similarities between the different data 221 (N2) and each piece of the plurality of equivalent data 222 (N3 and N4).
In an example, the similar range may be defined for each different data 221 according to 1 - ((1 - a)/2), for example, (1 + a)/2. Furthermore, sizes of the similar ranges 130a and 130b may differ for each piece of the different data 221 (N1 and N2). For example, the similar range 130a for the different data 221 (N1) is a range where the similarity is 0.85 or higher. The similar range 130a for the different data 221 (N2) is a range where the similarity is 0.80 or higher.
In an example, the similarity is cosine similarity. The cosine similarity is a cosine value of an angle made by two vectors and is given by the following equation.
The cosine similarity takes a value of -1 or more and 1 or less. In a case where the cosine similarity is close to 1, the two vectors are close to the same direction. In a case where the cosine similarity is close to -1, the two vectors are close to opposite directions. In a case where the cosine similarity is close to 0, the two vectors are dissimilar. Note that the similarity is not limited to the cosine similarity.
Returning to
Note that, in the comparative example illustrated in
In the first embodiment illustrated in
As illustrated in
In the comparative example illustrated in
Therefore, according to the comparative example, even with the new classification plane in the updated classifier 110, there is a possibility that determination target data C1 in which the ground truth 117 is originally “society” is erroneously determined as “science”, or determination target data C2 in which the ground truth 117 is “science” is erroneously determined as “society”.
In the first embodiment illustrated in
Therefore, according to the information processing device 1 of the first embodiment, it is suppressed that the determination target data C1 in which the ground truth 117 is originally “society” is erroneously determined as “science”, and the determination target data C2 in which the ground truth 117 is “science” is erroneously determined as “society” with the new classification plane in the updated classifier 110.
A training method for the machine learning model in the information processing device 1 as an example of the embodiment configured as described above will be described with reference to the flowchart illustrated in
During training, the training processing unit 100 trains the classifier 110 using the existing training data group 21 (operation S1). The existing training data group 21 is the first training data group 211, for example.
The training processing unit 100 selects the different data 221 in which the determination label 116 inferred by inputting the second training data 22 (new supervised data) to the machine-learned classifier 110 and the ground truth 117 of the second training data 22 are different (operation S2).
The training processing unit 100 updates the existing training data group 21 (operation S3). The training processing unit 100 may delete some data from the first training data group 211 to create the second training data group 212.
After waiting for a certain period of time to elapse (see YES route in operation S10), the processing proceeds to operation S11. Therefore, the processing of operations S11 to S17 may be executed at every certain period of time.
In operation S11, the training processing unit 100 receives the second training data 22 (new supervised data). The second training data 22 may be acquired via the ground-truth-labeled sentence collection unit 20.
In operation S12, the training processing unit 100 may set a timestamp for each piece of training data. The timestamp is information indicating date and time when the training data has been registered.
In operation S13, the training processing unit 100 inputs the second training data 22 to the classifier 110, and calculates the semantic vector 23 and a label determination result as illustrated in
In operation S14, the comparison unit 122 compares the determination label 116 with the ground truth 117. In a case where the determination label 116 and the ground truth 117 are the same (see YES route in operation S15), the comparison unit 122 registers the second training data 22 in the group of the equivalent data 222 (operation S16). In a case where the determination label 116 and the ground truth 117 are different (see NO route in operation S15), the comparison unit 122 registers the second training data 22 in the group of the different data 221 (operation S17).
The new data adding unit 121 waits until the number of second training data 22 exceeds a specified number (see YES route in operation S20), and additionally registers the second training data 22 in the existing training data group 21 (operation S21). The new data adding unit 121 performs processing of adding the second training data 22 to the first training data group 211.
In operation S22, the similar range determination unit 124 may calculate the cosine similarity between each piece of the different data 221 (for example, N1 and N2 in
In operation S23, the similar range determination unit 124 determines the similar range 130 for each piece of the different data 221 (for example, N1 and N2 in
In an example, the similar range determination unit 124 calculates the maximum value a in the cosine similarity between each different data 221 and all the equivalent data 222. The similar range determination unit 124 may determine the similar range 130 for each different data 221 by (1 + a)/2. The similar range determination unit 124 may determine the similar range 130 differently according to each piece of the different data 221. The similar range determination unit 124 may determine, for each different data 221, the similar range to be narrower as the similarity between the different data 221 (for example, N1 or N2 in
In operation S24, the removal unit 125 acquires the similarity between the different data 221 and the existing training data group 21. The removal unit 125 acquires the similarity between the different data 221 and the existing training data group 21. For example, the removal unit 125 calculates the cosine similarity between different data 221 and each first training data included in the first training data group 211.
In operation S25, the removal unit 125 determines whether there is data included within the similar range 130 among the training data of the existing training data group 21. For example, the removal unit 125 determines whether there is data included within the similar range 130 among the plurality of first training data included in the first training data group 211. In a case where there is data included within the similar range 130 among the training data of the existing training data group 21 (see YES route of operation S25), the removal unit 125 removes the data from the existing training data group 21 (operation S26), in a case where there is no data included within the similar range 130 among the training data of the existing training data group 21 (see NO route of operation S25), the processing proceeds to operation S27.
In operation S27, the removal unit 125 may further remove (N - S) pieces of the plurality of first training data in order from the oldest addition time. N is the number of newly added second training data 22 and S is the number of first training data to be removed as being included within the similar range 130.
During retraining, the training processing unit 100 retrains the classifier 110 using the updated existing training data group 21 (operation S4). The updated existing training data group 21 is, for example, the second training data group 212 obtained by updating the first training data group 211.
The second training data group 212 may be further re-updated by adding the new second training data 22 to the updated second training data group 212. In this case, the second training data group 212 before re-update is set as the first training data group 211 and the training data group after re-update is set as the second training data group 212. Then, the existing training data group 21 may be sequentially updated by applying the methods illustrated in
An information processing device 1 of a second embodiment will be described. A hardware configuration of the information processing device 1 of the second embodiment is similar to the hardware configuration of the first embodiment illustrated in
In the first embodiment, the processing of determining the similar range 130 by the calculation formula is performed for each piece of different data 221 of the second training data 22. For example, the similar range determination unit 124 changes the size of the similar range 130 according to the different data 221. However, in the second embodiment, the size of a similar range 130 may be fixed for each piece of different data 221. The size of the similar range 130 is represented by a distance R (where R is a constant) from each different data 221 in a feature map vector (semantic vector 23) space. The value of R may be predetermined.
Operations during training and retraining by the information processing device 1 of the second embodiment are similar to those of the information processing device 1 of the first embodiment illustrated in
The operation during inference of the information processing device 1 of the second embodiment is common to the operation of the information processing device 1 of the first embodiment illustrated in
In operation S32, a similar range determination unit 124 determines a similar range 130, which is a fixed range for each piece of different data 221 in second training data 22.
According to the information processing device 1 of the second embodiment, calculation using the equivalent data 222 is not needed for determining the similar range 130. Therefore, obsolete data can be deleted with a simplified configuration.
An information processing device 1 of a third embodiment will be described. A hardware configuration of the information processing device 1 of the third embodiment is similar to the hardware configuration of the first embodiment illustrated in
A removal unit 125 notifies the complementing unit 126 of index data.
Second training data 22a may be generated through processing by a training processing unit 100 instead of being obtained from a ground-truth-labeled sentence collection unit 20 as in
In an example, the index data 26 includes components of second training data 22 (N1 in
The index data 26 is generated for each of a plurality of similar ranges 130 (in the case of
Unlike the first and second embodiments, a sentence collection unit 27 may acquire unlabeled new training data candidates 251 to which a ground truth 117 is not added. The unlabeled new training data candidate 251 may be a target data candidate before the ground truth 117 is added in supervised data.
The unlabeled new training data candidate 251 is input to a classifier 110. The classifier 110 infers and outputs a feature map vector (semantic vector 23) corresponding to the unlabeled new training data candidate 251.
The complementing unit 126 selects labeling-waiting data 252 from the unlabeled new training data candidates 251 based on the feature map vector (semantic vector 23) inferred by the classifier 110 and the index data 26. The labeling-waiting data 252 is target data to which the ground truth 117 is attached.
The complementing unit 126 refers to an index range 132 (corresponding to the similar range 130 in an example) included in the index data 26. The index range 132 may be defined by, for example, a threshold for the cosine similarity. For example, for the index data 26 (N1), the index range 132 is equal to or greater than 0.85, and for index data 26 (N2), the index range is equal to or greater than 0.8.
The complementing unit 126 selects the labeling-waiting data 252 included in the index range 132 from the third table 28 illustrated in
As illustrated in
The ground truth 117 is added to the labeling-waiting data 252 to generate the second training data 22a. The ground truth 117 is added to data registered as the labeling-waiting data 252. The addition of the ground truth 117 may be performed by an operator, in an example.
Operations during training and retraining by the information processing device 1 of the third embodiment are similar to those of the information processing device 1 of the first embodiment illustrated in
After waiting for a certain period of time to elapse (see YES route in operation S40), the processing proceeds to operation S41. Therefore, the processing of operations S41 to S49 may be executed at every certain period of time.
In operation S41, the training processing unit 100 receives the unlabeled new training data candidates 251 (classification target data). The unlabeled new training data candidates 251 may be obtained from the sentence collection unit 27.
In operation S42, the complementing unit 126 determines whether there is the index data 26. In a case where there is no index data 26 (see NO route in operation S42), the processing proceeds to operation S43. In a case where there is index data 26 (see YES route of operation S42), the processing proceeds to operation S44.
In operation S43, the complementing unit 126 randomly selects data for the required number of second training data from the unlabeled new training data candidates 251. The complementing unit 126 registers the selected unlabeled new training data candidate 251 as the labeling-waiting data 252.
In operation S44, the complementing unit 126 acquires information of the index data 26. The index data 26 may include information such as the components of the corresponding second training data 22, the index range, and the number of deleted first training data, as illustrated in
In operation S45, the training processing unit 100 inputs the unlabeled new training data candidate 251 to the classifier 110 and acquires the feature map vector (semantic vector 23).
In operation S46, the complementing unit 126 calculates the similarity between each piece of the index data 26 and the unlabeled new training data candidate 251.
In operation S47, the complementing unit 126 selects and registers the unlabeled new training data candidates 251 within the index range corresponding to the similar range 130 or the like as the labeling-waiting data 252.
In a case where the number of registered labeling-waiting data 252 is a specified number or more (see YES route of operation S48), the processing is completed. In a case where the number of registered labeling-waiting data 252 is not the specified number or more (see NO route in operation S48), the processing proceeds to operation S49.
In operation S49, the complementing unit 126 randomly selects and registers the required number of labeling-waiting data from the remaining unlabeled new training data candidates 251.
The ground truth 117 is added to the labeling-waiting data 252 to generate the new second training data 22a. The ground truth 117 may be added by the operator according to content of a sentence.
In a case where the labels are assigned to a specified number or more of labeling-waiting data (see YES route in operation S50), the processing proceeds to operation S51 and subsequent operations.
In operation S51, the training processing unit 100 may set a timestamp for each training data. The timestamp is information indicating date and time when the training data has been registered.
In operation S52, the training processing unit 100 inputs the second training data 22a to the classifier 110, and calculates the label determination result as illustrated in
Processing of operations S53 to S56 is similar to the processing of operations S14 to S17 in
The processing of
In operation S67, the removal unit 125 generates the index data 26 based on the similar range 130 from which the first training data included in the existing training data group 21 (first training data group 211) has been removed or the removed first training data.
A region from which the first region data has been removed becomes a region with sparse training data in a vector space. Therefore, by preferentially collecting new training data based on the index data 26, it is possible to preferentially replenish the training data for the sparse region.
The processing of
In operation S76, the removal unit 125 generates the index data 26 based on the similar range 130 from which the first training data included in the existing training data group 21 (first training data group 211) has been removed or the removed first training data.
Thus, in the methods according to the first to third embodiments, the computer uses the determination label 116 inferred by inputting the second training data 22 to the classifier 110 machine-learned using the first training data group 211 including the plurality of first training data. The computer executes the processing of determining the similar range 130 for the second training data 22 in the case where the determination label 116 and the ground truth 117 of the second training data 22 are different. Then, the computer executes the processing of removing at least the first training data included in the similar range 130 from among the plurality of first training data to create the second training data group 212. Then, the computer executes the processing of newly performing machine learning for the classifier 110 using the second training data group 212.
According to the above method, it is possible to suppress a decrease in the data classification accuracy due to obsolescence of training data. Resolved is the situation where the ground truths 117 are different despite the fact that the feature map vectors such as the semantic vectors 23 are data with similar content. Therefore, it is possible to reduce the influence of the first training data with the outdated ground truth 117, thereby suppressing the decrease in the classification accuracy.
The second training data group 212 further includes the second training data 22. Therefore, even in the case where the second training data 22 is added, resolved is the situation where pieces of data with the different ground truths 117 coexists despite the fact that existing first training data group 211 and the second training data 22 are similar data. Therefore, it is possible to reduce the influence of the first training data with the outdated ground truth 117, thereby suppressing the decrease in the classification accuracy.
The processing of determining the similar range 130 determines the range indicating the similarity equal to or greater than a predetermined value with respect to the feature map vector obtained by vectorizing the second training data 22 as the similar range 130 for the second training data 22. Therefore, it is possible to resolve the situation where pieces of data with the different ground truths 117 coexist despite the fact that the feature map vectors such as the semantic vectors 23 are data with similar content.
The second training data 22 includes a plurality of the different data 221 in which the determination label 116 and the ground truth 117 are different, and a plurality of the equivalent data 222 in which the determination label 116 and the ground truth 117 are the same. The similar range 130 is determined to be narrower as the similarity between any data of the plurality of equivalent data 222 and the different data 221 is higher. The similar range 130 is determined for each different data 221.
Therefore, it is possible to remove the first training data within an optimal range for each different data 221.
In the different data 221, the similar range 130 is determined for each different data 221 according to (1 + a)/2, where the maximum value of the similarity between each of the plurality of equivalent data 222 and the different data 221 is α.
Therefore, it is possible to quantitatively remove the first training data within an optimal range for each different data 221.
(N - S) pieces of the plurality of first training data are further removed in order from the oldest addition time in the case where the number of second training data 22 is N and the number of first training data to be removed as being included in the similar range 130 is S.
Therefore, it is possible to suppress obsolescence of training data.
The index data 26 to serve as an index for collecting the new second training data is generated based on the different data 221 that is the second training data 22 which corresponds to the similar range 130 from which the first training data has been removed and in which the determination label 116 and the ground truth 117 are different, or the removed first training data. Then, the new second training data 22 is collected based on the similarity with respect to the index data 26.
Therefore, it is possible to preferentially replenish the training data to the region where the training data is sparse due to the removal of the first region data. Therefore, it is possible to prevent a decrease in the classification accuracy due to sparse training data.
The disclosed technique is not limited to the embodiment described above, and various modifications may be made without departing from the gist of the present embodiment. For example, each configuration and each processing of the present embodiment may be selected or omitted as needed or may be appropriately combined.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2022-079842 | May 2022 | JP | national |