This application is based upon and claims the benefit of priority from prior Japanese Patent Application No. 2022-118861, filed Jul. 26, 2022, the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to an additional training apparatus, an additional training method, and a storage medium.
In manufacturing industries, there is a case where a trained model is practically used which is trained to automatically classify defects of products, based on training data including existing data acquired in a manufacturing process of the products, and a classification of defects according to the existing data. In this case, if new data with a distribution that does not exist in the training data occurs with the passing of time, such a situation occurs that the classification accuracy of defects is low in the trained mode that is being practically used. In this situation, it is necessary to update the trained model such that defects can accurately be classified also in regard to the new data of the products.
In connection with this, there is known a method of updating a trained model by applying additional training (fine tuning) to the trained model by using additional training data in which new data is added to existing data.
According to the study by the present inventor, in this additional training method, in a case where the existing data is randomly down-sampled at a time of preparing the additional training data, the tendency and features that the existing data has are lost due to the random sampling. Consequently, there may be a case where the classification accuracy of the trained model in regard to the existing data deteriorates.
Accordingly, as the method of additional training, it is desirable to improve the classification accuracy of the new data while maintaining the classification accuracy of the existing data.
In general, according to one embodiment, an additional training apparatus includes processing circuitry. The processing circuitry is configured to store, in a memory, a plurality of pieces of existing training data in which existing data is input data and a classification of defects according to the existing data is output data, and cluster data representing clusters to which the respective pieces of existing training data belong. The processing circuitry is configured to extract, based on the cluster data, a plurality of pieces of first existing training data from the pieces of existing training data in accordance with a size of each of the clusters. The processing circuitry is configured to acquire a plurality of pieces of new training data in which new data is input data and a classification of defects according to the new data is output data. The processing circuitry is configured to store in the memory a plurality of pieces of additional training data that are based on the pieces of first existing training data and the pieces of new training data.
Hereinafter, embodiments are described with reference to the accompanying drawings. In the description below, similar structures are indicated by identical reference signs, and an overlapping description is omitted.
Here, the cluster data storage unit 10 stores a plurality of pieces of existing training data in which existing data is input data and a classification of defects according to the existing data is output data, and cluster data representing clusters to which the respective pieces of existing training data belong. As the output data, for example, use may be made of labels indicative of the presence/absence of a defect, or use may be made of labels indicative of the presence/absence of a defect and the kind of defect. The cluster data storage unit 10 may include an acquisition unit that acquires a plurality of pieces of existing training data; a computation unit that clusters the plurality of pieces of existing training data and computes cluster data representing clusters to which the respective existing training data belong; and a storage unit that stores the pieces of existing training data and the cluster data. Alternatively, the cluster data storage unit 10 may include only the storage unit among the acquisition unit, the computation unit and the storage unit, with the acquisition unit and the computation unit being configured as separate units. As the existing data, for example, use can be made of, as appropriate, management data collected in the manufacturing process of products, or data relating to quality management, such as inspection images acquired in inspections of products. The cluster data is information relating to clusters that are generated at a time of clustering the existing data. Specifically, for example, as the cluster data, use can be made of, as appropriate, information of clusters to which respective existing training data belong, center coordinates of each cluster within a feature space, and a distance from the center of the cluster to which each of the existing training data belongs. The cluster data, between the existing training data and the cluster data, may be stored in a cluster data computation apparatus (not illustrated), or in a server apparatus connected to a cloud.
The data extraction unit 20 extracts data used for additional training, from the pieces of existing training data in the cluster data storage unit 10. Specifically, for example, based on the cluster data, the data extraction unit 20 extracts a plurality of pieces of first existing training data (hereinafter referred to as “first data”) from the pieces of existing training data in accordance with the size of each cluster. At this time, the data extraction unit 20 may randomly select and extract the pieces of first data. In addition, the data extraction unit 20 may select and extract the pieces of first data in accordance with a distance between the pieces of existing training data. Besides, the data extraction unit 20 may select and extract the pieces of first data in accordance with the distribution of each of the pieces of existing training data. Furthermore, the data extraction unit 20 may select and extract the pieces of first data in accordance with a label ratio of the pieces of existing training data in the clusters. In other words, within the range corresponding to the size of each cluster, the first existing training data can be acquired by random extraction, extraction according to the distance between data, extraction according to the distribution of data, or the label ratio of data in the clusters. In addition, the data extraction unit 20 may execute a plurality of kinds of algorisms in order to extract the first data.
The new training data acquisition unit 30 acquires a plurality of pieces of new training data in which new data is input data and a classification of defects according to the new data is output data. Here, the new training data is training data that has occurred with a distribution not existing in the existing training data, with the passing of time from the time of acquisition of the existing training data. Similarly as described above, as the new data, use can be made of, as appropriate, management data collected in the manufacturing process of products, or data relating to quality management, such as inspection images acquired in inspections of products.
The additional training data storage unit 40 stores a plurality of pieces of additional training data that are based on the pieces of first data and the pieces of new training data. Here, the pieces of additional training data may include all first data extracted by the data extraction unit 20 and all new training data acquired by the new training data acquisition unit 30. In addition, the pieces of first existing training data included in the pieces of additional training data may be some pieces of first data among the pieces of extracted first data. Besides, the pieces of new training data included in the pieces of additional training data may be some pieces of new training data among the pieces of acquired new training data. If a supplementary description is given, the meaning of “based on” is not limited to a case of storing all pieces of data, but includes a case of storing some pieces of data.
The advance training data storage unit 50 stores, as a plurality of pieces of advance training data, a plurality of pieces of second existing training data (hereinafter referred to as “second data”) that are different from the pieces of first data, from among the pieces of existing training data.
The advance training unit 60 generates an advance training model by applying training to a training model, based on the pieces of advance training data. In the advance training, there is no need to use only a single training method for the training of a training model, and a plurality of kinds of algorithms may be executed. Besides, the training model may be called “classification model”.
The advance training model storage unit 70 stores the advance training model generated by the advance training unit 60.
The additional training unit 80 applies additional training to the advance training model, based on the pieces of additional training data. In the additional training, there is no need to use only a single training method for the training of the advance training model, and a plurality of kinds of algorithms may be executed.
Next, an example of an operation of the additional training apparatus with the above configuration is described with reference to a flowchart of
In step S10, the acquisition unit in the cluster data storage unit 10 acquires a plurality of pieces of existing training data in which existing data is input data and a classification of defects according to the existing data is output data. In addition, as illustrated in
In step S20, based on the cluster data, the data extraction unit 20 extracts a plurality of pieces of first data from the pieces of existing training data in accordance with the size of each of the clusters A, B and C. The pieces of first data are extracted from the pieces of existing training data in a state in which the size ratio between the clusters A, B and C after the clustering is maintained. Since the extracted first data retain the distribution of the original data, even if the data set becomes smaller, robust training for the existing training data can be performed. In addition, the data extraction unit 20 extracts, as a plurality of pieces of second data, a plurality of pieces of existing training data that are different from the pieces of first data, from among the pieces of existing training data. Note that the extraction of the second data may be executed in step S50 to be described later.
In step S30, the new training data acquisition unit 30 acquires a plurality of pieces of new training data in which new data is input data and a classification of defects according to the new data is output data.
In step S40, a plurality of pieces of additional training data, which are based on the pieces of first data and the pieces of new training data, are stored.
In step S50, the advance training data storage unit 50 stores, as a plurality of pieces of advance training data, a plurality of pieces of second data that are different from the pieces of first data, from among the pieces of existing training data. Note that, aside from a case of storing all pieces of second data, the advance training data storage unit 50 may store some pieces of second data among the pieces of second data. Specifically, the advance training data storage unit 50 stores the advance training data that are based on the second data other than the extracted first data.
In step step S60, the advance training unit 60 generates an advance training model by applying training to a training model, based on the pieces of advance training data.
In step S70, the advance training model storage unit 70 stores the generated advance training model.
In step S80, the additional training unit 80 applies additional training to the advance training model, based on the pieces of additional training data.
In addition, in
In
The distributions of certainty illustrated in part (a) of
As described above, according to the first embodiment, the cluster data storage unit 10 stores a plurality of pieces of existing training data in which existing data is input data and a classification of defects according to the existing data is output data, and cluster data representing clusters to which the respective pieces of existing training data belong. Based on the cluster data, the data extraction unit 20 extracts a plurality of pieces of first data (first existing training data) from the pieces of existing training data in accordance with the size of each cluster. The new training data acquisition unit 30 acquires a plurality of pieces of new training data in which new data is input data and a classification of defects according to the new data is output data. The additional training data storage unit 40 stores a plurality of pieces of additional training data that are based on the pieces of first data and the pieces of new training data.
In this manner, by the configuration that creates the additional training data including the pieces of existing training data according to the size of each cluster and the pieces of new training data, the classification accuracy of the new data can be enhanced while maintaining the classification accuracy of the existing data. If a supplementary description is given, while the classification accuracy of the training model is maintained for the existing data, the training is performed by adding new defects at the time of additional training, and thereby the classification accuracy of the training model can be enhanced for the new data that newly occurs.
In addition, according to the first embodiment, the advance training data storage unit 50 stores, as a plurality of pieces of advance training data, a plurality of pieces of second data (second existing training data) that are different from the pieces of first data, from among the pieces of existing training data. The advance training unit 60 generates an advance training model by applying training to a training model, based on the pieces of advance training data. The additional training unit 80 applies additional training to the advance training model, based on the pieces of additional training data.
Accordingly, the advance training model, which is trained by the second data that are left after the extraction of the first data, is additionally trained by the first data and the new training data, and thereby the classification accuracy of the new data can be enhanced without deteriorating the classification accuracy of the existing data.
Furthermore, for example, compared to the above-described comparison example, the training cost can be reduced. If a supplementary description is given, in the above-described comparative example, a plurality of pieces of sub-training data are created by sampling from training data, and a model meeting a condition is selected from models additionally trained by the respective sub-training data and evaluation results thereof. In the method of the comparative example, although the size of sub-training data for additional training becomes smaller, the number of times of training using the sub-data becomes larger, and multiple times of additional training are necessary, leading to an increase in cost of additional training.
By contrast, according to the first embodiment, after the clustering, the first data are extracted by maintaining the size ratio between the clusters, and a high classification accuracy can be obtained by one-time additional training.
As another comparative example, a case is discussed in which training data are clustered in order to extract data while retaining a data tendency before extraction, and additional training data are extracted at a ratio equal to the sizes of the clusters after the clustering. In this another comparative example, since the clustering is performed after adding new data to existing data, there is a possibility that features of new data included in the additional training data after the extraction are not retained. Thus, in this another comparative example, it is estimated that an improvement of the classification accuracy for the new data is not expected in the training model after additional training.
By contrast, according to the first embodiment, at a time of combining new training data into additional training data, the new training data are added in such a form that the new training data are added to the existing training data according to the size of each cluster, and therefore the features of the new data can be retained while the classification accuracy of the existing data is maintained. In other words, according to the first embodiment, at a time of the occurrence of new data, existing training data are clustered, and additional training data are created based on the first data in which the distribution information of the existing training data is maintained, and the new training data that is newly acquired, and therefore the features of the new data can be maintained.
Next, a modification of the first embodiment is described. The modification is similarly applicable to each of embodiments to be described below.
In the first embodiment, in step S20, the details of the operation in which the data extraction unit 20 extracts a plurality of pieces of first data are not specified, but the details of the operation may be specified as follows. For example, the data extraction unit 20 may randomly select and extract a plurality of first existing training data. In addition, the data extraction unit 20 may select and extract a plurality of pieces of first existing training data in accordance with the distance between the pieces of first existing training data. Further, the data extraction unit 20 may select and extract a plurality of pieces of first existing training data in accordance with the distribution of the respective pieces of first existing training data. Besides, the data extraction unit 20 may select and extract a plurality of pieces of first existing training data in accordance with the label ratio of the respective pieces of first existing training data in the clusters. Such modifications can also obtain the same advantageous effects as the first embodiment.
Next, an additional training apparatus according to a second embodiment is described.
The second embodiment is a modification of the first embodiment, and is configured to make higher the accuracy of labels of the first data.
In
The labeling unit 90 labels the pieces of first data extracted by the data extraction unit 20 by labels with higher accuracy.
Accordingly, the additional training data storage unit 40 stores a plurality of pieces of additional training data that are based on the pieces of labeled first data, and pieces of new training data.
The other configuration is the same as in the first embodiment.
Next, an operation of the additional training apparatus with the above configuration is described with reference to a flowchart of
In the same manner as described above, steps S10 and S20 are executed, and a plurality of pieces of first data and a plurality of pieces of second data are extracted from a plurality of pieces of existing training data.
After step S20, in step S22, the labeling unit 90 labels the extracted pieces of first data by labels with higher accuracy.
After step S22, step S30 is executed similarly as described above, and a plurality of pieces of new training data are acquired.
After step S30, in step S40, the additional training data storage unit 40 stores a plurality of pieces of additional training data that are based on the pieces of labeled first data and the pieces of new training data.
Subsequently, steps S50 to S80 are executed similarly as described above.
As described above, according to the second embodiment, the labeling unit 90 labels the extracted pieces of first data by labels with higher accuracy. The additional training data storage unit 40 stores the pieces of additional training data that are based on the pieces of labeled first data and the pieces of new training data. Accordingly, in addition to the advantageous effects of the first embodiment, by the configuration of applying labeling with higher accuracy to the first data that are before being added to the additional training data, it can be expected that a training model with higher robustness to the existing data is created.
Next, a modification of the second embodiment is described. The modification is similarly applicable to each of embodiments to be described below.
In the second embodiment, the labeling unit 90 labels the pieces of first data by labels with higher accuracy, but the second embodiment is not limited to this. For example, a further labeling unit may label the pieces of new training data by labels with higher accuracy. In this case, by the configuration of applying labeling with accuracy to the new training data that are before being added to the additional training data, it can be expected that a training model with higher robustness to the new data is created.
Next, an additional training apparatus according to a third embodiment is described.
The third embodiment is a modification of the first embodiment, and is configured to hold down the size of new training data.
Here, the new training data cluster computation unit 100 clusters a plurality of pieces of new training data acquired by the new training data acquisition unit 30, and computes cluster data representing clusters to which the respective pieces of new training data belong. Note that the new training data cluster computation unit 100 is an example of a cluster computation unit.
Based on the cluster data, the new training data extraction unit 110 extracts a plurality of pieces of new data for additional training from the pieces of new training data, while maintaining the features of the pieces of new training data.
Accordingly, the additional training data storage unit 40 stores a plurality of pieces of additional training data by using the extracted pieces of new data for additional training as a plurality of pieces of new training data.
The other configuration is the same as in the first embodiment.
Next, an operation of the additional training apparatus with the above configuration is described with reference to a flowchart of
In the same manner as described above, steps S10 to S30 are executed, and a plurality of pieces of new training data are acquired.
In step S32, the new training data cluster computation unit 100 clusters a plurality of pieces of new training data acquired in step S30, and computes cluster data representing clusters to which the respective pieces of new training data belong.
After step S32, in step S34, based on the cluster data, the new training data extraction unit 110 extracts a plurality of pieces of new data for additional training (new training data) from the pieces of new training data, while maintaining the features of the pieces of new training data.
After step S34, in step S40, the additional training data storage unit 40 stores a plurality of pieces of additional training data by using, as pieces of new training data, the pieces of new data for additional training, which are extracted in step S34. In other words, the additional training data storage unit 40 stores a plurality of pieces of additional training data that are based on the pieces of first data and the pieces of new training data (new data for additional training).
Subsequently, steps S50 to S80 are executed similarly as described above.
As described above, according to the third embodiment, the new training data cluster computation unit 100 clusters a plurality of pieces of new training data, and computes cluster data representing clusters to which the respective pieces of new training data belong. Based on the cluster data, the new training data extraction unit 110 extracts a plurality of pieces of new data for additional training from the pieces of new training data, while maintaining the features of the pieces of new training data. The additional training data storage unit 40 stores a plurality of pieces of additional training data by using, as pieces of new training data, the pieces of new data for additional training. Accordingly, in addition to the advantageous effects of the first embodiment, power saving in additional training can be expected by holding down the size of the data set by extracting the new training data before being added.
Next, a modification of the third embodiment is described. In the third embodiment, compared to the configuration illustrated in
The additional training apparatus 1 includes, as hardware, a CPU (Central Processing Unit) 2, a RAM (Random Access Memory) 3, a program memory 4, an auxiliary storage device 5, and an input/output interface 6. The CPU 2 communicates with the RAM 3, program memory 4, auxiliary storage device 5 and input/output interface 6 via a bus. Specifically, the additional training apparatus 1 of the present embodiment is implemented by a computer with this hardware configuration.
The CPU 2 is an example of a general-purpose processor. The RAM 3 is used by the CPU 2 as a working memory. The RAM 3 includes a volatile memory such as an SDRAM (Synchronous Dynamic Random Access Memory). The program memory 4 stores a program for implementing the respective components according to each embodiment. This program may be, for example, a program for enabling the computer to implement the functions of the respective components illustrated in the first to third embodiments. In addition, as the program memory 4, for example, a ROM (Read-Only Memory), a part of the auxiliary storage device or a combination thereof is used. The auxiliary storage device 5 non-transitorily stores data. The auxiliary storage device 5 includes a nonvolatile memory such as an HDD (hard disk drive) or an SSD (solid state drive). The auxiliary storage device 5 is an example of a memory.
The input/output interface 6 is an interface for connection to other devices. The input/output interface 6 is used, for example, for connection to a keyboard, a mouse and a display.
The program stored in the program memory 4 includes computer executable instructions. If the program (computer executable instructions) is executed by the CPU 2 that is processing circuitry, the program causes the CPU 2 to execute a predetermined process. For example, if the program is executed by the CPU 2, the program causes the CPU 2 to execute sequential processes described in connection with the respective components in
The program may be provided to the additional training apparatus 1 that is a computer, in a state in which the program is stored in a computer readable storage medium. In this case, for example, the additional training apparatus 1 further includes a drive (not illustrated) that reads data from the storage medium, and acquires the program from the storage medium. As the storage medium, for example, use can be made of, as appropriate, a magnetic disk, an optical disc (CD-ROM, CD-R, DVD-ROM, DVD-R, or the like), a magneto-optical disc (MO or the like), or a semiconductor memory. The storage medium may be called “non-transitory computer readable storage medium”. In addition, the program may be stored in a server on a communication network, and the additional training apparatus 1 may download the program from the server by using the input/output interface 6.
The processing circuitry that executes the program is not limited to a general-purpose hardware processor such as the CPU 2, and a purpose-specific hardware processor such as an ASIC (Application Specific Integrated Circuit) may be used. The term “processing circuitry (processing unit)” includes at least one general-purpose hardware processor, at least one purpose-specific hardware processor, or a combination of at least one general-purpose hardware processor and at least one purpose-specific hardware processor. In the example illustrated in
According to at least one of the above-described embodiments, the classification accuracy of new data can be enhanced while the classification accuracy of existing data is maintained.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2022-118861 | Jul 2022 | JP | national |