This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2022-035575, filed on Mar. 8, 2022, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a non-transitory storage medium stored with an information processing program, an information processing device, and an information processing method.
In a known technology called open set recognition, data items that are contained in data input to a classification model for classifying input data into one or another of plural classes but that was not contained in training data employed for training are detected as items of data of an unknown class. An example of a conceivable application of this technology is, for example, to return an error in cases in which an item of data of a class not contained in the training data has been input to a classification model, and to interrupt processing prior to a more serious problem occurring due to misdetection by the classification model. Another conceivable possible application is to generate a dedicated classification model by dividing data into trained classes and untrained classes, and performing labeling of items of data only in the untrained classes, so as to implement sequential learning.
As technology related to open set recognition, for example, there is a proposal for an information processing device that determines whether or not new item of target data is an item of target data of an unknown classification. This information processing device generates feature data by extracting features from new target data items. This information processing device also takes an assembly of target data items built up from already classified target data items and new target data items, and then performs clustering thereon based on the feature data of the already classified target data items and new target data items, so as to cluster into a number of clusters that is the number of classifications when the classified target data were classified+1. Such an information processing device performs a query output regarding a new classification of target data in cases in which there is a cluster in the clustering result that appears only in the new target data.
According to an aspect of the embodiments, a non-transitory recording medium is stored with a program that causes a computer to execute a processing process. The process includes: for a classification model for classifying input data into one or another of plural classes that was trained using a first data set, identifying, in a second data set that is different from the first data set one or more items of data having a specific datum of which a degree of contribution to a change in a classification criterion is greater than a predetermined threshold, the classification criterion being a classification criterion of the classification model during re-training based on the second data set; and from among the one or more items of data, detecting an item of data, for which a loss reduces for the classification model by change to the classification criterion by re-training based on the second data set, as an item of data of an unknown class not contained in the plural classes.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Explanation follows regarding an example of an exemplary embodiment according to technology disclosed herein, with reference to the drawings.
As illustrated in
Explanation follows regarding a general method of open set recognition. As illustrated in
In a general method of open set recognition such as described above, when generating the classification model there is an expectation that an unknown class will be input at the time of application, and that there is a need to train the classification model using a special method. Moreover, in cases in which data of an unknown class is contained in application data, conceivably a classification model generated by normal training not anticipating the unknown class might be re-used, and the classification model re-trained. However, sometimes the training data set is not available at the time of application of the classification model, due to the training data set having been returned or the like. In such cases, re-training of the classification model is not able to be executed due to not being able to determine which items of data is in an unknown class from among the target data items contained in the target data set at the time of application.
A conceivable approach in cases in which a classification model trained on a training data set is available, but the training data set is not available, might be to detect data of an unknown class in the target data set based on the classification model and on the target data set.
For example as illustrated in
However, in cases in which the target data items are moved so as to eliminate change to the classification model, as illustrated in
To address this issue, in the present exemplary embodiment, items of data of an unknown class are detected based on differences in properties exhibited by the items of target data with respect to change in the classification model during re-training with the target data set. More specifically, as illustrated in
The items of data of an unknown class are items of data having the above property (2). This insight is utilized in the present exemplary embodiment to detect the data of an unknown class with good accuracy. Detailed explanation follows regarding functional sections of an information processing device according to the present exemplary embodiment.
The information processing device 10 includes, from a functional perspective, an identification section 12 and a detection section 14, as illustrated in
The identification section 12 identifies one or more item of target data from a target data set that, when re-training based on the target data set, is target data having a degree of contribution of a specific datum or greater to a change in weight for identifying a decision plane of the classification model 20. Note that the weight is an example of a classification criterion of technology disclosed herein. Specifically, the identification section 12 computes, as the degree of contribution, a movement distance when each item of target data contained in the target data set is moved so as to reduce an update value of the weight of the classification model 20 when re-training based on the target data set.
More specifically, as illustrated in
As illustrated in
Moreover, the identification section 12 computes an update value |Δw| of a weight with respect to the loss sum ΣL as an index expressing change to the classification model 20. The identification section 12 may also compute, as the update value in cases in which the classification model 20 is a differentiable model such as a neural network, a gradient magnitude indicating the effect a loss imparts to the weight of the classification model 20 for the target data. More specifically as illustrated in
Moreover, as the degree of contribution to change of the classification model 20 during re-training with the target data set, the identification section 12 takes a movement distance |Δx| of the target data for a case in which the individual items of the target data (input data x) are moved to reduce the update value of the weight |Δw|. The identification section 12 then identifies any items of target data for which the movement distance is a predetermined threshold or greater. In cases in which the classification model 20 is a differentiable model, the identification section 12 may compute a magnitude of gradient for each item of target data with respect to the magnitude of gradient of the weight with respect to the loss, as the movement distance. Specifically as illustrated in
The detection section 14 detects, as being data of an unknown class, any items of target data, from among the one or more data items identified by the identification section 12, for which the loss decreases for the classification model 20 by change in weight due to being re-trained based on the target data set. Specifically, the detection section 14 detects, as being data of an unknown class, any items of target an item of data having a positive increase amount of loss in cases in which the one or more identified items of target data are moved in directions to suppress change to the classification model 20 from re-training.
More specifically as illustrated in
The information processing device 10 may, for example, be implemented by a computer 40 as illustrated in
The storage section 43 may, for example, be implemented by a hard disk drive (HDD), solid state drive (SSD), or flash memory. The storage section 43 serves as a storage medium stored with an information processing program 50 that causes the computer 40 to function as the information processing device 10. The information processing program 50 includes an identification process 52 and a detection process 54. The storage section 43 also includes an information storage region 60 stored with information configuring the classification model 20.
The CPU 41 reads the information processing program 50 from the storage section 43, expands the information processing program 50 in the memory 42, and sequentially executes the processes included in the information processing program 50. By executing the identification process 52, the CPU 41 acts as the identification section 12 illustrated in
Note that the functions implemented by the information processing program 50 may also be implemented by, for example, a semiconductor integrated circuit, and more specifically by an application specific integrated circuit (ASIC).
Next, description follows regarding operation of the information processing device 10 according to the present exemplary embodiment. The classification model 20 trained by machine learning using the training data set is stored in the information processing device 10, and the information processing illustrated in
At step S10 the identification section 12 acquires the target data set input to the information processing device 10. Then at step S12, the identification section 12 labels the target data based on the output obtained by inputting each of the items of target data contained in the target data into the classification model 20. The identification section 12 then computes a sum of losses, which are classification errors between the output when each of the items of target data contained in the target data was input to the classification model 20 and their respective correct labels, and computes an update value for the weight of the classification model 20 with respect to the loss sum.
Next, at step S14, the identification section 12 computes movement distances when each of the items of target data contained in the target data set is moved so as to reduce the computed update value of the weight. Next, at step S16 the identification section 12 identifies any items of target data having a computed movement distance that is the specific threshold or greater. This threshold may be a predetermined value, and may be a value dynamically determined so as to detect specific individual items of target data in sequence from the greatest movement distance.
Next, at step S18, the detection section 14 computes an increase amount of the loss when each of the identified items of target data has been moved in a direction to suppress change to the classification model 20 by re-training. Next, at step S20, the detection section 14 detects target data for which the increase amount of the computed loss is positive as being data of an unknown class. Next, at step S22, the detection section 14 outputs the detection result and ends the information processing.
Next, a more specific description will be given regarding the information processing using simple examples thereof.
As illustrated in
ΣL=Σi exp((∥p−ai∥−1)ci)/N
Wherein ai is two dimensional coordinates of an ith item of training data, ci is a label of the ith item of training data (positive example: 1, negative example: −1), and N is the number of items of training data contained in the training data set.
The weight of this classification model 20 is p. As illustrated in
a1=(0.0, 0.0)
a2=(1.0, 0.0)
a3=(0.0, 1.0)
In such cases, as illustrated in
a1: ∥(−0.09, −0.36)∥=0.37
a2: ∥(−0.09, 0.12)∥=0.15
a3: ∥(−0.13, −0.26)∥=0.30
As illustrated in
The detection section 14 computes a gradient of each item of target data with respect to loss L as indicated below.
a1=(0.20, 0.00)
a2=(−0.20, 0.00)
a3=(−0.13, −0.26)
As illustrated in
For each candidate for data of an unknown class, the detection section 14 then computes an inner product between the gradient of each item of target data with respect to the magnitude of gradient of weight p and the gradient of each item of target data with respect to the loss L.
a1=(−0.09, −0.36)·(0.20, 0.00)=−0.0180
a3=(−0.13, −0.26)·(−0.13, −0.26)=0.085>0
In such cases the detection section 14 detects the target data a3 having a positive inner product as being data of an unknown class. This thereby enables items of data that have a large degree of contribution to change in the classification model 20 when re-training with the target data set, and that are also items of data for which the loss is reduced by change to the classification model 20, i.e. data with the above property (2), to be detected as being data of an unknown class.
As described above, the information processing device according to the present exemplary embodiment identifies in the target data set one or more items of data having the specific datum or greater as the degree of contribution to change when the classification model trained using the training data set is re-trained based on the target data set. The information processing device then, from among the one or more items of identified data, detects any data for which the loss is reduced by change to the classification model as being data of an unknown class. This thereby enables items of data of an unknown class in a classification model to be detected from the target data set even in cases in which the training data set is no longer available.
Note that although for the above exemplary embodiment a mode has been described in which the information processing program is pre-stored (installed) in the storage section, there is no limitation thereto. The program according to the technology disclosed herein may be provided in a format stored on a storage medium, such as a CD-ROM, DVD-ROM, USB memory, or the like.
Sometimes the training data that was used in machine learning of a classification model is no longer available at the time of application of the classification model trained by machine learning. For example, in a business situation using customer data, there are often cases in which the prolonged retention of data from a given customer, or the re-use of a classification model trained by machine learning using a given customer's data for a task with another customer, is no allowed from contractual and data breach risk perspectives. After machine learning of the classification model has been performed, sometimes the training data is returned and the classification model alone is retained. In such circumstances, data of an unknown class is unable to be detected from the data set at the time of application using the method of the related technology.
The technology disclosed herein enables data of an unknown class in the classification model to be detected from the target data even in cases in which the training data set is no longer available.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2022-035575 | Mar 2022 | JP | national |