The present disclosure relates to an image discrimination technique using a domain adaptation.
In an image recognition or the like, a technique to train a discriminator using a domain adaptation is known in a case where training data cannot be obtained sufficiently in a target area. The domain adaptation is a technique to train the discriminator of a diversion destination (target domain) using the training data of a diversion source (source domain). A method for training the discriminator using the domain adaptation is described in Patent Document 1 and Non-Patent Document 1.
Patent Document 1: Japanese Laid-open Patent Publication No. 2016-224821
Non-Patent Document 1: Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francois Laviolette, Mario Marchand, and Victor Lempitsky, “Domain-adversarial training of neural networks”, J. Mach. Learn. Res. 17, 1 (January 2016), 2096-2030.
The technique described in the above literature and the like assumes that, as a source domain, a data set in which training data such as a public data set or the like are collected satisfactorily and evenly is used. However, in practice, the training data may not be prepared satisfactorily and evenly for all classes to be discriminated. In particular, for classes classified into predetermined abnormal class, it may be difficult to collect images themselves. In a case where there are fewer sets of training data for the abnormal class, even if training is performed using the domain adaptation, the training of the discriminator will be concentrated in a normal class, and the discriminator obtained by the training will not be able to correctly discriminate the abnormal class.
It is one object of the present disclosure to provide a learning device capable of generating a highly accurate discriminative model using the domain adaptation even in a case where the number of samples of a part of classes of the source domain is small.
According to an example aspect of the present disclosure, there is provided a learning device including:
According to another example aspect of the present disclosure, there is provided a trained model generation method, including:
According to a further example aspect of the present disclosure, there is provided a recording medium storing a program, the program causing a computer to perform a process including:
According to the present disclosure, it becomes possible to generate a highly accurate discriminative model using a domain adaptation even in a case where the number of samples of a part of classes of a source domain is small.
In the following, example embodiments will be described with reference to the accompanying drawings.
First, a learning device according to a first example embodiment will be described.
The training data are data prepared in advance for training the discriminative model, and form a pair of an input image and a correct label thereon. The “input image” is an image obtained in a source domain or the target domain. The “correct label” is a label indicating a correct answer for the input image. In the present example embodiment, the correct label includes a correct class label, a correct normal/abnormal label, and a correct domain label.
Specifically, the correct class label and the correct normal/abnormal label are prepared for the input image obtained from the source domain. The “correct class label” is a label which indicates a correct answer with respect to a class discriminative result by the discriminative model, that is, the correct answer of the class such as an object or the like appeared in the input image. The “correct normal/abnormal answer label” is a label which indicates a correct answer whether a class such as an object appeared in the input image is a normal class or an abnormal class. Note that each class to be discriminated by the discriminative model is classified in advance into either one of the normal class and the abnormal class, and the correct normal/abnormal label is a label which indicates whether the class of the object appeared in the input image belongs to the normal class or the abnormal class.
Moreover, the correct domain label is provided for the input image obtained from both the source domain and the target domain. The “correct domain label″” is a label which indicates whether the input image is an image obtained in either one of the source domain and the target domain.
Next, examples of domain and the normal/abnormal class will be described. As an example, in a case where the discriminative model to be trained is a product discriminative model which discriminates a product class from a product image, product images collected from a shopping site on the Web may be used as the source domain, and product images handled at a real store may be used as a target domain. In this case, since a product class which is less handled on the Web has a small number of product image samples, the product class can be regarded as the abnormal class. Hence, among a plurality of product classes to be discriminated, the product class which is less handled on the Web is set as the abnormal class, and other product classes are set as normal classes.
As another example, in a case of training the discriminative model which recognizes an object or an event from each captured image of a surveillance camera, a camera A installed at a location can be used as the source domain, and a camera B installed at another location can be used as the target domain. Here, in a case where a particular object or a particular event is rare, a class of the object or the event can be regarded as the abnormal class. For instance, in a case of recognizing a person, rare personal attributes such as firefighters and police officers can be set as the abnormal classes, and other personal attributes can be set as the normal classes.
The IF 11 inputs and outputs data from and to an external device. Specifically, the training data stored in the training DB 2 are input to the learning device 100 via the IF 11.
The processor 12 is a computer such as a CPU (Central Processing Unit) and controls the entire learning device 100 by executing programs prepared in advance. Specifically, the processor 12 executes a discriminative model generation process which will be described later.
The memory 13 is formed by a ROM (Read Only Memory), a RAM (Random Access Memory), or the like. The memory 13 is also used as a working memory during executions of various processes by the processor 12.
The recording medium 14 is a non-volatile and non-transitory recording medium such as a disk-shaped recording medium, a semiconductor memory, or the like, and is formed to be detachable from the learning device 100. The recording medium 14 records various programs executed by the processor 12. When the learning device 100 executes various kinds of processes, the programs recorded on the recording medium 14 are loaded into the memory 13 and executed by the processor 12.
The database 15 temporarily stores the training data input through the IF 11. The database 15 stores parameters for neural networks or the like which constitutes respective discriminative models of description units, which will be described later, in the learning device 100. Note that the learning device 100 may include an input unit such as a keyboard, a mouse, or the like, and a display unit such as a liquid crystal display for a user to make instructions and input data.
Each input image of the training data is input to the feature extraction unit 21. The feature extraction unit 21 extracts image features D1 by a CNN (Convolutional Neural Network) or another method from each input image, and outputs the extracted image features D1 to the class discrimination unit 22, the normal/abnormal discrimination unit 23, and the domain discrimination unit 24.
The class discrimination unit 22 discriminates a class of each input image based on the image features D1, and outputs a class discriminative result D2 to the class discriminative loss calculation unit 26. The class discrimination unit 22 discriminates a class of each input image using a class discriminative model which uses various machine learning techniques, neural networks, and the like. The class discriminative result D2 includes a reliability score for each class to be discriminated.
The class discriminative loss calculation unit 26 calculates a class discriminative loss D3 using the class discriminative result D2 and the correct class label for each of input images included in the training data, and outputs the class discriminative loss D3 to the class discriminative learning unit 25. The class discriminative loss calculation unit 26 calculates a loss such as, for instance, a cross entropy using the class discriminative result D2 and the correct class label, and outputs the loss as the class discriminative loss D3 to the class discriminative learning unit 25.
Based on the image features D1, the normal/abnormal discrimination unit 23 generates a normal/abnormal discriminative result D5 which indicates whether the input image corresponds to the normal class or the abnormal class, and outputs the normal/abnormal discriminative result D5 to the AUC loss calculation unit 27. Specifically, the normal/abnormal discrimination unit 23 calculates a normal/abnormal score gP(x) which indicates a normal class likelihood by the following formula for each sample x of the input image, and outputs the calculated score as the normal/abnormal discriminative result D5.
The normal/abnormal score calculation unit 23b calculates the score of the normal class likelihood of the input image based on the input reliability scores respective to the classes. Specifically, the normal/abnormal score calculation unit 23b sums the reliability scores of the classes A to C, which are the normal classes, and calculates the normal/abnormal score as follows,
After that, the normal/abnormal score calculation unit 23b outputs the obtained normal/abnormal score as the normal/abnormal discriminative result D5. Accordingly, in the example in
Returning to
In the above equation, “1 (el)” denotes a monotonically decreasing function taking a value of 0 or more, such as the following sigmoid function is used as an example.
The class discriminative learning unit 25 updates parameters of a model forming the feature extraction unit 21, the class discrimination unit 22, and the normal/abnormal discrimination unit 23 by a control signal D4 based on the class discriminative loss D3 and the AUC loss Rsp. Specifically, the class discriminative learning unit 25 updates parameters of the feature extraction unit 21, the class discrimination unit 22, and the normal/abnormal discrimination unit 23, so that the class discriminative loss D3 becomes smaller and the AUC loss Rsp becomes smaller.
The domain discrimination unit 24 discriminates a domain of the input image based on the image features D1, and outputs a domain discriminative result D6 to the domain discriminative loss calculation unit 28. The domain discriminative result D6 indicates a score which represents a source domain likelihood or a target domain likelihood of the input image. The domain discriminative loss calculation unit 28 calculates a domain discriminative loss D7 based on the domain discriminative result D6 and the correct domain label of the input image included in the training data, and outputs the calculated loss to the domain discriminative learning unit 29.
The domain discriminative learning unit 29 updates parameters of the feature extraction unit 21 and the domain discrimination unit 24 by a control signal D8 based on the domain discriminative loss D7. Specifically, the domain discriminative learning unit 29 extracts the image features D1 that makes it difficult for the feature extraction unit 21 to discriminate the domain, and updates the parameters of the feature extraction unit 21 and the domain discrimination unit 24 so that the domain discrimination unit 24 can correctly discriminate the domain.
As described above, in the present example embodiment, in the learning of the class discriminative model using the domain adaptation, the parameters of the feature extraction unit 21, the class discrimination unit 22, and the normal/abnormal discrimination unit 23 are updated using the AUC loss Rsp, so that the adverse effects caused by the imbalance among numbers of samples for respective classes of the input image can be suppressed. Therefore, even in a case where there are few input images of a particular abnormal class, it is possible to generate a class discriminative model capable of highly accurate discrimination.
First, the input image included in the training data is input to the feature extraction unit 21 (step S11), and the feature extraction unit 21 extracts the image features D1 from the input image (step S12). Next, the domain discrimination unit 24 discriminates a domain based on the image features D1, and outputs the domain discriminative result D6 (step S13). After that, the domain discriminative loss calculation unit 28 calculates the domain discriminative loss D7 based on the domain discriminative result D6 and the correct domain label (step S14). Subsequently, the domain discriminative learning unit 29 updates the parameters of the feature extraction unit 21 and the domain discrimination unit 24 based on the domain discriminative loss D7 (step S15). Note that steps S13 to S15 are referred to as a “domain mixing process”.
Next, the class discrimination unit 22 discriminates a class of the input image based on the image features D1, and generates the class discriminative result D2 (step S16). Next, the class discriminative loss calculation unit 26 calculates the class discriminative loss D3 using the class discriminative result D2 and the correct class label (step S17). Note that steps S16 to S17 are referred to as a “class discriminative loss calculation process”.
Next, based on the image features D1, the normal/abnormal discrimination unit 23 discriminates whether the input image is a normal class or an abnormal class, and outputs the normal/abnormal discriminative result D5 (step S18). After that, the AUC loss calculation unit 27 calculates the AUC loss Rsp based on the normal/abnormal discriminative result D5 (step S19). Note that steps S18 to S19 are referred to as an “AUC loss calculation process”.
Subsequently, the class discriminative learning unit 25 updates parameters of the feature extraction unit 21, the class discrimination unit 22, and the normal/abnormal discrimination unit 23 based on the class discriminative loss D3 and the AUC loss Rsp (step S20). Note that steps S16 to S20 are called a “class discriminative learning process”.
Next, the learning device 100 determines whether or not to terminate the learning (step S21). When the class discriminative loss, the AUC loss, and the domain discriminative loss converge to respective predetermined ranges, the learning device 100 determines that the learning is completed. When learning is not completed (step S21: No), the learning device 100 goes back to step S11 and repeats processes of step S11 to S20 using another input image. On the other hand, when the learning is completed (step S21: Yes), the discriminative model generation process is terminated.
In the above-described example embodiment, the class discriminative learning process (steps S16 to S20) is performed after the domain mixing process (steps S13 to S15), but an order of the domain mixing process and the class discriminative learning process may be reversed. In the above example, the AUC loss calculation process (steps S18 to 19) is performed after the class discriminative loss calculation process (steps S16 to S17), but the order of the class discriminative loss calculation process and the AUC loss calculation process may be reversed.
Furthermore, in the above example, the parameter update is performed based on the class discriminative loss and the AUC loss in step S20, but instead, the parameter update may be performed based on the AUC loss in step S17 by providing a step of updating the parameters based on the class discriminative loss.
Next, a second example embodiment of the present invention will be described.
The feature extraction means 71 extracts image features from the input image. The class discrimination means 72 discriminates the class of the input image based on the image features and generates a class discriminative result. The class discriminative loss calculation means 76 calculates a class discriminative loss based on the class discriminative result. Based on the image features, the normal/abnormal discrimination means 73 discriminates whether the class is the normal class or the abnormal class, and generates a normal/abnormal discriminative result. The AUC loss calculation means 77 calculates an AUC loss based on the normal/abnormal discriminative result. The first learning means 75 updates parameters of the feature extraction means, the class discrimination means, and the normal/abnormal discrimination means based on the class discriminative loss and the AUC loss.
The domain discrimination means 74 discriminates a domain of the input image based on the image features, and generates the domain discriminative result. The domain discriminative loss calculation means 78 calculates the domain discriminative loss based on the domain discriminative result. The second learning means 79 updates parameters of the feature extraction means and the domain discrimination means based on the domain discriminative loss.
A part or all of the example embodiments described above may also be described as the following supplementary notes, but not limited thereto.
1. A learning device comprising:
2. The learning device according to claim 1, wherein
3. The learning device according to claim 1, wherein
4. The learning device according to any one of claims 1 to 3, wherein
5. The learning device according to claim 4, wherein the first learning means updates parameters of the feature extraction means, the class discrimination means, and the normal/abnormal discrimination means so as to reduce the AUC loss.
6. A trained model generation method, comprising:
7. A recording medium storing a program, the program causing a computer to perform a process comprising:
While the disclosure has been described with reference to the example embodiments and examples, the disclosure is not limited to the above example embodiments and examples. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the claims.
2
21
22
23
24
25
26
27
28
29
100
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/021875 | 6/3/2020 | WO |