The present disclosure relates to an augmentation apparatus, an augmentation method, and an augmentation program.
The maintenance of training data in a deep learning model requires a high cost. The maintenance of training data includes not only collection of training data, but also addition of annotations, such as labels, to the training data.
In the related art, rule-based data augmentation is known as a technique to reduce such a cost for the maintenance of training data. For example, a method of adding a modification such as inversion, scaling, noise addition, or rotation to an image used as training data according to specific rules to generate another piece of training data is known (e.g., see Non Patent Literature 1 or 2). In addition, in a case in which training data is speech or text, similar rule-based data augmentation may be performed.
However, techniques in the related art have the problem that there are less variations in training data obtained from data augmentation and the accuracy of the model may not be improved. In particular, it is difficult in rule-based data augmentation of the related art to increase variations in attributes of training data, which limits improvement in the accuracy of the model. For example, using the rule-based data augmentation described in Non Patent Literature 1 and 2, it is difficult to generate an image with modified attributes such as “window”, “cat”, and “front” of an image of a cat facing the front at the window.
In order to solve the above-described problem and achieve the objective, an augmentation apparatus includes a learning unit configured to cause a generative model, which is configured to generate data from a label, to learn first data with a first label added and second data with a second label added, a generating unit configured to use the generative model that learned the first data and the second data to generate data for augmentation from the first label added to the first data, and an adding unit configured to add the first label added to the first data to augmented data obtained by integrating the first data and the data for augmentation.
According to the present disclosure, it is possible to increase variations in training data obtained through data augmentation and improve the accuracy of the model.
Hereinafter, an embodiment of an augmentation apparatus, an augmentation method, and an augmentation program according to the present application will be described in detail with reference to the drawings. Note that the present disclosure is not limited to the embodiment which will be described below.
First, a configuration of an augmentation apparatus according to a first embodiment will be described with reference to
The augmentation apparatus 10 uses an outer dataset 40 to perform data augmentation of a target dataset 30 and output an augmented dataset 50. In addition, the learning apparatus 20 has a target model 21 to perform learning by using the augmented dataset 50. The target model 21 may be a known model for performing machine learning. For example, the target model 21 is MCCNN with Triplet loss described in Non Patent Literature 7.
In addition, each dataset in
Here, an example in which each dataset is a combination of image data and a label will be mainly described. In addition, in the following description, data representing an image in a computer-processible format will be referred to as image data or simply an image.
As illustrated in
The storage unit 12 is a storage device such as a Hard Disk Drive (HDD), a Solid State Drive (SSD), or an optical disc. Note that the storage unit 12 may be a semiconductor memory capable of rewriting data, such as a Random Access Memory (RAM) or a flash memory, and a Non Volatile Static Random Access Memory (NVSRAM). The storage unit 12 stores an Operating System (OS) or various programs that are executed in the augmentation apparatus 10. Further, the storage unit 12 stores various types of information used in execution of the programs. In addition, the storage unit 12 stores a generative model 121.
Specifically, the storage unit 12 stores parameters used in each processing operation by the generative model 121. In the present embodiment, the generative model 121 is assumed to be a Conditional Generative Adversarial Networks (CGAN) described in Non Patent Literature 6. Here, the generative model 121 will be described using
As illustrated in
The generator 121a generates generative data from the correct label input with predetermined noise. Furthermore, the distinguisher 121b calculates, as a binary determination error, a degree of deviation between the generative data and the correct data. Then, in the learning of the generative model 121, parameters of the generator 121a are updated so that the error becomes smaller. On the other hand, parameters of the distinguisher 121b are updated so that the error becomes larger. Note that each of the parameters for learning is updated by using a method of backward propagation of errors (Backpropagation).
In other words, the generator 121a is designed to be able to generate generative data that is likely to be distinguished as the same as the correct data by the distinguisher 121b through learning. On the other hand, the distinguisher 121b is designed to be able to recognize the generative data as generative data and recognize the correct data as correct data through learning.
The control unit 13 controls the entire augmentation apparatus 10. The control unit 13 may be an electronic circuit such as a Central Processing Unit (CPU) or a Micro Processing Unit (MPU), or an integrated circuit such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA). In addition, the control unit 13 includes an internal memory for storing programs defining various processing procedures and control data, and executes each of the processing operations using the internal memory. Further, the control unit 13 functions as various processing units by operating various programs. The control unit 13 includes, for example, a learning unit 131, a generating unit 132, and an adding unit 133.
The learning unit 131 causes the generative model 121 that generates data from a label to learn first data with a first label added and second data with a second label added. The target dataset 30 is an example of a combination of the first data and the first label added to the first data. In addition, the outer dataset 40 is an example of a combination of the second data and the second label added to the second data.
Here, the target dataset 30 is assumed to be a combination of target data and a target label added to the target data. Also, the outer dataset 40 is assumed to be a combination of outer data and an outer label added to the outer data.
The target label is a label to be learned by the target model 21. For example, if the target model 21 is a model for recognizing a person in an image, the target label is an ID for identifying the person reflected in the image of the target data. In addition, if the target model 21 is a model for recognizing text from speech, the target label is text obtained by transcribing speech from the target data.
The outer dataset 40 is a dataset for augmenting the target dataset 30. The outer dataset 40 may be a dataset of different domains from the target dataset 30. Here, a domain is a unique feature of a dataset represented by data, a label, and generative distribution. For example, the domain of a dataset in which data is X0 and the label is Y0 is represented as (X0, Y0, P(X0, Y0)).
Here, in one example, the target model 21 is assumed to be an image recognition model, and the learning apparatus 20 is assumed to learn the target model 21 such that an image of a person whose ID is “0002” can be recognized from an image. In this case, the target dataset 30 is a combination of a label “ID: 0002” and an image in which the person is known to reflect. In addition, the outer dataset 40 is a combination of a label indicating an ID other than “0002” and an image in which the person corresponding to that ID is known to reflect.
Furthermore, the outer dataset 40 may not necessarily have an accurate label. That is, a label of the outer dataset 40 may be a label that is distinguishable from the label of the target dataset 30 and may mean, for example, unset.
The augmentation apparatus 10 outputs an augmented dataset 50 created by taking attributes that data of the target dataset 30 does not have from the outer dataset 40. Thus, data with variations that could not be obtained only from the target dataset 30 can be obtained. For example, according to the augmentation apparatus 10, even in a case in which the target dataset 30 includes only an image reflecting the back of a certain person, it is possible to obtain an image reflecting the front of the person.
Learning processing by the learning unit 131 will be described using
At this time, a domain of the target dataset 30 is represented as (Xtarget, Ytarget, P(Xtarget, Ytarget)). In addition, a domain of the outer dataset 40 is represented as (Xouter, Youter, P(Xouter, Youter)).
The learning unit 131 first performs pre-processing on each piece of the data. For example, the learning unit 131 changes the size of an image to a uniform size (e.g. 128×128 pixels) as pre-processing. Then, the learning unit 131 combines the datasets Starget and Souter, and generates a dataset St+o. For example, St+o has the data and the label of Starget and Sourer stored in the same sequence, respectively.
Then, the learning unit 131 causes the generative model 121 to learn the generated dataset St+o as a correct dataset. A specific learning method is as described above. That is, the learning unit 131 performs learning such that the generator 121a of the generative model 121 can generate data that is proximate to the first data and the second data and the distinguisher 121b of the generative model 121 can distinguish a difference between the data generated by the generator 121a and the first data and a difference between data generated by the generator and the second data.
In addition, X′ in
The generating unit 132 generates the data for augmentation from the first label added to the first data using the generative model 121 that learned the first data and the second data. Ytarget is an example of the first label added to the first data.
Generation processing by the generating unit 132 will be described using
The adding unit 133 adds the first label added to the first data to augmented data obtained by integrating the first data and the data for augmentation. The adding unit 133 adds a label to the generative data Xgen generated by the generating unit 132 to generate a dataset S′target that can be used by the learning apparatus 20. In addition, S′target is an example of the augmented dataset 50.
Adding processing by the adding unit 133 will be described with reference to
After that, as illustrated in
A specific example of the augmented dataset 50 will be described using
As illustrated in
The image 301a is assumed to reflect an Asian person with black hair, wearing a red T-shirt and short jeans and facing the back. In this case, the image 301a has attributes such as “back”, “black hair”, “red T-shirt”, “Asian”, and “short jeans”.
The image 401a is assumed to reflect a person carrying a bag on the shoulder, wearing a white T-shirt, black short jeans, and shoes, and facing the front. In this case, the image 401a has attributes such as “front”, “bag”, “white T-shirt”, “black short jeans”, and “shoes”.
Note that the attributes mentioned here are information used by the target model 21 in image recognition. However, these attributes are defined as examples for the purpose of description and are not necessarily explicitly treated as individual information in the image recognition processing. For this reason, the target dataset 30a and the outer dataset 40a may have unknown attributes.
The augmentation apparatus 10 inputs the target dataset 30a and the outer dataset 40a and outputs an augmented dataset 50a. An image for augmentation 501a is one of images generated by the augmentation apparatus 10. The augmented dataset 50a is a dataset obtained by integrating the target dataset 30a and the image for augmentation 501a to which the label “ID: 0002” is added.
The image for augmentation 501a is assumed to reflect an Asian person with black hair, wearing a red T-shirt and short jeans and facing the front. In this case, the image for augmentation 501a has attributes such as “front”, “black hair”, “red T-shirt”, “Asian”, and “short jeans”.
Here, the attribute “front” is an attribute that cannot be obtained from the target dataset 30a. As described above, the augmentation apparatus 10 can generate an image obtained by combining attributes obtained from the outer dataset 40a with the attributes of the target dataset 30a.
The flow of processing of the augmentation apparatus 10 will be described using
As shown in
Here, the augmentation apparatus 10 specifies a label for the target dataset 30 in the generative model 121 (step S104) and generates an image for augmentation based on the specified label (step S105). Next, the augmentation apparatus 10 integrates the image of the target dataset 30 and the image for augmentation and adds the label of the target dataset 30 to the integrated data (step S106).
The augmentation apparatus 10 outputs the data to which the label is added in step S106 as the augmented dataset 50 (step S107). The learning apparatus 20 performs learning of the target model 21 using the augmented dataset 50.
As described so far, the augmentation apparatus 10 causes the generative model that generates data from labels to learn the first data and the second data to which labels have been added. In addition, the augmentation apparatus 10 uses the generative model that learned the first data and the second data to generate data for augmentation from the label added to the first data. In addition, the augmentation apparatus 10 adds the label added to the first data to augmented data obtained by integrating the first data and the data for augmentation. In this way, the augmentation apparatus 10 of the present embodiment can generate training data having attributes not included in the target dataset through the data augmentation. Thus, according to the present embodiment, the variation of the training data obtained by the data augmentation can be increased, and the accuracy of the model can be improved.
The augmentation apparatus 10 performs learning such that the generator of the generative model can generate data that is proximate to the first data and the second data and the distinguisher of the generative model can identify a difference between the data generated by the generator and the first data and a difference between the data generated by the generator and the second data. This enables the data generated using the generative model to be similar to the target data.
Here, an experiment performed to compare a technique in the related art and the embodiment will now be described. In the experiment, the target model 21 is MCCNN with Triplet loss in which a task of searching for a particular person from an image is performed using image recognition. In addition, the comparison of each of the techniques was performed through accuracy in recognition when data before augmentation, i.e., the target dataset 30, was input into the target model 21. The generative model 121 is a CGAN.
In addition, the target dataset 30 is “Market-1501” which is a dataset for person re-identification. Also, the outer dataset 40 is “CHUK03” which is also a dataset for person re-identification. In addition, an amount of data to be augmented is also three times an amount of original data.
The results of the experiment are illustrated in
As illustrated in
In the above embodiment, the learning function of the target model 21 is included in the learning apparatus 20 that is different from the augmentation apparatus 10. On the other hand, the augmentation apparatus 10 may include a target model learning unit that causes the target model 21 to learn the augmented dataset 50. This allows the augmentation apparatus 10 to reduce resource consumption resulting from data transfer between apparatuses and data augmentation and learning of the target model to be efficiently performed as a series of processing operations.
System Configuration, and the Like
Further, each illustrated constituent component of each apparatus is a conceptual function and does not necessarily need to be physically configured as illustrated in the drawings. That is, a specific form of distribution and integration of each apparatus is not limited to the form illustrated in the drawings, and all or some of the apparatuses can be distributed or integrated functionally or physically in any units according to various loads and use situations. Further, all or any part of each processing function to be performed by each apparatus can be implemented by a CPU and a program being analyzed and executed by the CPU, or can be implemented as hardware by wired logic.
In addition, among the processing operations described in the present embodiment, all or some of the processing operations described as being performed automatically can be performed manually, or all or some of the processing operations described as being performed manually can be performed automatically in a known method. In addition, information including the processing procedures, the control procedures, the specific names, and various data and parameters described in the above-described document and drawings can be optionally changed unless otherwise specified.
Program
As one embodiment, the augmentation apparatus 10 can be implemented by installing an augmentation program for executing the data augmentation described above as packaged software or on-line software in a desired computer. For example, by causing an information processing apparatus to execute the augmentation program, the information processing apparatus can function as the augmentation apparatus 10. Here, the information processing apparatus includes a desktop or notebook type personal computer. In addition, the information processing apparatus includes a mobile communication terminal such as a smartphone, a feature phone, and a Personal Handyphone System (PHS), or a slate terminal such as a Personal Digital Assistant (PDA) in the category.
In addition, the augmentation apparatus 10 can be implemented as an augmentation server apparatus that has a terminal apparatus used by a user as a client and provides services regarding the above-described data augmentation to the client. For example, the augmentation server apparatus is implemented as a server apparatus that provides an augmentation service in which target data is input and augmented data is output. In this case, the augmentation server apparatus may be implemented as a web server or may be implemented as a cloud that provides services regarding the data augmentation through outsourcing.
The memory 1010 includes a Read Only Memory (ROM) 1011 and a RAM 1012. The ROM 1011 stores a boot program, for example, a Basic Input Output System (BIOS) or the like. The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. A detachable storage medium, for example, a magnetic disk, an optical disc, or the like is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected to, for example, a display 1130.
Here, the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program defining each processing operation of the augmentation apparatus 10 is implemented as the program module 1093 in which a computer-executable code is written. The program module 1093 is stored in, for example, the hard disk drive 1090. For example, the program module 1093 for executing similar processing as for the functional configurations of the augmentation apparatus 10 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced with an SSD.
In addition, setting data used in the processing of the embodiment described above is stored as the program data 1094, for example, in the memory 1010 or the hard disk drive 1090. And then, the CPU 1020 reads the program module 1093 or the program data 1094 stored in the memory 1010 or the hard disk drive 1090 into the RAM 1012 as necessary, and executes the processing of the above-described embodiment.
Note that the program module 1093 or the program data 1094 is not limited to being stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (a Local Area Network (LAN), a Wide Area Network (WAN), or the like). And then, the program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.
Number | Date | Country | Kind |
---|---|---|---|
2018-158400 | Aug 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/032863 | 8/22/2019 | WO | 00 |