The present disclosure relates to a data generation method, a data generation apparatus, a model generation method, a model generation apparatus, and a program.
With the progress of deep learning, various neural network architectures and training methods have been proposed and used for various purposes.
For example, in the field of image processing, various research results on image recognition, object detection, image synthesis, and the like have been achieved by using deep learning.
For example, in the field of image synthesis, various image synthesis tools such as GauGAN and Pix2PixHD have been developed. With these tools, for example, landscape images can be segmented by the sky, mountains, sea, or the like, and image synthesis can be performed using a segmentation map in which each segment is labeled with the sky, mountains, sea, or the like.
An object of the present disclosure is to provide a user-friendly data generation technique.
According to one aspect of the present disclosure, a data generation method includes generating, by at least one processor, an output image by using a first image, a first segmentation map, and a first neural network, the first segmentation map being layered.
According to one aspect of the present disclosure, a data displaying method implemented by at least one processor, the method comprising displaying a first segmentation map on a display device, displaying information on a plurality of layers to be edited on the display device, obtaining an editing instruction relating to a first layer included in the plurality of layers from a user, displaying a second segmentation map, generated by editing the first layer of the first segmentation map based on the editing instruction from the user, on the display device, and displaying an output image, generated based on a first image and the second segmentation map, on the display device.
In the following, embodiments of the present disclosure will be described with reference to the drawings. In the following examples, a data generation apparatus using a segmentation map and a training apparatus for training an encoder and a decoder of the data generation apparatus are disclosed.
As illustrated in
A training apparatus 200 uses training data stored in a database 300 to train the encoder and the decoder to be provided to the data generation apparatus 100 and provides the trained encoder and decoder to the data generation apparatus 100. For example, the training data may include a pair of image and the layered segmentation map as described below.
The data generation apparatus 100 according to the embodiment of the present disclosure will be described with reference to
As illustrated in
The encoder 110 generates a feature map of data such as an input image. The encoder 110 is comprised of a trained neural network trained by the training apparatus 200. The neural network may be implemented, for example, as a convolutional neural network.
The segmentation model generates a layered segmentation map of data such as input images. In the layered segmentation map, for example, one or more labels may be applied to each pixel of the image. For example, with respect to the input image of a character as illustrated in
The segmentation model 120 may be comprised of a trained neural network trained by the training apparatus 200. The neural network may be implemented, for example, as a convolutional neural network such as a U-Net type, which will be described below. Further, generating segmentation and layering may be performed in a single model, or may be performed using different models.
The decoder 130 generates an output image from the layered segmentation map and the feature map. Here, the output image can be generated to reflect the edited content of the layered segmentation map onto the input image. For example, when the user edits the layered segmentation map to delete the eyebrows of the image of the layered segmentation map of the input image and to replace the deleted portion with the face of the next layer (face skin), the decoder 130 generates an output image in which the eyebrows of the input image are replaced by the face.
In one embodiment, as illustrated in
Specifically, as illustrated in
The decoder 130 is comprised of a trained neural network by training apparatus 200. The neural network may be implemented, for example, as a convolutional neural network.
Next, various modifications of the data generation process of the data generation apparatus 100 according to an embodiment of the present disclosure will be described with reference to
The reference image is an image held by the data generation apparatus 100 for use by the user in advance, and the user can synthesize the input image provided by the user with the reference image. In the illustrated embodiment, the layered segmentation map is not edited, but the layered segmentation map to be synthesized with the reference image may be edited. In this case, the output image may be generated by reflecting the edited content with respect to the edited area of the edited layered segmentation map on the corresponding area of the reference image.
According to this modification, the input image is input into the segmentation model 120 and the layered segmentation map is acquired. The output image is generated from the decoder 130 based on the feature map of the reference image generated by the encoder 110 and the edited layered segmentation map with respect to the layered segmentation map or the layered segmentation map.
According to this modification, the input image and the reference image are input into the segmentation model 120 to acquire their own layered segmentation map. The feature map of the reference image generated by the encoder 110 and/or the edited layered segmentation map with respect to the layered segmentation map is input into the decoder 130 to generate the output image.
Here, when the reference image is used, all of the features extracted from the reference image are not required to be used to generate an output image, but only a part of the features (for example, hair or the like) may be used. Any combination of the feature map of the reference image and the feature map of the input image (for example, weighted average, a combination of only the features of the right half hair and the left half hair, or the like) may also be used to generate an output image. Multiple reference images may also be used to generate an output image.
Although the above-described embodiments have been described with reference to a generation process for an image, the data to be processed according to the present disclosure is not limited thereto, and the data generation apparatus 100 according to the present disclosure may be applied to any other suitable data format.
Next, a data generation process according to an embodiment of the present disclosure will be described with reference to
As illustrated in
In step S102, the data generation apparatus 100 acquires a layered segmentation map from the input image. Specifically, the data generation apparatus 100 inputs the input image into the segmentation model 120 to acquire the layered segmentation map from the segmentation model 120.
In step S103, the data generation apparatus 100 acquires an edited layered segmentation map. For example, when the layered segmentation map generated in step S102 is presented to the user terminal and the user edits the layered segmentation map on the user terminal, the data generation apparatus 100 receives the edited layered segmentation map from the user terminal.
In step S104, the data generation apparatus 100 acquires the output image from the feature map and the edited layered segmentation map. Specifically, the data generation apparatus 100 performs pooling, such as average pooling, with respect to the feature map acquired in step S101 and the layered segmentation map acquired in step S102 to derive a feature vector. The data generation apparatus 100 expands the feature vector by the edited layered segmentation map acquired in step S103, inputs the expanded feature map into the decoder 130, and acquires the output image from the decoder 130.
In the embodiment described above, the pooling was performed with respect to the feature map and the layered segmentation map, but the present disclosure is not limited thereto. For example, the encoder 110 may be any suitable model capable of extracting the feature of each object and/or part of an image. For example, the encoder 110 may be a Pix2PixHD encoder, and maximum pooling, minimum pooling, attention pooling, or the like rather than average pooling may be performed in the last feature map per instance. The Pix2PixHD encoder may be used to extract the feature vector by CNN or the like for each instance in the last feature map.
With reference to
A user interface screen illustrated in
As illustrated in
Further, as illustrated in
Further, as illustrated in
Further, as illustrated in
Here, as illustrated in
With reference to
As illustrated in
Specifically, the training apparatus 200 inputs an image for training into the encoder 210, acquires a feature map, and acquires an output image from the decoder 230 based on the acquired feature map and the layered segmentation map for training. Specifically, as illustrated in
Subsequently, the training apparatus 200 inputs any of a pair of the output image generated from the decoder 230 and the layered segmentation map for training, and a pair of the input image and the layered segmentation map for training into the discriminator 240 and acquires a loss value based on the discrimination result by the discriminator 240. Specifically, if the discriminator 240 correctly discriminates the input pair, the loss value may be set to be zero or the like, and if the discriminator 240 incorrectly discriminates the input pair, the loss value may be set to be a non-zero positive value. Alternatively, the training apparatus 200 may input either the output image generated from the decoder 230 or the input image into the discriminator 240 and acquire the loss value based on the discrimination result by the discriminator 240.
Meanwhile, the training apparatus 200 acquires the loss value representing the difference in the feature from the feature maps of the output image and the input image. The loss value may be set to be small when the difference in the feature is small, while the loss value may be set to be large when the difference in the feature is large.
The training apparatus 200 updates the parameters of the encoder 210, the decoder 230, and the discriminator 240 based on the two acquired loss values. Upon satisfying a predetermined termination condition, such as completion of the above-described process for the entire prepared training data, the training apparatus 200 provides the ultimately acquired encoder 210 and decoder 230 to the data generation apparatus 100 as a trained encoder 110 and decoder 130.
Further, the training apparatus 200 trains the segmentation model 220 by using a pair of the image for training and the layered segmentation map. For example, the layered segmentation map for training may be created by manually segmenting each object included in the image and labeling each segment with the object.
For example, the segmentation model 220 may include a U-Net type neural network architecture as illustrated in
Note that one or more of the encoder 210, the segmentation model 220, and the decoder 230 to be trained may be trained in advance. This case enables to train the encoder 210, the segmentation model 220, and the decoder 230 with less training data.
Next, a training process according to an embodiment of the present disclosure will be described with reference to
As illustrated in
In step S202, the training apparatus 200 acquires the output image from the acquired feature map and the layered segmentation map for training. Specifically, the training apparatus 200 performs a pooling, such as average pooling, with respect to the feature map acquired from the encoder 210 and the layered segmentation map for training to derive a feature vector. Subsequently, the training apparatus 200 expands the derived feature vector by the layered segmentation map for training to derive the feature map. The training apparatus 200 inputs the derived feature map into the decoder 230 to be trained and acquires the output image from the decoder 230.
In step S203, the training apparatus 200 inputs either a pair of the input image and the layered segmentation map for training or a pair of the output image and the layered segmentation map for training into the discriminator 240 to be trained.
Subsequently, the discriminator 240 discriminates whether the input pair is the pair of the input image and the layered segmentation map for training or the pair of the output image and the layered segmentation map for training. The training apparatus 200 determines the loss value of the discriminator 240 according to the correctness of the discrimination result of the discriminator 240 and updates the parameter of the discriminator 240 according to the determined loss value.
In step S204, the training apparatus 200 determines the loss value according to the difference of the feature maps between the input image and the output image and updates the parameters of the encoder 210 and the decoder 230 according to the determined loss value.
In step S205, the training apparatus 200 determines whether the termination condition is satisfied and terminates the training process when the termination condition is satisfied (S205: YES). On the other hand, if the termination condition is not satisfied (S205: NO), the training apparatus 200 performs steps S201 to S205 with respect to the following training data. Here, the termination condition may be steps S201 to S205 having been performed with respect to the entire prepared training data and the like.
A part or all of each apparatus (the data generation apparatus 100 or the training apparatus 200) according to the above-described embodiments may be partially or entirely configured by hardware or may be configured by information processing of software (i.e., a program) executed by a processor, such as a CPU or a graphics processing unit (GPU). If the device is configured by the information processing of software, the information processing of software may be performed by storing the software that achieves at least a portion of a function of each device according to the present embodiment in a non-transitory storage medium (i.e., a non-transitory computer-readable medium), such as a flexible disk, a compact disc-read only memory (CD-ROM), or a universal serial bus (USB) memory, and causing a computer to read the software. The software may also be downloaded through a communication network. Additionally, the information processing may be performed by the hardware by implementing software in a circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
The type of the storage medium storing the software is not limited. The storage medium is not limited to a removable storage medium, such as a magnetic disk or an optical disk, but may be a fixed storage medium, such as a hard disk or a memory. The storage medium may be provided inside the computer or outside the computer.
The computer 107 of
Various operations of each apparatus (the data generation apparatus 100 or the training apparatus 200) according to the above-described embodiments may be performed in parallel by using one or more processors or using multiple computers through a network. Various operations may be distributed to multiple arithmetic cores in the processor and may be performed in parallel. At least one of a processor or a storage device provided on a cloud that can communicate with the computer 107 through a network may be used to perform some or all of the processes, means, and the like of the present disclosure. As described, each apparatus according to the above-described embodiments may be in a form of parallel computing system including one or more computers.
The processor 101 may be an electronic circuit including a computer controller and a computing device (such as a processing circuit, a CPU, a GPU, an FPGA, or an ASIC). Further, the processor 101 may be a semiconductor device or the like that includes a dedicated processing circuit. The processor 101 is not limited to an electronic circuit using an electronic logic element, but may be implemented by an optical circuit using optical logic elements. Further, the processor 101 may also include a computing function based on quantum computing.
The processor 101 can perform arithmetic processing based on data or software (i.e., a program) input from each device or the like in the internal configuration of the computer 107 and output an arithmetic result or a control signal to each device. The processor 101 may control respective components constituting the computer 107 by executing an operating system (OS) of the computer 107, an application, or the like.
Each apparatus (the data generation apparatus 100 or the training apparatus 200) according to the above-described embodiments may be implemented by one or more processors 101. Here, the processor 101 may refer to one or more electronic circuits disposed on one chip or may refer to one or more electronic circuits disposed on two or more chips or two or more devices. If multiple electronic circuits are used, each electronic circuit may be communicated by wire or wireless.
The main storage device 102 is a storage device that stores instructions and various data executed by the processor 101. The information stored in the main storage device 102 is read by the processor 101. The auxiliary storage device 103 is a storage device other than the main storage device 102. These storage devices indicate any electronic component that can store electronic information and may be semiconductor memories. The semiconductor memory may be either a volatile memory or a non-volatile memory. The storage device for storing various data in each apparatus (the data generation apparatus 100 or the training apparatus 200) according to the above-described embodiments may be implemented by the main storage device 102 or the auxiliary storage device 103, or may be implemented by an internal memory embedded in the processor 101. For example, the storage portion according to the above-described embodiments may be implemented by the main storage device 102 or the auxiliary storage device 103.
To a single storage device (i.e., one memory), multiple processors may be connected (or coupled) or a single processor may be connected. To a single processor, multiple storage devices (i.e., multiple memories) may be connected (or coupled). If each apparatus (the data generation apparatus 100 or the training apparatus 200) according to the above-described embodiments includes at least one storage device (i.e., one memory) and multiple processors connected (or coupled) to the at least one storage device (i.e., one memory), at least one of the multiple processors may be connected to the at least one storage device (i.e., one memory). Further, this configuration may be implemented by storage devices (i.e., memories) and processors included in the plurality of computers. Further, the storage device (i.e., the memory) may be integrated with with the processor (e.g., a cache memory including an L1 cache and an L2 cache).
The network interface 104 is an interface for connecting to the communication network 108 by wireless or wired. As the network interface 104, any suitable interface, such as an interface conforming to existing communication standards, may be used. The network interface 104 may exchange information with an external device 109A connected through the communication network 108. The communication network 108 may be any one of a wide area network (WAN), a local area network (LAN), a personal area network (PAN), or a combination thereof, in which information is exchanged between the computer 107 and the external device 109A. Examples of the WAN include the Internet, examples of the LAN include IEEE 802.11 and Ethernet (registered trademark), and examples of the PAN include Bluetooth (registered trademark) and near field communication (NFC).
The device interface 105 is an interface, such as a USB, that directly connects to the external device 109B.
The external device 109A is a device connected to the computer 107 through a network. The external device 109B is a device connected directly to the computer 107.
The external device 109A or the external device 109B may be, for example, an input device. The input device may be, for example, a camera, a microphone, a motion capture, various sensors, a keyboard, a mouse, or a touch panel or the like, and provides obtained information to the computer 107. The input device may also be a device including an input unit, a memory, and a processor, such as a personal computer, a tablet terminal, or a smartphone.
The external device 109A or the external device 109B may be, for example, an output device. The output device may be, for example, a display device, such as a liquid crystal display (LCD), a cathode-ray tube (CRT), a plasma display panel (PDP), or an organic electro luminescence (EL) panel, or may be a speaker or the like that outputs the voice. The output device may also be a device including an output unit, a memory, and a processor, such as a personal computer, a tablet terminal, or a smartphone.
The external device 109A or the external device 109B may be a storage device (i.e., a memory). For example, the external device 109A may be a storage such as a network storage, and the external device 109B may be a storage such as an HDD.
The external device 109A or the external device 109B may be a device having functions of some of the components of each apparatus (the data generation apparatus 100 or the training apparatus 200) according to the above-described embodiments. That is, the computer 107 may transmit or receive some or all of processed results of the external device 109A or the external device 109B.
In the present specification (including the claims), if the expression “at least one of a, b, and c” or “at least one of a, b, or c” is used (including similar expressions), any one of a, b, c, a-b, a-c, b-c, or a-b-c is included. Multiple instances may also be included in any of the elements, such as a-a, a-b-b, and a-a-b-b-c-c. Further, the addition of another element other than the listed elements (i.e., a, b, and c), such as adding d as a-b-c-d, is included.
In the present specification (including the claims), if the expression such as “data as an input”, “based on data”, “according to data”, or “in accordance with data” (including similar expressions) is used, unless otherwise noted, a case in which various data itself is used as an input and a case in which data obtained by processing various data (e.g., data obtained by adding noise, normalized data, and intermediate representation of various data) is used as an input are included. If it is described that any result can be obtained “based on data”, “according to data”, or “in accordance with data”, a case in which a result is obtained based on only the data is included, and a case in which a result is obtained affected by another data other than the data, factors, conditions, and/or states may be included. If it is described that “data is output”, unless otherwise noted, a case in which various data is used as an output is included, and a case in which data processed in some way (e.g., data obtained by adding noise, normalized data, and intermediate representation of various data) is used as an output is included.
In the present specification (including the claims), if the terms “connected” and “coupled” are used, the terms are intended as non-limiting terms that include any of direct, indirect, electrically, communicatively, operatively, and physically connected/coupled. Such terms should be interpreted according to a context in which the terms are used, but a connected/coupled form that is not intentionally or naturally excluded should be interpreted as being included in the terms without being limited.
In the present specification (including the claims), if the expression “A configured to B” is used, a case in which a physical structure of the element A has a configuration that can perform the operation B, and a permanent or temporary setting/configuration of the element A is configured/set to actually perform the operation B may be included. For example, if the element A is a general purpose processor, the processor may have a hardware configuration that can perform the operation B and be configured to actually perform the operation B by setting a permanent or temporarily program (i.e., an instruction). If the element A is a dedicated processor or a dedicated arithmetic circuit, a circuit structure of the processor may be implemented so as to actually perform the operation B irrespective of whether the control instruction and the data are actually attached.
In the present specification (including the claims), if a term indicating containing or possessing (e.g., “comprising/including” and “having”) is used, the term is intended as an open-ended term, including an inclusion or possession of an object other than a target object indicated by the object of the term. If the object of the term indicating an inclusion or possession is an expression that does not specify a quantity or that suggests a singular number (i.e., an expression using “a” or “an” as an article), the expression should be interpreted as being not limited to a specified number.
In the present specification (including the claims), even if an expression such as “one or more” or “at least one” is used in a certain description, and an expression that does not specify a quantity or that suggests a singular number is used in another description (i.e., (i.e., an expression using “a” or “an” as an article), it is not intended that the latter expression indicates “one”. Generally, an expression that does not specify a quantity or that suggests a singular number (i.e., an expression using “a” or “an” as an article) should be interpreted as being not necessarily limited to a particular number.
In the present specification, if it is described that a particular advantage/result is obtained in a particular configuration included in an embodiment, unless there is a particular reason, it should be understood that that the advantage/result may be obtained in another embodiment or other embodiments including the configuration. It should be understood, however, that the presence or absence of the advantage/result generally depends on various factors, conditions, states, and/or the like, and that the advantage/result is not necessarily obtained by the configuration. The advantage/result is merely an advantage/result that results from the configuration described in the embodiment when various factors, conditions, states, and/or the like are satisfied, and is not necessarily obtained in the claimed invention that defines the configuration or a similar configuration.
In the present specification (including the claims), if a term such as “maximize” is used, it should be interpreted as appropriate according to a context in which the term is used, including obtaining a global maximum value, obtaining an approximate global maximum value, obtaining a local maximum value, and obtaining an approximate local maximum value. It also includes determining approximate values of these maximum values, stochastically or heuristically. Similarly, if a term such as “minimize” is used, they should be interpreted as appropriate, according to a context in which the term is used, including obtaining a global minimum value, obtaining an approximate global minimum value, obtaining a local minimum value, and obtaining an approximate local minimum value. It also includes determining approximate values of these minimum values, stochastically or heuristically. Similarly, if a term such as “optimize” is used, the term should be interpreted as appropriate, according to a context in which the term is used, including obtaining a global optimum value, obtaining an approximate global optimum value, obtaining a local optimum value, and obtaining an approximate local optimum value. It also includes determining approximate values of these optimum values, stochastically or heuristically.
In the present specification (including the claims), if multiple hardware performs predetermined processes, each of the hardware may cooperate to perform the predetermined processes, or some of the hardware may perform all of the predetermined processes. Additionally, some of the hardware may perform some of the predetermined processes while another hardware may perform the remainder of the predetermined processes. In the present specification (including the claims), if an expression such as “one or more hardware perform a first process and the one or more hardware perform a second process” is used, the hardware that performs the first process may be the same as or different from the hardware that performs the second process. That is, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more hardware. The hardware may include an electronic circuit, a device including an electronic circuit, or the like.
In the present specification (including the claims), if multiple storage devices (memories) store data, each of the multiple storage devices (memories) may store only a portion of the data or may store an entirety of the data.
Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to the individual embodiments described above. Various additions, modifications, substitutions, partial deletions, and the like may be made without departing from the conceptual idea and spirit of the invention derived from the contents defined in the claims and the equivalents thereof. For example, in all of the embodiments described above, if numerical values or mathematical expressions are used for description, they are presented as an example and are not limited thereto. Additionally, the order of respective operations in the embodiment is presented as an example and is not limited thereto.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2019-215846 | Nov 2019 | JP | national |
This application is a continuation application of International Application No. PCT/JP2020/043622 filed on Nov. 24, 2020, and designating the U.S., which is based upon and claims priority to Japanese Patent Application No. 2019-215846, filed on Nov. 28, 2019, the entire contents of which are incorporated herein by reference.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/JP2020/043622 | Nov 2020 | US |
| Child | 17804359 | US |