This application is based upon and claims priority to Chinese Patent Application No. 202110347463.5, filed on Mar. 31, 2021, the entire contents of which are incorporated herein by reference.
The present invention relates to the field of image translation, and specifically relates to a method for image shape transformation based on a generative adversarial network.
Adversarial neural network models have achieved remarkable success in many applications, such as image inpainting, semantic segmentation, image caption, video generation, and style conversion. Image translation is one of the main research topics in computer vision. In recent years, the development of generative adversarial networks has led to significant advancements in the field of image translation. When paired training data from two different domains are given, the model can be trained in a supervised manner by using a conditional generative adversarial network. When there is a lack of paired data sets, image-to-image translation can be realized by using an unsupervised cycle generative adversarial network and self-consistency loss.
However, technologies in this field mostly focus on the conversion task between the styles of two similar shapes, such as season transfer, selfie-to-amine, and style conversion, but lack satisfactory performance in the conversion task between objects with different shapes.
In view of the above-mentioned shortcomings in the prior art, the present invention provides a method for image shape transformation based on a generative adversarial network to solve the problems identified in the prior art.
In order to achieve the above-mentioned objective, the present invention adopts the following technical solution, A method for image shape transformation based on a generative adversarial network includes the following steps:
Further, step S1 specifically includes: generating, for the image to be transformed, a segmentation mask mx in a source domain X and a segmentation mask my in a target domain Y.
Further, in step S2, the generator includes a down-sampling module, a first residual network module, a second residual network module, a third residual network module, a fourth residual network module, a fifth residual network module, a sixth residual network module and an up-sampling module that are connected in sequence.
The down-sampling module includes a first padding layer, a first convolutional layer, a first instance normalization layer, a first activation layer, a second convolutional layer, a second instance normalization layer, a second activation layer, a third convolutional layer, a third instance normalization layer and a third activation layer that are connected in sequence.
Each of the residual network modules includes a second padding layer, a fourth convolutional layer, a fourth instance normalization layer, a fourth activation layer, a third padding layer, a fifth convolutional layer and a fifth instance normalization layer that are connected in sequence.
The up-sampling module includes a first deconvolutional layer, a sixth instance normalization layer, a fifth activation layer, a second deconvolutional layer, a seventh instance normalization layer, a sixth activation laver, a fourth padding layer, a sixth convolutional layer and a seventh activation layer that are connected in sequence.
Further, in step S2, the discriminator includes a seventh convolutional layer, a first switchable normalization layer, a first maximum activation layer, an eighth convolutional layer, a second switchable normalization layer, an eighth instance normalization layer, a second maximum activation layer, an eighth convolutional layer, a third switchable normalization layer, a ninth instance normalization layer, a third maximum activation layer, a third deconvolutional layer, a fourth switchable normalization layer, a tenth instance normalization layer, a fourth maximum activation layer, a fourth deconvolutional layer and a fifth switchable normalization layer that are connected in sequence.
Further, in step S2, a method for constructing the generative adversarial network through the generator and the discriminator specifically includes:
S2.1: constructing a generator GXY for converting a given image in the source domain X to an image in the target domain Y, and constructing a generator GYX for converting a given image in the target domain Y to an image in the source domain X;
S2.2: constructing a discriminator DY for predicting whether the image is an image in the target domain, and constructing a discriminator DX for predicting whether the image is an image in the source domain; and
S2.3: connecting the generator GXY to the generator GYX, connecting the generator GXY to the discriminator DY, and connecting the generator GYX to the discriminator DX to construct the generative adversarial network.
Further, in step S2.3, a one-cycle generation process of the generative adversarial network includes a source domain cycle generation process and a target domain cycle generation process.
The source domain cycle generation process specifically includes:
S2.3.1.1: inputting a source domain image x and its segmentation mask mx into the generator GXY to obtain a first target domain generated image y′ and its segmentation mask m′y, which are denoted as GXY(x,mx);
S2.3.1.2: inputting the target domain generated image y′ and its segmentation mask into the generator GYX to obtain a first source domain generated image and its segmentation mask, which are denoted as GYX(GXY(x,mx))
S2.3.1.3: inputting the first target domain generated image y′ and its mask m′y as well as a target domain image y and its segmentation mask my into the discriminator DY for discrimination, and inputting GYX(GXY(x,mx)) denoting the first source domain generated image and its segmentation mask into the discriminator DX for discrimination, so as to complete the source domain cycle generation process.
The target domain cycle generation process specifically includes:
Further, in step S3, the loss function total is specifically expressed as:
total=λadvadv+λcyccyc+λidtidt+λctxctx+λfsfs;
wherein adv represents an adversarial loss function, cyc represents a cycle consistency loss function, idt represents an identity loss function, ctx represents a context-preserving loss function, is represents a feature similarity loss function, and λadv, λcyc, λidt, λctx and λfs represent weights of adv, cyc, idt, ctx and fs in the loss function total, respectively.
Further, the adversarial loss function adv is specifically expressed as:
adv=(DX(x,mx)−1)2+DX(GYX(y,my))2+(DY(y,my)−1)2+DY(GXY(x,mx))2;
wherein, DX(x, mx) represents a discriminant output of the discriminator Dx for the source domain image x and its segmentation mask mx, DX(GYX(y,my)) represents a discriminant output of the discriminator DX for GYX(y,my) denoting the source domain generated image and its segmentation mask, GYX(y,my) represents the source domain generated image and its mask generated by the target domain image y and its segmentation mask my through the generator GYX, DY(y,my) represents a discriminant output of the discriminator DY for the target domain image y and its segmentation mask my, DY(GXY(x,mx)) represents a discriminant output of the discriminator DY for GXY(x,mx) denoting the target domain generated image and its segmentation mask, and GXY(x,mx) represents the target domain generated image and its segmentation mask generated by the source domain image x and its segmentation mask mx through the generator GXY.
The cycle consistency loss function cyc is specifically expressed as:
cyc=∥GYX(GXY(x,mx))−(x,mx)∥1+∥GXY(GXY(y,my))−(y,my∥1;
wherein GYX(GXY(x,mx)) represents the source domain generated image and its segmentation mask generated by GXY(x, mx) through the generator GYX, GXY(GYX(y,my)) represents the target domain generated image and its segmentation mask generated by GYX(y,my) through the generator GXY and represents ∥*∥1 1-norm.
The identity loss function dt is specifically expressed as:
idt=∥GXY(y,my)−(y,my)∥1+∥GYX(x,mx)−(x,mx)∥1;
wherein GXY(y,my) represents the segmentation mask of the first target domain generated image y obtained after the source domain image x and its segmentation mask mx are input into the generator GXY, and GYX(x,mx) represents the segmentation mask of the second source domain generated image x′ obtained after the target domain image y and its segmentation mask my are input into the generator GYX.
The context-preserving loss function ctx is specifically expressed as:
ctx=∥ω(mx,m′y)e(x−y′)∥1+∥ω(my,m′x)e(y−x′)∥1;
wherein ω(mx, m′y) represents a minimum value obtained by subtracting one from elements in binary object masks through segmentation masks mx and m′y, and ω(my,m′x) represents a minimum value obtained by subtracting one from elements in binary object masks through segmentation masks my and m′x; y′ represents the target domain generated image generated by the source domain image x through the generator GXY, and x′ represents the source domain generated image generated by the target domain image y and its segmentation mask my through the generator GYX.
The feature similarity loss function fs is specifically expressed as:
wherein FS(y,y′) represents the similarity between the image y and the image y′, and
represents a feature i of the image y′ that is most similar to a feature j of the image y′; N represents a total number of features of the image y′ which is the same as a total number of features of the image y′; h represents a bandwidth parameter,
represents an exponential operation from distance dij to Wij, dij represents a normalized similarity distance, Wij represents a similarity, Wij/Σkwik represents a normalized similarity, and wik represents a similarity value of a kth Wij.
The advantages of the present invention are as follows.
(1) The present invention provides a method for image shape transformation based on a generative adversarial network, which realizes the transformation between object images with different shapes.
(2) The present invention uses a cycle generator and a discriminator to learn cross-domain mappings, generates an image with a closer basic distribution based on a target instance, and can effectively learn complex segmentation guidance attributes related to shapes and positions.
(3) The present invention proposes a feature similarity loss function to clearly establish a similarity comparison between a source image and a target image.
(4) With low complexity but high image conversion efficiency, the present invention can efficiently process specific images in pictures to perform image transformations with large shape differences, and thus can be used in animation production, poster design and other fields to enhance the reality of image transformation, while reducing labor costs and workload.
The following describes the specific embodiments of the present invention to help those skilled in the art understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments, for those ordinarily skilled in the art, as long as various changes are within the spirit and scope of the present invention defined and determined by the appended claims, these changes are obvious, and all inventions and creations that utilize the concept of the present invention shall fall within the scope of the present invention.
The embodiments of the present invention will be described in detail below with reference to the drawings.
As shown in
In the present embodiment, the present invention can be applied to game design, animation design, graphic design, medical imaging, and style transfer. In step S1, the image to be transformed can be a medical image to be transformed, an animated image with a shape to be transformed during the animation design, a game character or architectural image with a shape to be transformed during the game design, or an image to be transformed during the graphic design.
Step S1 specifically includes: generating, for the image to be transformed, a segmentation mask mx in a source domain X and a segmentation mask my in a target domain Y.
As shown in
As shown in
As shown in
As shown in
As shown in
In step S2, a method for constructing the generative adversarial network through the generator and the discriminator specifically includes:
S2.1: a generator GXY for converting a given image in the source domain X to an image in the target domain Y is constructed, and a generator GYX for converting a given image in the target domain Y to an image in the source domain X is constructed;
S2.2: a discriminator DY for predicting whether the image is an image in the target domain is constructed, and a discriminator DX for predicting whether the image is an image in the source domain is constructed; and
S2.3: as shown in
In the present embodiment, the generator includes three modules: a down-sampling module, a residual network module and an up-sampling module.
The down-sampling module converts the input feature vector (1, 4, 256, 256) into a feature vector (1, 256, 64, 64) through a four-layer convolution operation. The residual network module includes six blocks, where the input and output feature vectors have the same dimension. The up-sampling module converts the input feature vector (1, 512, 64, 64) into a feature vector (1, 3, 256, 256) through a five-layer convolution operation.
The discriminator includes two modules: a down-sampling module and a classifier module. The down-sampling module converts the input feature vector (1, 3, 256, 256) into a feature vector (1, 256, 32, 32) through a three-layer convolution operation.
In step S2.3, a one-cycle generation process of the generative adversarial network includes a source domain cycle generation process and a target domain cycle generation process.
The source domain cycle generation process specifically includes:
The target domain cycle generation process specifically includes:
In step S3, the loss function total is specifically expressed as:
total=λadvadv+λcyccyc+λidtidt+λctxctx+λfsfs;
wherein adv, represents an adversarial loss function, represents a cycle consistency loss function, idt represents an identity loss function, ctx represents a context-preserving loss function, fs represents a feature similarity loss function, and λadv, λcyc, λidt, λctx and λfs represent weights of adv, cyc, idt, ctx and fs in the loss function total, respectively.
The adversarial loss function adv is specifically expressed as:
adv=(DX(x,mx)−1)2+DX(GYX(y,my))2+(Dy(y,my)−1)2+DY(GXY(x,mx))2;
wherein, DX(x,mx) represents a discriminant output of the discriminator DX for the source domain image x and its segmentation mask mx, DX(GYX(y,my)) represents a discriminant output of the discriminator DX for GYX(y,my) denoting the source domain generated image and its segmentation mask, GYX(y,my) represents the source domain generated image and its mask generated by the target domain image y and its segmentation mask m through the generator GYX, DY(y,my) represents a discriminant output of the discriminator DY for the target domain image y and its segmentation mask my, DY(GXY(x,mx)) represents a discriminant output of the discriminator DY for the target domain generated image and its segmentation mask GXY(x,mx) GXY(x,mx) represents the target domain generated image and its segmentation mask generated by the source domain image x and its segmentation mask mx through the generator GXY.
The cycle consistency loss function cyc is specifically expressed as:
cyc=∥GXY(GXY(x,mx))−(x,mx)∥1+∥GXY(GYX(GYX(y,my))−(y,my)∥1;
wherein GYX (GXY(x,mx)) represents the source domain generated image and its segmentation mask generated by GXY(x,mx) through the generator GYX, GXY(GYX(y,my)) represents the target domain generated image and its segmentation mask generated by GYX(y,my) through the generator GXY, and ∥*∥1 represents 1-norm.
The identity loss function idt is specifically expressed as:
idt=∥GXY(y,my)−(y,my∥1+∥GYX(x,mx)−(x,mx)∥1;
wherein GXY(y,my) represents the segmentation mask of the first target domain generated image y obtained after the source domain image x and its segmentation mask mx are input to the generator GXY, and GYX(x,mx) represents the segmentation mask of the second source domain generated image x′ obtained after the target domain image y and its segmentation mask my are input into the generator GYX.
The context-preserving loss function ctx is specifically expressed as:
ctx=∥ω(mx,m′y)e(x−y′)∥1+∥ω(my,m′x)e(y−x′)∥1;
wherein ω(mx,m′y) represents a minimum value obtained by subtracting one from elements in binary object masks through segmentation masks mx and m′y, and ω(my, m′x) represents a minimum value obtained by subtracting one from elements in binary object masks through segmentation masks my and m′x; y′ represents the target domain generated image generated by the source domain image x through the generator GXY, and x′ represents the source domain generated image generated by the target domain image y and its segmentation mask my through the generator GYX.
The feature similarity loss function s is specifically expressed as:
wherein FS(y,y′) represents the similarity between the image y and the image y′, and
represents a feature i of the image y that is most similar to a feature j of the image y′; N represents the total number of features of the image y, which is the same as the total number of features of the image y′; h represents a bandwidth parameter,
represents an exponential operation from distance dij to Wij, dij represents a normalized similarity distance, Wij represents a similarity, Wij/Σk wik represents normalized similarity, and wik represents a similarity value of the kth Wij.
The similarity FS(y,y′) between the two images is calculated by using these high-level features. Specifically, in a forward process, each layer generates a feature map, a real image y′ obtained from real training data includes the feature yi, while a composite image y′ includes the feature y′j. The content attribute and style attribute of the feature y′j are consistent with those of a real domain data set. Assuming that the two images have the same number N of features, where N=|R|=|F|. The most similar feature yi of each y′j is found, that is, maxFSij. Then, all similarity values of y′j are added to calculate a context similarity value between the two images. Finally, the context similarity value is divided by N to obtain an average similarity FS(y,y′).
The advantages of the present invention are as follows.
(1) The present invention provides a method for image shape transformation based on a generative adversarial network, which realizes the transformation between objet images with different shapes.
(2) The present invention uses a cycle generator and a discriminator to learn cross-domain mappings, generates an image with a closer basic distribution based on a target instance, and can effectively learn complex segmentation guidance attributes related to shapes and positions.
(3) The present invention proposes a feature similarity loss function to clearly establish a similarity comparison between a source image and a target image.
(4) With low complexity but high image conversion efficiency, the present invention can efficiently process specific images in pictures to perform image transformations with large shape differences, and thus can be used in animation production, poster design and other fields to enhance the reality of image transformation, while reducing labor costs and workload.
Number | Date | Country | Kind |
---|---|---|---|
202110347463.5 | Mar 2021 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
11158121 | Tung | Oct 2021 | B1 |
20190197358 | Madani | Jun 2019 | A1 |
20190197368 | Madani | Jun 2019 | A1 |
20190198156 | Madani | Jun 2019 | A1 |
20200073968 | Zhang | Mar 2020 | A1 |
20210012486 | Huang | Jan 2021 | A1 |
20210063559 | Tzadok | Mar 2021 | A1 |
20210385081 | Ding | Dec 2021 | A1 |
20210398334 | He | Dec 2021 | A1 |
20220373673 | Sur | Nov 2022 | A1 |
20230408682 | Gubbi Lakshminarasimha | Dec 2023 | A1 |
Number | Date | Country |
---|---|---|
109325951 | Feb 2019 | CN |
110659727 | Jan 2020 | CN |
111429405 | Jul 2020 | CN |
111899160 | Nov 2020 | CN |
113674154 | Oct 2023 | CN |
2021013020 | Jan 2021 | WO |
Entry |
---|
Alaa Abu-Srhan, Mohammad A.M. Abushariah, Omar S. Al-Kadi, “The effect of loss function on conditional generative adversarial networks”. . . Journal of King Saud University—Computer and Information Sciences, Oct. 2022, pp. 6977-6988 vol. 34, Issue 9, (Year: 2022). |
Yao Zhe-Wei, et al., Improved CycleGANs for Intravascular Ultrasound Image Enhancement, Computer Science, 2019, pp. 221-227, vol. 46 No. 5. |
Hao Chen, et al., Brain Tumor Segmentation with Generative Adversarial Nets, 2019 2nd International Conference on Artificial Intelligence and Big Data, 2019, pp. 301-305. |
Han Xue, et al., Realistic Talking Face Synthesis with Geometry-Aware Feature Transformation, ICIP, 2020, pp. 1581-1585. |
Number | Date | Country | |
---|---|---|---|
20220318946 A1 | Oct 2022 | US |