The application claims the priority to Chinese Patent Application No. 202111151607.6, filed with the China Patent Office on Sep. 29, 2021, the entirety of which is incorporated herein by reference.
Embodiments of the present disclosure relate to the technical field of image processing, for example, to a method, apparatus, device, and storage medium for image generation.
With the development of science and technology, more and more application software has entered user lives, gradually enriching the users' spare time life, such as short video APPs, etc. Users can record their lives by means of videos, photos, etc., and upload them to the short video APPs.
There are many effects based on image algorithms and rendering techniques on the short video APPs, wherein virtual clothing changing refers to the application of image fusion technology to fuse a user's human body image with a clothing image comprising target clothing to obtain an image of the user wearing the target clothing, so that the user can know the wearing effect of the target clothing without actually trying on the target clothing.
At present, an image fusion model is usually applied to extract features from the human body image and the clothing image respectively in the process of virtual clothing changing. A new image is generated based on the two extracted image features, i.e., the image of the user wearing the target clothing. However, in the above process, since the image fusion model extracts rough image features, it is prone to the loss of information on details in the newly generated image during image generation, which leads to the distortion of an image generation effect, and a poor effect of virtual clothing changing.
The embodiments of the present disclosure provide a method, apparatus, device, and storage medium for image generation, which can improve the fidelity of a generated image.
In a first aspect, embodiments of the present disclosure provide a method for image generation, comprising:
In a second aspect, embodiments of the present disclosure provide an apparatus for image generation, comprising:
In a third aspect, embodiments of the present disclosure provide an electronic device, comprising:
In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium storing a computer program which, when executed by a processing device, implements the method for image generation according to the embodiments of the present disclosure.
It should be understood that a plurality of steps described in the embodiments of the method of the present disclosure may be executed in different sequences and/or in parallel. Furthermore, the embodiments of the method may include additional steps and/or omit to execute shown steps. The scope of the present disclosure is not limited in this aspect.
The term “include” as used herein and its variations are open-ended, meaning “including but not limited to”. The term “based on” means “based at least in part on”. The term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms are given in the following description.
It should be noted that concepts such as “first” and “second” mentioned in the present disclosure are only used to distinguish different apparatuses, modules or units, and are not used to limit the sequence or interdependence of functions performed by these apparatuses, modules or units.
It should be noted that the modifications of “one” and “a plurality of” mentioned in the present disclosure are indicative rather than restrictive, and those skilled in the art should understand that they should be understood as “one or more” unless expressly indicated otherwise in the context.
The names of messages or information exchanged between the plurality of apparatuses in the embodiments of the present disclosure are used for illustrative purposes only and are not used to limit the scope of the messages or information.
At step 110, a first human body image comprising a target human body and a first clothing image comprising target clothing are obtained.
The target human body may be a portrait displayed in a certain pose, and the target clothing may be clothing displayed in a plan pattern. As an example,
At step 120, key point extraction, portrait segmentation and human body part segmentation are performed on the first human body image respectively, to obtain a key point feature image, a portrait segmented image and a human body part segmented image.
Human body key point extraction may be understood as human body pose estimation. Human body key points may comprise 17 key points, including the nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees, and left and right ankles. In these embodiments, any human body key point detection algorithm may be adopted to detect the human body key points on the first human body image (not limited here), or the first human body image may be input into a key point extraction model, to obtain a key point feature image. As an example,
The portrait segmented image may be understood as an image with a portrait segmented from a background. In these embodiments, any portrait segmentation technique may be used to perform portrait segmentation (not limited here), or the first human body image may be input into a portrait segmentation model, to obtain the portrait segmented image. As an example,
The human body part segmented image may be understood as an image with a plurality of parts of a human body segmented, for example, an image with a face, hair, arms, an upper body, legs and the like segmented. In these embodiments, any human body part segmentation algorithm may be used to perform human body part segmentation on the first human body image, which is not limited here. Alternatively, the first human body image may be input into a human body part segmentation model, to obtain the human body part segmented image. As an example,
In these embodiments, the information on the posture of the human body may be obtained through the key point feature image, the information on the size of the human body may be obtained through the portrait segmented image, and a region that the clothing covers may be obtained through the human body part segmented image. Thus, a posture adjustment can be performed on the clothing image based on the key point feature image, a size adjustment can be performed on the clothing image based on the portrait segmented image, and the clothing image can be cropped based on the human body part segmented image. A transformed clothing image can be obtained after the posture adjustment, size adjustment and cropping are performed on the plan clothing image such that the transformed clothing image fits the current human body in a better way.
For example, after the key point extraction on the first human body image and before the portrait segmentation on the first human body image, the following steps are also included: obtaining reference key point distribution information; and adjusting the key points of the first human body image based on the reference key point distribution information, to obtain an adjusted first human body image.
The reference key point distribution information may be understood as information on the distribution of a plurality of human body key points in a reference image. In these embodiments, after the key points of the first human body image are extracted respectively, the extracted key points are aligned with reference key points, so as to achieve the purpose of adjusting the size of the image and the proportion of the portrait in the image. As an example,
The way to perform portrait segmentation and human body part segmentation on the first human body image respectively is to perform the portrait segmentation and the human body part segmentation on the adjusted first human body image respectively.
At step 130, the key point feature image, the portrait segmented image, the human body part segmented image and the first clothing image are input into a transformation model, to obtain a transformed second clothing image.
The transformation model may be obtained by training a configured neural network based on a human body sample image and a clothing sample image, wherein the configured neural network may be a convolutional neural network, etc.
For example, after the key point feature image, the portrait segmented image, the human body part segmented image and the first clothing image are obtained, the key point feature image, the portrait segmented image, the human body part segmented image and the first clothing image are input into the transformation model, to obtain the transformed second clothing image. As an example,
For example, the process that the key point feature image, the portrait segmented image, the human body part segmented image and the first clothing image are input into the transformation model, to obtain the transformed second clothing image may include: the transformation model performing a posture adjustment on the first clothing image based on the key point feature image, performing a size adjustment on the posture-adjusted clothing image based on the portrait segmented image, and cropping the size-adjusted clothing image based on a clothing region in the human body part segmented image to obtain the transformed second clothing image.
The transformed second clothing image may be obtained after the posture adjustment, size adjustment and cropping are performed on the first clothing image in order based on the key point feature image, the portrait segmented image and the human body part segmented image, which can ensure that the transformed second clothing image fits the current human body in a better way.
In these embodiments, the transformation model may be trained by: obtaining the human body sample image and the clothing sample image, wherein a human body in the human body sample image wears clothing in the clothing sample image; performing key point extraction, portrait segmentation and human body part segmentation on the human body sample image respectively, to obtain a key point feature sample image, a portrait segmented sample image and a human body part segmented sample image; inputting the key point feature sample image, the portrait segmented sample image, the human body part segmented sample image and the clothing sample image into an initial model, to obtain a first transformed clothing image; calculating a loss function based on the first transformed clothing image and the human body sample image; and training the initial model based on the loss function, to obtain the transformation model.
The key point extraction, portrait segmentation and human body part segmentation may also be performed on the human body sample image respectively in such a way that the human body sample image is input into the key point extraction model, the portrait segmentation model and the human body part segmentation model respectively, to obtain the key point feature sample image, the portrait segmented sample image and the human body part segmented sample image.
At step 140, the second clothing image, the first human body image, the key point feature image, the portrait segmented image and the human body part segmented image are input into a merging model, to obtain a second human body image.
The target human body in the second human body image wears the target clothing. The merging model may be obtained by training a generative model in a generative adversarial network based on the human body sample image and the clothing sample image. For example, the second clothing image, the key point feature image, the portrait segmented image and the human body part segmented image are input into the merging model, to obtain the second human body image. As an example,
For example, the process that the second clothing image, the first human body image, the key point feature image, the portrait segmented image and the human body part segmented image are input into the merging model, to obtain the second human body image may include: the merging model combining the second clothing image and the first human body image to obtain an initial image, optimizing a clothing posture in the initial image based on the key point feature image, optimizing a clothing size in the initial image based on the portrait segmented image, and cropping the clothing in the initial image based on the human body part segmented image to obtain the second human body image.
In these embodiments, since the clothing and the human body in the initial image that is obtained by combining the second clothing image and the first human body image fits each other in a low degree, the initial image needs to be optimized. After the posture optimization, the size optimization and the cropping optimization are performed on the initial image in order based on the key point feature image, the portrait segmented image and the human body part segmented image, the clothing and the human body in the obtained second human body image can fit each other in a better way, to achieve higher fidelity.
In these embodiments, the merging model is trained by: inputting the key point feature sample image, the portrait segmented sample image and the human body part segmented sample image and the clothing sample image into the transformation model, to obtain a second transformed clothing image; inputting the second transformed clothing image, the human body sample image, the key point feature sample image, the portrait segmented sample image, the human body part segmented sample image and the clothing sample image into a generative model, to obtain a generated human body image; inputting the generated human body image into a discriminative model, to obtain a discrimination result; and training the generative model based on the discrimination result, to obtain the merging model.
The merging model is trained based on the transformation model. For example, the accuracy of the final merging model can be improved by performing adversarial training on the generative model and the discriminative model.
According to the technical solution of the embodiments, a first human body image comprising the target human body and a first clothing image comprising the target clothing are obtained; key point extraction, portrait segmentation and human body part segmentation are performed on the first human body image respectively, to obtain a key point feature image, a portrait segmented image and a human body part segmented image; the key point feature image, the portrait segmented image, the human body part segmented image and the first clothing image are input into a transformation model, to obtain a transformed second clothing image; and the second clothing image, the first human body image, the key point feature image, the portrait segmented image and the human body part segmented image are input into a merging model, to obtain the second human body image, wherein the target human body in the second human body image wears the target clothing. According to the method for image generation provided by the embodiments of the present disclosure, the transformed second clothing image is obtained by performing transformation processing on the target clothing in the first clothing image by the transformation model, and the second human body image wearing the target clothing is obtained by mixing the transformed target clothing and the target human body by the merging model, so that the fidelity of the generated image can be increased.
For example, the segmented image obtaining module 220 is further configured to:
For example, the second clothing image obtaining module 230 is further configured to:
For example, the second human body image obtaining module 240 is further configured to:
For example, the apparatus for image generation further comprises: a first human body image adjusting module configured to,
For example, the segmented image obtaining module 220 is further configured to:
For example, the apparatus for image generation further comprises: a transformation model training module configured to,
For example, the apparatus for image generation further comprises: a merging model training module configured to,
For example, the clothing image is a clothing plan image.
The above-described apparatus may perform the method provided by the foregoing embodiments of the present disclosure, and has corresponding functional modules for executing the above-described method, and obtain the beneficial effects. Technical details not described in detail in these embodiments can be found in the method provided by the foregoing embodiments of the present disclosure.
Reference is made below to
As shown in
In general, the following apparatuses may be connected to the I/O interface 305: an input device 306, such as a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; an output device 307, such as a liquid crystal display (LCD), a speaker, a vibrator, etc.; a storage device 308, such as magnetic tape, a hard disk, etc.; and a communication device 309. The communication device 309 may allow the electronic device 300 to communicate with other devices in a wireless or wired way to exchange data. Although
According to the embodiments of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium. The computer program comprises program codes used for executing a method for recommendation of words. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 309, or installed from the storage device 308, or installed from the ROM 302. When the computer program is executed by the processing device 301, the above-mentioned functions defined in the method according to the embodiment of the present disclosure are executed.
It should be noted that the computer-readable storage medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. The computer-readable storage medium may be, but not limited to, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor-based system, apparatus or device, or any combination thereof. More specific examples of the computer-readable storage medium may include but are not limited to: an electrical connection having one or more conducting wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof. In the present disclosure, the computer-readable storage medium may be any tangible medium that comprises or stores a program that may be used by or in combination with an instruction execution system, apparatus or device. In the present disclosure, the computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier, which carries computer-readable program codes. Such a propagated data signal may be in multiple forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination thereof. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium; and the computer-readable signal medium may send, propagate or transmit a program that is used by or in combination with an instruction execution system, apparatus or device. The program codes that the computer-readable medium comprises may be transmitted by means of any suitable medium, including but not limited to: an electric wire, an optical cable, a radio frequency (RF), etc., or any suitable combination thereof. The computer-readable storage medium may be a non-transient computer-readable storage medium.
In some embodiments, a client and a server may communicate by means of any network protocol that is known at present or developed in the future, such as a hypertext transfer protocol (HTTP), and may be interconnected with digital data communication (e.g., a communication network) of any form or medium. Examples of the communication network include a local area network (LAN), a wide area network (WAN), an internetwork (e.g., the Internet) and an end-to-end network (e.g., an ad hoc end-to-end network), and any networks that are known at present or developed in the future.
The above-mentioned computer-readable medium may be contained in the above-mentioned electronic device, and may also exist independently without being installed in the electronic device.
The above-mentioned computer-readable medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device is enabled to implement the following steps: obtaining a first human body image comprising a target human body and a first clothing image comprising target clothing; performing key point extraction, portrait segmentation and human body part segmentation on the first human body image respectively, to obtain a key point feature image, a portrait segmented image and a human body part segmented image; inputting the key point feature image, the portrait segmented image, the human body part segmented image and the first clothing image into a transformation model, to obtain a transformed second clothing image; and inputting the second clothing image, the first human body image, the key point feature image, the portrait segmented image and the human body part segmented image into a merging model, to obtain a second human body image, wherein the target human body in the second human body image wears the target clothing.
Computer program codes for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, etc., and conventional procedural programming languages such as “C” or similar programming languages. The program codes may be executed completely on a user computer, partially on the user computer, as an independent software package, partially on the user computer and partially on a remote computer, or completely on a remote computer or server. In a case involving the remote computer, the remote computer may be connected to the user computer through any type of network, including a LAN or a WAN, or may be connected to an external computer (for example, through the Internet by using an Internet service provider).
The flowchart and block diagram in the accompanying drawings illustrate architectures, functions, and operations that may be realized in accordance with the systems, methods, and computer program products of various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or part of the codes, which comprises one or more executable instructions for implementing specified logical functions. It should also be noted that in some alternative implementations, functions indicated in the blocks may also be implemented in an order different from that indicated in the drawings. For example, two blocks represented in succession may be executed basically in parallel in fact, and sometimes they may also be executed in reverse order, depending on the function involved. It should also be noted that each block in the block diagram and/or flowchart, as well as a combination of the blocks in the block diagram and/or flowchart, may be implemented with a dedicated hardware-based system that executes a specified function or operation, or with a combination of dedicated hardware and computer instructions.
Units described in the embodiments of the present disclosure may be implemented by means of software or hardware. The names of the units do not limit the units in some cases.
The functions described herein can be executed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), an application-specific standard product (ASSP), a system-on-chip (SOC), a complex programmable logic device (CPLD), etc.
In the context of the present disclosure, a machine-readable medium may be a tangible medium that may comprise or store a program for use by or in combination with an instruction execution system, apparatus or device. The machine-readable storage medium may be a machine-readable signal medium or machine-readable storage medium. The machine-readable medium may be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor-based system, apparatus or device, or any combination thereof. More specific examples of the machine-readable storage medium may include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, convenient compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, embodiments of the present disclosure disclose a method for image generation, comprising:
For example, the performing key point extraction, portrait segmentation and human body part segmentation on the first human body image respectively, to obtain the key point feature image, the portrait segmented image and the human body part segmented image comprises:
For example, the inputting the key point feature image, the portrait segmented image, the human body part segmented image and the first clothing image into the transformation model, to obtain the transformed second clothing image comprises:
For example, the inputting the second clothing image, the first human body image, the key point feature image, the portrait segmented image and the human body part segmented image into the merging model, to obtain the second human body image comprises:
For example, further comprising: after the key point extraction on the first human body image and before the portrait segmentation on the first human body image,
For example, the performing portrait segmentation and human body part segmentation on the first human body image respectively comprises:
For example, the transformation model is trained by:
For example, the merging model is trained by:
For example, the clothing image is a clothing plan image.
Number | Date | Country | Kind |
---|---|---|---|
202111151607.6 | Sep 2021 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/118670 | 9/14/2022 | WO |