The present application claims priority to Chinese Patent Application No. 202410335303.2, filed Mar. 22, 2024, the entire disclosure of which is incorporated herein by reference.
The present application relates to a method for recovering a face image based on semantic features, and belongs to the field of image processing technology.
With the development of science technology and the improvement of living standards, image and video data enhance exponentially, which contain a large amount of image or video data with human subjects. In addition, face recognition, face matching, drones, and face tracking in monitors, and other applications are emerging. However, due to the instability of imaging devices, such as focus failure and camera shake; imaging environment factors, such as low light, high exposure, and subject movement; and network transmission conditions, such as compression, zoom, and code formats, etc., various types and different degrees of degradation of images, such as blurring and noise, etc., exist, and the complex degradation not only reduces the observation effect of the human eye, especially in the face of the face image, but also brings negative effects on significant personal and property safety, such as: face access control, payment, and security monitoring. Thus, the face image recovery aims to recover clear face images from degraded face images to help in face vision work, such as face detection and recognition. Compared with natural images, face images not only contain details of visual perception, but also individual features of the five senses and identity information.
Currently, the face recovery method can be mainly categorized into three types: methods based on geometric a priori, methods based on reference, methods based on generating a priori. Existing methods based on geometric a priori are often difficult to capture effective geometric information from low-quality face images; existing methods based on reference generally construct a dictionary with a certain capacity in advance, so their generalization performance is low in the real degraded face images; existing methods based on generating a priori ignore the identity information of the face during the recovery operation.
Most of the above methods only focus on the recovery of the geometric structure and detailed texture of the face, while ignoring the mining and maintenance of the semantic information, resulting in that the generated face image looks real, but its basic features are changed.
An object of the present application is to overcome the deficiencies in the related art and provide a face image recovery method based on semantic features, a reference image generator and a feature transfer based on the recovery model, which can improve the problem of ignoring semantic information in the existing methods, ensure the consistency of the face semantic information while guaranteeing the details and textures of the recovery results, and achieves high evaluation index scores and high-quality visualization effects in real scenarios.
In order to achieve the above purpose, the present application is realized by adopting the following technical solutions.
The present application provides a method for recovering a face image based on semantic features, which includes:
In one embodiment, the encoder includes five residual convolution units and four 2-fold down-sampling units, and the five residual convolution units and the four 2-fold down-sampling units are alternately connected in series.
In one embodiment, the reference image generator includes a first generator module, a second generator module and a third generator module connected in series sequentially;
In one embodiment, the semantic feature fusion unit includes one normalization layer and two convolution layers connected in series sequentially, and the semantic feature fusion unit is configured for fusion of the low-quality face semantic features and the random noise.
In one embodiment, the feature transfer includes a dictionary construction module and a feature transfer module;
In one embodiment, the decoder includes five residual convolution units and four 2-fold up-sampling units, the five residual convolution units and the four 2-fold up-sampling units are alternately connected in series.
In one embodiment, the method for training the recovery model includes:
In one embodiment, obtaining the training set includes:
In one embodiment, the recovery model loss function is expressed as:
L=Ll1+λperLper+λadvLadv,
In one embodiment, the L1 loss value Ll1 is expressed as:
Ll1=|Ihq−Îhq|1
In the face image recovery method based on semantic features of the present application, firstly, the reference image generator based on the recovery model generates a plurality of high-quality face reference images by inputting random noise and guided by semantic information; the feature transfer based on the recovery model constructs a face component feature dictionary which is a semantically guided lightweight and easy to be quickly foud, which can improve the problem of ignoring the semantic information in the existing methods, and can improve the generalization performance of the face recovery model, and can ensure the consistency of the face semantic information while guaranteeing the details and textures of the recovery results, and achieves high evaluation index scores and high-quality visualization effects in real scenarios.
The present application is further described below in conjunction with the accompanying drawings. The following embodiments are only used to more clearly illustrate the technical solution of the present application, and are not to be used to limit the scope of the present application.
The present application discloses a method for recovering a face image based on semantic features, as shown in
The technical concepts of the present application are: a reference image generator based on the recovery model generates a plurality of high-quality face reference images by inputting random noise and guided by semantic information; a feature transfer based on the recovery model constructs a lightweight face component feature dictionary that facilitates fast searching, which improves the problem of ignoring the semantic information by the existing methods, and improves the generalization performance of the face recovery model, and achieves high evaluation index scores and high-quality visualization effects in real scenarios.
As shown in
As shown in
The reference image generator includes a first generator module, a second generator module and a third generator module connected in series sequentially;
The semantic feature fusion unit includes one normalization layer and two convolution layers connected in series sequentially. The normalization layer uses a Layer Norm (LN).
The semantic feature fusion unit is configured for fusion of low-quality face semantic features and random noise. Specifically, a plurality of sets of Gaussian noise generated based on random seeds are fused with low-quality face semantic features in the semantic feature fusion unit, and this multi-scale multi-stage fusion is utilized to produce a plurality of high-quality face reference images with the same semantic information but with different performances.
As shown in
The dictionary construction module includes a face feature extraction unit, a first face component detection unit and a dictionary construction unit connected in series sequentially.
The first face component detection unit is configured for obtaining high-quality component features of different component categories based on the plurality of high-quality face reference features.
The dictionary construction unit is configured for constructing the face component feature dictionary based on the high-quality component features of different component categories.
A feature transfer module includes a second face component detection unit, a dictionary lookup unit and a feature fusion unit connected in series sequentially.
The second face component detection unit is configured for obtaining a low-quality component feature of a corresponding component category based on the low-quality face semantic features.
The dictionary lookup unit is configured for obtaining the high-quality component features of a corresponding component category based on the low-quality component feature and the constructed face component feature dictionary.
The feature fusion unit is configured for fusing the low-quality face semantic features and the high-quality component features of the corresponding component category to obtain the high-quality face semantic features.
Specifically, the first face component detection unit and the second face component detection unit both use publicly available dlib libraries; the dictionary construction unit uses an existing K clustering method; the dictionary lookup unit calculates the similarity between each low-quality component feature to be looked up and each item in the face component feature dictionary, and selects the high-quality component feature with the largest similarity value. The similarity calculation uses Euclidean distance.
The face recovery methods based on dictionary learning in the related art usually construct the dictionary in advance, so that the redundant dictionary requires a large amount of storage and computational overhead when performing face recovery, which is costly, and the initial capacity limits the generalization performance of the model. Therefore, the present application constructs a semantically guided lightweight face component feature dictionary, which avoids the existing problem of constructing a good dictionary in advance, facilitates searching, and ensures the consistency of the face semantic information while guaranteeing the details and textures of the recovery results. The present application is able to achieve high evaluation index scores and high-quality visualization effects in real scenarios.
As shown in
It should be noted that the residual convolution unit involved in this embodiment is to utilize residual learning to fully mine the feature information, as shown in
The 2-fold down-sampling unit is a convolution layer with a convolution kernel size of 2 and a step size of 2, whose function is to reduce the size of the input features by a factor of 2. The 2-fold up-sampling unit expands the size of the input features by a factor of 2, which consists of a convolution layer with a convolution kernel size of 1 and a subpixel layer with a 2-fold up-sampling in a series sequentially.
The training method of the recovery model in this embodiment includes:
First, obtaining a training set, the training set includes a to-be-recovered face training image and a corresponding real face recovery image.
Specifically, obtaining the training set includes:
Specifically, each high-quality face image is extracted from the FFHQ dataset and its aspect is adjusted to 512 pixels to obtain the degraded face image. adjusting pixels is expressed as:
Ilq={JPEGq((Ihq*kσ)↓s+nδ)}↑s,
The specific parameters of adjusting pixels can be adjusted according to the actual image and are not limited here.
Second, inputting the to-be-recovered face training image into the pre-constructed recovery model to obtain the recovered face training image.
Third, calculating the recovery model loss function based on the recovered face training image and the corresponding real face recovery image.
The recovery model loss function is expressed as:
L=Ll1+λperLper+λadvLadv,
The L1 loss value is expressed as:
Ll1=|Ihq−Îhq|1,
Fourth, the recovery model is trained based on iterative updating of the recovery model by the gradient descent method, and the recovery model with the smallest loss function of the recovery model is used as the trained recovery model.
It should be appreciated by those skilled in the art that embodiments of the present application may be provided as methods, systems, or computer program products. Thus, the present application may take the form of a fully hardware embodiment, a fully software embodiment, or an embodiment that combines software and hardware aspects. Further, the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk memory, CD-ROM, optical memory, and the like) that contain computer-usable program code therein.
The present application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present application. It should be understood that each of the processes and/or boxes in the flowchart and/or block diagram, and the combination of processes and/or boxes in the flowchart and/or block diagram, may be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data-processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data-processing device produce a device for carrying out the functions specified in the one process or a plurality of processes of the flowchart and/or the one box or a plurality of boxes of the box diagram.
These computer program instructions may also be stored in computer-readable memory capable of directing the computer or other programmable data processing device to operate in a particular manner such that the instructions stored in the computer-readable memory produce an article of manufacture comprising an instruction device that implements the function specified in the flowchart one process or a plurality of processes and/or the box diagram one box or a plurality of boxes.
These computer program instructions may also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on the computer or other programmable device to produce computer-implemented processing, such that the instructions executed on the computer or other programmable device provide steps for implementing the functionality specified in the flowchart one process or a plurality of processes and/or the box diagram one box or a plurality of boxes.
The foregoing is only a preferred embodiment of the present application, and it should be noted that: for those skilled in the art, without departing from the principles of the present application, a number of improvements and embellishments may be made, which shall also be considered as the scope of the present application.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202410335303.2 | Mar 2024 | CN | national |
| Number | Name | Date | Kind |
|---|---|---|---|
| 11869275 | Joseph | Jan 2024 | B1 |
| 12175641 | Mironica | Dec 2024 | B2 |
| 20160093026 | Choudhury | Mar 2016 | A1 |
| 20180075581 | Shi et al. | Mar 2018 | A1 |
| 20220076459 | Zhang | Mar 2022 | A1 |
| 20230138049 | Zhang | May 2023 | A1 |
| 20230222628 | Zhao | Jul 2023 | A1 |
| 20240087083 | Wang | Mar 2024 | A1 |
| 20240119671 | Liu | Apr 2024 | A1 |
| 20240378921 | Shu | Nov 2024 | A1 |
| Number | Date | Country |
|---|---|---|
| 104008538 | Aug 2014 | CN |
| 113128624 | Jul 2021 | CN |
| 114372937 | Apr 2022 | CN |
| 116543388 | Aug 2023 | CN |
| 117391968 | Jan 2024 | CN |
| 117391995 | Jan 2024 | CN |
| 117475042 | Jan 2024 | CN |
| 117710252 | Mar 2024 | CN |
| 118262395 | Jun 2024 | CN |
| Entry |
|---|
| Ji Ming-Ye; Zhang Deng-Yin; Ji Ying-Tian;, <Image Haze Removal Algorithm Based on Haze Thickness Estimation>, <Acta Automatica Sinica>, Sep. 30, 2016, pp. 85-97, vol. 42, No. 9. |
| Zhang dengyin, et al., High-Resolution Representations network for single image dehazing, «Sensors» Mar. 15, 2022, pp. 1-14. |
| Li Xiaoming, et al., Blind Face Restoration via Deep Multi-scale Component Dictionaries «https://arxiv.org/abs/2008.00418>, Aug. 2, 2020, pp. 1-16. |
| Yanjiang Yu, et al., Multiprior Learning via Neural Architecture Search for Blind Face Restoration, «IEEE Transactions on Neural Networks and Learning Systems», Dec. 12, 2023, pp. 1-12. |
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2024/109800 | Aug 2024 | WO |
| Child | 18812170 | US |