The disclosure relates to an artificial intelligent field, and more particularly to an image processing method and device, an electronic apparatus and a storage medium.
The camera function of an intelligent terminal, for example a smart phone, is an important function. With the popularity and changing frequency of the intelligent terminal, the camera function of the intelligent terminal is more and more powerful, the image resolution is larger and larger, and the imaging details are clearer and clearer.
The image is an objective reflection of the real world, and the imaging quality is a core index to evaluate the camera function. Therefore, improving the image imaging quality has become an important goal pursued by many manufacturers.
However, due to the limited physical structure of the intelligent terminal, there is still a certain gap between the imaging quality of the intelligent terminal and the professional camera. Especially under a dark light condition, due to insufficient illumination, the image photographed by the intelligent terminal will also have serious quality degradation (e.g., texture loss), especially for the portrait part in the image, and the serious quality degradation will greatly affect the usage experience of a user.
In some related art solutions, a certain preset filtering operator may be used to achieve the enhancement of the image texture. However, the recovering effect for texture details is poor if only the filtering is used to improve the image quality degradation, thus, the improvement of the image quality is very limited. In view of this, a better technology for improving or correcting the image quality degradation is needed.
Provided are an image processing method and device, an electronic apparatus and a storage medium, which may address at least the problem that the effect of improving the image quality degradation is poor in the related technology.
According to an aspect of the disclosure, an image processing method includes acquiring an input image. An image processing method includes detecting a target area in the input image. An image processing method includes processing the target area. The processing of the target area includes obtaining a feature map of the target area. The processing of the target area includes rearranging feature blocks in the feature map in a feature space. The processing of the target area includes obtaining an output image after the target area is processed based on the rearranged feature blocks and the feature map.
According to an aspect to the disclosure, an image processing method includes acquiring an input image. An image processing method includes detecting a target area in the input image. An image processing method includes acquiring at least one of semantic layout information of the target area and quality degradation level information based on quality degradation levels of different areas in the input image. An image processing method includes processing the target area based on the at least one of the semantic layout information and the quality degradation level information to obtain a processed output image.
According to an aspect to the disclosure, an image processing device comprises at least one storage configured to store one or more computer executable instructions. The image processing device comprises at least one processor configured to execute the one or more instructions stored in the storage to acquire an input image. The at least one processor configured to execute the one or more instructions stored in the storage to detect a target area in the input image. The at least one processor configured to execute the one or more instructions stored in the storage to obtain a feature map of the target area by extracting an image feature of the target area The at least one processor configured to execute the one or more instructions stored in the storage to rearrange feature blocks in the feature map in a feature space. The at least one processor configured to execute the one or more instructions stored in the storage to obtain an output image after the target area is processed based on the rearranged feature blocks and the feature map.
According to an aspect to the disclosure, a computer-readable storage medium configured to store instructions which when executed by at least one processor, cause the at least one processor to execute any one of the image processing methods discussed above.
The technical solutions provided by the embodiments of the present disclosure at least bring the following advantageous effects: according to the image processing method and device of the embodiment of the present disclosure, by rearranging feature blocks in the feature map in a feature space, and obtaining an output image after the target area is processed based on the rearranged feature blocks and the feature map, so that: as for a large area of texture missing areas in the target area, details thereof may be effectively restored, thereby improving the image quality degradation better.
It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and may not limit the present disclosure.
The drawings herein are incorporated into the specification and constitute a part of the specification, show example embodiments conforming to the present disclosure, and together with the specification to explain the principle of the present disclosure, and do not constitute an improper limitation of the present disclosure.
The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
In order to enable ordinary people in the art to better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in combination with the drawings.
It should be noted that the terms “first” and “second” in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific order or sequence. It should be understood that data used in this way may be interchanged under appropriate circumstances so that the embodiments of the present disclosure described herein may be implemented in an order other than those illustrated or described herein. The implementations described in the following embodiments do not represent all implementations consistent with the present disclosure. On the contrary, they are merely examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.
It should be noted here that “at least one of several items” appearing in the present disclosure all means including the three parallel cases: “any one of the several items”, “a combination of any multiple of the several items”, and “the entirety of the several items”. For example, “including at least one of A and B” is including the following three parallel cases: (1) including A; (2) including B; and (3) including A and B. For another example, “executing at least one of Step 1 and Step 2” indicates the following three parallel cases: (1) executing Step 1; (2) executing Step 2; and (3) executing Step 1 and Step 2.
As is traditional in the field, the embodiments are described, and illustrated in the drawings, in terms of functional blocks, units and/or modules. Those skilled in the art will appreciate that these blocks, units and/or modules are physically implemented by electronic (or optical) circuits such as logic circuits, discrete components, microprocessors, hard-wired circuits, memory elements, wiring connections, and the like, which may be formed using semiconductor-based fabrication techniques or other manufacturing technologies. In the case of the blocks, units and/or modules being implemented by microprocessors or similar, they may be programmed using software (e.g., microcode) to perform various functions discussed herein and may optionally be driven by firmware and/or software. Alternatively, each block, unit and/or module may be implemented by dedicated hardware, or as a combination of dedicated hardware to perform some functions and a processor (e.g., one or more programmed microprocessors and associated circuitry) to perform other functions. Also, each block, unit and/or module of the embodiments may be physically separated into two or more interacting and discrete blocks, units and/or modules without departing from the present scope. Further, the blocks, units and/or modules of the embodiments may be physically combined into more complex blocks, units and/or modules without departing from the present scope.
Throughout the Specification, the use of the terms “solve” or “solution” or similar terms with respect to one or more problems or issues may encompass a full or complete solution to the problem or issue, and may also encompass partial or incremental solutions to the problem or issue, and/or solutions which address or mitigate some or all of the effects of the problem or issue.
As discussed above, the recovering effect for texture details may be poor if only the filtering is used to improve the image quality degradation, thus, the improvement to the image quality is very limited.
The performance effect of the related art solutions on the real quality-degraded image is not ideal, mainly due to the following three problems:
Problem 1: For a large area of texture missing, the details may not be recovered.
In a low light condition, large area of texture missing is very common. Although implicit semantic features may be extracted, if there is a large area of texture missing, suffering from a limited receptive field of convolution, these extracted features have little effect on guiding subsequent texture generation. For example, for a hair area in which there is a large area of texture missing, after being processed using related art solutions, there is still the texture missing.
Problem 2: The generated texture is unreasonable.
The related art solutions may not extract explicit semantic information, especially the spatial layout information, which may also be called semantic layout, and which may relate to the locations of eyes, hair, and so on. This information is very important for texture generation, and thus the unreasonable texture may be generated without guidance of such prior information. For example, due to lacking the guidance of the semantic layout, the hair may be generated on the face, which is very undesirable.
Problem 3: The lack of the prior information on the image quality degradation leads to over-processing for the target area.
The degradation degrees of input images vary from light to heavy. Generally, the degree of the image quality degradation is lighter in the case of a sufficient light condition during the day, and heavier in the case of an insufficient light condition at night. Even in different areas of a same image, the degrees of quality degradation may be different. However, the related art solutions may lack a judgment for the image quality degradation information, may adopt a unified processing force for all images, and may adopt a unified processing force for all areas of one image, thus, the problem of the over-processing is very easy to occur. For example, due to the lack of the judgment for the image quality degradation, the detailed texture on the skin may be excessively processed.
After the above problems in the existing facial processing technology are discovered, the present disclosure firstly proposes an image processing method to solve Problem 1 discussed above, which may effectively solve the problem existing in the current facial processing technology that the details may not be repaired for a large area of texture missing areas. Secondly, on this basis, the present disclosure proposes an image processing method that may further solve Problem 2 and Problem 3 discussed above.
Embodiments relating to a target area processing solution may be applied to a night shooting mode of taking pictures by the intelligent terminal. For example, when the user selects the night shooting mode of the intelligent terminal to take pictures at night or under a scene such as insufficient indoor light, the camera of the intelligent terminal will automatically detect the person in the image, and automatically improve the quality of the face area, for example including hair, so that the user may capture high-quality face details. As an example, the image processing method that will be described below may either process all the captured images or process the image on a preview interface. Further, since the above problem usually occurs when the light is insufficient, the image processing method described below may be applied to the night shooting mode of the intelligent terminal, for example.
Hereinafter, an image processing method according to various exemplary embodiments of the present disclosure will be described with reference to
Embodiments relating to the image processing method shown in
Referring to
At step S120, a target area in the input image is detected. Here, any known target area detection method may be used to detect the target area in the input image, and the present disclosure is not limited thereto either. As an example, the target area may be a face area, but is not limited to this.
At step S130, the target area is processed. For example, the processing of the target area may include: obtaining a feature map of the target area, rearranging feature blocks in the feature map in a feature space, and obtaining an output image after the target area is processed based on the rearranged feature blocks and the feature map. As an example, the obtaining of the output image after the target area is processed based on the rearranged feature blocks and the feature map may include: weighting and combining the rearranged feature blocks, and obtaining the output image after the target area is processed based on the weighted and combined feature blocks and the feature map. As an example, a machine learning model may be used to process the target area. In addition, the processing of the target area may include recovering and generating of details, in which for example a finer texture is interpolated at a place where the texture is blurred, and a semantically reasonable, real and natural texture is generated at a place where there is no texture, may include reducing image noise, and may include eliminating blur to make the image clear, and the like.
As an example, when the target area is a face area, processing the target area may include redrawing the face area. “Face redrawing” technology is an important branch of the image quality enhancement technology, also called “face restoration” or “face hallucination”, which may be a technology specially aimed at improving the image quality of the face area, and such an improvement may include but is not limited to the recovering and generating of details in a finer texture is interpolated at a place where the texture is blurred, and a semantically reasonable, and a real and natural texture is generated at a place where there is no texture, reducing image noise, and eliminating blur to make the image clear, and the like. In embodiments, there may be differences between the face redrawing technology and the “beauty” function in many intelligent terminals: firstly, the purposes may be different. The purpose of the beauty function may be to make the face more “good-looking”, with more subjective factors; while the face redrawing may be used to recover the real and natural facial details, which is dominated by objective factors. Secondly, the means may be different. The beauty function may improve the visual effect by changing the characteristics of the face, such as changing the shape of the face to obtain the effect of “bigger eyes” or “thinner face”; while the face redrawing may generate high-frequency detail information by interpolating in the existing texture prior information, or by learning massive high-definition facial prior knowledge, to generate semantically reasonable and real and natural texture details in untextured areas. Third, the ranges may be different. The beauty function usually only relates to the face area, and generally does not deal with the hair, while the face redrawing may improve the entire head feature including the hair.
Hereinafter, examples of step S130 will be described in detail. Firstly, at step S130, for example, the feature map of the target area may be obtained by extracting an image feature of the target area. Here, the image feature may be a texture feature, a shadow feature, a tone feature, an illumination feature, or a color feature, but it is not limited thereto. The present disclosure does not limit the specific feature extraction method. In addition, a deep neural network may be used to extract the image features of the target area to obtain feature maps of different scales.
Next, the feature blocks in the feature map are rearranged in the feature space.
Subsequently, for example, the importance of at least one rearranged feature block and/or the correlation between different feature blocks may be determined, and based on the importance of the at least one rearranged feature block and/or the correlation between different feature blocks, the rearranged feature blocks are weighted and combined. In embodiments, the importance may be expressed as, for example, a level of importance. In embodiments, this importance or level or importance may include an importance or level of importance of the at least one rearranged feature block with respect to, in comparison with, or relative to one or more other rearranged feature blocks. Specifically, the importance of at least one arranged feature block may be analyzed first to determine a weight of each feature block in the at least one feature block. For example, an operation similar to a channel attention mechanism may be used to weight different feature blocks. However, the method of determining the importance of each feature block and then determining its weight is not limited thereto. Next, the correlation between different feature blocks may be calculated. For example, the correlation between different feature blocks may be calculated according to the similarity of texture features. However, the method of calculating the correlation between the feature blocks is not limited thereto. Finally, the rearranged feature blocks may be weighted and combined according to the determined weight of each feature block and/or the calculated correlation between the feature blocks. Although various methods may be adopted to determine the importance of each feature block and the correlation between different feature blocks, how to determine the importance of the at least one rearranged feature block and/or the correlation between different feature blocks better will directly affect the weighting and combing of the rearranged feature blocks, thereby affecting the improving level for the image quality.
After the weighting and combing, the output image after the target area is processed may be obtained based on the weighted and combined feature blocks and the feature map. For example, firstly, a reconstruction feature map may be obtained by recovering the weighted and combined feature blocks to an initial position thereof in the feature map; secondly, the reconstruction feature map may be fused with the feature map; and finally, the output image after the target area is processed may be obtained based on the fused feature map. For example, after the fused feature map is further convoluted, the feature reuse of different spatial locations may be realized, so that the output image after the target area is processed may be obtained.
In embodiments, the feature block a in the obtained feature map of the target area may correspond to the hair on the upper left of the head in the target area, the feature block b may correspond to the hair on the upper right of the head (assuming that the texture missing occurs in this area), the feature block c may correspond to the hair and skin on the side face, and the feature block d may correspond to the skin. It should be noted that, in
After the feature blocks in the feature map are rearranged, the obtained feature blocks may be represented using . Next, the function CA(.) is used to determine the weight of the rearranged feature blocks, and the function CA(.) is implemented through an operation similar to the channel attention mechanism. Firstly, the global feature of each feature block is obtained through global pooling, and then the weight of each feature block is learned through a convolution layer (represented as “convolution+relu” in
Subsequently, the weighted feature map may be represented as . Though goes through the activation function sigmoid after going through two parallel convolutions (using different deformation operations) and cross multiplication, the correlation between the feature blocks may be learned. For example, the correlation between the feature blocks a, b, c, and d may be represented as the following matrix:
The more similar the semantics of feature blocks, the higher the correlation, and vice versa. The information compensation between different feature blocks may be realized through a further cross multiplication with . The above operation may be represented as Equation 2 below:
In Equation 2 above, θ() and φ() may represent different convolution operations performed on . Finally, the obtained weighted and combined feature blocks may be represented as:
After calculation, the following may be obtained:
u=0.784*a+0.080*b+0.132*c+0.011*d
v=0.504*a+0.120*b+0.270*c+0.010*d
w=0.210*a+0.011*b+0.395*c+0.122*d
x=0.077*a+0.012*b+0.40*c+0.325*d
Wherein u, v, w and x are all weighted combinations of a, b, c and d. For example, v=0.504*a+0.120*b+0.27*c+0.01*d. The feature a has the highest weight, because it has the closest semantic relation (the top of the hair), good texture quality and rich texture details. The feature d has a weight close to 0, because it is the feature of the skin and has no semantic relation with the hair.
Finally, after the weighted and combined features u, v, w, and x are obtained, the weighted and combined feature blocks may be restored to their original locations in the feature map to obtain a reconstruction feature map, the reconstruction feature map is fused with the feature map, and the output image after the target area is processed is obtained based on the fused feature map. For example, features b and v may be spatially aligned and connected, thus, the feature b (with loss of details) may effectively reuse useful features in other locations, for example mainly a and c.
The image processing method according to the exemplary embodiment of the present disclosure has been described above in conjunction with
As mentioned above, although various methods may be adopted to determine the importance of at least one feature block and/or the correlation between different feature blocks, how to determine the importance of at least one rearranged feature block and/or the correlation between different feature blocks better will directly affect the weighting and combing of the rearranged feature blocks, thereby affecting the improving level for the image quality. That is to say, how to find the useful information better or more accurately will affect the improvement of the image quality.
Therefore, according to another exemplary embodiment of the present disclosure, the importance of at least one feature block may be determined by performing the following operations: acquiring quality degradation level information and/or texture direction field information of at least one feature block of the feature map, and determining the importance of the at least one feature block, based on the quality degradation level information and/or the texture direction field information. For example, as shown in
The texture direction field information extraction may also be called texture trend extraction. Here, the texture direction field information is used to determine the importance of the feature block and/or the correlation between different feature blocks. Specifically, the texture direction field information may include texture direction field strength information and texture direction field consistency information. The texture direction field strength information may be used to determine the importance of the feature blocks and the texture direction field consistency information may be used to determine the correlation between different feature blocks. In addition, the greater the strength of the texture direction field of the feature block, the higher the importance of the feature block, and the more consistent the texture direction field between the feature blocks, the higher the correlation between the feature blocks. The texture direction field information may be extracted to guide the processing of the target area because when the target area is a face area, the hair, eyebrows, beards and other hair parts in the target area have significant directional features, or the texture thereof has a certain trend, which is shown in
Hereinafter, the acquiring of the texture direction field information of at least one feature block will be discussed in conjunction with
According to another exemplary embodiment, as shown in
As an example, the above machine learning model may adopt a UNet-like structure, but unlike the standard UNet, a feature reconstruction module shown in
As mentioned above, in addition to extracting the texture direction field information to determine the importance and/or correlation of the feature blocks, the semantic layout information and/or the quality degradation level information may also be used for determining the importance and/or the correlation of the feature blocks. Hereafter, it will be described how to obtain the semantic layout information and the quality degradation level information.
As mentioned above, there are also the above Problem 2 and Problem 3 in the related art solutions, and with respect to the above Problem 2 and Problem 3, embodiments of the present disclosure proposes may relate to using the semantic layout information and/or the quality degradation level information to guide the processing of the target area, on the basis of the image processing method shown in
To this end, the above machine learning model according to an exemplary embodiment of the present disclosure may include a semantic encoding branch and/or a quality degradation estimation branch in addition to a processing branch (hereinafter, also called a redrawing branch) for performing the above described processing of the target area.
According to an exemplary embodiment, in the case where the machine learning model includes the semantic encoding branch, the image processing method shown in
Hereinafter, the acquiring of the semantic layout information will be introduced with reference to
Specifically, the acquiring of the semantic layout information of the target area may include: obtaining absolute semantic layout information by parsing the target area; obtaining relative semantic layout information by detecting key points of the target area; and obtaining the semantic layout information by encoding the obtained absolute semantic layout information and relative semantic layout information. That is to say, the semantic layout information may include the absolute semantic layout information and the relative semantic layout information.
Specifically, as shown in
In addition, as shown in
Step 1: detecting the key points of the target area. For example, a face key point detecting module in the semantic encoding branch may be used to detect the key points of the face area.
Step 2: selecting a first base point and a second base point from the detected key points. When the structure of the target area is relatively fixed, for example, when the target area is a face area, since the facial structure is relatively fixed, the facial parts have relative position invariance, and two base points are selected from the detected face key points: the tip of the nose is selected as the base point O, and the point on the bridge of the nose farthest from the point O is selected as the base point N.
Step 3: obtaining the relative semantic layout information by mapping a vector constituted by at least one point in the target area and the first base point to a reference vector constituted by the first base point and the second base point. Here, at least one point in the target area may be at least one pixel point in the target area. For a pixel point X on the image, its position encoding information is mapped through the function f(X), as shown in Equation 3 below:
In Equation 3 above, ω is the normalization constant, and a typical value thereof is selected as 0.2; and γ is a projection of the vector on .
Because the relative invariance of the structure of the target area is used, the position encoding information has the characteristic of rotation invariance, that is, for any point in the target area, it will not change due to the rotation of the target area, as shown in
The target area parsing information reflects the absolute semantic layout information of the target area (e.g., where is a hair area and where is a skin area), while the position encoding information reflects the relative semantic layout information of the target area (e.g., whether it is the hair of the top hair or the hair of the end hair). Combining the two kinds of information may provide richer semantic guide information and make the texture generated by the model more reasonable. For example, for the three points A, B, and C in
As described above, the acquired semantic layout information may be used to determine the correlation between the feature blocks when the feature blocks are weighted and combined. In addition, the semantic layout information may also be used when the output image after the target area is processed is obtained based on the weighted and combined feature blocks and the feature map.
For example, the obtaining of the output image after the target area is processed based on the weighted and combined feature blocks and the feature map mentioned when describing the image processing method of
k2=k1*(1+γ)+β, (Equation 4)
In Equation 4 above, k1 and k2 are values of each pixel point on the feature map in the original feature space and the new feature space respectively, γ and β are a scaling factor and an offset factor respectively.
It may be seen that the function of the semantic layout information is reflected in two aspects. The first one is that it may guide the redrawing branch to generate more natural texture details. Because for any point on the image, the semantic layout information clarifies the semantic of the point, and then a texture that conforms to the semantics may be generated. For example, at a point in the hair area, the texture of the hair instead of the texture of the skin should be generated. Therefore, the semantic layout information may effectively overcome the defect of generating unreasonable textures due to the lack of explicit semantic information in the related art solutions. The second one is that it may help the feature reconstruction module in the redrawing branch to find more similar texture features. For example, the hair area on the top of the right head lacks textures, and the semantics of the hair on the top of the left head is closer than that of the end hair, thus, it will be more reasonable to use the hair feature of the top of the left head to supplement the hair feature of the top of the right head.
As shown in
An example of the operation of the quality degradation branch will be briefly introduced below with reference to
The degradation level prediction phase takes the output residual of the first stage as an input, and outputs a coarse-scale degradation level map (also called a “quality degradation level map”). Each pixel on the map corresponds to the image quality degradation level of a small area on the original image. The degradation level prediction phase may relate to a pixel-level classification network. The degradation level prediction phase may be understood as a quantitative prediction of the image quality degradation level. The degradation level is a reflection of the degree of the image quality degradation in the number. One feasible method is to discretize the degradation level into a certain number of levels, for example, ten levels (different levels correspond to different degradation levels), level 1 represents the lightest quality degradation (the image quality is very good), and level 10 represents the most serious quality degradation (the image quality is very bad), and these levels are quantified into a certain value section, for example, between 0 and 1, as shown for example in
In the present disclosure, the image quality degradation estimation branch works on a small-scale input image (an image obtained by reducing the original image to a certain scale). The purposes of this design may be: (1) it may cause a fast forward test speed; and (2) the degradation level has a regional characteristic, that is, reflecting the image quality of a certain area, thus, pixel-level prediction is unnecessary.
As a result, according to the exemplary embodiment, the above acquiring of the quality degradation level information based on quality degradation levels of different areas in the input image may include: reducing the input image to a predetermined size; predicting quality degradation levels of different areas of the input image reduced to the predetermined size; and quantizing the predicted quality degradation levels to acquire the quality degradation level information.
As shown in
Therefore, by introducing a quality degradation estimation branch in the machine learning model, the condition of the image quality degradation is input to the processing branch as known information, so that the processing branch has the ability to perceive the image quality, so that it is possible to better control the processing intensity. That is to say, when the image quality is relatively good, a slight improvement is made, and when the image quality is relatively poor, a heavy processing is performed.
The image processing method according to various exemplary embodiments of the present disclosure have been described above in conjunction with
In addition, the output image obtained by using the image processing method according to the exemplary embodiment of the present disclosure has a better visual effect compared with the input image and compared with the output image obtained by using the related art solutions.
As mentioned above, there are the above Problem 2 and Problem 3 in the related art solutions, in the above description, with respect to the above Problem 2 and Problem 3, the present disclosure uses the semantic layout information and/or the quality degradation level information to guide the processing of the target area on the basis of the image processing method shown in
That is to say, embodiments of the present disclosure may provide an image processing method that solves only the above Problem 1, an image processing method that solves the above Problem 1 and Problem 2 at the same time, an image processing method that solves the above Problem 1 and Problem 3 at the same time, and an image processing method that solves the above Problem 1, Problem 2, and Problem 3 at the same time, and may also provide an image processing method that solves only the above Problem 2, may also provide an image processing method that solves only the above Problem 3, and may also provide an image processing method that solves the above Problem 2 and Problem 3 at the same time.
Hereinafter, examples of the two kinds of image processing methods will be described with reference to
Subsequently, at step S1730, the semantic layout information of the target area is acquired. For example, at step S1730, the following operations may be performed to extract the semantic layout information: obtaining absolute semantic layout information by parsing the target area; obtaining relative semantic layout information by detecting key points of the target area; and obtaining the semantic layout information by encoding the obtained absolute semantic layout information and relative semantic layout information. For example, a face parsing map may be obtained by parsing the target area, and the absolute semantic layout information is obtained by performing a blur processing on the face parsing map. In addition, for example, the following operations may be performed to extract the relative semantic layout information: detecting the key points of the target area; selecting a first base point and a second base point from the detected key points; and obtaining the relative semantic layout information by mapping a vector constituted by at least one point in the target area and the first base point to a reference vector constituted by the first base point and the second base point. It should be noted that, hereinbefore, the acquiring of the semantic layout information has been described in detail, and the relevant contents may be referred to the above detailed descriptions, which will not be repeated here.
Finally, at step S1740, the target area is processed based on the semantic layout information, so as to obtain a processed output image. In the image processing method shown in
According to the image processing method shown in
Subsequently, at step S1830, the quality degradation level information is acquired based on quality degradation levels of different areas in the input image. Specifically, for example, the following operations may be performed to acquire the quality degradation level information: reducing the input image to a predetermined size; predicting quality degradation levels of different areas of the input image reduced to the predetermined size; and quantizing the predicted quality degradation levels to acquire the quality degradation level information. In the above, how to obtain the quality degradation level information has been described, which will not be repeated here, and the relevant content may be referred to the above detailed description.
Finally, at step S1840, the target area is processed based on the quality degradation level information, so as to obtain a processed output image. The quality degradation level information is used as one of the guide information to guide the processing of the target area.
According to the image processing method shown in
In addition, it should be noted that although the image processing method of
That is to say, both the semantic layout information and the quality degradation level information may be used as the guide information to guide the processing of the target area. In summary, in addition to the image processing method shown in
Referring to
In embodiments, the image processing method shown in
In addition, it should be noted that although the image processing device 1900 is divided into units for performing corresponding processing respectively when being introduced above, it is clear to those skilled in the art that the processing performed by the above respective units may also be performed in the case where the image processing device 1900 does not perform any specific unit division or there is no explicit demarcation between the respective units. In addition, the image processing device 1900 may further include other units, for example, an image preprocessing unit, a storing unit and the like.
Referring to
In embodiments, the image processing method shown in
Referring to
In embodiments, the image processing method shown in
In addition, as mentioned above, both the semantic layout information and the quality degradation level information may be acquired, and the target area is processed based on both the semantic layout information and the quality degradation level information, so as to obtain a processed output image. Correspondingly, in addition to the above image processing device, the present disclosure may also provide an image processing device including the following units: an input image acquiring unit configured for acquiring an input image; a target area detecting unit configured for detecting a target area in the input image; a semantic layout information and/or quality degradation level information acquiring unit configured for acquiring semantic layout information of the target area, and/or acquiring quality degradation level information based on quality degradation levels of different areas in the input image; and a target area processing unit configured for processing the target area based on the semantic layout information and/or the quality degradation level information so as to obtain a processed output image.
Referring to
At least one of the above modules may be implemented through an artificial intelligence (AI) model. The functions associated with AI may be performed by a non-volatile memory, a volatile memory, and a processor.
The processor may include one or more processors. At this time, one or more processors may be general-purpose processors, such as central processing units (CPU), application processors (AP), etc., processors that are only used for graphics (such as graphics processing units (GPU), vision processors (VPU) and/or AI dedicated processors (e.g., neural processing units (NPU)).
One or more processors control the processing of input data according to a predefined operating rule or an AI model stored in the non-volatile memory and volatile memory. The predefined operating rule or the AI model may be provided through training or learning. Here, providing through learning means that by applying a learning algorithm to a plurality of learning data, to form a predefined operation rule or AI model with desired characteristics. The learning may be performed in the apparatus itself that performs AI according to the embodiment, and/or may be implemented by a separate server/apparatus/system.
The learning algorithm is a method that uses a plurality of learning data to train a predetermined target apparatus (e.g., a robot) to enable, allow, or control the target apparatus to make a determination or prediction. Examples of the learning algorithm include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning or reinforced learning.
According to embodiments, in the image processing method executed by the electronic apparatus, the output image after the target area is processed may be obtained by using the input image as the input data of the artificial intelligence model.
The artificial intelligence model may be obtained through training. Here, “obtained through training” refers to training a basic artificial intelligence model with a plurality of training data through a training algorithm, thereby obtaining a predefined operation rule or an artificial intelligence model, which is configured to perform the required feature (or purpose).
As an example, the artificial intelligence model may include a plurality of neural network layers. Each of the plurality of neural network layers includes a plurality of weight values, and the neural network calculation is performed through the calculation result of the previous layer and the calculation between the plurality of weight values. Examples of the neural network include, but are not limited to, a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a Restricted Boltzmann Machine (RBM), a Deep Belief Network (DBN), a Bidirectional Recursive Deep Neural Network (BRDNN), a Generative Adversarial Network (GAN) and Deep Q Network.
As an example, the electronic apparatus may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or other devices capable of executing the above set of instructions. Here, the electronic apparatus does not have to be a single electronic apparatus and may also be any device or a collection of circuits that may execute the foregoing instructions (or instruction sets) individually or jointly. The electronic apparatus may also be a part of an integrated control system or a system manager, or may be configured as a portable electronic apparatus interconnected by an interface with a local or remote (e.g., via wireless transmission).
In the electronic apparatus, the processor may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. As an example and not limitation, the processor may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, and the like.
The processor may run instructions or codes stored in the memory, where the memory may also store data. Instructions and data may also be transmitted and received through a network via a network interface device, wherein the network interface device may use any known transmission protocol.
The memory may be integrated with the processor as a whole, for example, RAM or a flash memory is arranged in an integrated circuit microprocessor or the like. In addition, the memory may include an independent device, such as an external disk drive, a storage array, or any other storage device that may be used by a database system. The memory and the processor may be operatively coupled, or may communicate with each other, for example, through an I/O port, a network connection, or the like, so that the processor may read files stored in the memory.
In addition, the electronic apparatus may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, a mouse, a touch input device, etc.). All components of the electronic apparatus may be connected to each other via a bus and/or a network.
According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to execute the image processing method according to the exemplary embodiment of the present disclosure. Examples of the computer-readable storage medium here include: Read Only Memory (ROM), Random Access Programmable Read Only Memory (PROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disc storage, Hard Disk Drive (HDD), Solid State Drive (SSD), card storage (such as multimedia card, secure digital (SD) card or extremely fast digital (XD) card), magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid state disk and any other devices which are configured to store computer programs and any associated data, data files, and data structures in a non-transitory manner, and provide the computer programs and any associated data, data files, and data structures to the processor or the computer, so that the processor or the computer may execute the computer programs. The computer programs in the above computer-readable storage mediums may run in an environment deployed in computer equipment such as a client, a host, an agent device, a server, etc. In addition, in one example, the computer programs and any associated data, data files and data structures are distributed on networked computer systems, so that computer programs and any associated data, data files, and data structures are stored, accessed, and executed in a distributed manner through one or more processors or computers.
According to an aspect of the disclosure, an image processing method includes acquiring an input image. An image processing method includes detecting a target area in the input image. An image processing method includes processing the target area. The processing of the target area includes obtaining a feature map of the target area. The processing of the target area includes rearranging feature blocks in the feature map in a feature space. The processing of the target area includes obtaining an output image after the target area is processed based on the rearranged feature blocks and the feature map.
The feature map of the target area may be obtained by extracting an image feature of the target area.
The obtaining of the output image may include: weighting and combining the rearranged feature blocks; and obtaining the output image after the target area is processed based on the weighted and combined feature blocks and the feature map.
The weighting and the combining of the rearranged feature blocks may include: determining at least one of a level of importance of at least one rearranged feature block and a correlation between different feature blocks; and weighting and combining the rearranged feature blocks, based on the at least one of the level of importance of the at least one rearranged feature block the correlation between the different feature blocks.
The determining of the level of importance of the at least one rearranged feature block may include: acquiring at least one of quality degradation level information and texture direction field information of at least one feature block of the feature map; and determining the level of importance of the at least one feature block, based on the at least one of the quality degradation level information and the texture direction field information.
The texture direction field information may include texture direction field strength information.
The acquiring of the quality degradation level information may include: reducing the input image to a predetermined size; predicting quality degradation levels of different areas of the input image reduced to the predetermined size; and quantizing the predicted quality degradation levels to acquire the quality degradation level information.
The determining of the correlation between the different feature blocks may include: acquiring at least one of semantic layout information of the target area and texture direction field information of at least one feature block of the feature map; and determining the correlation between the different feature blocks, based on the at least one of the semantic layout information and the texture direction field information.
The texture direction field information may include texture direction field consistency information.
The acquiring of the texture direction field information may include: acquiring a gradient field corresponding to the at least one feature block of the feature map; and obtaining the texture direction field information by applying expansion convolutions with different expansion rates to the gradient field.
The acquiring of the semantic layout information may include: obtaining absolute semantic layout information by parsing the target area; obtaining relative semantic layout information by detecting key points of the target area; and obtaining the semantic layout information by encoding the obtained absolute semantic layout information and the relative semantic layout information.
The obtaining of the absolute semantic layout information may include: obtaining a face parsing map by parsing the target area; and obtaining the absolute semantic layout information by performing a blur processing on the face parsing map.
The obtaining of the relative semantic layout information may include: detecting the key points of the target area; selecting a first base point and a second base point from among the detected key points; and obtaining the relative semantic layout information by mapping a vector including at least one point in the target area and the first base point, to a reference vector including the first base point and the second base point.
The obtaining of the input image may include: obtaining a reconstruction feature map by recovering the weighted and combined feature blocks to initial positions of the feature blocks in the feature map; fusing the reconstruction feature map with the feature map; and obtaining the output image after the target area is processed based on the fused feature map.
The obtaining of the output image may include: fusing semantic layout information of the target area to the feature map; obtaining a reconstruction feature map by recovering the weighted and combined feature blocks to initial positions of the feature blocks in the feature map; fusing the reconstruction feature map with the feature map fused with the semantic layout information; and obtaining the output image after the target area is processed based on the fused feature map.
According to an aspect to the disclosure, an image processing method includes acquiring an input image. An image processing method includes detecting a target area in the input image. An image processing method includes acquiring at least one of semantic layout information of the target area and quality degradation level information based on quality degradation levels of different areas in the input image. An image processing method includes processing the target area based on the at least one of the semantic layout information and the quality degradation level information to obtain a processed output image.
The acquiring of the semantic layout information may include: obtaining absolute semantic layout information by parsing the target area; obtaining relative semantic layout information by detecting key points of the target area; and obtaining the semantic layout information by encoding the obtained absolute semantic layout information and the relative semantic layout information.
The obtaining of the absolute semantic layout information may include: obtaining a face parsing map by parsing the target area, and obtaining the absolute semantic layout information by performing a blur processing on the face parsing map.
The obtaining of the relative semantic layout information may include: detecting the key points of the target area; selecting a first base point and a second base point from the detected key points; and obtaining the relative semantic layout information by mapping a vector including at least one point in the target area and the first base point to a reference vector including the first base point and the second base point.
The acquiring of the quality degradation level information may include: reducing the input image to a predetermined size; predicting the quality degradation levels of the different areas of the reduced input image; and quantizing the predicted quality degradation levels to acquire the quality degradation level information.
According to an aspect to the disclosure, an image processing device comprises at least one storage configured to store one or more computer executable instructions. The image processing device comprises at least one processor configured to execute the one or more instructions stored in the storage to acquire an input image. The at least one processor configured to execute the one or more instructions stored in the storage to detect a target area in the input image. The at least one processor configured to execute the one or more instructions stored in the storage to obtain a feature map of the target area by extracting an image feature of the target area The at least one processor configured to execute the one or more instructions stored in the storage to rearrange feature blocks in the feature map in a feature space. The at least one processor configured to execute the one or more instructions stored in the storage to obtain an output image after the target area is processed based on the rearranged feature blocks and the feature map.
According to an aspect to the disclosure, an image processing device includes an input image acquiring unit configured to acquire an input image; a target area detecting unit configured to detect a target area in the input image; and a target area processing unit configured to process the target area, wherein the processing the target area includes obtaining a feature map of the target area, rearranging feature blocks in the feature map in a feature space, and obtaining an output image after the target area is processed based on the rearranged feature blocks and the feature map.
According to an aspect to the disclosure, an image processing device includes an input image acquiring unit configured to acquire an input image; a target area detecting unit configured to detect a target area in the input image; an information acquiring unit configured to acquire at least one of semantic layout information of the target area and quality degradation level information based on quality degradation levels of different areas in the input image; and a target area processing unit configured to process the target area based on the at least one of the semantic layout information and the quality degradation level information to obtain a processed output image.
According to an aspect to the disclosure, an electronic apparatus includes at least one processor; and at least one storage configured to store computer executable instructions, wherein the computer executable instructions, when executed by the at least one processor, cause the at least one processor to execute any one of the image processing methods discussed above.
According to an aspect to the disclosure, a computer-readable storage medium configured to store instructions which when executed by at least one processor, cause the at least one processor to execute any one of the image processing methods discussed above.
Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the embodiments disclosed herein. The present application is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. The specification and the embodiments are only to be regarded as exemplary, and the true scope and spirit of the present disclosure are defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
202111260449.8 | Oct 2021 | CN | national |
This application is a bypass continuation application of PCT International Application No. PCT/KR2022/016169, which was filed on Oct. 21, 2022, in the Korean Intellectual Property Office, and which claims priority to Chinese Patent Application No. 202111260449.8, filed on Oct. 28, 2021, in the China National Intellectual Property Administration, the disclosures of which are incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2022/016169 | Oct 2022 | US |
Child | 17982111 | US |