The present disclosure relates in general to video image processing. In particular, this disclosure relates to face region detection and local reshaping enhancement.
Face detection methods have been used in various applications that identify human faces in images and/or videos. In some of the existing face region detection methods, the face region can be detected by skin tone. Some methods based on graph-cut or graphical models may use the bounding boxes of faces to predict segmentation of faces in images. Based on recently developed techniques, deep convolutional neural networks for semantic and instance segmentation tasks can be used for face region detection.
The disclosed methods and devices provide an efficient framework to detect face region in images given bounding boxes of faces and apply different adjustment on the face region in local reshaping. The detection of face region is based on histogram analysis of the face and can be efficiently extended to continuous frames in video clips. When applying the detected face region to local reshaping, the contrast and saturation of faces can be adjusted separately from other image contents to avoid over-enhancement of details, such as wrinkles or spots, on faces.
An embodiment of the present invention is a method of face region detection in an input image including one or more faces, the method comprising: providing face bounding boxes and confidence levels for each face of the one or more faces; based on the input image, generating a histogram of all pixels; based on the input image and the face bounding boxes, generating histograms of the one or more faces; based on the histogram of all pixels and the histograms of the one more face, generating a probability of face, and based on the probability of face, generating a face probability map. Another embodiment of the present invention utilizes the face region detection of the previous embodiment to apply local reshaping by applying face saturation adjustment and face contrast adjustment to the face probability map to generate an adjusted face probability map; and generating a reshaped image based on the adjusted face probability map and one or more selected reshaping function.
A method may be computer-implemented in some embodiments. For example, the method may be implemented, at least in part, via a control system comprising one or more processors and one or more non-transitory storage media.
Some or all of the methods described herein may be performed by one or more devices according to instructions (e.g. software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. Accordingly, various innovative aspects of the subject matter described in this disclosure may be implemented in a non-transitory medium having software stored thereon. The software may, for example, be executable by one or more components of a control system such as those disclosed herein. The software may, for example, include instructions for performing one or more of the methods disclosed herein.
At least some aspects of the present disclosure may be implemented via an apparatus or apparatuses. For example, one or more devices may be configured for performing, at least in part, the methods disclosed herein. In some implementations, an apparatus may include an interface system and a control system. The interface system may include one or more network interfaces, one or more interfaces between the control system and memory system, one or more interfaces between the control system and another device and/or one or more external device interfaces. The control system may include at least one of a general-purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. Accordingly, in some implementations the control system may include one or more processors and one or more non-transitory storage media operatively coupled to one or more processors.
Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims. Note that the relative dimensions of the following figures may not be drawn to scale. Like reference numbers and designations in the various drawings generally indicate like elements, but different reference numbers do not necessarily designate different elements between different drawings.
The previous methods of facial recognition for image processing have drawbacks for video. For example, skin tone detection cannot be generalized well, because skin tone varies between different people and different lighting conditions. Predicting segmentation is computationally expensive for video. And neural networks can create flickering artifacts in further operations due to missing detections and temporal inconsistency. The systems and methods provided herein avoid those deficiencies.
As used herein, “face bounding box” refers to an imaginary (non-drawn) rectangle that serves as a point of reference for a face detected by a face detection algorithm.
As used herein, “histogram of a face” refers to grouped data for a detected face image.
As used herein, “face probability map” refers to a pixel mapping of an image to the probabilities of each pixel individually being part of a face.
As used herein, “basic face shape” or “basic face shape model” refers to a shape (e.g. an ellipse) that represents generally the size and shape of a detected face and a “basic face shape map” refers to a pixel mapping of basic face shapes in an image.
As used herein, “probability of face” and “probability of non-face” refer to the calculated probability of a pixel being in a face or not in a face respectively.
As used herein, “soft morphological operation” refers to non-linear operations related to the shape or morphology of features in an image where the maximum and the minimum operations used in standard gray-scale morphology are replaced by weighted order statistics.
As used herein, “face adjustment” refers to applying reshaping operations on the detected face regions of an image.
As shown in the exemplary embodiment of
Local reshaping (100′) processing can then be applied. With the face probability map (15), different local reshaping (17) operations on face region are applied. The contrast and saturation in face region are adjusted (16) so that it looks natural and visually pleasant in the reshaped image (18). In an embodiment, local reshaping methods like those proposed in U.S. Prov. App. Ser. 63/086,699 filed by the applicant of the present disclosure, for “Adaptive Local Reshaping For SDR-To-HDR Up-Conversion” filed on Oct. 2, 2020 and incorporated herein by reference in its entirety, can be used. In this method, the contrast and saturation for each pixel can be easily adjusted.
With continued reference to
With further reference to
According to the teachings of the present disclosure, as part of the histogram analysis, a face shape model is used to generate the initial guess of face region for calculating the generic histogram of face. In order to capture the diversity of colors in different faces in the same image, the individual histogram of each face is also calculated.
With further reference to
where operator .* is element-wise multiplication. In order to further clarify the above-disclosed teachings, reference is made to
Referring back to
With continued reference to
In addition to the global generic histogram of all faces, the local individual histogram of each face is also considered to capture the variation of each face. This is illustrated by an exemplary diagram shown in
With further reference to
In addition, the keeping ratio rkeep,k of the trimmed histogram, i.e. the ratio of total pixel count before and after trimming, may be recorded for future use. Such ratio can be obtained as follows:
For an improved result, in order to trim the histogram, the contiguous bins of size ÑbinY×ÑbinU×ÑbinV inside which the summation of histogram is maximum may be found. However, the resulting computation may be large because the histogram is 3-D. Therefore, the histogram may be trimmed in one channel at a time in the order of Y, U and V channels. An example of parameters is ÑbinY=64, ÑbinU=ÑbinV=16 for all faces. Moreover, most of the faces may have a keeping ratio of, for example, larger than 90%.
Continuing with the trimming process disclosed above, and in view of possible memory limitations, the maximum number of faces, Nface,max, may be set for storing individual histograms. As such, when Nface>Nface,max, the Nface, max most important faces are only kept. Because larger faces in image usually attract more attention, the size of the bounding boxes may be used as a measure of importance. Additionally, the detection score of bounding boxes may be considered to avoid false detections. Therefore, the importance of each face may be defined based on their area and detection score as shown in the following equation:
where the area is normalized by W*H/Nface,max and clipped to 1 because if a face is large enough, it is deemed as important. The term Nface,max is put in the denominator because the more faces can be kept, the smaller faces can be considered. The top Nface, max faces with the highest importance are selected. An exemplary value for Nface,max, is Nface,max=16.
With reference to
= hist(bstartY: bstartY + ÑbinY, bstartU: bstartU + ÑbinU, bstartV: bstartV + ÑbinV)
With the generated histograms as previously disclosed, the probability of face for each bin can be defined. Generally, if a color has higher value in a histogram of face, it is more likely to be part of the face. Therefore, the initial probability of face can be estimated directly from the generic histograms of face and all pixels. However, because the histogram of face is estimated from the basic shape map, which is just an initial guess of face region, further refining of the initial probability by adapting it to the histograms locally in YUV color space may be needed. As such, iterative adaptive sorting and probability propagation based on the individual histograms of each face and the generic histogram of non-face may be implemented. Details of initial probability estimation, adaptive sorting, and probability propagation are presented through the exemplary diagrams of
where is 3-D Gaussian filtering (63) with standard deviation σhist. Operator ./ is element-wise division (64). To avoid dividing by zero, rface(b) may be set to 0 for bin b where (histall) is 0. The purpose of Gaussian filtering is to reduce the noise in histogram. Standard deviation σhist may be set to, for example, σhist=0.25 (in bin). Scaling and thresholding (65) is then applied on the ratio to get the initial probability of face (66). The larger the ratio, the larger the probability. For each bin b, the following applies:
where r0 and r1 are thresholds of ratio of histogram. From the above equation, it can be noticed that when rface<r0, pface,init=0. On the other hand, when rface>r1, pface,init=1. Thresholds r0 and r1 may be set, for example, to r0=0.1 and r1=0.5. Moreover, the histogram of non-face (68) may be defined as the difference (67) between the histograms histnonface=histall−histface. As will be seen later, histogram of non-face (68) will be used in the adaptive sorting process which will detailed in the next section.
Where is the set of bins whose probabilities are to be updated to 0. In other words:
where is the set of the bins with the lowest probability. The above-disclosed method is illustrated in
Referring back to
where is the set of bins whose probabilities are to be updated to 1:
The updated probability from all faces (75) can be acquired by considering the updates from all faces:
In practice, the trimmed histograms face,k, may only be available. In addition, in such trimmed histogrammed only rkeep,k portion of pixel counts in histface,k may be kept. Therefore, the cumulative pixel count may need to reach θface/rkeep,k of the sum of face,k instead. Moreover, when θface/rkeep,k>1, all the probability of all bins in trimmed histogram are may be set to 1. The value for parameters θnonface and θface may be empirically decided. As an example, θnonface=0.9 and θface=0.75.
The pseudocode below shows an example of how the probability from non-face can be calculated:
The pseudocode below shows an example of how the probability from face can calculated
With further reference to
To avoid division by zero, pface′ may be set to 0 at the bins where histall is 0. Moreover, because the probability is updated based on the sort index, it may undergo sharp changes between neighbor bins. As such, Gaussian filtering (78) may be performed in the 3-D bins to make the probability of face (79), pface, smooth to avoid potential artifact in later stages of processing. The standard deviation of the gaussian filter, σprop, may be set, for example, to σprop=0.25.
With continued reference to
With reference to
With further reference to
Referring back to
The output of guided image filter (93) may be clipped between [0,1] because the guided image filter (93) is based on ridge regression and may create noise due to outliers. Also, the probability map of ROI may be applied so that the face region is inside ROI, i.e. Mface(i)≤MROI(i)∀i.
Referring back to
The soft morphological operation (901) of
Parameters to control the soft morphological operation (901) include σmorph, the standard deviation for Gaussian filtering (95), and amorph, the scaling factor to decide whether to expand the face region or not. Operator .* is elementwise multiplication. From the above definition, it can be seen that each pixel is multiplied by the weighted average of its surrounding pixels (Mface). As part of scaling and thresholding (97) step, for a pixel at which Mface>0, if (Mface)>1/amorph, the pixel value will be amplified after the operation. On the other hand, if (Mface)<1/amorph, the pixel value will be decreased after the operation. In other words, the pixel is preserved only if its surroundings have high values. Additionally, the operation may be repeated for nsoftmorph iterations to gradually refine the probability map (92), as shown in the following:
where (.) means repeating (.) for nsoftmorph times. Also, the probability map of ROI may be applied so that the face region is inside ROI, i.e. Mface(i)≤MROI(i)∀i. Parameters σmorph, amorph, and nsoftmorph may be set as, for example, σmorph=25, amorph=3 and nsoftmorph=2.
B. Local Reshaping with Face Adjustment
When local reshaping is performed, different reshaping functions can be applied on different pixels locally. The reshaping functions can control and enhance the image properties such as contrast, saturation, or other visual features, see e.g. the above-mentioned U.S. Prov. App. Ser. 63/086,699, incorporated herein by reference in its entirety. For most of the image contents, higher contrast and saturation bring better viewing experience to common people. However, for the face in images, higher contrast and saturation are not always better. People may not prefer the details, such as wrinkles or spots, on faces to be enhanced. Moreover, less saturated faces may be preferred compared with faces with over saturated skin color, which looks unnatural, i.e. changed skin tone. Local reshaping with face adjustment in accordance with the teachings of the present disclosure can be applied to address such problem. With reference to
With further reference to
where siY, siU, siV, viY, viU, and viV are the i-th pixel in SY, SU, SV, VY, VU and VV, respectively. B, MMRU, and MMRV are the family of reshaping functions for Y, U, and V channels, respectively, and LiY, LiU, and LiV are the corresponding indices of the selected reshaping functions for the i-th pixel. For simplicity, the indices for all pixels are denoted as index maps LY, LU and LV. Therefore, given an input image and corresponding index maps, the local reshaping operation for each pixel can be performed accordingly.
With carefully designed families of reshaping functions, the brightness, contrast, saturation, or other visual features in the reshaped images can be changed by adjusting the index maps. For example, as described, e.g. in the above-mentioned U.S. Prov. App. Ser. 63/086,699 incorporated herein by reference in its entirety, the local detail and contrast enhancement can be achieved by using:
Or equivalently
where {tilde over (S)}Y is the Y channel of normalized input image in the range of, for example, [0,1] and {tilde over (S)}Y,(l) is the corresponding edge-preserving filtered image. α is the map of enhancement strength for each pixel. The larger the α, the stronger the enhancement. ƒSL(.) is a pixelwise non-linear function to further adjust the enhancement based on pixel brightness. L(g) is a constant global index for the whole image, which control the overall look, such as brightness and saturation, of the reshaped images. Moreover, when α=0, all the pixels use the same reshaping function and this is called global reshaping, which means no local contrast and detail enhancement. As an example, 4096 reshaping functions in the family of reshaping functions can be considered for each channel. The parameter used may be a the default setting such as α=3.8*c1 for all pixels, where c1, is the model parameter and can be set as, for example, c1=2687.1.
With continued reference to
In some applications the enhancement the details, such as wrinkles or spots, on faces like other image contents may not be desired. As such, there may be a need to reduce the enhancement strength in face region when performing detail and contrast enhancement. The adjusted index map LY may be defined as:
where rface is the face contrast reduction ratio. It can be seen that for pixel i, if Mface(i)=1, ΔLface,c(i) becomes −rface(i)*α(i)*({tilde over (S)}Y(i)−{tilde over (S)}Y,(l)(i)) and the term ΔL(l)(i)+Δlface,c(i) in Equation (22) can be written as (1−rface(i)*α(i)*({tilde over (S)}Y(i)−{tilde over (S)}Y,(l)(i)). By comparing with Equation (20) and (21), the enhancement strength drops from a (i) to (1−rface)*α(i). Therefore, ΔLface,c reduces the contrast on faces for 0<rface≤1. When rface=0, there is no adjustment. When rface=1, the enhancement strength on face becomes 0. Empirically, if the enhancement strength on a face is 0, the face may look over-smoothed compared to the surrounding image contents, which are enhanced in the original strength. As an example, rface may be set as rface=0.5.
In general, increasing the color saturation in images improves the viewing experiences. However, when it comes to the faces in images, increasing the color saturation in the same way as other image contents may be undesired. Over saturated skin color will make the faces looks unnatural or unhealthy. With reference to
As described in U.S. Prov. App. Ser. 63/086,699 incorporated herein by reference in its entirety, in general, the smaller the index of a reshaping function, the less saturated the reshaped image. In addition, the darker the input pixel, the more sensitive the reshaped pixel to the index.
In view of the above, based on the acquired LY as disclosed in the previous section, the adjusted index maps LU and LY can be further defined as:
in Equation (23) dface is the face desaturation offset. θsat is the threshold to control the desaturation. Therefore, ΔLface,s reduces the saturation on face when dface>0 and θsat>0. The larger the dface, the more the desaturation. When dface=0, there is no desaturation. Empirically, parameters dface and θsat may be set as, for example, dface=1024 and θsat=0.5.
A number of embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the present disclosure. Accordingly, the invention may be embodied in any of the forms described herein, including, but not limited to the following Enumerated Example Embodiments (EEEs) which described structure, features, and functionality of some portions of the present invention:
The present disclosure is directed to certain implementations for the purposes of describing some innovative aspects described herein, as well as examples of contexts in which these innovative aspects may be implemented. However, the teachings herein can be applied in various different ways. Moreover, the described embodiments may be implemented in a variety of hardware, software, firmware, etc. For example, aspects of the present application may be embodied, at least in part, in an apparatus, a system that includes more than one device, a method, a computer program product, etc. Accordingly, aspects of the present application may take the form of a hardware embodiment, a software embodiment (including firmware, resident software, microcodes, etc.) and/or an embodiment combining both software and hardware aspects. Such embodiments may be referred to herein as a “circuit,” a “module”, a “device”, an “apparatus” or “engine.” Some aspects of the present application may take the form of a computer program product embodied in one or more non-transitory media having computer readable program code embodied thereon. Such non-transitory media may, for example, include a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. Accordingly, the teachings of this disclosure are not intended to be limited to the implementations shown in the figures and/or described herein, but instead have wide applicability.
The examples set forth above are provided to those of ordinary skill in the art as a complete disclosure and description of how to make and use the embodiments of the disclosure, and are not intended to limit the scope of what the inventor/inventors regard as their disclosure.
Modifications of the above-described modes for carrying out the methods and systems herein disclosed that are obvious to persons of skill in the art are intended to be within the scope of the following claims. All patents and publications mentioned in the specification are indicative of the levels of skill of those skilled in the art to which the disclosure pertains. All references cited in this disclosure are incorporated by reference to the same extent as if each reference had been incorporated by reference in its entirety individually.
It is to be understood that the disclosure is not limited to particular methods or systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. The term “plurality” includes two or more referents unless the content clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.
Number | Date | Country | Kind |
---|---|---|---|
21188517.3 | Jul 2021 | EP | regional |
This application claims the benefit of priority from U.S. Provisional patent application Ser. No. 63/226,938, filed on 29 Jul. 2021, and EP application Ser. No. 21/188,517.3, filed on 29 Jul. 2021, which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/038249 | 7/25/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63226938 | Jul 2021 | US |