FACE REGION DETECTION AND LOCAL RESHAPING ENHANCEMENT

TECHNICAL FIELD

The present disclosure relates in general to video image processing. In particular, this disclosure relates to face region detection and local reshaping enhancement.

BACKGROUND

Face detection methods have been used in various applications that identify human faces in images and/or videos. In some of the existing face region detection methods, the face region can be detected by skin tone. Some methods based on graph-cut or graphical models may use the bounding boxes of faces to predict segmentation of faces in images. Based on recently developed techniques, deep convolutional neural networks for semantic and instance segmentation tasks can be used for face region detection.

SUMMARY

The disclosed methods and devices provide an efficient framework to detect face region in images given bounding boxes of faces and apply different adjustment on the face region in local reshaping. The detection of face region is based on histogram analysis of the face and can be efficiently extended to continuous frames in video clips. When applying the detected face region to local reshaping, the contrast and saturation of faces can be adjusted separately from other image contents to avoid over-enhancement of details, such as wrinkles or spots, on faces.

An embodiment of the present invention is a method of face region detection in an input image including one or more faces, the method comprising: providing face bounding boxes and confidence levels for each face of the one or more faces; based on the input image, generating a histogram of all pixels; based on the input image and the face bounding boxes, generating histograms of the one or more faces; based on the histogram of all pixels and the histograms of the one more face, generating a probability of face, and based on the probability of face, generating a face probability map. Another embodiment of the present invention utilizes the face region detection of the previous embodiment to apply local reshaping by applying face saturation adjustment and face contrast adjustment to the face probability map to generate an adjusted face probability map; and generating a reshaped image based on the adjusted face probability map and one or more selected reshaping function.

A method may be computer-implemented in some embodiments. For example, the method may be implemented, at least in part, via a control system comprising one or more processors and one or more non-transitory storage media.

Some or all of the methods described herein may be performed by one or more devices according to instructions (e.g. software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. Accordingly, various innovative aspects of the subject matter described in this disclosure may be implemented in a non-transitory medium having software stored thereon. The software may, for example, be executable by one or more components of a control system such as those disclosed herein. The software may, for example, include instructions for performing one or more of the methods disclosed herein.

At least some aspects of the present disclosure may be implemented via an apparatus or apparatuses. For example, one or more devices may be configured for performing, at least in part, the methods disclosed herein. In some implementations, an apparatus may include an interface system and a control system. The interface system may include one or more network interfaces, one or more interfaces between the control system and memory system, one or more interfaces between the control system and another device and/or one or more external device interfaces. The control system may include at least one of a general-purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. Accordingly, in some implementations the control system may include one or more processors and one or more non-transitory storage media operatively coupled to one or more processors.

Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims. Note that the relative dimensions of the following figures may not be drawn to scale. Like reference numbers and designations in the various drawings generally indicate like elements, but different reference numbers do not necessarily designate different elements between different drawings.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows an exemplary diagram of the face region detection and local reshaping with face adjustment according to an embodiment of the present disclosure.

FIG. 2 shows an exemplary diagram of the face region detection process according to an embodiment of the present disclosure.

FIG. 3 shows an exemplary diagram of generating global generic histograms according to an embodiment of the present disclosure.

FIG. 4 shows an image with detected faces according to an embodiment of the present disclosure.

FIG. 5 shows an exemplary diagram of generating individual histograms of faces in an image according to an embodiment of the present disclosure.

FIG. 6 shows an exemplary diagram of calculating the initial probability of face according to an embodiment of the present disclosure.

FIG. 7 shows an exemplary diagram of the adaptive sorting and probability propagation process according to an embodiment of the present disclosure.

FIGS. 8A to 8D show example graphs and histograms related to the present disclosure. FIG. 8A shows an exemplary graph of probability of face and FIG. 8B shows an exemplary histogram of non-face according to an embodiment of the present disclosure. FIG. 8C shows an exemplary histogram of truly non-face and FIG. 8D shows an exemplary graph of updated probability of non-face according to an embodiment of the present disclosure.

FIG. 9 shows an exemplary diagram illustrating the details of the local post processing step according to an embodiment of the present disclosure.

FIG. 10 shows an exemplary diagram of the local reshaping according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

The previous methods of facial recognition for image processing have drawbacks for video. For example, skin tone detection cannot be generalized well, because skin tone varies between different people and different lighting conditions. Predicting segmentation is computationally expensive for video. And neural networks can create flickering artifacts in further operations due to missing detections and temporal inconsistency. The systems and methods provided herein avoid those deficiencies.

As used herein, “face bounding box” refers to an imaginary (non-drawn) rectangle that serves as a point of reference for a face detected by a face detection algorithm.

As used herein, “histogram of a face” refers to grouped data for a detected face image.

As used herein, “face probability map” refers to a pixel mapping of an image to the probabilities of each pixel individually being part of a face.

As used herein, “basic face shape” or “basic face shape model” refers to a shape (e.g. an ellipse) that represents generally the size and shape of a detected face and a “basic face shape map” refers to a pixel mapping of basic face shapes in an image.

As used herein, “probability of face” and “probability of non-face” refer to the calculated probability of a pixel being in a face or not in a face respectively.

As used herein, “soft morphological operation” refers to non-linear operations related to the shape or morphology of features in an image where the maximum and the minimum operations used in standard gray-scale morphology are replaced by weighted order statistics.

As used herein, “face adjustment” refers to applying reshaping operations on the detected face regions of an image.

As shown in the exemplary embodiment of FIG. 1, the disclosed method comprises face region detection (100). Given an input image (11) and pre-detected face bounding boxes (10), the histogram (12) properties are analyzed and the probability (13) of face is predicted for each bin in the histogram. Local post-processing (14) is then applied, and as a result, smoothness is improved and small noise in the generated face probability map (15) is removed.

Local reshaping (100′) processing can then be applied. With the face probability map (15), different local reshaping (17) operations on face region are applied. The contrast and saturation in face region are adjusted (16) so that it looks natural and visually pleasant in the reshaped image (18). In an embodiment, local reshaping methods like those proposed in U.S. Prov. App. Ser. 63/086,699 filed by the applicant of the present disclosure, for “Adaptive Local Reshaping For SDR-To-HDR Up-Conversion” filed on Oct. 2, 2020 and incorporated herein by reference in its entirety, can be used. In this method, the contrast and saturation for each pixel can be easily adjusted.

With continued reference to FIG. 1, the disclosed methods may also be integrated with existing linear encoding architecture for local reshaping (see e.g. U.S. Prov. App. Ser. 63/086,699 mentioned above) to address the real-world conversion scenario. The presented methods take advantage of the sliding window in linear encoding architecture to enhance the temporal stability in final video quality

A. Face Region Detection

FIG. 2 shows an exemplary diagram of the face region detection process according to an embodiment of the present disclosure. Given that the colors of faces are most likely different from the colors of other contents in the same image, such process is based on analyzing the histograms of face in YUV color space. Given an input image (201) and pre-detected bounding boxes of faces (200), as shown in histogram analysis step (220), the generic histograms (202) of face and all pixels and the individual histograms (204) of each detected face in YUV color space are first calculated. A basic face shape model (203) is used to generate the basic shape map, which is an initial guess of face region in the input image (201) for calculating the histograms. As part of step (230), the initial probability (205) of face in YUV color space is calculated from the generic histograms (202). Adaptive sorting (206) may then be used to refine the initial probability of face based on the individual histograms (204) of each face and the generic histograms (202). The probability of face then is iteratively updated and propagated (207) in YUV color space.

With further reference to FIG. 2, as shown in local post-processing step (240), given the refined probability of face, local smoothing (208) is first performed to avoid artifact due to abrupt probability changes in image. This is followed by iterative application of soft morphological operation (209) to remove small noise from the final face probability map (215) of the input image (201). The pre-detected bounding boxes (200) of faces may be from any kinds of face detectors that predict the bounding boxes and corresponding detection scores of faces. In what follows, details of various steps shown in the embodiment of FIG. 2 will be described.

A.1 Histogram Analysis

According to the teachings of the present disclosure, as part of the histogram analysis, a face shape model is used to generate the initial guess of face region for calculating the generic histogram of face. In order to capture the diversity of colors in different faces in the same image, the individual histogram of each face is also calculated.

A.1.1 Global Generic Histograms

FIG. 3 shows an exemplary diagram of generating global generic histograms according to an embodiment of the present disclosure. The generic histograms refer to the histogram of all faces or the histogram of all pixels. To calculate the generic histogram of face in the input image (31), the region of face is defined first. Based on the already detected bounding boxes (30) of faces, each of them may be filled with the average shape of face, i.e. basic face shape (32), to get an initial guess of face region. Given an input image (31), S, of size W×H and N_facedetected faces with bounding boxes (c_k,x_k,y_k,w_k,h_k), k=0, . . . , N_face−1, where c_kis the detection score between 0 and 1, (x_k,y_k) is the coordinate (either in integer or floating point) of the top-left corner of the bounding box, and (w_k,h_k) is the size (either in integer or floating point) of the bounding box of the k-th detected face, a basic shape map (33), M_Q, can be generated. Such basic shape map is the initial guess of face region, using the pre-defined or pre-trained basic face shape (32) model, denoted as Q. The basic face shape (32) model is the probability map of face inside the detected bounding box. It can also be viewed as the average shape of face. As an example, a basic face shape (32) model, Q, can be a solid inscribed ellipse of the bounding box, i.e. 1 inside the ellipse and 0 outside the ellipse. As another example, the basic face shape (32) model can also be learned from training data of segmentation of faces. In general, a basic face shape (32) model can be saved as a probability map of size W_Q×H_Qand resized for each detected face.

With further reference to FIG. 3, for the k-th detected face, the face shape model may be resized and shifted to fit it into the bounding box (30) to obtain the probability map of face M_Q,k. Then the probability map is multiplied by the detection score c_kto reduce the effect from false positive detections, which usually have lower detection scores. Then, the probability maps of all detected faces are added to the basic shape map (33), M_Q. The maximum value of M_Qmay be clipped to 1 in case there are overlapping bounding boxes. If there are any letterboxes in M_Q, such letterboxed are excluded. Given the probability map of non-active region (padded black areas with arbitrary shapes, such as letterboxes, pillar boxes, circles, or any other shapes) M_Lobtained from a non-active region detector such as the one described in U.S. Prov. App. Ser. 63/209,602 filed by the applicant of the present disclosure, for “Surround Area Detection And Blending For Image Filtering” filed on Jun. 11, 2021 and incorporated herein by reference in its entirety, the probability map of region of interest (ROI) can be defined as M_ROI=1−M_Land then multiplied by M_Q. Therefore, the final M_Qcan be formulated as:

$\begin{matrix} M_{Q} = M_{ROI} . * (\min (\sum_{k} c_{k} M_{Q, k}, 1)) & (1) \end{matrix}$

where operator .* is element-wise multiplication. In order to further clarify the above-disclosed teachings, reference is made to FIG. 4 showing an image (400) wherein four faces have been detected. Image (400) includes image main area (401) and letterbox (402). A basic shape map (403) and a face bounding box (404) associated with the four faces are also shown.

Referring back to FIG. 3, as the bounding boxes (30) from face detector may not always be perfect, the actual face region may be outside the bounding boxes (30). Therefore, the probability of face outside the bounding box may not be 0. In this case, the center and scale of the bounding box may be fixed with scaling factor ƒ_box,xand ƒ_box,yfor x and y directions before fitting the basic face shape model. The pseudocode below shows an example of how the basic shape map is generated from basic face shape model of inscribed ellipse:

Generate the basic shape map from basic face shape

model of inscribed ellipse

Input: detected face bounding boxes (c_k, x_k, y_k, w_k, h_k),

k = 0, ... , N_face− 1,

probability map of ROI M_ROI, scaling factors f_box,x, f_box,y

Output: basic shape map M_Q

// Initialization

for (i = 0; i < H; i + +) {

for(j = 0; j < W; j + +) {

M_Q(i, j) = 0

}

}

// Add probability of each detected face to map

for (k = 0; k < N_face; k + +) {

// Skip invalid detection

if (wk == 0 or h_k== 0 or c_k== 0) {

continue

}

// Center and half of the width and height of the rectangle to fit

c_x= x_k+ w_k/2

c_y= y_k+ h_k/2

s_x= f_box,x* w_k/2

s_y= f_box,y* h_k/2

// Location and size of the rectangle to fit

x_f= c_x− S_x

y_f= c_y− S_y

w_f= 2s_x

h_f= 2s_y

// Valid pixel range

x_begin= max(round(x_f), 0)

x_end= min(round (x_f+ w_f), W)

y_begin= max(round(y_f), 0)

y_end= min(round (y_f+ h_f), H)

// Fill in basic face shape map

for (i = y_begin; i < y_end; i + +) {

for (j = x_begin; j < x_end; j + +) {

// Solid ellipse

if ((i − c_y)²/(s_y)²+ (j − c_x)²/(s_x)²≤ 1) {

M_Q(i, j) = M_Q(i, j) + c_k

}

}

}

}

// Clip maximum to 1

M_Q= min(M_Q, 1)

// Apply ROI

M_Q= M_Q.* M_ROI

return M_Q

Generate the basic shape map from basic face shape

model of arbitrary shape

Input: detected face bounding boxes

(c_k, x_k, y_k, w_k, h_k), k = 0, ... , N_face− 1,

probability map of ROI M_ROI, basic face shape model Q, scaling

factors f_box,x, f_box,y

// Initialization

for (i = 0; i < H; i + +) {

for (j = 0; j < W; j + +) {

M_Q(i, j) = 0

}

}

// Add probability of each detected face to map

for (k = 0; k < N_face; k + +) {

// Skip invalid detection

if (w_k== 0 or h_k== 0 or c_k== 0) {

continue

}

// Center and half of the width and height of the rectangle to fit

c_x= x_k+ w_k/2

c_y= y_k+ h_k/2

s_x= f_box,x* w_k/2

s_y= f_box,y* h_k/2

// Location and size of the rectangle to fit

x_f= c_x− s_x

y_f= c_y− s_y

w_f= 2s_x

h_f= 2s_y

// Valid pixel range

x_begin= max(round(x_f), 0)

x_end= min(round(x_f+ w_f), W)

y_begin= max(round(y_f), 0)

y_end= min(round (y_f+ h_f), H)

// Fill in basic face shape map

for (i = y_begin; i < y_end; i + +) {

for (j = x_begin; j < x_end; j + +) {

// Coordinate in basic face shape model

i_m= clip3(round((i − y_f) * H_Q/h_f), 0, H_Q− 1)

j_m= clip3(round((j − x_f) * W_Q/w_f), 0, W_Q− 1)

// Add to basic shape map

M_Q(i, j) = M_Q(i, j) + c_k* Q(i_m, j_m)

}

}

}

// Clip maximum to 1

M_Q= min(M_Q, 1)

// Apply ROI

M_Q= M_Q.* M_ROI

return M_Q

With continued reference to FIG. 3, given the face region defined in the basic shape map (32), the generic histograms of face (35) and generic histograms of all pixels (34) can be calculated. According to an embodiment of the present disclosure, the generic histogram of face (35) is calculated as weighted counting of pixels, where the weight is from the basic shape map (32). On the other hand, the generic histogram of all pixel (34) is a histogram counting for all pixels in ROI. For computational efficiency, the pixel may be subsampled during counting with a subsample factor s_hist. As an example, s_histmay be set as s_hist=2. The histograms of the input image (31) of size H×W may be calculated in YUV color space. The YUV channels of the input image are denoted as S^Y, S^Uand S^V, respectively, and the number of bins for each channel as N_bin^Y, N_bin^Uand N_bin^V, respectively. For the input bit depth B^S, the bin width for each channel is calculated as w_bin^Y=2^B^S/N_bin^Y, w_bin^U=2^B^S/N_bin^Uand w_bin^V=2^B^S/N_bin^V. Exemplary values for B^S, N_bin^Y, N_bin^Uand N_bin^V, are B^S=10 and N_bin^Y=N_bin^U=N_bin^V=128. For different YUV input formats, the corresponding pixel locations in each channel may be needed. For YUV420 input, the Y channel may be saved as a W×H array and the U and V channels may be saved as W_half×H_halfarrays, where W_half=W/2 and H_half=H/2. Therefore, S_half^Uand S_half^Vare used to represent the down-sampled U and V channels, respectively. For computation efficiency, the pixel location (i,j) in S^Ymay be matched to (┌i/2┘,┌j/2┘) in S_half^Uand S_half^V. For other YUV format, adjustment may be made accordingly. The following pseudocode is an example of the generic histograms of face and all pixels for YUV420 input is calculated:

// Calculate the generic histograms of face and all pixels for YUV420 input

Input: YUV channels of input image S^Y, S_half^Uand S_half^V, basic

shape map M_Q, probability map of ROI M_ROI, subsample factor S_hist

Output: generic histogram of face hist_face, generic

histogram of all pixels hist_all

// Initialization

hist_face= zeros (N_bin^Y, N_bin^U, N_bin^V)

// Weighted count

for (i = 0; i < H; i+= s_hist) {

for (j = 0; j < W; j+= s_hist) {

i_half= └i/2┘ // convert index for YUV420 input

j_half= └j/2┘ // convert index for YUV420 input

b^Y= └S^Y(i, j)/w_bin^Y┘

b^U= └S_half^U(i_half, j_half)/w_bin^U┘

b^V= └S_half^V(i_half, j_half)/w_bin^V┘

hist_face(b^Y, b^U, b^V) = hist_face(b^Y, b^U, b^V) + M_Q(i, j)

hist_all(b^Y, b^U, b^V) = hist_all(b^Y, b^U, b^V) + M_ROI(i, j)

}

}

return hist_face, hist_all

A.1.2 Local Individual Histogram of Face

In addition to the global generic histogram of all faces, the local individual histogram of each face is also considered to capture the variation of each face. This is illustrated by an exemplary diagram shown in FIG. 5. For each face, the basic face shape (52) model is used to find the probability inside the face bounding box (50) on the basis of an input image (51) in the same manner as in constructing the basic shape map (33) of FIG. 3, and this is followed by performing a weighted counting. However, storing all the individual histograms (54) may take a huge amount of memory if there are many faces in one frame, and the situation may become more severe if the histograms from multiple frames are stored. Therefore, to save memory, the individual histograms (54) of each face are preliminarily trimmed (53) while keeping as much pixel counts as possible. In what follows, an exemplary process of trimming is described more in detail.

With further reference to FIG. 5, for the k-th face, given the original histogram hist_face,k, the trimmed histogram custom-character _face,kis a subarray of size Ñ_bin^Y×Ñ_bin^U×Ñ_bin^Vstarting at bins (b_start,k^Y, b_start,k^U, b_start,k^V). This is shown in the following equation:

$\begin{matrix} face, k = {hist}_{face, k} (\begin{matrix} b_{start, k}^{Y} : b_{start, k}^{Y} + {\tilde{N}}_{bin}^{Y}, b_{start, k}^{U} : b_{start, k}^{U} + \\ {\tilde{N}}_{bin}^{U}, b_{start, k}^{V} : b_{start, k}^{V} + {\tilde{N}}_{bin}^{V} \end{matrix}) & (2) \end{matrix}$

In addition, the keeping ratio r_keep,kof the trimmed histogram, i.e. the ratio of total pixel count before and after trimming, may be recorded for future use. Such ratio can be obtained as follows:

$\begin{matrix} r_{keep, k} = sum (face, k) / sum ({hist}_{face, k}) & (3) \end{matrix}$

For an improved result, in order to trim the histogram, the contiguous bins of size Ñ_bin^Y×Ñ_bin^U×Ñ_bin^Vinside which the summation of histogram is maximum may be found. However, the resulting computation may be large because the histogram is 3-D. Therefore, the histogram may be trimmed in one channel at a time in the order of Y, U and V channels. An example of parameters is Ñ_bin^Y=64, Ñ_bin^U=Ñ_bin^V=16 for all faces. Moreover, most of the faces may have a keeping ratio of, for example, larger than 90%.

Continuing with the trimming process disclosed above, and in view of possible memory limitations, the maximum number of faces, N_face,max, may be set for storing individual histograms. As such, when N_face>N_face,max, the N_{face, max}most important faces are only kept. Because larger faces in image usually attract more attention, the size of the bounding boxes may be used as a measure of importance. Additionally, the detection score of bounding boxes may be considered to avoid false detections. Therefore, the importance of each face may be defined based on their area and detection score as shown in the following equation:

$\begin{matrix} a_{k} = c_{k}^{2} * \min (\frac{w_{k} * h_{k}}{W * H / N_{face, \max}}, 1) & (4) \end{matrix}$

where the area is normalized by W*H/N_face,maxand clipped to 1 because if a face is large enough, it is deemed as important. The term N_face,maxis put in the denominator because the more faces can be kept, the smaller faces can be considered. The top N_{face, max}faces with the highest importance are selected. An exemplary value for N_face,max, is N_face,max=16.

With reference to FIG. 5 the following pseudocode shows an example of how the individual histograms (54) of each face using basic face shape (52) model of inscribed ellipse for YUV420 input is calculated:

// Calculate the individual histograms of each face using basic face shape model of

inscribed ellipse for YUV420 input

Input: YUV channels of input image S^Y, S_half^Uand S_half^V, detected face bounding boxes

(c_k, x_k, y_k, w_k, h_k), k = 0, ... , N_face− 1, probability map of ROI M_ROI, scaling factors

f_box,x, f_box,y

Output: trimmed individual histograms custom-character

_face,k, starting bins by b_start,k^Y, b_start,k^U,

b_start,k^V, keeping ratio r_keep,kof each face, k = 0, ... , N_face− 1

for (k = 0; k < N_face; k + +) {

// Initialize

hist_face,k= zeros(N_bin^Y, N_bin^U, N_bin^V)

custom-character

_face,k= zeros(Ñ_bin^Y, Ñ_bin^U, Ñ_bin^V)

r_keep,k= 0

// Skip invalid detection

if (w_k== 0 or h_k== 0 or c_k== 0) {

continue

}

// Center and half of the width and height of the rectangle to fit

c_x= x_k+ w_k/2

c_y= y_k+ h_k/2

s_x= f_box,x* w_k/2

s_y= f_box,y* h_k/2

// Location and size of the rectangle to fit

x_f= c_x− s_x

y_f= c_y− s_y

w_f= 2s_x

h_f= 2s_y

// Valid pixel range

x_begin= max(round(x_f), 0)

x_end= min(round (x_f+ w_f), W)

y_begin= max(round(y_f), 0)

y_end= min(round(y_f+ h_f), H)

// Weighted count

for (i = y_begin; i < y_end; i + +) {

for (j = x_begin; j < x_end; j + +) {

// Solid ellipse

if ((i − c_y)²/(s_y)²+ (j − c_x)²/(s_x)²< 1) {

i_half= └i/2┘ // convert index for YUV420 input

j_half= └j/2┘ // convert index for YUV420 input

b^Y= └S^Y(i, j)/w_bin^Y┘

b_half^U= └S^U(i_half, j_half)/w_bin^U┘

b_half^V= └S^V(i_half, j_half)/w_bin^V┘

hist_face,k(b^Y, b^U, b^V) = hist_face,k(b^Y, b^U, b^V) + c_k* M_ROI(i, j)

}

}

}

// Trim histogram

custom-character

_face,k, b_start,k^Y, b_start,k^U, b_start,k^V, r_keep,k= trim_histogram(hist_face,k)

}

return custom-character

_face,k, b_start,k^Y, b_start,k^U, b_start,k^V, r_keep,kk = 0, ... , N_face− 1

Calculate the individual histogram of face using basic face shape model of arbitrary

shape for YUV420 input

Input: YUV channels of input image S^Y, S_half^Uand S_half^V, detected face bounding boxes

(c_k, x_k, y_k, w_k, h_k), k = 0, ..., N_face− 1, probability map of ROI M_ROI, basic face

shape model Q, scaling factors f_box,x, f_box,y

Output: trimmed individual histograms custom-character

_face,k, starting bins b_start,k^Y, b_start,k^U,

b_start,k^V, keeping ratio r_keep,kof each face, k = 0, ... , N_face− 1

for (k = 0; k < N_face; k + +) {

// Initialize

hist_face,k= zeros(N_bin^Y, N_bin^U, N_bin^V)

custom-character

_face,k= zeros(Ñ_bin^Y, Ñ_bin^U, Ñ_bin^V)

r_keep,k= 0

// Skip invalid detection

if (w_k== 0 or h_k== 0 or s_k== 0) {

continue

}

// Center and half of the width and height of the rectangle to fit

c_x= x_k+ w_k/2

s_x= f_box,x* w_k/2

s_y= f_box,y* h_k/2

// Location and size of the rectangle to fit

x_f= c_x− s_x

y_f= c_y− s_y

w_f= 2s_x

h_f= 2s_y

// Valid pixel range

x_begin= max(round(x_f), 0)

x_end= min(round(x_f+ w_f), W)

y_begin= max(round(y_f), 0)

y_end= min(round (y_f+ h_f), H)

// Fill in basic face shape map

for (i = y_begin; i < y_end; i + +) {

i_half= └i/2┘ // convert index for YUV420 input

j_half= └j/2┘ // convert index for YUV420 input

b^Y= └S^Y(i, j)/w_bin^Y┘

b^U= └S_half^U(i_half, j_half)/w_bin^U┘

b^V= └S_half^V(i_half, j_half)/w_bin^V┘

// Coordinate in basic face shape model

i_m= clip3((i − y_f) * H_Q/h_f), 0, H_Q− 1)

j_m= clip3((j − x_f) * W_Q/w_f), 0, W_Q− 1)

hist_face,k(b^Y, b^U, b^V) = hist_face,k(b^Y, b^U, b^V) + c_k* Q(i_m, j_m) * M_ROI(i, j)

}

}

// Trim histogram

custom-character

_face,k, b_start,k^Y, b_start,k^U, b_start,k^V, r_keep,k, = trim_histoggram(hist_face,k)

}

return custom-character

_face,k, b_start,k^Y, b_start,k^U, b_start,k^V, r_keep,k, k = 0, ... , N_face− 1

// Trim histogram

Input: histogram hist, number of bins in trimmed histogram Ñ_bin^Y, Ñ_bin^U, Ñ_bin^V

Output: trimmed histogram custom-character

// Trim Y channel

hist^Y= sum(hist, axis = [1,2]) // summation along U and V axes

b_start^Y= arg max sum ( hist^Y(b: b + Ñ_bin^Y))

r_keep^Y= sum (^bhist^Y(b_start^Y: b_start^Y+ Ñ_bin^Y))/sum(hist^Y)

// Trim U channel

hist^U= sum(hist(b_start^Y: b_start^Y+ Ñ_bin^Y,:,:), axis = [0,2]) // summation along Y

and V axes

b_start^U= arg max sum (hist^U(b: b + Ñ_bin^U)

r_keep^U= sum (^bhist^U(b_start^U: b_start^U+ Ñ_bin^U))/sum(hist^U)

// Trim V channel

hist^V= sum(hist(b_start^Y: b_start^Y+ Ñ_bin^Y, b_start^U: b_start^U+ Ñ_bin^U,:), axis = [0,1]) //

summation along Y and U axes

b_start^V= arg max sum (hist^V(b: b + Ñ_bin^V))

r_keep^V= sum (^bhist^V(b_start^V: b_start^V+ Ñ_bin^V))/sum(hist^V)

// Final output

custom-character

= hist(b_start^Y: b_start^Y+ Ñ_bin^Y, b_start^U: b_start^U+ Ñ_bin^U, b_start^V: b_start^V+ Ñ_bin^V)

r_keep= r_keep^Y* r_keep^U* r_keep^V

return custom-character

, b_start^Y, b_start^U, b_start^V, r_keep

A.2 Probability Adaptation

With the generated histograms as previously disclosed, the probability of face for each bin can be defined. Generally, if a color has higher value in a histogram of face, it is more likely to be part of the face. Therefore, the initial probability of face can be estimated directly from the generic histograms of face and all pixels. However, because the histogram of face is estimated from the basic shape map, which is just an initial guess of face region, further refining of the initial probability by adapting it to the histograms locally in YUV color space may be needed. As such, iterative adaptive sorting and probability propagation based on the individual histograms of each face and the generic histogram of non-face may be implemented. Details of initial probability estimation, adaptive sorting, and probability propagation are presented through the exemplary diagrams of FIGS. 6-8 which will be described in the following sections.

A.2.1 Initial Probability

FIG. 6 shows an exemplary diagram of calculating the initial probability of face according to an embodiment of the present disclosure. First, the ratio between the histogram of face (62) and generic histogram of all pixels (61) is calculated as follows:

$\begin{matrix} r_{face} = 𝒢_{σ_{hist}} ({hist}_{face}) \cdot / 𝒢_{σ_{hist}} ({hist}_{all}) & (5) \end{matrix}$

where custom-character is 3-D Gaussian filtering (63) with standard deviation σ_hist. Operator ./ is element-wise division (64). To avoid dividing by zero, r_face(b) may be set to 0 for bin b where (hist_all) is 0. The purpose of Gaussian filtering is to reduce the noise in histogram. Standard deviation σ_histmay be set to, for example, σ_hist=0.25 (in bin). Scaling and thresholding (65) is then applied on the ratio to get the initial probability of face (66). The larger the ratio, the larger the probability. For each bin b, the following applies:

$\begin{matrix} p_{face, init} (b) = clip 3 ((r_{face} (b) - r_{0}) / (r_{1} - r_{0}), 0, 1) & (6) \end{matrix}$

where r₀and r₁are thresholds of ratio of histogram. From the above equation, it can be noticed that when r_face<r₀, p_face,init=0. On the other hand, when r_face>r₁, p_face,init=1. Thresholds r₀and r₁may be set, for example, to r₀=0.1 and r₁=0.5. Moreover, the histogram of non-face (68) may be defined as the difference (67) between the histograms hist_nonface=hist_all−hist_face. As will be seen later, histogram of non-face (68) will be used in the adaptive sorting process which will detailed in the next section.

A.2.2 Adaptive Sorting

FIG. 7 shows an exemplary diagram of adaptive sorting (700), described in this section, and probability propagation (701) process, described in the next section. Most parts of the basic shape map (33) of FIG. 3 are assumed to be correct and only minor adjustment may be required. More specifically, it is assumed that at least θ_nonfaceportion of pixels counted in hist_nonfaceare truly non-face. It is also assumed at least θ_faceportion of pixels counted in hist_face,kare truly face for each k. As such, the probability of face as the initial probability of face is first initialized, p_face←p_face,init. Additionally, the probabilities of bins with the lowest probability are updated to 0 until the cumulative pixel count reaches θ_nonfaceof the total pixel count of the histogram. In other words, the updated probability from non-face (74), p_face^(nf)is obtained as follows:

$\begin{matrix} p_{face}^{(nf)} (b) = {\begin{matrix} 0 & if b \in ℬ^{(nf)} \\ p_{face} (b) & othe r w i s e \end{matrix} & (7) \end{matrix}$

Where custom-character is the set of bins whose probabilities are to be updated to 0. In other words:

$\begin{matrix} ℬ^{(nf)} = \arg \min_{ℬ} ❘ ℬ ❘ & (8) \end{matrix}$

$s . t . p_{face} (b) \leq p_{face} (b^{'}) \forall b \in ℬ, b^{'} \notin ℬ and \sum_{b \in ℬ} {hist}_{nonface} (b) \geq θ_{nonface} \sum_{b = 0}^{N_{bin}^{Y} N_{bin}^{U} N_{bin}^{V} - 1} {hist}_{nonface} (b)$

where custom-character is the set of the bins with the lowest probability. The above-disclosed method is illustrated in FIGS. 8A-8D, and in the case of a 1-D histogram. Given probability of face (81), p_face, and histogram of non-face (82), hist_nonface, probability of the bins , with the lowest probability is updated until the sum of pixel count for those bins reach θ_nonfaceof total pixel count in histogram. As a result, histogram of truly non-face (84), hist_nonface, and updated probability from non-face (83), p_face^(nf), are obtained.

Referring back to FIG. 7, similarly to what was disclosed with regards to updated probability from non-face (74), p_face^(nf), the probabilities of bins with the highest probability are updated to 1 until the cumulative pixel count reaches θ_faceof the total pixel count of the histogram for each face. In other words, the updated probability from each face (73) is obtained as:

$\begin{matrix} P_{f a c e, k}^{(f)} (b) = {\begin{matrix} 1 & if b \in ℬ_{k}^{(f)} \\ p_{face} (b) & otherwise \end{matrix} & (9) \end{matrix}$

where custom-character is the set of bins whose probabilities are to be updated to 1:

$\begin{matrix} ℬ_{k}^{(f)} = \arg \min_{ℬ} ❘ ℬ ❘ & (10) \end{matrix}$

$s . t . p_{face} (b) \geq p_{face} (b^{'}) \forall b \in ℬ, b^{'} \notin ℬ and \sum_{b \in ℬ} {hist}_{face, k <} (b) \geq θ_{face} \sum_{b = 0}^{N_{bin}^{Y} N_{bin}^{U} N_{bin}^{V} - 1} {hist}_{face, k} (b)$

The updated probability from all faces (75) can be acquired by considering the updates from all faces:

$\begin{matrix} p_{face}^{(f)} (b) = \max_{k} p_{face, k}^{(f)} (b) & (11) \end{matrix}$

In practice, the trimmed histograms custom-character _face,k, may only be available. In addition, in such trimmed histogrammed only r_keep,kportion of pixel counts in hist_face,kmay be kept. Therefore, the cumulative pixel count may need to reach θ_face/r_keep,kof the sum of _face,kinstead. Moreover, when θ_face/r_keep,k>1, all the probability of all bins in trimmed histogram are may be set to 1. The value for parameters θ_nonfaceand θ_facemay be empirically decided. As an example, θ_nonface=0.9 and θ_face=0.75.

The pseudocode below shows an example of how the probability from non-face can be calculated:

Update the probability from non-face

Input: Initial probability of face p_face,init, histogram of non-face hist_nonface

Output: p_face^(nf)

// Initialize

pf_ace^(nf)= p_face,init

// Sort

I_a= sort_index(p_face, ‘ascend’) // get sort index in ascending order

// Cumulative sum

C_nonface= zeros(N_bin^Y* N_bin^U* N_bin^V)

C_nonface(0) = hist_nonface(I_a(0))

for (i = 1; i < N_bin^Y* N_bin^U* N_bin^V; i + +) {

C_nonface(i) = C_nonface(i − 1) + histn_onface(I_a(i))

}

// Update

for (i = 0; i < N_bin^Y* N_bin^U* N_bin^V; i + +) {

p_face^(nf)(I_a(i)) = 0

if (C_nonface(i) ≥ θ_nonface* C_nonface(N_bin^Y* N_bin^U* N_bin^V)) {

break

}

}

returnp_face^(nf)

The pseudocode below shows an example of how the probability from face can calculated

Update the probability from face

Input: Initial probability of face p_face,init, trimmed

histograms custom-character

_face,k, starting bins

b_start,k^Y, b_start,k^U, b_start,k^V, keeping ratio r_keep,k, k = 0, ... , N_face− 1

Output: p_face^(f)

// Initialize

p_face^(f)= p_face,init

for (k = 0; k < N_face; k + +) {

// Sort subarray

{tilde over (p)}_face,k= p_face(b_start,k^Y: b_start,k^Y, + Ñ_bin^Y, b_start,k^U: b_start,k^U+ Ñ_bin^Y, b_start,k^V:

b_start,k^V + Ñ_bin^V)

Ĩ_dk= sort_index({tilde over (p)}_face,k,‘descend’) // get sort index in descending order

// Cumulative sum

{tilde over (C)}_face,k= Zeros(Ñ_bin^Y* Ñ_bin^U* Ñ_bin^V)

{tilde over (C)}_face,k(0) = custom-character

_face,k(Ĩ_d,k,(0))

for (i = 1; i < Ñ_bin^Y * Ñ_bin^U * Ñ_bin^V; i + +) {

{tilde over (C)}_face,k(i) = {tilde over (C)}_face,k(i − 1) + custom-character

_face,k(Ĩ_d,k(i))

}

// Update

I_d,k= untrim_index(Ĩ_d,k,) // convert index from subarray to complete array

for (i = 0; i < Ñ_bin^Y * Ñ_bin^U * Ñ_bin^V; i + +) {

p_face^(f)(I_d,k(i)) = 1

if ({tilde over (C)}_face,k(i) ≥ θ_nonface* {tilde over (C)}_face,k(N_bin^Y* N_bin^U* N_bin^V)) {

break

}

}

}

return p_face^(f)

A.2.3 Probability Propagation

With further reference to FIG. 7, because a bin may appear in both face and non-face regions, the updates from non-face and face are performed separately and summed together. Given the updated probability from non-face (74), p_face^(nf), and the updated probability from face (75), p_face^(nf), the updated probability (77), p_face^(nf), is the weighted sum of these two updated probabilities based on histogram counts, and as shown below:

$\begin{matrix} p_{face}^{'} = \frac{p_{face}^{(f)} . * {hist}_{face} + p_{face}^{(nf)} . * {hist}_{nonface}}{{hist}_{all}} & (12) \end{matrix}$

To avoid division by zero, p_face′ may be set to 0 at the bins where hist_allis 0. Moreover, because the probability is updated based on the sort index, it may undergo sharp changes between neighbor bins. As such, Gaussian filtering (78) may be performed in the 3-D bins to make the probability of face (79), p_face, smooth to avoid potential artifact in later stages of processing. The standard deviation of the gaussian filter, σ_prop, may be set, for example, to σ_prop=0.25.

With continued reference to FIG. 7, in accordance with the teachings of the present disclosure, the adaptive sorting (700) and probability propagation (701) may be formed for n_probadaiterations to gradually adapt the probability to the histograms locally in YUV color space. The number of iterations n_probada, may be set, for example, to n_probada=3.

A.3 Local Post-Processing

With reference to FIG. 7, the probability of face was refined in YUV color space, but the spatial relationships between pixels were not considered. According to embodiments of the present disclosure the probability of face can be further refined in the spatial domain. FIG. 9 is an exemplary diagram showing the details of local post processing step (240) as was disclosed with regards to the embodiment of FIG. 2. As shown, such post processing step comprises a local smoothing (900) to avoid visual artifact and a soft morphological operation (901) to remove small noise.

A.3.1 Local Smoothing

With further reference to FIG. 9, a combination of input image (91) and the probability of face (90), p_face, is first used to obtain the initial probability map (92) of face M_face,init. The following pseudocode is an example of how the probability map (92) for YUV420 input can be acquired:

Acquire probability map for YUV420 input

Input: YUV channels of input image S^Y, S_half^Uand S_half^V,

probability of face p_face,

probability map of ROI M_ROI

Output: probability map of face M_face,init

// Initialize

for (i = 0; i < H; i + +) {

for (j = 0; j < W; j + +) {

M_face,init(i, j) = 0

}

}

// Find probability for all pixels

for (i = 0; i < H; i + +) {

for (j = 0; j < W; j + +) {

i_half= └i/2┘ // convert index for YUV420 input

j_half= └j/2┘ // convert index for YUV420 input

b^Y= └S^Y(i, j)/w_bin^Y┘

b^U= └S_half^U(i_half, j_half)/w_bin^U┘

b^V= └S_half^V(i_half, j_half)/w_bin^V┘

M_face,init(i,j) = p_face(b^Y, b^U, b^V)

}

}

// Apply ROI

Mf_ace,init← M_face,init.* M_ROI

return M_face,init

Referring back to FIG. 9, because the probability of face (90) is quantized into bins, if there are very few bins, the probability between bins may be interpolated for each pixel. However, because the probability of face (90) does not yet contain spatial information, there could be sharp changes between neighbor pixels in the initial probability map (92). If this occurs in the smooth region of input image (91), there will be false edges and banding-like artifact in the following local reshaping operation. To make the probability map smooth in the regions where the input image is smooth, guided image filtering (93), as described in [Ref. 1], incorporated herein by reference in its entirety, using the probability map (92) as the input and the normalized Y-channel of input image {tilde over (S)}^Yas the guidance may be implemented. The implementation detail can be found, e.g., in the above-mentioned U.S. Prov. App. Ser. 63/086,699, incorporated herein by reference in its entirety. As a result of guided image filtering (93), a smooth map (94) is obtained. Exemplary parameter values that may be used for the guided image filter (93) are degree of smoothness 0.01 and kernel size 51 for the normalized input image (91) in the range of [0,1] and size 1920×1080. For images with different sizes, the kernel size may be scaled proportional to the image size:

$\begin{matrix} M_{face} = M_{ROI} . * clip 3 (GIF (M_{face, init}, {\tilde{S}}^{Y}), 0, 1) & (214) \end{matrix}$

The output of guided image filter (93) may be clipped between [0,1] because the guided image filter (93) is based on ridge regression and may create noise due to outliers. Also, the probability map of ROI may be applied so that the face region is inside ROI, i.e. M_face(i)≤M_ROI(i)∀i.

A.3.2 Soft Morphological Operation

Referring back to FIG. 9, as the face regions usually are continuous and have smooth boundaries, removing the small noises in the probability map (92) may be required. The small noise can be small holes or small unconnected dots in the probability map (92). Traditionally, the small noise can be removed through morphological operations such as closing and opening. However, such operations will also alter the boundary of face region, which is undesirable in some applications. In accordance with the teachings of the present disclosure, a soft morphological operation (901) can be used to remove such kind of small noise.

The soft morphological operation (901) of FIG. 9 essentially means weighting the importance of each pixel by its surroundings. Given the input probability map (92), M_face, the soft morphological operation (901) is defined as:

$\begin{matrix} (M_{face}) = clip 3 (a_{morph} * M_{face} . * (M_{face}), 0, 1) & (15) \end{matrix}$

Parameters to control the soft morphological operation (901) include σ_morph, the standard deviation for Gaussian filtering (95), and a_morph, the scaling factor to decide whether to expand the face region or not. Operator .* is elementwise multiplication. From the above definition, it can be seen that each pixel is multiplied by the weighted average of its surrounding pixels custom-character (M_face). As part of scaling and thresholding (97) step, for a pixel at which M_face>0, if (M_face)>1/a_morph, the pixel value will be amplified after the operation. On the other hand, if (M_face)<1/a_morph, the pixel value will be decreased after the operation. In other words, the pixel is preserved only if its surroundings have high values. Additionally, the operation may be repeated for n_softmorphiterations to gradually refine the probability map (92), as shown in the following:

$\begin{matrix} M_{face} \leftarrow M_{ROI} . * (M_{face}) & (16) \end{matrix}$

where custom-character (.) means repeating (.) for n_softmorphtimes. Also, the probability map of ROI may be applied so that the face region is inside ROI, i.e. M_face(i)≤M_ROI(i)∀i. Parameters σ_morph, a_morph, and n_softmorphmay be set as, for example, σ_morph=25, a_morph=3 and n_softmorph=2.

B. Local Reshaping with Face Adjustment

When local reshaping is performed, different reshaping functions can be applied on different pixels locally. The reshaping functions can control and enhance the image properties such as contrast, saturation, or other visual features, see e.g. the above-mentioned U.S. Prov. App. Ser. 63/086,699, incorporated herein by reference in its entirety. For most of the image contents, higher contrast and saturation bring better viewing experience to common people. However, for the face in images, higher contrast and saturation are not always better. People may not prefer the details, such as wrinkles or spots, on faces to be enhanced. Moreover, less saturated faces may be preferred compared with faces with over saturated skin color, which looks unnatural, i.e. changed skin tone. Local reshaping with face adjustment in accordance with the teachings of the present disclosure can be applied to address such problem. With reference to FIG. 9, after the face probability map (98) acquired based on what was disclosed previously, different reshaping functions can be applied to face regions from other non-face regions in images to adjust the contrast and saturation.

FIG. 10 shows an exemplary diagram of the local reshaping (110) according to an embodiment of the present disclosure. Based on the face probability map (102), the amount of contrast adjustment (103) for each pixel in the input image (101) is decided. In addition, the amount of saturation adjustment (104) for each pixel in the input image (101) is also decided. The adjustments for contrast and saturation are then applied to reshaping function (105) selection. The reshaping operation (106) is performed based on the selected reshaping function (105) to generate a reshaped image (107). In what follows, details of the elements of local reshaping (110) will be described.

B.1. Local Reshaping Function Selection

With further reference to FIG. 10, local reshaping method (110) may be based on local reshaping function selection as detailed in the above-mentioned U.S. Prov. App. Ser. 63/086,699, incorporated herein by reference in its entirety. In other words, for each pixel, an individual reshaping function (105) selected from a family of reshaping functions is applied for the reshaping operation (106) for each channel. Denote the input image S and its YUV channel S^Y, S^Uand S^V, the reshaped image V and its YUV channel V^Y, V^Uand V^V, for the i-th pixel, the local reshaping operation can be defined as:

$\begin{matrix} v_{i}^{Y} = B_{L_{i}^{Y}}^{Y} (s_{i}^{Y}) & (17) \end{matrix}$

$\begin{matrix} v_{i}^{U} = {MMR}_{L_{i}^{U}}^{U} (s_{i}^{Y}, s_{i}^{U}, s_{i}^{V}) & (18) \end{matrix}$

$\begin{matrix} v_{i}^{V} = {MMR}_{L_{i}^{V}}^{V} (s_{i}^{Y}, s_{i}^{U}, s_{i}^{V}) & (19) \end{matrix}$

where s_i^Y, s_i^U, s_i^V, v_i^Y, v_i^U, and v_i^Vare the i-th pixel in S^Y, S^U, S^V, V^Y, V^Uand V^V, respectively. B, MMR^U, and MMR^Vare the family of reshaping functions for Y, U, and V channels, respectively, and L_i^Y, L_i^U, and L_i^Vare the corresponding indices of the selected reshaping functions for the i-th pixel. For simplicity, the indices for all pixels are denoted as index maps L^Y, L^Uand L^V. Therefore, given an input image and corresponding index maps, the local reshaping operation for each pixel can be performed accordingly.

With carefully designed families of reshaping functions, the brightness, contrast, saturation, or other visual features in the reshaped images can be changed by adjusting the index maps. For example, as described, e.g. in the above-mentioned U.S. Prov. App. Ser. 63/086,699 incorporated herein by reference in its entirety, the local detail and contrast enhancement can be achieved by using:

$\begin{matrix} L^{Y} = L^{U} = L^{V} = L^{(g)} + f^{SL} (α . * ({\tilde{S}}^{Y} - {\tilde{S}}^{Y, (l)})) & (20) \end{matrix}$

Or equivalently

$\begin{matrix} L^{Y} = L^{U} = L^{V} = L^{(g)} + f^{SL} (Δ L^{(l)}) & (21) \end{matrix}$

$and$

$Δ L^{(l)} = α . * ({\tilde{S}}^{Y} - {\tilde{S}}^{Y, (l)})$

where {tilde over (S)}^Yis the Y channel of normalized input image in the range of, for example, [0,1] and {tilde over (S)}^Y,(l)is the corresponding edge-preserving filtered image. α is the map of enhancement strength for each pixel. The larger the α, the stronger the enhancement. ƒ^SL(.) is a pixelwise non-linear function to further adjust the enhancement based on pixel brightness. L^(g)is a constant global index for the whole image, which control the overall look, such as brightness and saturation, of the reshaped images. Moreover, when α=0, all the pixels use the same reshaping function and this is called global reshaping, which means no local contrast and detail enhancement. As an example, 4096 reshaping functions in the family of reshaping functions can be considered for each channel. The parameter used may be a the default setting such as α=3.8*c₁for all pixels, where c₁, is the model parameter and can be set as, for example, c₁=2687.1.

With continued reference to FIG. 10, and in view of what was disclosed above, given the face probability map (102), by adjusting the indices in the face region in the index maps, change the look of the faces in the reshaped images (107) can be changed. In the following sections, the face contrast adjustment (103) and saturation adjustment (104) will be described more in detail.

B.2 Face Contrast Adjustment

In some applications the enhancement the details, such as wrinkles or spots, on faces like other image contents may not be desired. As such, there may be a need to reduce the enhancement strength in face region when performing detail and contrast enhancement. The adjusted index map L^Ymay be defined as:

$\begin{matrix} L^{Y} = L^{(g)} + f^{SL} (Δ L^{(l)} + Δ L_{face, c}) & (22) \end{matrix}$

$where$

$Δ L_{face, c} = - r_{face} * M_{face} * α . * ({\tilde{S}}^{Y} - {\tilde{S}}^{Y, (l)})$

where r_faceis the face contrast reduction ratio. It can be seen that for pixel i, if M_face(i)=1, ΔL_face,c(i) becomes −r_face(i)*α(i)*({tilde over (S)}^Y(i)−{tilde over (S)}^Y,(l)(i)) and the term ΔL^(l)(i)+Δl_face,c(i) in Equation (22) can be written as (1−r_face(i)*α(i)*({tilde over (S)}^Y(i)−{tilde over (S)}^Y,(l)(i)). By comparing with Equation (20) and (21), the enhancement strength drops from a (i) to (1−r_face)*α(i). Therefore, ΔL_face,creduces the contrast on faces for 0<r_face≤1. When r_face=0, there is no adjustment. When r_face=1, the enhancement strength on face becomes 0. Empirically, if the enhancement strength on a face is 0, the face may look over-smoothed compared to the surrounding image contents, which are enhanced in the original strength. As an example, r_facemay be set as r_face=0.5.

B.2 Face Saturation Adjustment

In general, increasing the color saturation in images improves the viewing experiences. However, when it comes to the faces in images, increasing the color saturation in the same way as other image contents may be undesired. Over saturated skin color will make the faces looks unnatural or unhealthy. With reference to FIG. 10, the disclosed face saturation adjustment (104) addresses such problem.

As described in U.S. Prov. App. Ser. 63/086,699 incorporated herein by reference in its entirety, in general, the smaller the index of a reshaping function, the less saturated the reshaped image. In addition, the darker the input pixel, the more sensitive the reshaped pixel to the index.

In view of the above, based on the acquired L^Yas disclosed in the previous section, the adjusted index maps L^Uand L^Ycan be further defined as:

$\begin{matrix} L^{U} = L^{V} = L^{Y} + Δ L_{face, s} & (23) \end{matrix}$

$where$

$Δ L_{face, s} = - d_{face} * M_{face} . * (\min ({\tilde{S}}^{Y} / θ_{sat}, 1))$

in Equation (23) d_faceis the face desaturation offset. θ_satis the threshold to control the desaturation. Therefore, ΔL_face,sreduces the saturation on face when d_face>0 and θ_sat>0. The larger the d_face, the more the desaturation. When d_face=0, there is no desaturation. Empirically, parameters d_faceand θ_satmay be set as, for example, d_face=1024 and θ_sat=0.5.

A number of embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the present disclosure. Accordingly, the invention may be embodied in any of the forms described herein, including, but not limited to the following Enumerated Example Embodiments (EEEs) which described structure, features, and functionality of some portions of the present invention:

- EEE1: A method of face region detection in an input image including one or more faces, the method comprising: providing face bounding boxes and confidence levels for each face of the one or more faces; based on the input image, generating a histogram of all pixels; based on the input image and the face bounding boxes, generating histograms of the one or more faces; based on the histogram of all pixels and the histograms of the one more face, generating a probability of face, and based on the probability of face, generating a face probability map.
- EEE2: The method of EEE1, wherein the generating the histograms of the one or more faces comprises, based on a combination of face bounding boxes with a basic face shape, generating a basic face shape map, and based on the input image and the basic face shape map, generating the histograms of the one or more faces.
- EEE3: The method of any of EEEs 1 and 2, wherein the generating of the probability of face comprises: filtering the histogram of all pixels to generate a filtered histogram of all pixels, and filtering the histograms of the one or more faces to generated filtered histograms of the one or more faces.
- EEE4: The method of EEE3, further wherein the generating of the probability of face further comprises, scaling and thresholding a combination of the filtered histogram of all pixels and filtered histograms of the one or more faces to generate an initial probability of face.
- EEE5: The method of EEE4, wherein the initial probability of face comprises an initial probability of face in YUV channel.
- EEE6: The method of any of EEEs 4 and 5, wherein the generating of the probability of face further comprises subtracting the generated histograms of the one or more faces from the generated histogram of all pixels to generate a histogram of non-face.
- EEE7: The method of EEE6, wherein generating of the probability of face further comprises, based on the initial probability of face and the histogram of non-face, generating an updated probability of non-face, and based on the initial probability of face and the histograms of the one or more faces, generating an updated probability of face.
- EEE8: The method of EEE7, generating of the probability of face further comprises combining the updated probability from non-face and the updated probability from face to generate an updated probability, and filtering the updated probability to generate the probability of face.
- EEE9: The method of EEE8, wherein the filtering is performed using a gaussian filter.
- EEE10: The method of any of EEEs 1-9, further comprising, after generating the probability of face and before generating the face probability map, local smoothing the probability of face to generate a smoothened probability of face, and applying a soft morphological operation to the smoothened probability of face to generate the face probability map.
- EEE11: The method of EEE8, further comprising, after generating the probability of face and before generating the face probability map, local smoothing the probability of face to generate a smoothened probability of face, and applying a soft morphological operation to the smoothened probability of face to generate the face probability map.
- EEE12: The method of any of EEEs 10 and 11 further comprising applying local reshaping by: applying face saturation adjustment and face contrast adjustment to the face probability map to generate an adjusted face probability map; and generating a reshaped image based on the adjusted face probability map and one or more selected reshaping function.
- EEE13: The method of any of EEEs 1-12, further comprising trimming the histograms of the one or more faces to reduce a memory space required to store the histograms of the one or more faces.
- EEE14: The method of any of EEEs 3-9, wherein: the filtering the histogram of all pixels is performed using a gaussian filter, and the filtering the histograms of the one or more faces is performed using a gaussian filter.
- EEE15: The method of any of EEEs 4-9 wherein the combination of the filtered histogram of all pixels and filtered histograms of the one or more faces comprises a ratio of the filtered histograms of the one or more faces with the filtered histogram of all pixels.
- EEE16: The method of EEE8, wherein the combining the updated probability from non-face and the updated probability from face comprises generating a weighted sum of updated probability from non-face and the updated probability from face.
- EEE17: The method of EEE12, wherein the applying the face contrast adjustment is performed by adjusting a contrast of the one or more faces based on a face contrast reduction ratio.
- EEE18: The method of EEE12, wherein the applying the face saturation adjustment is performed by adjusting a saturation of the one or more faces based a face desaturation offset and a face desaturation threshold.
- EEE19: A video decoder comprising hardware, software, or both configured to carry out the method of any one of EEEs 1-18.
- EEE20: A non-transient computer readable medium containing program instructions for causing a computer to perform the method of any of EEEs 1-18.

The present disclosure is directed to certain implementations for the purposes of describing some innovative aspects described herein, as well as examples of contexts in which these innovative aspects may be implemented. However, the teachings herein can be applied in various different ways. Moreover, the described embodiments may be implemented in a variety of hardware, software, firmware, etc. For example, aspects of the present application may be embodied, at least in part, in an apparatus, a system that includes more than one device, a method, a computer program product, etc. Accordingly, aspects of the present application may take the form of a hardware embodiment, a software embodiment (including firmware, resident software, microcodes, etc.) and/or an embodiment combining both software and hardware aspects. Such embodiments may be referred to herein as a “circuit,” a “module”, a “device”, an “apparatus” or “engine.” Some aspects of the present application may take the form of a computer program product embodied in one or more non-transitory media having computer readable program code embodied thereon. Such non-transitory media may, for example, include a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. Accordingly, the teachings of this disclosure are not intended to be limited to the implementations shown in the figures and/or described herein, but instead have wide applicability.

The examples set forth above are provided to those of ordinary skill in the art as a complete disclosure and description of how to make and use the embodiments of the disclosure, and are not intended to limit the scope of what the inventor/inventors regard as their disclosure.

Modifications of the above-described modes for carrying out the methods and systems herein disclosed that are obvious to persons of skill in the art are intended to be within the scope of the following claims. All patents and publications mentioned in the specification are indicative of the levels of skill of those skilled in the art to which the disclosure pertains. All references cited in this disclosure are incorporated by reference to the same extent as if each reference had been incorporated by reference in its entirety individually.

It is to be understood that the disclosure is not limited to particular methods or systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. The term “plurality” includes two or more referents unless the content clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.

REFERENCES

[1] He, Kaiming, Jian Sun, and Xiaoou Tang. “Guided image filtering.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 6 (2012): 1397-1409.

FACE REGION DETECTION AND LOCAL RESHAPING ENHANCEMENT

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

CROSS-REFERENCE TO RELATED APPLICATIONS

PCT Information

Provisional Applications (1)