The present disclosure relates to the field of processing images, and for example, relates to a method for training an adaptive rigid prior model, and a method for tracking faces, an electronic device, and a storage medium.
Facial pose estimation is a significant topic in tracking faces in videos. Facial poses and facial expressions need to be tracked in real-time face tracking, and the real-time face tracking includes calculating a rigid transformation of a three-dimension face model, for example, rotation, translation, scaling, and the like. Stable facial pose estimation can be used to achieve a vision enhancement effect, for example, trying on hats and glasses, adding beards and tattoos, and the like, and is also applicable to driving virtual characters to change expressions.
Hereinafter is a summary of the subject matter of the present disclosure described in detail, and the summary is not intended to limit the protection scope of the claims.
Embodiments of the present disclosure provide a method for training an adaptive rigid prior model, and a method for tracking faces, an electronic device, and a storage medium.
According to some embodiments of the present disclosure, a method for training an adaptive rigid prior model is provided. The method includes:
According to some embodiments of the present disclosure, a method for tracking faces is provided. The method includes:
According to some embodiments of the present disclosure, an electronic device is provided. The electronic device includes:
According to some embodiments of the present disclosure, a transitory computer-readable storage medium is provided. The computer-readable storage medium stores one or more computer programs, wherein the one or more programs, when loaded and run by a processor, causes the processor to perform the method for training the adaptive rigid prior model and/or the method for tracking the faces according to any one of the embodiments of the present disclosure.
It should be noted that the examples and embodiments described herein are merely used to explain the present disclosure, and are not intended to limit the present disclosure. It should be further noted that the accompany drawings merely show structures related to the present disclosure but all structures. The embodiments of the present disclosure can be combined with the features of the embodiments without conflict.
In the real-time face tracking, the facial pose is estimated based on a motion of the face in the video. However, the motion of face in the video is not solely determined by the facial pose. In the case that the facial pose is fixed, different expressions of the user may cause the motion of the facial pose in the video. Thus, for acquiring the stable estimation of facial pose, an effect of the facial expression should be efficiently reduced in optimizing the facial pose, that is, the rigidity is stabilized.
In some practices, for improvement of a calculation efficiency, rigidity stabilization of face tracking is achieved using a prior in a real-time rigidity stabilization method. Common priors are achieved by two relatively simpler methods: a heuristic prior for a specific facial region, and a dynamic rigid prior. For the heuristic prior, a fixed weight is used in different facial regions, but the heuristic prior is not applicable to all expressions. For the dynamic rigid prior, the weight is dynamically estimated by estimating displacements of each facial region, and thus, the dynamic rigid prior is applicable to different expressions. However, a trained model of dynamic rigid prior is merely applicable in a face at a specific size, and is not adaptive to changes of the facial sizes in the video. Thus, the current rigid prior model fails to adapt all expressions and different facial sizes and estimate weights of a plurality of facial regions, resulting in unsatisfied rigidity stabilization of facial pose estimation and poor accuracy of tracking the faces.
In S101, model parameters of the adaptive rigid prior model are initialized, wherein the model parameters are functions with facial sizes as independent variables, and the adaptive rigid prior model is configured to output weights of a plurality of facial regions of faces at different facial sizes.
In the embodiments of the present disclosure, the face is divided into a plurality of facial regions. In the case that the face evinces an expression, displacements of the plurality of facial regions of the faces relative to a neutral face are generated. The neutral face is a face of the user without facial expression, and the neutral face is generated for each user. As a motion of each of the plurality of facial regions affects a face rigid transformation, for stable rigid transformation, it is desirable that the facial region with greater displacement has a less weight in the rigid transformation, and the facial region with less displacement has a greater weight in the rigid transformation in facial pose estimation.
In the embodiments of the present disclosure, the adaptive rigid prior model is configured to output the weights of the plurality of facial regions of the faces at different facial sizes, and the model parameters are the functions with the facial sizes as the independent variables.
In S102, a plurality of frames of facial data of a same face are acquired by face tracking on training video data using the adaptive rigid prior model.
In the embodiments of the present disclosure, the training video data is video data of a user. In the faces of the user in the video data, more than half of the faces are faces at different sizes without facial expression, and other faces are faces of the user at different sizes with different facial expressions. The face tracking is a process of extracting facial key points of the same user from the training video data, establishing three-dimension faces, rigidly transforming the three-dimension faces, and acquiring optimal rigid transformation parameters and optimal facial expression parameters.
In some embodiments, in face tracking on the training video facial data, the facial key points Qƒ and the facial expression parameters δƒ of the faces in the plurality of frames of training video data are extracted. The facial key points Qƒ are two-dimensional points of face key sites. The facial key points extracted by a pre-trained facial key point detection model, the facial sizes of the faces are determined based on the facial key points of the faces, and the displacements of the plurality of facial regions of the faces relative to the neutral face of the faces are calculated by multiplying the facial expression parameters Qƒ by a predetermined blend shape deformer Bexp. The predetermined blend shape deformer Bexp is a tool for producing facial expression animations, and is capable of achieving a smooth and high precision deformation effect of basic objects by target shape objects. In the embodiments of the present disclosure, the blend shape deformer Bexp is predetermined for a plurality of users, and the blend shape deformer predetermined for a user is acquired in the case that the video data of the user is determined as the training video data.
In the embodiments of the present disclosure, in the case that the face tracking is performed on the training video, the facial sizes of the faces in the plurality of frames of the video data are acquired, and the displacements of the plurality of facial regions relative to the neutral face are acquired by optimizing rigid transformation parameters and facial expression parameters and are determined as the facial data.
In S103, the model parameters are updated based on the plurality of frames of facial data.
In the embodiments of the present disclosure, the facial data includes facial sizes and the displacements of the plurality of facial regions of faces relative to the neutral face, the faces at different facial sizes extracted from the training video data are organized to a plurality of face containers based on the facial sizes. For the faces in each of the plurality of face containers, an average value of the facial sizes of the faces in the face container is calculated, and in the case that the facial sizes are the average value, data points of the model parameters of the adaptive rigid prior model are acquired by analyzing the displacements of the plurality of facial regions of the faces in the plurality of face containers. Then, the data points of the model parameters corresponding to different average values are interpolated, so as to acquire the model parameters and take the model parameters as new model parameters of the adaptive rigid prior model. The model parameters are functions of the facial sizes, that is, the model parameters are a spline curve. In the spline curve, the facial size is used as an independent variable, and the model parameter is used as a dependent variable.
In S104, whether a condition of stopping updating the model parameters is satisfied is determined.
In the embodiments of the present disclosure, a condition of stopping updating the model parameters is that difference values of the model parameters acquired by two adjacent updates are less than a first predetermined threshold, or the number of updates of the model parameters reaches to a preset number. S105 is performed in the case that the difference values of the model parameters acquired by two adjacent updates are less than the first predetermined threshold. In the case that the difference values of the model parameters acquired by two adjacent updates are greater than or equal to the first predetermined threshold, the process is returned to S102 of acquiring the plurality of frames of facial data of the same face by face tracking on training video data using the adaptive rigid prior model with updated model parameters, and the model parameters are re-updated based on the facial data. In some embodiments, S105 is performed in the case that the number of updates of the model parameters reaches to the preset number. I In the case that the number of updates of the model parameters does not reach to the preset number, the process is returned to S102 of acquiring the plurality of frames of facial data of the same face by face tracking on training video data using the adaptive rigid prior model with updated model parameters, and the model parameters are re-updated based on the facial data.
In S105, updating of the model parameters of the adaptive rigid prior model is stopped in response to the condition of stopping updating the model parameters being satisfied, and a final adaptive rigid prior model is acquired.
In the case that the condition of stopping updating the model parameters is satisfied, the adaptive rigid prior model with finally updated model parameters is a trained model, and the trained adaptive rigid prior model is deployed on a client, a server, or the like. In face tracking, the adaptive rigid prior model is capable of assigning different weights to different regions based on real time facial sizes of the faces and the displacements of the plurality of facial regions of the faces.
In the embodiments of the present disclosure, the model parameters of the adaptive rigid prior model are functions with the facial sizes as independent variables, and the adaptive rigid prior model is configured to output the weights of the plurality of facial regions of the faces at different facial sizes. In the case that the model parameters are initialized, the plurality of frames of facial data of the same face are acquired by face tracking on the training video data using the adaptive rigid prior model, the model parameters are updated based on the plurality of frames of facial data, the model parameters of the adaptive rigid prior model is stopped updating in response to the condition of stopping updating the model parameters being satisfied, and the final adaptive rigid prior model is acquired. In applying the trained adaptive rigid prior model to face tracking, the adaptive rigid prior model is capable of assigning different weights to different regions based on real time facial sizes of the faces and the displacements of the plurality of facial regions of the faces, such that rigidity stabilization of face tracking is improved, and a robust result of face tracking is achieved.
In S201, model parameters of the adaptive rigid prior model are initialized, wherein the model parameters are functions with facial sizes as independent variables, and the adaptive rigid prior model is configured to output weights of a plurality of facial regions of faces at different facial sizes.
In the embodiments of the present disclosure, the adaptive rigid prior model is defined as:
s represents the facial size of the face,
represents a weight of a kth facial region in the face with the facial size being s, αk(s), βk(s), and γk(s) represent the model parameters, αk(s), βk(s), and γk(s) are functions of two-dimensional facial sizes s, and dk represents the displacement of the kth facial region.
In S202, facial key points and facial expression parameters of faces in a plurality of frames of the training video data are extracted by face tracking on the training video data.
In the embodiments of the present disclosure, for each frame of video data f in the training video data F, a face in each frame of video data f is determined based on following parameters:
Pƒ represents a rigid transformation parameter, δƒ represents a facial expression parameter, and Qƒ represents a facial key point.
The rigid transformation parameter Pƒ and the facial expression parameter δƒ are unknown parameters, and facial key point Qƒ is a known parameter and is acquired by detecting the facial key point in face tracking.
In S203, the facial sizes of the faces are determined based on the facial key points.
In the case that the facial key points are determined, a face frame is determined based on the facial key points, and a length of a diagonal line of the face frame is used as the facial size of the face in each frame of video data ƒ.
In S204, displacements of the plurality of facial regions of the faces relative to a neutral face of the faces are calculated based on the adaptive rigid prior model, the facial expression parameters, and a predetermined blend shape deformer.
In the embodiments of the present disclosure, three-dimension faces are constructed based on the neutral face, the facial expression parameters, and the predetermined blend shape deformer, and the three-dimension face models is rigidly transformed. A rigid transformation optimization function is constructed based on the weights calculated based on the rigid transformation, the facial key points, and the adaptive rigid prior model. In the case that optimal rigid transformation parameters are acquired by solving the rigid transformation optimization function, optimal facial expression parameters are acquired based on the optimal rigid transformation parameters, and the displacements of the plurality of facial regions of the faces are acquired based on the optimal facial expression parameters.
In the embodiments of the present disclosure, one three-dimension face Fƒ is determined based on each of the facial expression parameters:
Buser represents the neutral face of the user without expression, and the face of the user without expression is pre-acquired, that is, the neutral face. Bexp represents a blend shape deformer of the user. The facial displacement Bexpδƒ of the face in the video data f, that is, the three-dimension face, is acquired by adding a plurality of expressions to the neutral face of the user.
In face tracking, the optimal result is acquired by solving the following optimization function:
Pƒ(·) represents the rigid transformation on the three-dimension face Fƒ, Π(·) represents a two-dimensional projection (an orthogonal projection or a perspective projection) on the three-dimension face. Above optimization function is solved using a coordinate descent:
The function (2) is the rigid transformation function, wki represents a weight of the facial region k of the facial key point i of the face in the training video data ƒ, j represents an iteration number. For initial value δƒ,0 = δƒ-1, the expression optimization function (3) is iteratively solved. The optimal facial expression parameter δƒ is acquired in the case that the expression optimization function (3) converges, and the optimal facial expression parameter δƒ is the expression parameter of last iteration. On this basis, the facial displacement Bexpδƒ of the face in the video data f is calculated, and the displacements of the plurality of facial regions relative to the neutral face are acquired based on the division of the plurality of facial regions shown in
In S205, the faces are organized to a plurality of face containers based on the facial sizes.
In the embodiments of the present disclosure, in the case that the face tracking is performed on the training video data, the following facial data is extracted from each frame of video data f∈F:
Sf represents the facial size of the face in the video data f,
k∈K represents the displacement of the kth facial region of the face in the video data f relative to the neutral face.
In the embodiments of the present disclosure, a histogram analysis is performed on all facial sizes
f∈F of the faces, and the faces extracted from the video data f are organized to n face containers Ci. In some embodiments, a maximum facial size and a minimum facial size are first determined, a size range from the maximum facial size to the minimum facial size is evenly divided into a plurality of size ranges, the plurality of size ranges are used as facial size ranges of the plurality of face containers. Target facial size ranges of the facial sizes of the faces are determined, and the faces are organized to face containers within the target facial size ranges.
In some embodiments, the faces are organized to n face containers Ci based on following formula:
smin = minƒ∈F{sƒ}, that is, the minimum facial size. smax = maxƒ∈F{sƒ}, that is, the maximum facial size.
A middle value sį of the facial sizes in the face container is:
In the case that the number of faces in the face container Cį is less than a threshold, for example, the number of the faces in the face container Cį is less than 10000, the face container is discarded. In the case that the histogram analysis is performed on the facial sizes:
That is, all faces F are a union of faces Fi in the plurality of face containers Cį, and an intersection of faces Fį and faces Fj in any two face containers Cį and Cjis null.
In S206, average values of the facial sizes of the faces in the plurality of face containers are calculated.
In some embodiments, for the faces in each of the plurality of face containers Cį, sį = meanƒ∈Fį{sƒ}, that is, the average values si of the facial sizes of the faces in the plurality of face containers Ci are calculated.
In S207, in the case that the facial sizes are the average values, data points of the model parameters of the adaptive rigid prior model are acquired by analyzing the displacements of the plurality of facial regions of the faces in the plurality of face containers.
In some embodiments, S207 includes following sub-steps.
In S2071, maximum displacements of the plurality of facial regions are determined based on facial image data of the faces for the plurality of facial regions of the faces in the plurality of face containers.
For the plurality of faces in the plurality of face containers Ci, each face includes a plurality of facial regions. For each facial region k, the maximum displacement of the facial region k is determined:
In above formula, Fi is the face in the face containers Ci,
is the maximum displacement of the facial region k in the face containers Ci.
In S2072, the displacements of the plurality of facial regions are organized to a plurality of displacement containers.
In the embodiments of the present disclosure, for each facial region, a histogram analysis is performed on the displacements of the facial region, and the displacements of the facial region are organized to m displacement containers. For the organization process, reference may be made to S205 of performing the histogram analysis on facial sizes.
In S2073, a displacement container containing a greatest number of displacements in the plurality of displacement containers is determined as a maximum displacement container.
In some embodiments, numbers of the displacements in the plurality of displacement containers are counted, and the displacement container containing a greatest number of displacements is determined as the maximum displacement container.
In S2074, a middle displacement of the plurality of displacements in the maximum displacement container is determined as a data point of a first model parameter.
In some embodiments, the plurality of displacements
in the maximum displacement container are ranked in descending order or in ascending order, the displacement
in the middle is determined as the data point of the first model parameter with the face size being the average value, one data point
of the first model parameter is acquired. I = {1,2, ..., n}, and n is the number of the face containers.
In S2075, a data point of a second model parameter and a data point of a third model parameter are calculated based on the maximum displacements of the plurality of facial regions, the middle displacement, a predetermined maximum weight, a predetermined minimum weight, and the data point of the first model parameter.
In the case that the facial sizes are the average values, the data points of the model parameters of the adaptive rigid prior model include: the data point of the first model parameter, the data point of the second model parameter, and the data point of the third model parameter.
As shown in
=
and
=
In addition, as the facial region 4 is the nose of the face, the displacement of the nose is least, and the nose is most stable in the case that the user evinces an expression, a greater weight is assigned to the facial region 4. Similarly, the facial region 3 and the facial region 5 include profile points of the face, and is relatively stable, and thus, a greater weight is assigned to the facial region 3 and the facial region 5. The facial region 1 and the facial region 2 are eye portions, the facial region 6 and the facial region 7 are mouth portions, and the displacements of the eye portions and the mouth portions are greater in the case that the user evinces an expression, which may cause instable rigid transformation. Thus, a less weight is assigned to the facial region 1, the facial region 2, the facial region 6, and the facial region 7. The weight ranges of the plurality of facial regions are set as the following table:
In practice, a person skilled in the art is capable of dividing the face into the facial regions and setting the corresponding weight ranges in other ways, the means of dividing the facial regions and setting the weight ranges is not limited in the embodiments of the present disclosure.
In the embodiments of the present disclosure, the data point of the second model parameter, and the data point of the third model parameter are calculated in following formulas:
In above formulas,
represents the data point of the second model parameter with the facial sizes being the average values,
represents the data point of the third model parameter with the facial sizes being the average values,
represents the maximum weight of the facial region k, and
represents the minimum weight of the facial region k.
In S208, the model parameters are acquired by interpolating the data points of the model parameters corresponding to different average values, and the model parameters are determined as new model parameters of the adaptive rigid prior model, wherein the model parameters are functions of the facial sizes.
In some embodiments, the first model parameter, the second model parameter, and the third model parameter are acquired by spline interpolating the data point of the first model parameter, the data point of the second model parameter, and the data point of the third model parameter in the case that the facial sizes are the average values, and the first model parameter, the second model parameter, and the third model parameter are determined as the new model parameters of the adaptive rigid prior model. The first model parameter, the second model parameter, and the third model parameter are the functions of the facial sizes. In some embodiments, one spline interpolation is performed on the data points of the plurality of model parameters at different facial sizes, that is, the data points of the plurality of model parameters are connected by straight segments, so as to acquire the functions of the plurality of model parameters and the facial sizes.
As an average value is determined and used as the face sizes for each face container Ci, and the data points
and
are determined in the case that the facial sizes are si for each face container Ci, the following data points are determined for each face container Ci:
The model parameters
and
of the adaptive rigid prior model
are acquired by interpolating the plurality of data points of the plurality of face containers Ci. The model parameters
and
are functions of the facial sizes s, that is, the facial sizes s is the independent variable. That is, in face tracking, in the case that the facial sizes s are determined, the model parameters
and
of the adaptive rigid prior model
are determined, and the weights of the plurality of facial regions are determined by the adaptive rigid prior model
In S209, whether a condition of stopping updating the model parameters is satisfied is determined.
In the embodiments of the present disclosure, difference values between model parameters acquired by two updates are calculated, whether the difference values are less than a first predetermined threshold is determined, and whether the condition of stopping updating the model parameters is satisfied is determined. That is, whether the difference values between model parameters acquired by two updates are less than the first predetermined threshold is determined, and S210 is performed in the case that the condition of stopping updating the model parameters is satisfied. In the case that the condition of stopping updating the model parameters is not satisfied, the face tracking is further performed on the training video data by the adaptive rigid prior model
(·) with updated model parameters, and the model parameters are further updated based on the data acquired by face tracking.
In practice, the number of updates of the model parameters is counted, and whether the number of updates is greater than a second predetermined threshold is determined. In the case that the number of updates is greater than the second predetermined threshold, the condition of stopping updating the model parameters is satisfied, and thus S210 is performed. In the case that the number of updates is less than or equal to the second predetermined threshold, the process returns to S202.
In S210, updating the model parameters of the adaptive rigid prior model is stopped updating in response to the condition of stopping updating the model parameters being satisfied, and a final adaptive rigid prior model is acquired.
In the case that the condition of stopping updating the model parameters is satisfied, the adaptive rigid prior model with finally updated model parameters is a trained model, and the trained adaptive rigid prior model is deployed on a client, a server, or the like. In face tracking, the adaptive rigid prior model is capable of assigning different weights to different regions based on real time facial sizes of the faces and the displacements of the plurality of facial regions of the faces.
For clearer description of the method for training an adaptive rigid prior model in the embodiments of the present disclosure, illustration is given hereinafter in conjunction with
As shown in
S0, the adaptive rigid prior model
is initialized to be 1.
That is, the model parameters of the adaptive rigid prior model are initialized, such that the weights of the plurality of facial regions of the faces at different facial sizes are 1.
S1, face tracking is performed on a video data F, and facial data
is extracted.
S2, a histogram analysis is performed on
of n containers.
S3, for each container
S4, for each facial region k E K,
=
S5, a histogram analysis is performed on
of m containers.
S6,
is equal to a middle displacement of a maximum container.
S7,
and
S8, for each facial region k E K, discrete points with
discrete points with
and discrete points with
are spline interpolated.
S9,
is updated.
S10, whether to stop training is determined, S11 is performed in response to a result of stopping training, and S12 is performed in response to a result of not stopping training.
S11, model parameters are output.
S12, the face tracking is performed using updated
For details about the parameters in the formula in above example, reference may be made to S205 to S208, which are not described in detail herein.
In the embodiments of the present disclosure, in the case that the model parameters of the adaptive rigid prior model is initialized, the facial key points and the facial expression parameters of the faces in the plurality of frames of the training video data are extracted by face tracking on the training video data, the facial sizes of the faces are determined based on the facial key points, and the displacements of the plurality of facial regions of the faces relative to the neutral face of the faces are calculated using the adaptive rigid prior model, the facial expression parameters, and the predetermined blend shape deformer. The faces are organized to a plurality of face containers based on the facial sizes, and the average values of the facial sizes of the faces are calculated. In the case that the facial sizes are the average values, the data points of the model parameters of the adaptive rigid prior model are acquired by analyzing the displacements of the plurality of facial regions of the faces in the plurality of face containers. The data points of the model parameters corresponding to different average values are interpolated, so as to acquire the model parameters and determine the model parameters as new model parameters of the adaptive rigid prior model. Updating of the model parameters is stopped in response to the condition of stopping updating being satisfied. Otherwise, the model parameters are further updated by face tracking on the training video data. In applying the trained adaptive rigid prior model to face tracking, the adaptive rigid prior model is capable of assigning different weights to different regions based on real-time facial sizes of the faces and the displacements of the plurality of facial regions of the faces, such that rigidity stabilization of face tracking is improved, and the a result of face tracking is achieved.
In some embodiments, the adaptive rigid prior model is a piecewise function, such that a greater weight caused by less displacement of facial region can be avoided, and the stability of face tracking is ensured.
In S301, a plurality of facial key points of a same face, initial facial expression parameters, and initial rigid transformation parameters in a plurality of frames of video data are extracted by face tracking on the video data.
In some embodiments, for the video data for face tracking, the face tracking is performed on the video data to acquire the face in each frame of video data f:
Pf represents a rigid transformation parameter, δƒ represents a facial expression parameter, and Qƒ represents a facial key point.
The rigid transformation parameter Pƒ and the facial expression parameter δƒ are unknown parameters, and facial key point Qƒ is a known parameter and is acquired by detecting the facial key point in face tracking. In the embodiments of the present disclosure, the face is tracked to acquire optimal result of the facial expression parameter δƒ.
In S302, facial sizes of faces in the plurality of frames of video data are determined based on the plurality of facial key points.
In the case that the facial key points are determined, a face frame is determined based on the facial key points, and a length of a diagonal line of the face frame is determined as the facial size of the face in each frame of video data f.
In S303, weights of a plurality of facial regions of the faces are acquired by inputting the facial sizes to a pre-trained adaptive rigid prior model.
In the embodiments of the present disclosure, the adaptive rigid prior model is configured to output the weight of the plurality of facial regions of the faces at different facial sizes, and the adaptive rigid prior model is trained by the method for training the adaptive rigid prior model according to the above embodiments.
In some embodiments, the adaptive rigid prior model is shown as the formula (1) in the above embodiments.
In S304, optimal facial expression parameters are acquired based on the weights of the plurality of facial regions, the plurality of facial key points, the initial rigid transformation parameters, and the initial facial expression parameters, and the optimal facial expression parameters are determined as results of face tracking on the plurality of frames of video data.
In some embodiments, neutral face data of the faces is acquired, three-dimension face models of the faces in the plurality of frames video data are generated based on the neutral face data, the initial facial expression parameters, and a predetermined blend shape deformer, and the three-dimension face models are rigidly transformed. Two-dimensional facial data is acquired by performing, based on the initial rigid transformation parameters, two-dimensional projection on the three-dimension face models, and the two-dimensional facial data includes a plurality of projected two-dimensional key points in one-to-one correspondence to the plurality of facial key points. Distances between the plurality of projected two-dimensional key points and the corresponding plurality of facial key points are calculated, and a rigid transformation optimization function and a facial expression optimization function are constructed based on the weights of the plurality of facial regions of the plurality of projected two-dimensional key points and the distances. Optimal rigid transformation parameters are acquired by optimizing the rigid transformation optimization function, optimal facial expression parameters are acquired by optimizing, based on the optimal rigid transformation parameters, the facial expression optimization function, and the optimal facial expression parameters are determined as the results of face tracking on the plurality of frames of video data.
Referring to S204 in the above embodiments, in the case that the three-dimension face is constructed, the weight
is calculated by the adaptive rigid prior model, the rigid transformation optimization function (2) and the facial expression optimization function (3) are constructed, and the optimal facial expression parameter δƒ is acquired by sequentially optimizing the function (2) and function (3). For details, reference may be made to the above embodiments, which are thus not repeated herein.
In the case that the facial expression parameter δƒ is acquired by face tracking, a face vision enhancement special effect is achieved based on the facial expression parameter δƒ, for example, adding hats, glasses, beards, tattoos, and the like to the face.
In the embodiments of the present disclosure, the weights of the plurality of facial regions are acquired by the adaptive rigid prior model, the optimal results of the facial expression parameters are acquired by constructing rigid transformation based on the weights, and the optimal results of the facial expression parameters are determined as the results of face tracking. In the embodiments of the present disclosure, the adaptive rigid prior model is capable of assigning different weights to different regions based on real time facial sizes of the faces and the displacements of the plurality of facial regions of the faces, such that rigidity stabilization of face tracking is improved, and a robust result of face tracking is achieved.
The apparatus for training the adaptive rigid prior model is capable of performing the method for training the adaptive rigid prior model according to the above embodiments, and includes corresponding function modules and achieves the beneficial effects of performing the method.
The apparatus for tracking the faces according to the embodiments of the present disclosure is capable of performing the method for tracking the faces in the third embodiment, and has corresponding function module and the beneficial effects of performing the method.
A computer-readable storage medium is provided in the embodiments of the present disclosure. One or more instructions in the storage medium, when loaded and executed by a processor of an electronic device, cause the electronic device to perform the method for training the adaptive rigid prior model and/or the method for tracking the faces according to the above embodiments of the present disclosure.
It should be noted that, for the embodiments of the apparatus, the electronic device, and the storage medium, descriptions are concise because these embodiments are similar to the embodiments of the method, and reference may be made to the description of the embodiments of the method.
In the description of the specification, the terms “one embodiment,” “some embodiments,” “an example,” “some examples” and the like indicates that the features, structures, materials, or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present disclosure. In the specification, the illustrative description of above terms does not necessarily refer to the same embodiment or example. Furthermore, the described features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
It is noted that above descriptions are merely some embodiments and technical principle of the present disclosure. A person skilled in the art can understand that the present disclosure is not limited in specific embodiments herein, and a person skilled in the art can make some apparent changes, re-adjustments, and substitutes without departing from the protection scope of the present disclosure. Thus, although detailed description of the present disclosure is made in above embodiments, the present disclosure is not limited in above embodiments. Other equivalent embodiments are included without departing from the conception of the present disclosure, and the scope of the present disclosure is determined by appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202010355935.7 | Apr 2020 | CN | national |
This application is a U.S. National Stage of International Application No. PCT/CN2021/087576, filed on Apr. 15, 2021, which is based on and claims priority to Chinese Patent Application No. 202010355935.7 filed on Apr. 29, 2020, and the disclosures of which are herein incorporated by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/087576 | 4/15/2021 | WO |