The present disclosure relates to the field of video processing technologies, and in particular, relates to a face tracking method and an electronic device.
Video-based face tracking is used for achieving some vision enhancement effects, such as, trying on hats and glasses in videos and adding moustaches and tattoos, and is also used for driving expression motions of virtual figures.
A two-dimensional face in a video is a two-dimensional projection of a three-dimensional face acquired by combing a facial identity and expressions of a user. In a real-time face tracking process, the facial identity and expressions of the user can be accurately calculated based on two-dimensional keypoints on the face in the video, thus the facial identity is required to be quickly reconstructed to perform face tracking.
The present disclosure provides a face tracking method and an electronic device.
A face tracking method is provided in the present disclosure. The face tracking method is applicable to a tracking thread, wherein a first keyframe data set and a second keyframe data set are maintained on the tracking thread. The face tracking method includes:
A face tracking method is provided in the present disclosure. The face tracking method is applicable to an optimization thread. The method includes:
A face tracking apparatus is provided in the present disclosure. The face tracking apparatus is applicable to a tracking thread, wherein a first keyframe data set and a second keyframe data set are maintained on the tracking thread. The face tracking apparatus includes: an optimization thread running determining module, a second keyframe data set updating module, a clearing module, a first keyframe data set updating module, and an optimization thread invoking module.
The optimization thread running determining module is configured to determine, in a process of tracking a face in a video frame, whether an optimization thread is running.
The second keyframe data set updating module is configured to, in response to the optimization thread running and the video frame being a keyframe, update the second keyframe data set based on the video frame.
The clearing module is configured to, in response to receiving a clear instruction for clearing a video frame in the second keyframe data set from the optimization thread, clear the video frame in the second keyframe data set, and update the second keyframe data set to the first keyframe data set.
The first keyframe data set updating module is configured to, in response to the optimization thread not running and the video frame being the keyframe, update the first keyframe data set based on the video frame and the second keyframe data set.
The optimization thread invoking module is configured to make the optimization thread optimize a facial identity based on the first keyframe data set by invoking the optimization thread upon updating the first keyframe data set.
An electronic device for tracking a face is provided in the present disclosure. The electronic device includes:
The present disclosure is described hereinafter with reference to the accompanying drawings and embodiments. The specific embodiments described herein are merely illustrative of the present disclosure. For convenience of description, only portions relevant to the present disclosure are shown in the accompanying drawings.
In a real-time facial identity reconstruction process, the facial identity is reconstructed by extracting keyframes from a real-time video of the user, which is very convenient for the user. As video data increases, the facial identity is optimized based on more keyframes, such that errors are reduced. However, in the facial identity reconstruction process, a keyframe cannot be added by a tracking thread during running of a facial identity optimization thread. The keyframe is added by the tracking thread after the running of the facial identity optimization thread ends. As a result, the keyframe cannot be added to optimize the facial identity even if the tracking thread detects a keyframe during running of the facial identity optimization thread. The facial identity optimization thread is invoked once every time one keyframe is added. In one aspect, a quantity of the keyframes is increased slowly and a keyframe can be omitted, such that the facial identity optimization converges slowly and the facial identity is inaccurate. In another aspect, the number of invocations of the face optimization thread is increased. Due to the above-mentioned two reasons, instantaneity of the real-time facial identity reconstruction is poor, such that the real-time facial identity reconstruction is not applicable to real-time face tracking.
In S101, whether an optimization thread is running is determined in a process of tracking a face in a video frame.
In the embodiment of the present disclosure, a face in the video frame is tracked by using the tracking thread and an optimization thread. The tracking thread is a thread configured to acquire face tracking data, such as face posture data, expression data, and a face keypoint, by tracking the face in the video frame of a video. The tracking thread is further configured to detect whether the video frame is a keyframe. The optimization thread is a thread configured to optimize a facial identity vector based on the keyframe detected by the tracking thread. The optimization thread provides an optimized facial identity vector for the tracking thread, and the tracking thread tracks a face in the video frame based on the optimized facial identity vector and detects the keyframe.
Data are interacted between the tracking thread and the optimization thread, such that the tracking thread is able to determine whether the optimization thread is running. For example, the optimization thread is provided with a state identifier. The tracking thread determines, based on the state identifier, whether the optimization thread is running. In response to the optimization thread running, S102 is performed. In response to the optimization thread not running, S104 is performed.
In S102, a second keyframe data set is updated based on the video frame in response to the video frame being a keyframe.
A first keyframe data set ϕ1 and the second keyframe data set ϕ2 are maintained on the tracking thread in the embodiment of the present disclosure. Each keyframe data set includes a keyframe set, a frame vector of a keyframe, and face tracking data of the keyframe.
In some embodiments, prior to tracking the face in the video, the tracking thread initializes various parameters, such as the facial identity vector, the first keyframe data set ϕ1, and the second keyframe data set ϕ2. A current facial identity vector is a facial identity vector currently used for tracking the video frame. The tracking thread acquires the face tracking data upon tracking the face in the video frame based on the current facial identity vector. The face tracking data is face posture data, face expression data, and a face keypoint. That is, in the process of tracking the face, in a case that the current facial identity vector α is given, the tracking thread acquires the following face tracking data upon tracking the face in an ith video frame:
{Qi|Pi,δi};
In the embodiment of the present disclosure, the optimization thread optimizes the facial identity vector based on the face tracking data of the keyframe in the first keyframe data set, and the first keyframe data set is not updated in the running process of the optimization thread.
In some embodiments of the present disclosure, the tracking thread further maintains a first principal components analysis (PCA) subspace and a second PCA subspace. A PCA subspace is a space composed of frame vectors of all keyframes, an average frame vector, and a feature vector matrix of a keyframe set in a keyframe data set. The first PCA subspace is a space composed of the frame vectors of all keyframes, an average frame vector, and a feature vector matrix of a first keyframe set in the first keyframe data set. Before the optimization thread is initiated, the first PCA subspace is assigned to the second PCA subspace. Whether a current video frame is a keyframe is detected based on the face tracking data of the current video frame and the second PCA subspace. In a case that the current video frame Fi is the keyframe, the video frame Fi is added as the keyframe to a second keyframe set of the second keyframe data set, and a frame vector of the video frame is added to the second keyframe data set; and the second PCA subspace is updated, that is, ϕ2=ϕ2 ∪{Fi}; and whether a next video frame is the keyframe is continued to be detected based on the updated second PCA subspace.
In some embodiments, in the process of tracking the face, the tracking thread calculates a frame vector of each video frame based on the expression data δ and a rotation vector in the posture data, and acquires an average frame vector v0 and a feature vector matrix M by performing PCA analysis on the frame vectors of all the keyframes in the first keyframe data set and retaining a variation of 95%. The frame vectors of all the keyframes, the average frame vector v0, and the feature vector matrix M are determined as the first PCA subspace. Upon assigning the first PCA subspace to the second PCA subspace, a distance from a frame vector v of any one video frame to the second PCA subspace is:
dis(v,M)=∥v−(v0+MMT(v−v0)∥.
In response to the distance dis (v, M) being less than a predetermined threshold ε1, the video frame is determined as the keyframe. The video frame is added to the second keyframe set of the second keyframe data set. The frame vector of the keyframe is added to the second keyframe data set, and the second PCA subspace is updated based on the frame vector of the keyframe. In response to the distance dis (v, M) being not less than the predetermined threshold ε1, a next video frame is tracked.
In practical applications, it is not limited to using a PCA subspace to determine whether the video frame is the keyframe. A person skilled in the art also determines whether the video frame is the keyframe by using other methods, which is not limited in the embodiments of the present disclosure.
In S103, in response to receiving a clear instruction for clearing the video frame in the second keyframe data set from the optimization thread, the video frame in the second keyframe data set is cleared, and the second keyframe data set is updated to the first keyframe data set.
In a case that the optimization thread performs iterative optimization on the facial identity vector, upon each iterative optimization, the optimization thread determines whether the second keyframe set in the second keyframe data set is a non-empty set. In response to the second keyframe set in the second keyframe data set being a non-empty set, it indicates that the tracking thread has tracked the keyframe in the iterative optimization process performed by the optimization thread. The optimization thread sends the instruction for clearing the video frame in the second keyframe data set to the tracking thread. Upon receiving the clear instruction, the tracking thread updates the keyframe in the second keyframe set of the second keyframe data set to the first keyframe set of the first keyframe data set, clears the second keyframe set, and adds the frame vector of the keyframe in the second keyframe set to the first keyframe set, such that the optimization thread that is running performs the iterative optimization on the facial identity vector based on face tracking data of the newly tracked keyframe. In this way, keyframe is added in the running process of the optimization thread; a phenomenon of omission of the keyframe caused by the following fact is avoided: the keyframe has been tracked but cannot be added; and the keyframe adding speed is increased. A plurality of keyframes are added in the running process of the optimization thread, and the number of times of invoking the optimization thread is reduced.
In S104, the first keyframe data set is updated based on the video frame and the second keyframe data set in response to the video frame being the keyframe.
In response to the optimization thread not running, whether the video frame is the keyframe is determined based on the first PCA subspace and the face tracking data. The way to determine whether the video frame is the keyframe is the same as the way to determine whether the video frame is the keyframe based on the second PCA subspace and the face tracking data, which is not described detailed herein.
In response to the video frame being determined as the keyframe, the video frame is added as the keyframe to the first keyframe set of the first keyframe data set, and the frame vector of the video frame is added to the first keyframe data set. At the same time, the frame vector of the video frame is added to the first PCA subspace, and the average frame vector and the feature vector matrix of the first PCA subspace are updated; and a next video frame is tracked. In response to the video being not the keyframe, it is determined whether the second keyframe set of the second keyframe data set is the non-empty set. In response to the second keyframe set of the second keyframe data set being the non-empty set, the keyframe in the second keyframe set is updated to the first keyframe set, and the frame vector of the keyframe in the second keyframe set is updated to the first keyframe data set.
In S105, the optimization thread is made to optimize a facial identity based on the first keyframe data set by invoking the optimization thread upon updating the first keyframe data set.
The update of the first keyframe data set indicates that a new keyframe has been tracked. The second keyframe set of the second keyframe data set is the non-empty set, which also indicates that a new keyframe has been tracked, and the first keyframe data set is also updated. In response to the optimization thread not running and the first keyframe data set being updated, the optimization thread is invoked. The optimization thread optimizes the facial identity vector based on the facial tracking data of the keyframe of the first keyframe data set.
According to the face tracking method of the embodiment of the present disclosure, in the running process of the optimization thread, the second keyframe data set is updated based on the detected keyframe; in response to receiving the instruction from the optimization thread for clearing the video frame in the second keyframe data set, the second keyframe data set is updated to the first keyframe data set, and the video frame in the second keyframe data set is cleared. In response to the optimization thread not running and the video frame being the keyframe, the first keyframe data set is updated based on the video frame and the second keyframe data set. That is, the tracking thread adds the keyframe at any time no matter whether the optimization thread is running. In this way, a large number of keyframes are quickly extracted to optimize the facial identity vector, which increases the convergence speed of the facial identity vector. And it is not necessary to invoke the optimization thread once one keyframe is detected, such that the number of times of invocations of the optimization thread is reduced. The high instantaneity of the real-time facial identity vector optimization is achieved, a few resources are consumed, and more resources are used for achieving a complex optimization algorithm to improve an accuracy of facial identity optimization.
In S201, a received facial identity vector is determined as a current facial identity vector in response to receiving a facial identity vector from the optimization thread.
In the embodiment of the present disclosure, the tracking thread provides the keyframe for the optimization thread. The optimization thread optimizes the facial identity vector based on the keyframe. Upon acquiring a converged facial identity vector by performing iterative optimization on the facial identity vector, the optimization thread sends the facial identity vector to the tracking thread. The tracking thread determines the received facial identity vector as the current facial identity vector, and performs the face tracking on the received video frame based on the current facial identity vector.
In S202, a face keypoint, posture data, and expression data of a face in the video frame are acquired as face tracking data by tracking the video frame based on the current facial identity vector.
In some embodiments, prior to tracking the face tracking in the video, the tracking thread initializes various parameters, such as the facial identity vector. The current facial identity vector is a facial identity vector currently used for tracking the video frame. The tracking thread acquires the face tracking data upon tracking the face in the video frame based on the current facial identity vector. The face tracking data is the face posture data, the face expression data, and the face keypoint. That is, in the face tracking process, in the case that the current identity vector α is given, the tracking thread acquires the following face tracking data upon tracking the face tracking in an ith video frame:
{Qi|Pi,δi};
In S203, whether the optimization thread is running is determined.
Data are interacted between the tracking thread and the optimization thread, such that the tracking thread acquires whether the optimization thread is running. For example, the optimization thread is provided with a state identifier. The tracking thread determines, based on the state identifier, whether the optimization thread is running. In response to the optimization thread running, S204 is performed. In response to the optimization thread not running, S212 is performed.
In S204, the first PCA subspace is assigned to the second PCA subspace prior to invoking the optimization thread, wherein the first PCA subspace is a space composed of frame vectors of all keyframes, an average frame vector, and a feature vector matrix of a first keyframe set in the first keyframe data set.
In the embodiment of the present disclosure, before the optimization thread is running at each time, the first PCA subspace is assigned to the second PCA subspace. The first PCA subspace is a space composed of the frame vectors of all the keyframes, the average frame vector, and the feature vector matrix of the first keyframe set in the first keyframe data set.
In S205, whether the video frame is the keyframe is determined based on the second PCA subspace and the posture data and the expression data of the video frame.
In some embodiments, a frame vector of the video frame is calculated based on the posture data and the expression data, and a distance between the frame vector of the video frame and the second PCA subspace is calculated. In response to the distance being less than a predetermined threshold, it is determined that the video frame is the keyframe, and S206 is performed. In response to the distance being not less than the predetermined threshold, it is determined that the video frame is not the keyframe, and S209 is performed.
For example, in the face tracking process, the tracking thread calculates a frame vector v of each video frame based on the expression data δ and rotation vector in posture data of the face in the video frame, and acquires an average frame vector v0 and a feature vector matrix M. M is the first PCA subspace. Upon assigning the first PCA subspace to the second PCA subspace, a distance from a frame vector v of any one video frame to the second PCA subspace is:
dis(v,M)=∥v−(v0+MMT(v−v0)∥;
In S206, the second keyframe data set is updated based on the video frame.
In some embodiments, the current video frame is added as the keyframe to the second keyframe set of the second keyframe data set, and the face keypoint, the posture data, the expression data, and frame vector of the video frame are added to the second keyframe data set. S207 and S210 are performed. The frame vector of the video frame is calculated based on the posture data and expression data of the video frame and the frame vector is added to the second keyframe data set.
In some other embodiments, the second keyframe set of the second keyframe data set is further provided with an identifier. Whether the identifier of the second keyframe set is a predetermined first identifier is determined. The predetermined first identifier indicates that the second keyframe set is an empty set. For example, the predetermined first identifier is 1, and the first identifier indicates that the second keyframe set is an empty set. In response to the predetermined first identifier being not 1, but 0, it indicates that the second keyframe set is a non-empty set. A detected keyframe is directly added to the second keyframe set without changing the identifier. In response to the identifier of the second keyframe set being the predetermined first identifier which is 1, it indicates that the second keyframe set is an empty set. The identifier of the second keyframe set is set to a predetermined second identifier which is 0 after the keyframe is detected and added to the second keyframe set. By setting the identifier of the second keyframe set, it is convenient for the optimization thread to determine whether the second keyframe set is the non-empty set, such that the optimization thread timely sends, in response to the second keyframe set being the non-empty set, a clear instruction to the tracking thread in the optimization process to combine the first keyframe set with the second keyframe set, and optimizes the facial identity vector based on a newly detected keyframe.
In S207, the threshold is updated based on a predetermined step size.
A dynamic threshold is used in the embodiment of the present disclosure, that is, the threshold is updated in real time. For example:
ε1=ε1+ε0;
In S208, the second PCA subspace is updated based on the frame vector in the second keyframe data set.
Upon adding the frame vector, the face keypoint, the posture data, and the expression data of the video frame to the second keyframe data set, the frame vector of the video frame is added to the second PCA subspace. The average frame vector and the feature vector matrix of the second PCA subspace are recalculated to update the second PCA subspace. Whether a next video frame is the keyframe is continued to be determined based on the updated second PCA subspace. Whether a video frame is the keyframe is determined based on the first PCA subspace after the running of the optimization thread is ended.
In S209, whether a facial identity vector is received from the optimization thread is determined.
The tracking thread needs to track a next video frame upon tracking one video frame. Before the tracking of the next video frame, whether a facial identity vector is received from the optimization thread is determined. In response to receiving the facial identity vector from the optimization thread, it indicates that the facial identity vector is optimized and updated, and S201 is re-performed to determine the received facial identity vector as the current facial identity vector.
In response to failing to receive the facial identity vector from the face optimization thread, S202 is re-performed, which indicates that the facial identity vector is not updated and optimized, and the next video frame is continued to be tracked based on the current facial identity vector.
In S210, in response to receiving a clear instruction for clearing video frames in the second keyframe data set from the optimization thread, the video frame in the second keyframe data set is cleared, and the second keyframe data set is updated to the first keyframe data set.
In the case that the optimization thread performs iterative optimization on the facial identity vector, upon each iterative optimization, the optimization thread determines whether the second keyframe set of the second keyframe data set is the non-empty set. In response to the second keyframe set of the second keyframe data set being the non-empty set, it indicates that the tracking thread has tracked the keyframe in the iterative optimization process of the optimization thread. The optimization thread sends the instruction for clearing the second keyframe set to the tracking thread. Upon receiving the instruction for clearing the second keyframe set, the tracking thread updates the tracked keyframe in the second keyframe set to the first keyframe set, and clears the second keyframe set, such that the optimization thread that is running performs the iterative optimization on the facial identity vector based on the newly tracked keyframe. In this way, the keyframe is added in the running process of the optimization thread; a phenomenon of omission of the keyframe caused by the following fact is avoided: the keyframe has been tracked but cannot be added; and the keyframe adding speed is increased. And a plurality of keyframes are added in the running process of the optimization thread, and the number of times of invoking the optimization thread is reduced.
In S211, the first PCA subspace is updated based on the frame vector in the first keyframe data set.
Upon the frame vector in the second keyframe data set is added to the first keyframe data set, the added frame vector is added to the first PCA subspace, and an average frame vector and a feature vector matrix of the first PCA subspace are recalculated to update the first PCA subspace.
In S212, whether the video frame is the keyframe is determined based on the first PCA subspace and the face tracking data.
In response to the optimization thread not running, the frame vector of the video frame is calculated based on the posture data and the expression data of the face in the video frame, and a distance between the frame vector of the video frame and the first PCA subspace is calculated. In response to the distance being less than the predetermined threshold, it is determined that the video frame is the keyframe, and S213 is performed; and in response to the distance being greater than or equal to the predetermined threshold, it is determined that the video frame is not the keyframe, and S216 is performed.
Whether the video frame is the keyframe is determined based on the first PCA subspace and the face tracking data, which is referred to S205 and is not detailed herein.
In S213, the first keyframe data set is updated based on the video frame and the second keyframe data set.
In response to the optimization thread not running and the video frame being detected as the keyframe, the video frame is added to the first keyframe set of the first keyframe data set, and the frame vector, the face keypoint, the posture data, and the expression data of the video frame are added to the first keyframe data set. The keyframe in the second keyframe set of the second keyframe data set is added to the first keyframe set, and the frame vector, the face keypoint, the posture data, and the expression data of the keyframe in the second keyframe data set are added to the first keyframe data set.
In S214, the threshold is updated based on the predetermined step size.
For example:
ε1=ε1+ε0;
In S215, the first PCA subspace is updated based on the frame vector in the first keyframe data set.
As the frame vector of the detected keyframe is added to the first keyframe data set or the frame vector of the keyframe in the second keyframe data set is added to the first keyframe data set, the newly added frame vector is added to the first PCA subspace, and the average frame vector and the feature vector matrix are calculated based on the frame vector in the first PCA subspace to update the first PCA subspace.
In S216, whether the second keyframe set is the non-empty set is determined.
In response to the tracking thread detecting that the current video frame is not the keyframe, the tracking thread determines whether the second keyframe set is the non-empty set. In response to the second keyframe set being the non-empty set, S217 is performed. In response to the second keyframe set being the empty set, S209 is re-performed.
In S217, the second keyframe data set is updated to the first keyframe data set, and the first PCA subspace is updated.
The keyframe in the second keyframe set in the second keyframe data set is added to the first keyframe set, and the frame vector, the face keypoint, the posture data, and the expression data of the keyframe in the second keyframe data set are added to the first keyframe data set. The first PCA subspace is updated based on the newly added frame vector, and whether a next video frame is the keyframe is continued to be detected based on the updated first PCA subspace.
In S218, the optimization thread is invoked upon updating the first keyframe data set.
In the embodiment of the present disclosure, in response to the optimization thread not running and the optimization thread being invoked upon updating the first keyframe set, the optimization thread optimizes the facial identity vector based on the posture data and expression data of the keyframe in the first keyframe data set.
In S219, a face change rate is calculated based on two adjacent facial identity vectors received from the optimization thread.
The optimization thread sends the optimized facial identity vector to the tracking thread upon each invocation. It is assumed that F(α) is a face mesh corresponding to the facial identity vector α and s is a diagonal length of a minimum circumscribed rectangle of the three-dimensional average face. In a case that the two adjacent facial identity vectors are changed from α1 to α2, the face change rate is calculated as follows:
That is, the face change rate is acquired by dividing a maximum movement amount among j vertexes of the face mesh by the diagonal length of the minimum circumscribed rectangle of the three-dimensional average face.
In S220, whether the face change rate is less than a predetermined change rate threshold is determined.
The face change rate is determined as follows:
In response to the above formula being true, S221 is performed, and in response to the above formula being not true, S202 is re-performed to continue to extract the keyframe in the face tracking process, such that the optimization thread uses more keyframes to optimize the facial identity vector.
In S221, determining whether the video frame is the keyframe is stopped in the process of tracking the video frame based on the current facial identity vector, and the invocation of the optimization thread is skipped.
In response to the face change rate being less than the predetermined change rate threshold, the facial identity vector has already converged, and the tracking thread tracks the video frame based on the current facial identity vector, and keyframes are not detected in the tracking process, that is, the processes from S203 to S221 are not performed.
According to the face tracking method of the embodiment of the present disclosure, in the running process of the optimization thread, the second keyframe data set is updated based on the detected keyframe; in response to receiving the instruction from the optimization thread for clearing the video frame in the second keyframe data set, the second keyframe data set is updated to the first keyframe data set, and the video frame in the second keyframe data set is cleared. In response to the optimization thread not running and the video frame being the keyframe, the first keyframe data set is updated based on the video frame and the second keyframe data set. That is, the tracking thread adds the keyframe at any time no matter whether the optimization thread is running. In this way, a large number of keyframes are quickly extracted to optimize the facial identity vector, which increases the convergence speed of the facial identity vector. And it is not necessary to invoke the optimization thread once one keyframe is detected, such that the number of times of invocations of the optimization thread is reduced. The high instantaneity of the real-time facial identity vector optimization is achieved, a few resources are consumed, and more resources are used for achieving a complex optimization algorithm to improve an accuracy of facial identity optimization.
In response to receiving the instruction for clearing the second keyframe set from the optimization thread, the second keyframe data set is updated to the first keyframe data set, and the second keyframe set in the second keyframe data set is cleared, such that the optimization thread uses the newly detected keyframe to optimize the facial identity vector in the iteration process. The newly detected keyframe is used to optimize the facial identity vector without invoking the optimization thread again upon the end of the optimization performed by the optimization thread, which reduces the number of times of invocations of the optimization thread.
The threshold is updated based on the predetermined step size every time a new keyframe is detected. As the number of keyframes increases, the threshold gradually increases, and the video frame is less likely to be detected as the keyframe, such that an updating frequency of the first keyframe set in the first keyframe data set decreases, and the number of times of invocations of the optimization thread decreases.
In S301, a current facial identity vector used by a tracking thread is determined as an initial facial identity vector upon invoking the optimization thread.
Upon invoking the optimization thread, the current facial identity vector used by the tracking thread is acquired, and the current facial identity vector is determined as an optimized initial facial identity vector. In one example, the optimization thread determines a facial identity vector that is output to the tracking thread at the last time as the initial facial identity vector.
In S302, a first keyframe data set is acquired, wherein the first keyframe data set is a data set updated by the tracking thread upon performing face tracking, and the first keyframe data set includes face tracking data.
The tracking thread maintains the first keyframe data set, wherein the first keyframe data set includes face tracking data of all keyframes detected prior to invoking the optimization thread, and the face tracking data includes a face keypoint, posture data, and expression data.
In S303, optimized face tracking data is acquired by optimizing the face tracking data in the first keyframe data set based on the initial facial identity vector.
In some embodiments of the present disclosure, a three-dimensional face model is constructed based on the initial facial identity vector and the expression data, and a face keypoint of the three-dimensional face model is acquired. Optimized posture data and optimized expression data are acquired as the optimized face tracking data by solving optimal posture data and optimal expression data based on the face keypoint of the three-dimensional face model and the face keypoint in the face tracking data.
S304, an optimized facial identity vector is acquired by performing iterative optimization on the initial facial identity vector based on the optimized face tracking data.
In one example, a face size of a tracked face is calculated based on the face keypoint; an expression weight of each keyframe is calculated based on the expression data of each keyframe; an optimization equation is established based on the face tracking data, the face size, the expression weight, the current facial identity vector, and the initial facial identity vector; and the optimized facial identity vector is acquired by iteratively solving the optimization equation.
In S305, upon each iteration, whether an iteration stop condition is satisfied is determined based on the optimized facial identity vector and the initial facial identity vector.
A face change rate is calculated based on the optimized facial identity vector and the initial facial identity vector. In response to the face change rate being less than a predetermined threshold, the iteration is stopped, and the optimized facial identity vector is determined as a result of optimization of current invocation. S306 is performed in response to the iteration stop condition being satisfied, and S307 is performed in response to the iteration stop condition being not satisfied.
In S306, the optimized facial identity vector is sent to the tracking thread, such that the tracking thread determines a received facial identity vector as a current facial identity vector upon receiving the optimized facial identity vector.
In response to acquiring the optimized facial identity vector upon the stopping of the iterative optimization of the optimization thread in one invocation, the facial identity vector is sent to the tracking thread. The tracking thread receives the facial identity vector and determines the facial identity vector as the current facial identity vector.
In S307, a clear instruction for clearing a video frame in a second keyframe data set is sent to the tracking thread, such that the tracking thread updates, in response to a second keyframe set of the second keyframe data set being a non-empty set, the second keyframe data set to the first keyframe data set upon receiving the clear instruction.
In response to the iteration stop condition being not satisfied and the tracking thread having detected a new keyframe and added the new keyframe to the second keyframe set of the second keyframe data set, the instruction for clearing the second keyframe set is sent to the tracking thread upon ending each iteration, such that the tracking thread adds the keyframe in the second keyframe set to the first keyframe set and clears the second keyframe set, and the tracking thread uses the newly detected keyframe to optimize the facial identity vector in the next optimization.
In S308, the optimized facial identity vector is determined as an initial facial identity vector.
Upon each iterative optimization, in response to the iteration stop condition being not satisfied, the optimization thread determines the optimized facial identity vector acquired in the current optimization iteration as the initial facial identity vector of a next iteration, and S302 is re-performed to continue to perform iterative optimization on the facial identity vector.
According to the face tracking method of the embodiments of the present disclosure, upon invoking the optimization thread, the current facial identity vector used by the tracking thread is determined as the initial facial identity vector, the first keyframe data set is acquired, and the optimized face tracking data is acquired by optimizing the face tracking data in the first keyframe data set based on the initial facial identity vector. The optimized facial identity vector is acquired by performing the iterative optimization on the initial facial identity vector based on the optimized face tracking data, and upon each iteration, whether the iteration stop condition is satisfied is determined based on the optimized facial identity vector and the initial facial identity vector. In response to the iteration stop condition being satisfied, the optimized facial identity vector is sent to the tracking thread, such that the tracking thread determines the received facial identity vector as the current facial identity vector in response to receiving the optimized facial identity vector. In response to the iteration stop condition being not satisfied, the clear instruction for clearing the video frames in the second keyframe data set is sent to the tracking thread, such that the tracking thread updates, in response to the second keyframe set of the second keyframe data set being the non-empty set, the second keyframe set to the first keyframe set upon receiving the clear instruction. The optimized facial identity vector is determined as the initial facial identity vector to continue to optimize the facial identity vector. Thus, the optimization thread uses the keyframe newly added to the second keyframe set to optimize the facial identity vector in the iterative optimization process, and it is not necessary to add a new keyframe to optimize the facial identity vector until each invocation of the optimization thread is ended. In this way, a large number of keyframes are quickly extracted to optimize the facial identity vector, which increases the convergence speed of the facial identity vector. And it is not necessary to invoke the optimization thread once one keyframe is detected, such that the number of times of invocations of the optimization thread is reduced. The high instantaneity of the real-time facial identity vector optimization is achieved, a few resources are consumed, and more resources are used for achieving a complex optimization algorithm to improve the accuracy of facial identity optimization.
In S401, a current facial identity vector used by a tracking thread is determined as an initial facial identity vector upon invoking the optimization thread.
For example, the current facial identity vector is αpre. Before one invocation of the optimization thread is ended, the tracking thread tracks a face in a received video frame based on the current facial identity vector αpre.
In S402, a first keyframe data set is acquired, wherein the first keyframe data set is a data set updated by the tracking thread upon performing face tracking, and the first keyframe data set includes face tracking data.
In S403, optimized face tracking data is acquired by optimizing the face tracking data in the first keyframe data set based on the initial facial identity vector.
In some embodiments, a three-dimensional face model is constructed based on the initial facial identity vector and the expression data, and a face keypoint of the three-dimensional face model is acquired. Optimized posture data and optimized expression data are acquired as the optimized face tracking data by solving optimal posture data and optimal expression data based on the face keypoint of the three-dimensional face model and the face keypoint in the face tracking data.
In a case that a facial identity vector α is given, the three-dimensional face model Fi of an ith video frame is expressed as:
F
i
=F
i(α,δ)=C0+Cexpδ (1)
For a PCA three-dimensional face model:
F
i
=F
i(α,δ)=B+BIDα+Bexp(α)δ (2)
It is possible to provide that:
C
0
=B+B
ID
α,C
exp
=B
exp(α);
wherein B represents an average face; BID represents an identity shape fusion deformer of the user; and Bexp represents an expression shape fusion deformer designed for the average face B. B, BID, and Bexp are predetermined.
For a bilinear three-dimensional face model:
F
i
=F
i(α,δ)=C⊗2α⊗3δ (3)
wherein C represents the expressionless neutral face of the user, and ⊗2 represents a modal product.
Optimal Pi,δi is acquired by solving the following optimization equation (4) based on an input face keypoint Qi:
(Pik,δik)=argmin(Σj∥ΠP
wherein k represents a kth iteration; C0k-1 represents a neutral face used in the kth iteration; Cexpk-1 represents an expression shape fusion deformer used in the kth iteration; ΠP
{Qi|Pi,δi};
wherein Qi represents a face keypoint; Pi represents the posture data; and δi represents the expression data.
In S404, a face size of a tracked face is calculated based on the face keypoint.
A minimum circumscribed rectangular frame of the face is determined based on the face keypoint, and a face size f 1 of the face is calculated based on the face keypoint on the minimum circumscribed rectangular frame.
In S405, an expression weight of each keyframe is calculated based on the expression data of each keyframe.
In some embodiments, the first keyframe set includes a plurality of keyframes. The expression data of each of the keyframes is acquired upon tracking the face in the plurality of keyframes, such that minimum expression data is determined from the expression data of all the keyframes; and the expression weight of each of the keyframes is calculated based on a predetermined constant term, the minimum expression data, and the expression data of the keyframe, wherein the expression weight of the keyframe is negatively correlated with the expression data of the keyframe, as shown in the following formula:
The expression weight of each of the keyframes is calculated according to the following formula:
wherein wik represents an expression weight of a keyframe i in the kth iteration; r represents a constant; δik represents expression data of the keyframe i in the kth iteration; l represents the first keyframe set; and
represents the minimum expression data among all the keyframes.
According to the above formula (5), it is seen that the larger facial expression in a keyframe, the smaller wik. On the contrary, the smaller facial expression, the larger wik. In this way, the influence of the large-expression face on the facial identity vector is reduced.
In S406, the optimized facial identity vector of the face model in the keyframe is acquired by iterative solving based on the face tracking data, the face size, the expression weight, the current facial identity vector, and the initial facial identity vector.
In some embodiments of the present disclosure, an optimization equation is established as follows:
(α,{Pi,δi}i∈l)=argmin(ΣiElwiΣj∥ΠP
wherein l represents the first keyframe set; β1, β2, and γ represent parameters that are predetermined; and αpre represents the current facial identity vector used by the tracking thread.
In one embodiment of the present disclosure, for the bilinear three-dimensional face model Fi=Fi (α,δ)=Cϑ2αϑ3 δ, the formula for iteratively optimizing the facial identity vector αk is as follows:
αk=argmin(Σi∈lwikΣj∥ΠP
In another embodiment, for the PCA three-dimensional face model Fi=Fi (α,δ)=B+BIDα+Bexp (α)δ, the formula for iteratively optimizing the facial identity vector αk is as follows:
αk=argmin(Σi∈lwikΣj∥ΠP
wherein l represents the first keyframe set; β1, β2, and β3 represent parameters that are predetermined; and αpre represents the current facial identity vector used by the tracking thread.
In the above equation (7) and equation (8):
The face size fi is introduced into a regular term β1Σi∈lwikfi∥α∥, such that the parameter β1 is self-adapted to different face sizes; a smooth term β2∥α−αpre∥ of the facial identity vector is introduced, such that the overall convergence speed of the facial identity vector is reduced, but the sudden shaking of the face, upon updating the facial identity vector in the tracking thread, is prevented in a special application scene, such as a face change; β3∥α−αk-1∥ is introduced in the optimization process; and Bexp (αk-1) is used to approximate Bexp (αk), such that the optimization equation (6) is applicable to both the PCA three-dimensional face model and the bilinear three-dimensional face model; and in the optimization process, the weight wi of each of the keyframes is dynamically calculated, such that the weight of the large-expression face is reduced, and the influence of a large expression on the facial identity vector is reduced.
In S407, upon each iteration, a face change rate is calculated based on the facial identity vector acquired by current iteration and the facial identity vector acquired by the previous iteration.
An optimized facial identity vector αk and a facial identity vector αk-1 of the previous iteration are acquired after each iteration is ended, and the face change rate is calculated according to the following formula:
wherein F(α) represents a face mesh corresponding to the facial identity vector α, and s represents a diagonal length of a minimum circumscribed rectangle of the three-dimensional average face, that is, the face change rate is acquired by dividing a maximum movement amount among j vertexes of the face mesh by the diagonal length of the minimum circumscribed rectangle of the three-dimensional average face.
In S408, whether the face change rate is less than a predetermined change rate threshold is determined.
That is:
In response to the above formula being true, it is determined that an iterative solving stop condition is satisfied, and S409 is performed. In response to determining that the iterative solving stop condition is not satisfied, S410 is performed.
In S409, the optimized facial identity vector is sent to the tracking thread, such that the tracking thread determines, in response to receiving the optimized facial identity vector, the received facial identity vector as a current facial identity vector.
The optimized facial identity vector αk is sent to the tracking thread, such that the tracking thread determines, in response to receiving the optimized facial identity vector αk, the received facial identity vector βk as the current facial identity vector to track a video frame.
In some embodiments, the optimization thread further calculates a new user face C0k and a new user face expression shape mixer Cexpk based on the optimized facial identity vector αk, and tracks a video frame based on the optimized facial identity vector αk, the user face C0k and the user face expression shape mixer Cexpk.
The frame vector of each of the keyframes is also updated based on the expression data and posture data of the optimized keyframe in the first keyframe data set; and a first PCA subspace is updated based on the frame vector of each of the keyframes, such that the first PCA subspace is more accurate, and the keyframe is more accurately detected based on the first PCA subspace.
In S410, a clear instruction for clearing a video frame in a second keyframe data set is sent to the tracking thread, such that the tracking thread updates, in response to the second keyframe set of the second keyframe data set being a non-empty set, the second keyframe data set to the first keyframe data set upon receiving the clear instruction.
In S411, the optimized facial identity vector is determined as an initial facial identity vector, and S402 is re-performed.
According to the face tracking method of the embodiments of the present disclosure, upon each iteration, whether the iteration stop condition is satisfied is determined based on the optimized facial identity vector and the initial facial identity vector. In response to the iteration stop condition being satisfied, the optimized facial identity vector is sent to the tracking thread, such that the tracking thread determines the received facial identity vector as the current facial identity vector. In response to the iteration stop condition being not satisfied, the clear instruction for clearing the video frame of the second keyframe data set is sent to the tracking thread, such that upon receiving the clear instruction, the tracking thread updates, in response to the second keyframe set being the non-empty set, the second keyframe set to the first keyframe set, and clears the second keyframe set. The optimized facial identity vector is determined as the initial facial identity vector, and the optimized facial identity vector is continued to be iteratively solved. Thus, the optimization thread uses the keyframe newly added to the second keyframe set to optimize the facial identity vector in the iterative optimization process, and it is not necessary to add a new keyframe to optimize the facial identity vector until each invocation of the optimization thread is ended. In this way, a large number of keyframes are quickly extracted to optimize the facial identity vector, which increases the convergence speed of the facial identity vector. And it is not necessary to invoke the optimization thread once one keyframe is detected, such that the number of times of invocations of the optimization thread is reduced. The high instantaneity of the real-time facial identity vector optimization is achieved, a few resources are consumed, and more resources are used for achieving a complex optimization algorithm to improve the accuracy of facial identity optimization.
The face size fi is introduced into the optimization equation, such that self-adaptation to different face sizes is achieved. The smooth term β2∥α−αpre∥ of the facial identity vector is introduced, such that although the overall convergence speed of the facial identity vector is reduced, the sudden shaking of the face upon updating the facial identity vector in the tracking thread is prevented in special application scenes, such as the face change. β3∥α−αk-1∥ is introduced in the optimization process, and Bexp (αk-1) is used to approximate Bexp (αk), such that the optimization equation is applicable to both the PCA three-dimensional face model and the bilinear three-dimensional face model. In the optimization process, the expression weight of each of the keyframes is dynamically calculated, such that the weight of the large-expression face is reduced, and the influence of a large expression on the facial identity vector is reduced.
an optimization thread running determining module 501, configured to determine, in a process of tracking a face in a video frame, whether an optimization thread is running; a second keyframe data set updating module 502, configured to, in response to the optimization thread running and the video frame being a keyframe, update the second keyframe data set based on the video frame; a clearing module 503, configured to, in response to receiving a clear instruction for clearing a video frame in the second keyframe data set from the optimization thread, clear the video frame in the second keyframe data set, and update the second keyframe data set to the first keyframe data set; a first keyframe data set updating module 504, configured to, in response to the optimization thread not running and the video frame being the keyframe, update the first keyframe data set based on the video frame and the second keyframe data set; and an optimization thread invoking module 505, configured to make the optimization thread optimize a facial identity based on the first keyframe data set by invoking the optimization thread upon updating the first keyframe data set.
The face tracking apparatus according to the embodiment of the present disclosure performs the face tracking methods according to the first embodiment and the second embodiment of the present disclosure, and the face tracking apparatus has corresponding functional modules and effects for performing the face tracking method.
a facial identity vector initializing module 601, configured to determine a current facial identity vector used by a tracking thread as an initial facial identity vector upon invoking the optimization thread; a first keyframe data set acquiring module 602, configured to acquire a first keyframe data set, wherein the first keyframe data set is a data set updated by the tracking thread upon performing face tracking, and the first keyframe data set includes face tracking data; a face tracking data optimizing module 603, configured to acquire optimized face tracking data by optimizing the face tracking data in the first keyframe data set based on the initial facial identity vector; a facial identity vector optimizing module 604, configured to acquire an optimized facial identity vector by performing iterative optimization on the initial facial identity vector based on the optimized face tracking data; an iteration stop determining module 605, configured to, upon each iteration, determine, based on the optimized facial identity vector and the initial facial identity vector, whether an iteration stop condition is satisfied; an iteration stopping module 606, configured to, in response to the iteration stop condition being satisfied, make the tracking thread determine, in response to receiving the optimized facial identity vector, a received facial identity vector as the current facial identity vector by sending the optimized facial identity vector to the tracking thread; a clear instruction sending module 607, configured to, in response to the iteration stop condition being not satisfied, make the tracking thread update, in response to a second keyframe set of the second keyframe data set being a non-empty set, the second keyframe data set to the first keyframe data set upon receiving the clear instruction by sending the clear instruction for clearing video frame in a second keyframe data set to the tracking thread; and an initial facial identity vector updating module 608, configured to determine the optimized facial identity vector as the initial facial identity vector, and return to the first keyframe data set acquiring module 602.
The face tracking apparatus according to the embodiment of the present disclosure performs the face tracking methods according to the third embodiment and the fourth embodiment of the present disclosure, and the face tracking apparatus has corresponding functional modules and effects for performing the face tracking method.
The embodiments of the present disclosure further provide a computer-readable storage medium. One or more instructions are stored in the computer-readable storage medium. The one or more instructions, when executed by a processor of a device, cause the device to perform the face tracking methods as defined in the foregoing method embodiments. The computer-readable storage medium is a non-transitory storage medium.
For the embodiments in regard to the apparatus, the electronic device, and the storage medium, since they are substantially similar to the method embodiments, the descriptions are relatively simple, and for relevant parts, reference are made to parts of the descriptions of the method embodiments.
In the description of the specification, the description referring to the terms “one embodiment,” “some embodiments,” “an example,” “specific examples,” “some examples,” or the like means that specific features, structures, materials or characteristics described in connection with the embodiments or examples are included in at least one embodiment or example of the present disclosure. In the specification, the schematic representations of the above terms are not necessarily intended to refer to the same embodiment or example. In addition, the specific features, structures, materials or characteristics described are combined in any suitable manner in any one or more embodiments or examples.
Number | Date | Country | Kind |
---|---|---|---|
202110007729.1 | Jan 2021 | CN | national |
This application is a U.S. national stage of international application No. PCT/CN2022/070133, filed on Jan. 4, 2022, which claims priority to Chinese Patent Application No. 202110007729.1, filed on Jan. 5, 2021, the disclosures of which are incorporated herein by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/070133 | 1/4/2022 | WO |