The following description relates to a facial expression image processing method and apparatus.
Human facial expression identification is widely used in areas such as entertainment, security, human-robot interaction, and social network analysis. A facial identification model is desired to be trained to identify a human facial expression and thus, training data including a great number of expression images is generally required. However, collecting expression images is relatively costly in terms of time, processing storage, and data transfer resources, and has many limitations. To address these issues, a great number of expression images may be generated using a synthesizing method to train an expression identification model.
However, because an expression change is not linear, the synthesizing method used to generate the expression images may acquire or result in a new expression feature by directly enlarging or reducing an expression feature magnitude already provided, having the effect that an expression image based on the new expression feature acquired in this way is relatively awkward, and greater in size than a real expression image. Thus, with typical technological approaches, an effect of a facial expression image may be unnatural as well as having significantly degraded accuracy in the ultimate recognition.
When an expression is analyzed, an expression space including an expression type (category) and all facial expression conditions may be set in advance. For example, an expression space includes an expression type (category), for example, a type C1, a type C2, and a type C3. Here, similar to the above, with typical technological approaches, in response to respective analyzed expression features α (being associated with a type C1) and k (being greater than 1 or close to 0), kα may easily exceed a range of the type C1 such that an effect of a corresponding human facial expression may be even more unnatural.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
According to a general aspect, a processor implemented method of processing a facial expression image, the method including acquiring an expression feature of each of at least two reference facial expression images; generating a new expression feature based on an interpolation value of the expression feature; and adjusting a target facial expression image based on the new expression feature and creating a new facial expression image.
The acquiring of the expression feature may include analyzing facial key point coordinates of the at least two reference facial expression images; acquiring an expression fitting result by fitting the facial key point coordinates; and setting the expression fitting result as the expression feature.
The method may further include setting one of the at least two reference facial expression images as the target facial expression image.
The generating of the new expression feature may include: acquiring an expression feature of the target facial expression image; and generating the new expression feature based on the expression feature of the target facial expression image and the expression feature of each of the at least two reference facial expression images.
The acquiring of the new expression feature may include: acquiring an interpolation value parameter of the expression feature; and generating the new expression feature based on the interpolation value parameter.
The acquiring of the interpolation value parameter may include substantially randomly generating the interpolation value parameter.
The generating of the new expression feature may include generating the new expression feature based on the following equation:
αn+1=c1α1+c2α2+ . . . +ciαi+ . . . +cnαn
wherein αn+1 denotes a new expression feature, ci denotes an interpolation value parameter, 1≤i≤n, 0<ci<1, and c1+c2+ . . . +ci+ . . . +cn=1 are satisfied, αi denotes an expression feature of a reference expression image i, n denotes a number of reference expression images, and n is greater than or equal to 2.
The expression feature of each of the at least two reference facial expression images may be associated with an expression feature of a substantially identical type.
According to another general aspect, a facial expression image processing apparatus includes an expression feature acquirer configured to acquire an expression feature of each of at least two reference facial expression images; an expression feature interpolator configured to generate a new expression feature based on an interpolation value of the expression feature; and an expression image synthesizer configured to adjust a target facial expression image based on the new expression feature and create a new facial expression image.
The expression feature acquirer may be configured to analyze facial key point coordinates of the at least two reference facial expression images, acquire an expression fitting result by fitting the facial key point coordinates, and set the expression fitting result as the expression feature.
The apparatus may further include a target image acquirer configured to set one of the at least two reference facial expression images as the target facial expression image.
The expression feature interpolator may be configured to acquire an expression feature of the target facial expression image and generate the new expression feature based on the expression feature of the target facial expression image and the expression feature of each of the at least two reference facial expression images.
The expression feature interpolator may include: a parameter acquirer configured to acquire an interpolation value parameter of the expression feature; and an interpolator configured to acquire the new expression feature based on the interpolation value parameter.
The parameter acquirer may be configured to substantially randomly acquire the interpolation value parameter.
The interpolator may be configured to acquire the new expression feature based on the following equation:
αn+1=c1α1+c2α2+ . . . +ciαi+ . . . +cnαn
wherein αn+1 denotes a new expression feature, ci denotes an interpolation value parameter, 1≤i≤n, 0<ci<1, and c1+c2+ . . . +ci+ . . . +cn=1 are satisfied, and αi denotes an expression feature of a reference expression image i, n denotes a number of reference expression images, and n is greater than or equal to 2.
The expression feature of each of the at least two reference facial expression images may be associated with an expression feature of a substantially identical type.
A non-transitory computer readable storage medium may store instructions that, when executed by the processor, cause the processor to perform the method.
According to another general aspect, a processor implemented method of processing a facial expression image to train a facial recognition device, including: acquiring an expression feature of each of at least two reference facial expression images; generating synthesized images, the synthesized images respectively comprising at least a portion of either one or both of the two reference facial expression images and a new expression feature based on an interpolation value of the expression feature; training a neural network with the synthesized images to recognize facial expressions; recognizing a user's facial expression based on the trained neural network and an image of the user captured by a camera; and, interacting with the user based upon the recognized user's facial expression.
Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, the same reference numerals refer to the same or like elements. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order. Also, descriptions of features that are known in the art may be omitted for increased clarity and conciseness.
The features described herein may be embodied in different forms, and are not to be construed as being limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application. Hereinafter, reference will now be made in detail to examples with reference to the accompanying drawings, wherein like reference numerals refer to like elements throughout.
Various alterations and modifications may be made to the examples. Here, the examples are not construed as limited to the disclosure and should be understood to include all changes, equivalents, and replacements within the idea and the technical scope of the disclosure.
The terminology used herein is for the purpose of describing particular examples only and is not to be limiting of the examples. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “include/comprise” and/or “have” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, components, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, all terms including technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which examples belong and in view of the present disclosure. It will be further understood that terms, such as those defined in commonly-used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
When describing the examples with reference to the accompanying drawings, like reference numerals refer to like constituent elements and a repeated description related thereto will be omitted. When it is determined detailed description related to a related known function or configuration they may make the purpose of the examples unnecessarily ambiguous in describing the examples, the detailed description will be omitted here for clarity and conciseness.
The expression image synthesizing method may be classified, for example, into two types.
For example, a first type of the expression image synthesizing method may directly control an expression image and acquire a new expression image by either one or both of moving one or more pixel positions and/or changing one or more pixel luminance values of the expression image. The new expression image acquired based on the first type of the method may be significantly different from an image acquired by photographing a real expression.
A second type of the expression image synthesizing method may separately analyze an expression feature of an expression image, acquire a new expression feature based on the expression feature obtained through a correction analysis, and acquire a new expression image by fusing the new expression feature and an original expression image. The new expression image acquired based on the second type of the method may be similar to a real expression image.
An expression feature α of a human facial expression image may be analyzed, a new expression feature may be acquired by setting a control constant k and calculating an expression feature kα, and a new human facial expression image may be created by adjusting the human facial expression image.
In response to the control constant k being greater than 1, the expression feature kα may be associated with a stronger expression than the expression feature α. For example, a laughing expression is stronger than a smiling expression.
In response to the control constant k being greater than 0 and less than 1, the expression feature kα may be associated with a weaker expression than the expression feature α. In response to k corresponding to 0.5, an example of the new human facial expression image created based on the second type of the expression image synthesizing method is as illustrated in
Referring to
In operation 101, an image processing apparatus acquires an expression feature of each of at least two reference facial expression images. A reference facial expression image may be understood as a standard facial expression image such as a smile, frown, grimace, or other standard facial expression contained within an image. The expression features of the at least two reference facial expression images may be, according to embodiment, associated with a substantially identical type. In response to at least two reference facial expression images being associated with at least two types of expression features being used, unique facial expression images may be synthesized.
In operation 102, the image processing apparatus acquires a new expression feature based on an interpolation value of the expression feature of each of the at least two reference facial expression images.
In an example, an interpolation value parameter is heuristically acquired, such as by aggregate analysis, such as by back propagating, deep learning employing a neural network, or seeding a pseudorandom function according to embodiment, and a new expression feature is acquired or generated based on the interpolation value parameter. Even though reference facial expression images are substantially identical, different interpolation value parameters are substantially randomly (such as by pseudorandom generation) acquired whenever facial expression images are synthesized and thus, different facial expression images may be synthesized by acquiring different new expression features. An interpolation value parameter may indicate a value preset by a user, by neural network/deep learning, programmatically acquired, or generated based on heuristic analysis, according to embodiment.
In another example, a new expression feature is acquired based on an expression feature interpolation value of each of at least two reference facial expression images using Equation 1, for example.
αn+1=c1α1+c2α2+ . . . +ciαi+ . . . +cnαn [Equation 1]
Here, αn+1 denotes a new expression feature, ci denotes an interpolation value parameter, 1≤i≤n, 0<ci<1, and c1+c2+ . . . +ci+ . . . +cn=1 are satisfied, αi denotes an expression feature of the reference expression image i, n denotes a number of reference expression images, and n is greater than or equal to 2.
In still another example, one of at least two reference facial expression images is set as a target facial expression image. The target facial expression image is closely related to a new expression feature. The facial expression image acquired or generated using the target facial expression image may be significantly more natural and realistic based on objective features and subjective human/deep learning analyses.
At least two reference facial expression images may be independent of the target facial expression image, according to embodiment. For example, the target facial expression image may not include a smiling expression, and the at least two reference facial expression images may include a laughing expression. Here, a new expression feature of the laughing expression may be generated based on a value of expression feature difference between the at least two reference facial expression images, and a smiling expression may be included in the target facial expression image by adjusting the target facial expression image based on the new expression feature, according to embodiment.
Hereinafter, detailed description of an interpolation value and a new expression feature are further provided.
An expression feature of one target facial expression image and an expression feature of one reference facial expression image may be acquired. An expression feature of each of two reference facial expression images may be acquired, and one of the two reference facial expression images may be set as the target facial expression image. Also, a new expression feature may be acquired based on an interpolation value of the expression feature of the target facial expression image and the expression feature of the reference facial expression image using Equation 2, for example.
α3=cα1+(1−c)α2 [Equation 2]
In Equation 2, c denotes an interpolation value parameter, c is greater than 0 and less than 1, α3 denotes a new expression feature, α1 denotes an expression feature of a target facial expression image, and α2 denotes an expression feature of a reference facial expression image. Here, c may be a value, for example, 0.1, 0.26, 0.5, 0.72, 0.88, and 0.98.
When the facial expression images are synthesized, new, different (according to embodiment) expression features may be acquired by setting different interpolation value parameters, and the new different facial expression images may be synthesized even when the target facial expression image is identical to the reference facial expression image. A great number of facial expression images may be synthesized by setting the different interpolation value parameters.
An expression feature of one target facial expression image, an expression feature of a first reference facial expression image, and an expression feature of a second reference facial expression image may be acquired. That is, an expression feature of each of three reference facial expression images may be acquired, one of the three reference facial expression images may be set as the target facial expression image, and the other two reference facial expression images may be set as the first reference facial expression image and the second reference facial expression image. Also, a new expression feature may be acquired based on an interpolation value of the expression feature of the target facial expression image, the expression feature of the first reference facial expression image, and the expression feature of the second reference facial expression image using Equation 3, for example.
α4=c1α1+c2α2+c3α3 [Equation 3]
Here, c1, c2, and c3 denote interpolation parameters, c1+c2+c3=1, 0<c1<1, 0<c2<1, and 0<c3<1, are satisfied, α4 denotes a new expression feature, α1 denotes an expression feature of a target facial expression image, α2 denotes an expression feature of a first reference facial expression image, and α3 denotes an expression feature of a second reference facial expression image.
In operation 103, the image processing apparatus creates the new facial expression image by adjusting the target facial expression image based on the acquired new expression feature.
Although a number of expression features of reference facial expression images and interpolation value parameters are predetermined for ease of description, the foregoing is not limited thereto. The number of expression features of reference facial expression images and the interpolation value parameters may vary.
A more types of interpolation equations may be easily conceived and a more number of example embodiments may be generated based on an interpolation value equation by those skilled in the art by combining the descriptions of the present disclosure, the common sense, and the published technologies. When such example embodiments are about acquiring a new expression feature based on different expression feature interpolation values, the example embodiments all belong to the scope of protection.
Referring to
In operation 202, the image processing apparatus acquires an expression fitting result α1 by fitting the facial key point coordinates of the target facial expression image in a preset expression space, and set the expression fitting result α1 as an expression feature of the target facial expression image. Here, the operation of acquiring the expression fitting result α1 by fitting the facial key point coordinates of the target facial expression image in the preset expression space may include the following operations.
In operation 11, a preset angle θ is acquired.
In operation 12, f(θ)=IsBs+αeBe, is calculated. Here, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x1; θ)·bsi·cos<R(x1; θ), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, αe={ae1, ae2, . . . , aei, . . . , aen}, and aei=R(x1; θ)·bei·cos<R(x1; θ), bei> are satisfied.
In operation 13, ∥x1−f(θ)∥ is calculated and ∥x1−f(θ)∥=p1 is set.
In operation 14, θ−d*f′(θ) is defined as a new θ.
In operation 15, f(θ) IsBs+αeBe is calculated again using the new θ. Here, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x1; θ)·bsi·cos<R(x1; θ), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, αe={ae1, ae2, . . . , aei, . . . , aen}, and aei=R(x1; θ)·bei·cos<R(x1; θ), bei> are satisfied.
In operation 16, ∥x1−f(θ)∥ is calculated based on f(θ) calculated in operation 15.
In operation 17, it is verified whether ∥x1−f(θ)∥−p1≤g is satisfied.
Here, in response to ∥x1−f(θ)∥−p1≤g being satisfied, α1={a11, a12, . . . , a1i, . . . , a1n}=αe and αe={ae1, ae2, . . . , aei, . . . , aen} are satisfied, and θ indicates an angle value of rebuilt new coordinates.
In response to ∥x1−f(θ)∥−p1>g being satisfied, ∥x1−f(θ)∥=p1 is set to return to operation 14.
Here, θ denotes a facial posture parameter, f(θ) denotes a column vector function of facial key point coordinates calculated based on a fitting coefficient, f′(θ) denotes a derived function of the column vector function, f(θ), x1 denotes a column vector of the facial key point coordinates of the target facial expression image, Bs denotes a pre-trained facial identification main component analyzing model, Is denotes a fitting coefficient of a facial identification main component analyzing model, bsi denotes an i-th latitude element of the facial identification main component analyzing model, n denotes a model latitude degree, asi denotes an inner product of two vectors, R(x1; θ) denotes new coordinates obtained in response to the column vector x1 rotating based on the facial posture parameter θ, cos<R(x1; θ), bsi> denotes a cosine value of an angle between R(x1; θ) and bsi, Be denotes a pre-trained facial expression main component analyzing model, αe denotes a fitting coefficient of a facial expression main component analyzing model, bei denotes an i-th latitude element of the facial expression main component analyzing model, aei denotes an inner product of two vectors, cos<R(x1; θ), bei> denotes a cosine value of an angle between R(x1; θ) and bei, ∥x1−f(θ)∥ denotes a norm obtained by subtracting a column vector corresponding element of the column vector x1 and the column vector function f(θ), d denotes a preset control value, g denotes a preset value, and p1 denotes a norm corresponding to θ.
Each facial key point coordinates are associated with a three-dimensional (3D) column vector, and x1 may be a column vector of 3×68 rows. The preset value g is a value close to 0 and set to be, for example, 0.01, 0.001, or 0.0001, depending on an actual application.
An expression feature may be acquired through non-fitting that does not perform fitting on the facial key point coordinates, other than the above-described methods. Also, an expression feature of a facial expression image may be acquired based on other methods, for example, an image scanning method or other related technologies, without using the facial key point coordinates.
Operations 201 and 202 are example processes for acquiring the expression feature of the target facial expression image.
Operations 203 and 204 are processes for acquiring the expression feature of the reference facial expression image. Operations 201, 202, 203, and 204 may have no sequential relation.
In operation 203, the image processing apparatus analyzes the facial key point coordinates of the reference facial expression image to acquire the expression feature of the reference facial expression image.
In operation 204, the image processing apparatus acquires an expression fitting result α2 by fitting the facial key point coordinates of the reference facial expression image in a preset expression space, and set the expression fitting result α2 as the expression feature of the reference facial expression image. In an example, the facial key point coordinates of the reference facial expression image in a preset expression space is manually set in advance.
Here, an operation of acquiring the expression fitting result α2 by fitting the facial key point coordinates of the reference facial expression image in the preset expression space may include the following operations.
In operation 21, a preset value θ is acquired.
In operation 22, f(θ)=IsBs+αeBe is calculated. Here, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x2; θ)·bsi·cos<R(x2; θ), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, αe={ae1, ae2, . . . , aei, . . . , aen}, and aei=R(x2; θ)·bei·cos<R(x2; θ), bei> are satisfied.
In operation 23, min∥x2−f(θ)∥ is calculated and ∥x2−f(θ)∥=p2 is set.
In operation 24, θ−d*f′(θ) is defined as a new θ.
In operation 25, f(θ)=IsBs+αeBe is calculated again based on the new θ. Here, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x2; θ)·bsi·cos<R(x2; θ), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, αe={ae1, ae2, . . . , aei, . . . , aen}, and aei=R(x2; θ)·bei·cos<R(x2; θ), bei> are satisfied.
In operation 26, ∥x2−f(θ)∥ is calculated based on f(θ) calculated in operation 25.
In operation 27, it is verified whether ∥x2−f(θ)∥−p2≤g is satisfied.
Here, in response to ∥x2−f(θ)∥−p2≤g being satisfied, α2={a21, a22, . . . , a2i, . . . , a2n}=αe and, αe={ae1, ae2, . . . , aei, . . . aen} are satisfied.
In response to ∥x2−f(θ)∥−p2>g being satisfied, ∥x2−f(θ)∥=p2 is set to return to operation 24.
Here, θ denotes a facial posture parameter, f(θ) denotes a column vector function of facial key point coordinates calculated based on a fitting coefficient, f′(θ) denotes a derived function of the column vector function f(θ), x2 denotes a column vector of the facial key point coordinates of the target facial expression image, Bs denotes a pre-trained facial identification main component analyzing model, Is denotes a fitting coefficient of a facial identification main component analyzing model, bsi denotes an i-th latitude element of the facial identification main component analyzing model, n denotes a model latitude degree, asi denotes an inner product of two vectors, R(x2; θ) denotes new coordinates obtained in response to the column vector x2 rotating based on the facial posture parameter θ, cos<R(x2; θ), bsi> denotes a cosine value of an angle between R(x2; θ) and bsi, Be denotes a pre-trained facial expression main component analyzing model, αe denotes a fitting coefficient of a facial expression main component analyzing model, bei denotes an i-th latitude element of the facial expression main component analyzing model, aei notes an inner product of two vectors, cos<R(x2; θ), bei> denotes a cosine value of an angle between R(x2; θ) and bei, ∥x2−f(θ)∥ denotes a norm obtained by subtracting a column vector corresponding element of the column vector x2 and the column vector function f(θ), d denotes a preset control value, g denotes a preset value, and p2 denotes a norm corresponding to θ.
An expression feature may be acquired through non-fitting that does not perform fitting on the facial key point coordinates, other than the above-described methods. Also, an expression feature of a facial expression image may be acquired based on other methods, for example, an image scanning method or other related technologies, without using the facial key point coordinates.
Referring to
α3=cα1+(1−c)α2 [Equation 4]
In Equation 4, c denotes an interpolation value parameter, c is greater than 0 and less than 1, α3 denotes the new expression feature, α1 denotes an expression feature of a target facial expression image, and α2 denotes an expression feature of a reference facial expression image.
A preset expression space S includes seven types of expressions, for example, joy, a grief, anger, irritation, fear, and a neutral blank expression, though other types of expressions are also possible and contemplated herein. A space occupied by a portion of expression in the expression space S is defined as a target space of the types of expressions. For example, the expression features α1 and α2 may be associated with a substantially identical type of expression and a substantially identical essential space. Each of the points of the line connecting the expression features α1 and α2 may be present within a convex space including a minimum target space.
Referring to
Here, an image E2 is generated and acquired by extending the expression feature α1. However, the image E2 is distanced from the essential space associated with an expression.
An image E1 is generated and acquired based on the above-described method. In comparison with the image E2 acquired based on the related technology, the image E1 is closer to the target space within the convex space including the minimum target space, and an effect of a synthesized facial expression image may be relatively natural.
Referring to
New facial key point coordinates f′(θ) are as shown, for example, in Equation 5.
f(θ′)=IsBs+α3Be [Equation 5]
In Equation 5, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x1′; θ′)·bsi·cos<R(x1′; θ′), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, α3={a31, a32, . . . , a3i, . . . , a3n}, and a3i=R(x1′; θ′)·bei·cos<R(x1′; θ′), bei> are satisfied, and θ′ indicates an angle value of rebuilt new coordinates.
In operation 207, the image processing apparatus creates a new facial expression image by adjusting a pixel position of a target facial expression image based on a difference between new facial key point coordinates and facial key point coordinates of the target facial expression image.
The new facial expression image may be created by moving a pixel position corresponding to the facial key point coordinates of the target facial expression image to a pixel position corresponding to the new facial key point coordinates. A pixel movement is associated with the related technology and the related description is not provided further herein for clarity and conciseness.
When
A new facial expression image may be generated based on the method of processing the image by performing an performance test, such as, for example, the Cohn-Kanade (CK+) human facial expression database. The generated new facial expression image may be trained as training data for training an expression identification model in a back propagated neural network or deep learning machine. When compared with the related technology by identifying expressions using the expression identification model, an identified error rate is reduced by approximately 25% in an example.
Hereinafter, description of an image processing apparatus is further provided. The method embodiments and apparatus embodiments may belong to the same design, and are closely related to each other. Detailed descriptions not provided with reference to the apparatus embodiments may refer to the descriptions of the method embodiments provided above for clarity and conciseness purposes.
Referring to
The expression feature acquirer 310 acquires an expression feature each of at least two reference facial expression images.
The expression feature interpolator 320 acquires or generates a new expression feature based on an interpolation value of an expression feature of each of the at least two reference facial expression images.
The expression feature interpolator 320 includes a parameter acquirer 321 and an interpolator 322. Here, the parameter acquirer 321 acquires an interpolation parameter. For example, the parameter acquirer 321 randomly acquires the interpolation value parameter. The interpolator 322 may acquire a new expression feature based on the interpolation value of the expression feature of each of the at least two reference facial expression images based on the interpolation value parameter.
The expression image synthesizer 330 creates a new facial expression image by adjusting a target facial expression image based on the new expression feature. For example, the expression image synthesizer 330 acquires a new expression feature based on the interpolation and/or extrapolation value of each of the expression feature of each of at least two reference facial expression images. The expression image synthesizer 330 may acquire the new expression feature using Equation 6, for example.
αn+1=c1α1+c2α2+ . . . +ciαi+ . . . +cnαn [Equation 6]
In Equation 6, αn+1 denotes a new expression feature, ci denotes an interpolation value parameter, 1≤i≤n, 0<ci<1, and c1+c2+ . . . +ci+ . . . +cn=1 are satisfied, αi denotes an expression feature of a reference expression image i, and n denotes a number of reference expression images, and n is greater than or equal to 2. Here, an expression feature of each of at least two reference facial expression images is associated with an identical type of expression.
The target image acquirer 340 sets one of the at least two reference facial expression images as a target facial expression image.
The image processing apparatus 300 operates as follows.
The target image acquirer 340 analyzes facial key point coordinates of the target facial expression image, acquires an expression fitting result α1 by fitting the facial key point coordinates of the target facial expression image in a preset expression space, and sets the expression fitting result α1 as an expression feature of the target facial expression image. Also, the target image acquirer 340 may analyze facial key point coordinates of the reference facial expression image and acquire an expression fitting result α2 by fitting the facial key point coordinates of the reference facial expression image in the preset expression space, and set the expression fitting result α2 as the expression feature of the reference facial expression image.
Here, the target image acquirer 340 may acquire the expression fitting result α1 by fitting the face key point coordinates of the target facial expression image based on the following operations.
In operation 11, a preset angle θ is acquired.
In operation 12, f(θ)=IsBs+αeBe is calculated. Here, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x1; θ)·bsi·cos<R(x1; θ), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, αe={ae1, ae2, . . . , aei, . . . , aen}, and aei=R(x1; θ)·bei·cos<R(x1; θ), bei> are satisfied.
In operation 13, ∥x1−f(θ)∥ is calculated and ∥x1−f(θ)∥=p1 is set.
In operation 14, θ−d*f′(θ) is defined as a new θ.
In operation 15, f(θ)=IsBs+αeBe, is calculated. Here, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x1; θ)·bsi·cos<R(x1; θ), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, αe={ae1, ae2, . . . , aei, . . . , aen}, and aei=R(x1; θ)·bei·cos<R(x1; θ), bei> are satisfied.
In operation 16, ∥x1−f(θ)∥ is calculated.
In operation 17, it is verified whether ∥x1−f(θ)∥−p1≤g is satisfied. Here, in response to ∥x1−f(θ)∥−p1≤g being satisfied, α1={a11, a12, . . . , a1i, . . . , a1n}=αe and αe={ae1, ae2, . . . , aei, . . . , aen} are satisfied, and θ indicates an angle value of rebuilt new coordinates.
In response to ∥x1−f(θ)∥−p1>g being satisfied, ∥x1−f(θ)∥−p1 is set to return to operation 14.
Here, θ denotes a facial posture parameter, f(θ) denotes a column vector function of facial key point coordinates calculated based on a fitting coefficient, f′(θ) denotes a derived function of the column vector function f(θ), x1 denotes a column vector of the facial key point coordinates of the target facial expression image, Bs denotes a pre-trained facial identification main component analyzing model, Is denotes a fitting coefficient of a facial identification main component analyzing model, bsi denotes an i-th latitude element of the facial identification main component analyzing model, n denotes a model latitude degree, asi denotes an inner product of two vectors, R(x1; θ) denotes new coordinates obtained in response to the column vector x1 rotating based on the facial posture parameter θ, cos<R(x1; θ), bsi> denotes a cosine value of an angle between R(x1; θ) and bsi, Be denotes a pre-trained facial expression main component analyzing model, αe denotes a fitting coefficient of a facial expression main component analyzing model, bei denotes an i-th latitude element of the facial expression main component analyzing model, aei denotes an inner product of two vectors, cos<R(x1; θ), bei> denotes a cosine value of an angle between R(x1; θ) and bei, ∥x1−f(θ)∥ denotes a norm obtained by subtracting a column vector corresponding element of the column vector x1 and the column vector function f(θ), d denotes a preset control value, g denotes a preset value, and p1 denotes a norm corresponding to θ.
The expression fitting result α1 may be acquired by fitting the facial key point coordinates of the target facial expression image.
The target image acquirer 340 may acquire an expression fitting result α2 by fitting the facial key point coordinates of the reference facial expression image based on the following operations.
In operation 21, a preset value θ is acquired.
In operation 22, f(θ)=IsBs+αeBe is calculated. Here, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x2; θ)·bsi·cos<R(x2; θ), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, αe={ae1, ae2, . . . , aei, . . . , aen}, and aei=R(x2; θ)·bei·cos<R(x2; θ), bei> are satisfied.
In operation 23, min∥x2−f(θ)∥ is calculated and ∥x2−f(θ)∥=p2 is set.
In operation 24, θ−d*f′(θ) is defined as a new θ.
In operation 25, f(θ)=IsBs+αeBe is calculated. Here, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x2; θ)·bsi·cos<R(x2; θ), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, αe={ae1, ae2, . . . , aei, . . . , aen}, and aei=R(x2; θ)·bei·cos<R(x2; θ), bei> are satisfied.
In operation 26, f(θ)=IsBs+αeBe is calculated.
In operation 27, it is verified whether ∥x2−f(θ)∥−p2≤g is satisfied.
Here, in response to ∥x2−f(θ)∥−p2≤g being satisfied, α2={a21, a22, . . . , a2i, . . . , a2n}=αe and αe={ae1, ae2, . . . , aei, . . . , aen} are satisfied.
In response to ∥x2−f(θ)∥−p2>g being satisfied, ∥x2−f(θ)∥=p2 is set to return to operation 24.
Here, θ denotes a facial posture parameter, f(θ) denotes a column vector function of facial key point coordinates calculated based on a fitting coefficient, f′(θ) denotes a derived function of the column vector function f(θ), x2 denotes a column vector of the facial key point coordinates of the target facial expression image, Bs denotes a pre-trained facial identification main component analyzing model, Is denotes a fitting coefficient of a facial identification main component analyzing model, bsi denotes an i-th latitude element of the facial identification main component analyzing model, n denotes a model latitude degree, asi denotes an inner product of two vectors, R(x2; θ) denotes new coordinates obtained in response to the column vector x2 rotating based on the facial posture parameter θ, cos<R(x2; θ), bsi> denotes a cosine value of an angle between R(x2; θ) and bsi, Be denotes a pre-trained facial expression main component analyzing model, αe denotes a fitting coefficient of a facial expression main component analyzing model, bei denotes an i-th latitude element of the facial expression main component analyzing model, aei denotes an inner product of two vectors, cos<R(x2; θ), bei> denotes a cosine value of an angle between R(x2; θ) and bei, ∥x2−f(θ)∥ denotes a norm obtained by subtracting a column vector corresponding element of the column vector x2 and the column vector function f(θ), d denotes a preset control value, g denotes a preset value, and p2 denotes a norm corresponding to θ.
The expression feature interpolator 320 may acquire a new expression feature α3 based on an interpolation value of the expression feature of the target facial expression image and the expression feature of the reference facial expression image using Equation 7, for example.
α3=cα1+(1−c)α2 [Equation 7]
In Equation 7, c denotes an interpolation value parameter, 0<c<1 is satisfied, α3 denotes the new expression feature, α1 denotes the expression feature of the target facial expression image, and α2 denotes the expression feature of the reference facial expression image.
The expression image synthesizer 330 may rebuild new facial key point coordinates based on the new expression feature α3.
New facial key point coordinates are as shown in Equation 8, for example.
f(θ′)=IsBs+α3Be [Equation 8]
In Equation 8, Bs={bs1, bs2, . . . , bsi, . . . , bsn}, Is={as1, as2, . . . , asi, . . . , asn}, asi=R(x1′; θ′)·bsi·cos<R(x1′; θ′), bsi>, Be={be1, be2, . . . , bei, . . . , ben}, α3={a31, a32, . . . , a3i, . . . , a3n}, and a3i=R(x1′; θ′)·bei·cos<R(x1′; θ′), bei> are satisfied, and θ′ indicates an angle value of rebuilt new coordinates.
A new facial expression image may be created by adjusting a pixel position of the target facial expression image based on a difference between the new facial key point coordinates and the facial key point coordinates of the target facial expression image.
Referring to
Referring to
Referring to
The computing apparatus 1200 includes a processor 1310, a memory 1320, a camera 1330, a storage device 1340, an input device 1350, an output device 1360, and a network interface 1370. The processor 1310, the memory 1320, the camera 1230, the storage device 1340, the input device 1350, the output device 1360, and the network interface 1370 may communicate with one another through a communication bus 1380.
The camera 1330 captures a still image, a video image, or both. The processor 1225 may control the camera 1230 to obtain or capture an image, e.g., including a face region, of a user by capturing an image for the face region of the user attempting at the facial verification, or may control the camera 1330 to autonomously capture images and automatically verify a user, for example, without user initiation. In addition, as noted above, the camera 1330 may also be controlled by the processor 1310 during other functions of the computing apparatus 1300, such as when operated as a personal camera. The camera 1330 may be representative of plural cameras, such as a color image/video camera and may further include a depth or infrared camera or time of flight (TOF) module, as only non-limiting examples.
The processor 1310 may implement functions and instructions to operate in the computing apparatus 1300 as described herein. For example, the processor 1325 may execute instructions stored in the memory 1220 or the storage device 1240. The processor 1310 may be the same one or more or all above discussed processors, as described above. The processor 1310 is configured to perform one or more, any combination, or all operations described with reference to
The memory 1320 is a non-transitory computer readable media or device that stores information to be used for the facial verification. The memory 1320 may be the same one or more memories otherwise discussed herein, though examples are not limited thereto. The memory 1320 includes a computer-readable storage medium or a computer-readable storage device. In addition, memory 1320 is further representative of multiple such types of memory. The memory 1320 includes, for example, a RAM, a dynamic RAM (DRAM), a static RAM (SRAM), and other types of a nonvolatile memory well-known to the technical field to which the present disclosure pertains. The memory 1320 stores instructions to be implemented or executed by the processor 1310, and stores related information during software or an application being performed by the computing apparatus 1300.
The storage device 1340 includes a computer-readable storage medium or a computer-readable storage device. The storage device 1340 stores a database (DB) or matrix including registered features or registered images. In one example, the storage device 1340 stores a greater quantity of information compared to the memory 1320, and stores information for a relatively longer period of time. The storage device 1240 includes, for example, a magnetic disk drive, an optical disc, a redundant array of disks (RAID), a network attached storage (NAS), a flash memory, an erasable programmable read-only memory (EPROM), a floppy disk, or other types of nonvolatile memories well-known in the technical field to which the present disclosure pertains.
The input device 1350 receives an input from the user through a tactile, video, audio, or touch input. The input device 1350 includes one or more of, for example, a keyboard, a mouse, a touchscreen, a microphone, and other devices configured to detect the input from the user and transmit the detected input to the computing apparatus 1300.
The output device 1360 provides the user with an output of the computing apparatus 1300 through a visual, auditory, or tactile channel. For example, the output device 1360 visualizes information related to the facial verification and provides the user with the visualized information. For example, the visualized information may indicate whether the facial recognition was successful, or may enable access to further functions of the computing apparatus 1300 demonstrated through the visualized information. The output device 1360 includes one or more of, for example, a liquid crystal display (LCD), a light-emitting diode (LED) display, a touchscreen, a speaker, a vibration generator, and other devices configured to provide the output to the user. In one example, during the facial registration or verification respectively being performed, the computing apparatus 1300 displays or visually feeds backs the currently captured face image, or a preview image, obtained by the camera 1330 on a display screen to be viewed by the user, or another guide, or does not display the example face image, preview image, or other guide. The example face image or preview image may provide such visual feedback or encouragement to guide the user to provide the computing apparatus 1300 a full face image. The visual feedback may be a display of the currently captured image frames from which the input image is to be selectively captured for the facial recognition, or may provide the user with illustrated guidelines or overlays for a desirable or preferred positioning and size of the to-be captured image for facial recognition. Such face image or preview image may also not be provided, such as in the example where such facial verification is automatically performed, e.g., without user initiation.
The network interface 1370 communicates with an external device through a wired and/or wireless network. The network interface 1370 includes one or more of, for example, an Ethernet card, optical transceiver, a radio frequency transceiver, and other network interface cards configured to transmit and receive information. The network interface 1370 wirelessly communicates with the external device using a communication method, such as, for example, Bluetooth, WiFi, or a third generation (3G), fourth generation (4G), or fifth generation (5G) communication method. The network interface 1370 may further include a near field transceiver or the like. For example, through control of the processor 1310 upon verification of a user, the near field transceiver may transmit a payment authorization to an external terminal, such as with an appropriate mobile payment instruction transmitted by the near field transceiver. In addition, the processor 1310 may control the network interface 1370 to routinely check for updates for the registration and/or verification trained recognizer (or respective trained recognizer and verification portions) neural network(s), or other machine learning models, for example, and request, receive, and store parameters or coefficients of the same in the memory 1320 for use in the interpolation, recognition, and/or verification operations herein. For example, when the feature recognizers or extractors and face verifying is implemented though the above example trained neural network feature recognizers or extractors and/or trained face verifying networks, the processor 1310 may request, receive, and store updated trained weighting matrices for any or all of the recognizers or extractors and/or the face verifying neural network portions. In addition, updated hyper-parameters that can control or alter the configuration or architecture of such neural network(s) may also be requested, received, and stored along with corresponding trained weighting matrices in any of the memory 1320 or storage device 1340.
The facial expression image processing apparatus and components thereof, such as the expression feature acquirer 310, expression feature interpolator 320, expression image synthesizer 330, target image acquirer 340, parameter acquirer 321, interpolator 322 in
The methods illustrated in
Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software includes higher-level code that is executed by the one or more processors or computer using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions in the specification, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of a non-transitory computer-readable storage medium include read-only memory (ROM), random-access memory (RAM), flash memory, CD-ROMs, CD-Rs, CD+Rs, CD-RWs, CD+RWs, DVD-ROMs, DVD-Rs, DVD+Rs, DVD-RWs, DVD+RWs, DVD-RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
While this disclosure includes specific examples, it will be apparent after an understanding of the disclosure of this application that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents. Therefore, the scope of the disclosure is defined not by the detailed description, but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
201611248661.1 | Dec 2016 | CN | national |
10-2017-0084930 | Jul 2017 | KR | national |
This application is a continuation of U.S. patent application Ser. No. 15/837,877 filed on Dec. 11, 2017, which claims the benefit under 35 U.S.C 119(a) of Chinese Patent Application No. 201611248661.1 filed on Dec. 29, 2016, in the State Intellectual Property Office of the People's Republic of China, and Korean Patent Application No. 10-2017-0084930 filed on Jul. 4, 2017, in the Korean Intellectual Property Office, the entire disclosures of which are all incorporated herein by reference and for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
6600830 | Lin et al. | Jul 2003 | B1 |
6919892 | Cheiky et al. | Jul 2005 | B1 |
7027054 | Cheiky et al. | Apr 2006 | B1 |
8902232 | Debevec et al. | Dec 2014 | B2 |
9008416 | Movellan et al. | Apr 2015 | B2 |
9036018 | Wang et al. | May 2015 | B2 |
9207755 | Byrnes | Dec 2015 | B2 |
10417533 | Barker et al. | Sep 2019 | B2 |
10710457 | Thurimella | Jul 2020 | B2 |
20050078124 | Liu et al. | Apr 2005 | A1 |
20060061598 | Mino et al. | Mar 2006 | A1 |
20100007665 | Smith et al. | Jan 2010 | A1 |
20100134487 | Lai et al. | Jun 2010 | A1 |
20110038547 | Hill | Feb 2011 | A1 |
20130002669 | Rhee et al. | Jan 2013 | A1 |
20130147788 | Weise et al. | Jun 2013 | A1 |
20130215113 | Corazza et al. | Aug 2013 | A1 |
20130243281 | Nakamura et al. | Sep 2013 | A1 |
20130300900 | Pfister et al. | Nov 2013 | A1 |
20140063236 | Shreve | Mar 2014 | A1 |
20140355843 | Da et al. | Dec 2014 | A1 |
20150242678 | Lee | Aug 2015 | A1 |
20150262000 | Huo et al. | Sep 2015 | A1 |
20150269431 | Haji et al. | Sep 2015 | A1 |
20150302240 | Rao et al. | Oct 2015 | A1 |
20150363634 | Yin et al. | Dec 2015 | A1 |
20160011657 | Estacio | Jan 2016 | A1 |
20160127641 | Gove | May 2016 | A1 |
20160275341 | Li et al. | Sep 2016 | A1 |
20160314784 | Kleppe et al. | Oct 2016 | A1 |
20170083757 | Enomoto et al. | Mar 2017 | A1 |
20170116467 | Li et al. | Apr 2017 | A1 |
20170116478 | Gousev et al. | Apr 2017 | A1 |
20170228867 | Baruch | Aug 2017 | A1 |
20180003839 | Lowell et al. | Jan 2018 | A1 |
20180165863 | Kubo | Jun 2018 | A1 |
20180173942 | Kim | Jun 2018 | A1 |
20190171869 | Fortune et al. | Jun 2019 | A1 |
20190251336 | Wu | Aug 2019 | A1 |
20200110925 | Wang et al. | Apr 2020 | A1 |
20220075996 | Jian | Mar 2022 | A1 |
20220398797 | Phan | Dec 2022 | A1 |
Number | Date | Country |
---|---|---|
103268623 | Aug 2013 | CN |
105960657 | Sep 2016 | CN |
106056650 | Oct 2016 | CN |
3951061 | Aug 2007 | JP |
10-0407111 | Nov 2003 | KR |
10-2007-0061252 | Jun 2007 | KR |
10-0860989 | Sep 2008 | KR |
10-0935482 | Jan 2010 | KR |
10-0974293 | Aug 2010 | KR |
10-1198322 | Nov 2012 | KR |
10-2015-0140644 | Dec 2015 | KR |
Entry |
---|
Abboud, B. “Facial expression recognition and synthesis based on an appearance model” Signal Processing: Image Communication 19 Mar. 10, 2004, pp. 1-18 (723-740). |
Korean Office Action dated Nov. 30, 2021 in corresponding Korean Patent Application No. 10-2017-0084930 (3 pages in English and 5 pages in Korean). |
Ghent, John, et al. “Photo-Realistic Facial Expression Synthesis.” Image and Vision Computing 23.12 (2005): 1041-1050. (10 pages, in English). |
Jang, Yong-Suk, et al. “3D Face Modeling Based on 3D Morphable Shape Model.” The Journal of the Korea Contents Association 8.1 (2008): 212-227. (In Korean with English abstract). |
Ramirez-Valdez, Leonel, et al. “3D-Facial Expression Synthesis and its Application to Face Recognition Systems.” JART 7.3 (2009). (17 pages, in English). |
Kattan et al. “Training Feed-Forward Neural Networks using a Parallel Genetic Algorithm with the Best Must Survive Strategy” (2010) International Conference on Intelligent Systems, Modelling and Simulation—pp. 1-4. |
Tsai, Yihjia, et al. “Facial Expression Synthesis Based on Imitation.” International Journal of Advanced Robotic Systems 9.4 (2012). (6 pages, in English). |
Xudong, Li et al., “Facial Expression Synthesis from Multiple Facial Expressions,” Journal of Computer-Aided Design & Computer Graphics, vol. 17, No. 1, Jan. 2005, pp. 93-98. |
Xiaohui, Wang, et al., “Expression Detail Synthesis Based on Wavelet-Based Image Fusion,” Journal of Computer Research and Development, 2013, pp. 387-393. |
Chinese Office Action dated Nov. 9, 2022, in counterpart Chinese Patent Application No. 201611248661.1 (7 Pages in English, 16 Pages in Chinese). |
Number | Date | Country | |
---|---|---|---|
20210089760 A1 | Mar 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15837877 | Dec 2017 | US |
Child | 17109762 | US |