Not applicable.
Applying a facial mask to a face in a video is very different from applying a facial mask to a face in an image. In particular, an image is stationary; therefore, once the image and the facial mask are properly aligned with one another the facial mask can be placed in front of the face (or blended with the face to make the mask semi-transparent). However, in a video the face and many facial elements are in motion, making the addition of a facial mask a particularly difficult problem (background, for example, is easy to replace since it can be filmed as a stationary element such as a green screen). For example, the placement of the face, expressions on the face, the angle of the face relative to the camera, etc. may all change from frame to frame. Even if the user tries to hold still there are movements that are impossible to suppress for long periods of time, such as blinking or iris movement.
Because of this difficulty facial masks are generally applied on a frame by frame basis. Further, it is difficult to fully automate this process. For example, in one frame the whole facial mask may move during an automated process relative to the face because of movements of part of the face, such as a jaw when the actor is talking. Therefore, this process is accomplished by hiring a specialist that adds the facial mask to each frame. This allows them to control elements such as those discussed above that would cause the mask to move and would cause a disorienting effect.
Since these changes are made in a very time intensive process it is impossible to make the changes in real-time. I.e., the process can sometimes be finished weeks or months after the video is actually recorded.
Accordingly, there is a need in the art for a method that can apply facial masks to a face in real-time. Further, there is a need for the method to apply the facial mask in a manner that allows the facial mask to respond to facial movements of the user.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential characteristics of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
One example embodiment includes a method for applying facial masks to faces in live video. The method includes receiving an image containing a face from a user, wherein the image is a frame of a video and identifying the coordinates of a face in the image. The method also includes identifying the coordinates of facial elements within the face previously identified and synchronizing a bitmap add-on, wherein synchronizing the bitmap add-on includes aligning the bitmap add-on with the identified facial elements. The method further includes applying the bitmap add-ons over the frame of the identified face.
Another example embodiment includes a method for applying facial masks to faces in live video. The method includes receiving an image containing a face from a user, wherein the image is a frame of a video and identifying the coordinates of a face in the image. The method also includes identifying the coordinates of facial elements within the face previously identified and synchronizing a bitmap add-on, wherein synchronizing the bitmap add-on includes aligning the bitmap add-on with the identified facial elements. The method for synchronizing the bitmap add-on includes smoothing facial element coordinates in the current frame based on previous frames, warping the face in the image and warping the bitmap add-on. The method further includes applying the bitmap add-ons over the frame of the identified face.
Another example embodiment includes a method for applying facial masks to faces in live video. The method includes receiving an image containing a face from a user, wherein the image is a frame of a video and identifying the coordinates of a face in the image. The method also includes identifying the coordinates of facial elements within the face previously identified and training a detector, wherein training a detector allows for synchronization of a bitmap add-on. The method also includes synchronizing a bitmap add-on, wherein synchronizing the bitmap add-on includes aligning the bitmap add-on with the identified facial elements. The method for synchronizing the bitmap add-on includes smoothing facial element coordinates in the current frame based on previous frames, warping the face in the image and warping the bitmap add-on. The method additionally includes applying the bitmap add-ons over the frame of the identified face.
These and other objects and features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
To further clarify various aspects of some example embodiments of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It is appreciated that these drawings depict only illustrated embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Reference will now be made to the figures wherein like structures will be provided with like reference designations. It is understood that the figures are diagrammatic and schematic representations of some embodiments of the invention, and are not limiting of the present invention, nor are they necessarily drawn to scale.
For example, we can have a face of a character and put it over a user's face, so that its facial elements (e.g., eyes, mouth, nose, eyebrows, etc.) are aligned with the users' facial elements. Additionally or alternatively, we can place features like wrinkles or facial paint over the user's face. In such case the character's face or wrinkles which are placed over the user's face is called a “bitmap add-on”. One of skill in the art will appreciate that the video may include more than one face and that the method 100 is applicable to any number of faces within the video without restriction. For example, a first character's face can be placed over the face of a first user and a second character's face can be placed over the face of a second user.
Alternatively, the coordinates of facial elements can be identified 106 using a “successive steps” method to identify the coordinates of facial elements. One example of this method is disclosed below.
One of skill in the art will appreciate that synchronizing 108 the bitmap add-on can be done in a number of ways. For example, there is third-party software which is capable of producing the desired result. Therefore, one or more software package can be used with the results compared to determine which solution creates the best effect. Additionally or alternatively, a method of synchronizing 108 the bitmap add-on which may be used is disclosed below.
One skilled in the art will appreciate that, for this and other processes and methods disclosed herein, the functions performed in the processes and methods may be implemented in differing order. Furthermore, the outlined steps and operations are only provided as examples, and some of the steps and operations may be optional, combined into fewer steps and operations, or expanded into additional steps and operations without detracting from the essence of the disclosed embodiments.
Within the present invention, for each bk there are M such multi-step choosing procedures, each yielding a number H. Thus, within the present invention bk=(Hk,0, . . . , Hk,M−1). Each multi-step choosing procedure includes choose junctions Jkij, i=[0,M−1], j=[0, Nki−1], one of them labeled as the initial. A choose junction contains a set of parameters. It also may contain the “main link” and the “auxiliary link” to other choose junctions. Let us define Cγ(I,x,y)=μI(x+x1,y+y1)+λI(x+x2,y+y2), where γ=(x1,y1,x2,y2,μ,λ,ϕ)—the parameters of a choose junction with (x1,y1),(x2,y2) being the displacements, I is the image with I(x,y) being the pixel value at (x,y), λ,μ∈R (which, for example, could take values of 1 or −1). Thus, each choose junction Jkij is associated with its parameters γkij. To get the output of a multi-level choosing procedure at the coordinates (x,y) of an image, Cγ(I,x,y) is repeatedly evaluated starting from the initial choose junction, proceeding to main link if Cγ(I,x,y)<ϕ, and to the auxiliary otherwise, until a choose junction that does not contain any links is reached. Then such choose junction stores an integer number Hkij which is the output of a multi-step choosing procedure. For example, if there are N*ki choose junctions that do not contain any links among Jki, one may enumerate them from 0 to N*ki−1, assigning each Hkij with a respective number [0; N*ki−1].
Then v will contain the dot product <DS,k,E>.
The displacements (x1,y1),(x2,y2) could undergo a coordinate transformation each time we call a multi-step choosing procedure for some image. Let us have a transformation F(a,b) that transforms (for example, by an affine transform, which may include rotation, scaling, shifting and skewing) a grid b so it became close to a (for example, in the least squares sense, minimizing Σ|ai−bi|2, or aligning the coordinates of the pupils of a and b instead). For example, the transformation could be represented as (dx,dy,s,α) which are shift by the X and Y coordinates, scaling and rotation angle respectively.
Then, for example, if the procedure is called on an image I with a grid fk, we can compute a transformation for the grid fk to the mean grid, F(Mean, fk), receiving the (dx,dy,s,α) representation of the transformation, and then apply this transformation to the displacements before calculating the output of a multi-step choosing procedure.
One of skill in the art will appreciate that to get a more precise result, one may run the method 500 several times. In particular, each time the method 500 is run the values within finitial may be displaced by small values (dependent on the size of the facial region), and then the final result can be an average or median of the results for each coordinate of each facial feature.
There are multiple methods for smoothing 602 which can be used to accomplish the desired result. By way of example, one method is illustrated herein. Assume that there is a sequence of grids ft in time, fkt∈R2, k∈0, . . . , N−1, t=T, T−1, . . . , with t=T being the latest grid. That is, we store a history of the detections of facial element coordinates at some number of previous frames (one of skill in the art will appreciate that the number may be limited such that the frames being used are only the most recent relevant frames), with the latest frame being the frame to be smoothed. Let's define C(f)∈2 that gives the center of a grid, averaging every fk coordinate, for example
One may want to exclude the upper eyelid from this averaging, since people usually blink from time to time, which causes the upper eyelid to move, which itself causes the center to go down and up on the blink. One may want additionally exclude the coordinates of the iris, since it also moves. For example, within the given configuration of facial elements, one may exclude the points 28, 35, 36, 32, 39, 40, 23, 26, 37, 38, 27, 31, 41, 42, 0, 1, 29, 30, 33, 34 of
Then, to smooth 602 the coordinates of the facial elements, the coordinates of the facial center are subtracted from the grid on each frame, and a smoothing filter is applied. For example, a temporal Gaussian filter, or a bilateral Gaussian filter (both spatial and temporal), may be applied. After that, the coordinates of the center of the current frame are added. That is, S(ft)=S(ft−C(ft))+C(fT),t=T,T−1, . . . ,T−M, where M is the amount of previous frames that we store, and S is the smoothing function that works separately at x and y coordinates:
where σ0, σ1 are the spatial and temporal smoothing coefficients, and σ0 could be proportional to some overall face size (for example, its interocular distance).
Alternatively, one may also smooth the center coordinates first, to further reduce the oscillation, so the final result is:
S(ft)=S(ft−S(C(ft)))+S(C(fT))
Give the sets of facial element coordinates fk and its corresponding coordinates gk, |f|=|g|, one partitions these points to triangles such that each point is a vertex of some triangle, and no triangle contains points inside it. Usually it is best to do the partition by hand, since it needs to be done only once for a particular configuration of the detected facial elements, choosing the partition to give the most pleasurable effect. Alternatively, one may use automated methods like Delaunay triangulation or a greedy algorithm. As the result of such partition, one gets a set of triads (p,q,r) which define a triangle (fp,fq,fr) or (gp,gq,gr). Then one gets a triangle (fp,fq,fr) and its contents at the source image, and redraws the content of this triangle at the destination image at the coordinates (qp,qq,qr), transforming such content accordingly. There are standard procedures in modern 3D frameworks like OpenGL, OpenGL ES or Direct3D on mobile phones and desktops that allow performing that. Alternatively, one may code this procedure manually from geometric relations between the (fp,fq,fr) and (gp,gq,gr) coordinates, or use any other known method for triangle transformation.
To further improve the aesthetics of the transformation, it may be advised to add more points to fk and gk. For example, one may add 4 points at the corners of the image and some additional points at the sides. One also may add 4 or more points around the face (with the coordinates of such points based on the size and the position of the face). Further, one may add more points in between facial features, by averaging the coordinates of some points, or shifting some distance at some angle from certain points, where such distance is proportional to the overall face size (for example, its interocular distance, or the distance between the eyes and the mouth), and the angle is related to the angle by which the face is currently rotated. For example one may add more points on the cheeks (by averaging the coordinates of the points 52, 50, 45, 23 in
A set of possible parameters Γ={γ} for a choose junction is defined (where γ=(x1,y1,x2,y2,μ,λ,ϕ), as defined above, is the parameters of a choose junction). We could choose μ,λ at random, or set them to (−1,1). ϕ could be chosen at random from some appropriate interval (for example, [−1;1], if the pixel values of I are in the range [0;1]), or from the even partition of the [−1; 1] range into a number of intervals (for example, 100 intervals). The displacement parameters (x1y1),(x2,y2) can be chosen, for example, at random from (−Vmax,Vmax) which is an interval of some appropriate size. For example, the interval could be about 0.5 of the interocular distance of the face. We can also decrease this interval as S increases (starting at S=1 or later, for example, at S=3). We can also choose the displacement parameters to be evenly distributed across some particular grid covering the same interval. The number of possible parameters |Γ| could be of the order of 200000, but this number could be more or less than that.
Then, partition the set of specimens Ω into the main and auxiliary subsets by each γ∈Γ:
Ωmain(γ)={(j,q,k,v)|Cγ(Ij,Gjqk)<ϕ}
Ωauxiliary(γ)=Ω, Ωmain(γ)
Compute the value of γ giving the smallest standard deviation σ of v in both sets of specimens (here we denote v(Ω)={v|(j,q,k,v)∈Ω)}):
γ*=argminγ(σv(Ωmain)+σv(Ωauxiliary))
This defines the corresponding choose junction having the parameters γ*. If the corresponding sum of standard deviations is sufficient, and the current count of choose junctions is below a certain maximum, then link the choose junction with its main and auxiliary choose junctions, and repeat the described calculation procedure for the main link (with the subset Ωmain(γ*)) and for the auxiliary link (with Ωauxiliary(γ*)) until the mentioned condition no longer holds. This finishes the calculation procedure that yields the set of choose junctions Jkij, their links and their parameters γkij for any given facial element k.
The result is DS,k=(DS,kx,DS,ky) and D*S,k=(D*S,kx,D*S,ky) which could be (0, 0).
When finding 718 the solution of the minimization problem, the minimization problem could be solved as a linear regression problem or with methods like support vector machines or neural networks. When solving it as a linear regression problem, one may need to add a regularization term λ to the minimized function. Such term could be calculated as 2zN|E|, where one could find an optimal value of z by trying different real numbers from some set and stopping at number which gives the best accuracy. Alternatively, one may assign z with a fixed value like 3.6. One may solve the linear regression problem with a gradient descent method or calculate the closed-form solution.
One of skill in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, mobile phones, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination of hardwired or wireless links) through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
With reference to
The computer 820 may also include a magnetic hard disk drive 827 for reading from and writing to a magnetic hard disk 839, a magnetic disk drive 828 for reading from or writing to a removable magnetic disk 829, and an optical disc drive 830 for reading from or writing to removable optical disc 831 such as a CD-ROM or other optical media. The magnetic hard disk drive 827, magnetic disk drive 828, and optical disc drive 830 are connected to the system bus 823 by a hard disk drive interface 832, a magnetic disk drive-interface 833, and an optical drive interface 834, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer-executable instructions, data structures, program modules and other data for the computer 820. Although the exemplary environment described herein employs a magnetic hard disk 839, a removable magnetic disk 829 and a removable optical disc 831, other types of computer readable media for storing data can be used, including magnetic cassettes, flash memory cards, digital versatile discs, Bernoulli cartridges, RAMs, ROMs, and the like.
Program code means comprising one or more program modules may be stored on the hard disk 839, magnetic disk 829, optical disc 831, ROM 824 or RAM 825, including an operating system 835, one or more application programs 836, other program modules 837, and program data 838. A user may enter commands and information into the computer 820 through keyboard 840, pointing device 842, or other input devices (not shown), such as a microphone, joy stick, game pad, satellite dish, scanner, motion detectors or the like. These and other input devices are often connected to the processing unit 821 through a serial port interface 846 coupled to system bus 823. Alternatively, the input devices may be connected by other interfaces, such as a parallel port, a game port or a universal serial bus (USB). A monitor 847 or another display device is also connected to system bus 823 via an interface, such as video adapter 848. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
The computer 820 may operate in a networked environment using logical connections to one or more remote computers, such as remote computers 849a and 849b. Remote computers 849a and 849b may each be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically include many or all of the elements described above relative to the computer 820, although only memory storage devices 850a and 850b and their associated application programs 836a and 836b have been illustrated in
When used in a LAN networking environment, the computer 820 can be connected to the local network 851 through a network interface or adapter 853. When used in a WAN networking environment, the computer 820 may include a modem 854, a wireless link, or other means for establishing communications over the wide area network 852, such as the Internet. The modem 854, which may be internal or external, is connected to the system bus 823 via the serial port interface 846. In a networked environment, program modules depicted relative to the computer 820, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing communications over wide area network 852 may be used.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Number | Name | Date | Kind |
---|---|---|---|
20070189627 | Cohen | Aug 2007 | A1 |
20090202114 | Morin | Aug 2009 | A1 |
20130315554 | Ybanez Zepeda | Nov 2013 | A1 |
20150205997 | Ma | Jul 2015 | A1 |
20150286858 | Shaburov | Oct 2015 | A1 |
20160241884 | Messing | Aug 2016 | A1 |
20170278302 | Varanasi | Sep 2017 | A1 |
Entry |
---|
Gudmandsen, Magnus. “Using a robot head with a 3D face mask as a communication medium for telepresence.” (2015). |
Number | Date | Country | |
---|---|---|---|
20180075665 A1 | Mar 2018 | US |