A method and apparatus are disclosed for automatically generating a three-dimensional avatar from an image of a face in a digital photo.
The prior art includes various approaches for performing facial analysis of digital photos of human faces. For example, researchers at Carnegie-Mellon University generated the CMU Multi-PIE dataset, which contains a hundreds of images of human faces in a variety of lighting conditions with groundtruth landmark annotations. The annotations in the CMU Multi-PIE dataset indicate the location of certain facial characteristics, such as eyebrow position within a facial image.
The prior art also includes computer-generated avatars. An avatar is a graphical representation of a user. Avatars sometimes are designed to be an accurate and realistic representation of the user, and sometimes they are designed to look like a character that does not resemble the user. Applicant is a pioneer is in the area of avatar generation in virtual reality (VR) applications. In these applications, a user can generate an avatar and then interact with a virtual world, including with avatars operated by other users, by directly controlling the avatar.
What is needed is a mechanism for automatically generating an avatar based on a face contained in a digital photo.
A method and apparatus are disclosed for generating an avatar from an image of a face using an avatar generation engine executed by a processing unit of a computing device. The avatar generation engine receives the image, identifies a face in the image, crops a face in the image to generate a cropped face image, determines an ethnicity and a gender based on the cropped face image, detects facial landmarks in the cropped face image, selects a base facial rig from a set of stored facial rigs based on ethnicity and gender, alters the base facial rig based on the facial landmarks to generate a customized facial rig, and adds facial attributes to the customized facial rig based on the facial characteristics to generate the avatar.
Processing unit 301 optionally comprises a microprocessor with one or more processing cores that can execute instructions. Memory 302 optionally comprises DRAM or SRAM volatile memory. Non-volatile storage 303 optionally comprises a hard disk drive or flash memory array. Positioning unit 304 optionally comprises a GPS unit or GNSS unit that communicates with GPS or GNSS satellites to determine latitude and longitude coordinates for client device 300, usually output as latitude data and longitude data. Network interface 305 optionally comprises a wired interface (e.g., Ethernet interface) and/or a wireless interface (e.g., an interface that communicates using the 3G, 4G, 5G, GSM, or 802.11 standards or the wireless protocol known by the trademark BLUETOOTH, etc.). Image capture unit 306 optionally comprises one or more standard cameras (as is currently found on most smartphones and notebook computers). Graphics processing unit 307 optionally comprises a controller or processor for generating graphics for display. Display 308 displays the graphics generated by graphics processing unit 307 and optionally comprises a monitor, touchscreen, or other type of display.
Client application 403 comprises lines of software code executed by processing unit 301 and/or graphics processing unit 307 to perform the functions described below. For example, client device 300 can be a smartphone sold with the trademark “GALAXY” by Samsung or “IPHONE” by Apple, and client application 403 can be a downloadable app installed on the smartphone. Client device 300 also can be a notebook computer, desktop computer, game system, or other computing device, and client application 403 can be a software application running on client device 300. Client application 403 forms an important component of the inventive aspect of the embodiments described herein, and client application 403 is not known in the prior art.
With reference to
Server 500 is a computing device, and it includes the same or similar hardware components as those shown in
Avatar generation engine 600 comprises facial detection and normalization module 601, facial landmark extraction module 602, facial characteristics identification module 603, rig selection and modification module 604, and mesh selection module 605. Facial detection and normalization module 601, facial landmark extraction module 602, facial characteristics identification module 603, rig selection and modification module 604, and mesh selection module 605 each comprises lines of software code executed by processing unit 301 and/or graphics processing unit 307 in client device 300 and/or server 500 to perform the functions described below
With reference to
Facial Detection and Normalization Module 601 identifies a face 752 (shown in
Facial Detection and Normalization Module 601 detects head pose 754 from cropped face image 753. If head pose 754 is upright and looking at camera, the method proceeds (step 703). If not, another image is requested and steps 701-703 are repeated with a new image.
Facial Detection and Normalization Module 601 detects eye openness 755 (which can be open or closed), mouth openness 756 (which can be open or closed), and emotion 757 (which can include neutral, happy, angry, and other detectable emotions) from cropped faced image 753 (step 704). If eye openness 755 is open, mouth openness 756 is closed, and emotion 757 is neutral, the method proceeds. If not, another image is requested and steps 701-704 are repeated with a new image.
Facial Landmark Extraction Module 602 detects facial landmarks 760 (shown in
Facial Characteristics Identification Module 603 detects ethnicity 758 and gender 759 based on cropped face image 753 and optionally stores ethnicity 758 and gender 759 as data within object 711 (step 706). Facial Characteristics Identification Module 603 optionally utilizes an artificial intelligence engine. Ethnicity 758 can comprise one or more of African, South Asian, East Asian, Latino, and Caucasian with varying degrees of certainty. Gender 759 can comprise the male gender and/or the female gender with varying degrees of certainty. One purpose of Facial Characteristics Identification Module 603 is to identify the most accurate starting point for the avatar from the set of base facial rigs 763. As state above, optionally, ethnicity 758 and gender 759 are stored in object 711. However, ethnicity 758 and gender 759 need not be stored at all (in object 711 or elsewhere), and they need not be reported to the user or any other person or device.
Facial Characteristics Identification Module 603 further detects facial attributes 761 in cropped face image 753 and stores facial attributes 761 as data within object 711 (step 707). Facial attributes 761 can comprise hairstyle, skin color, hair color, body hair, wearing eyeglasses, wearing hat, and wearing lipstick.
Rig Selection and Modification Module 604 selects base facial rig 763 (shown in
Rig Selection and Modification Module 604 translates, scales, and rotates joints in base facial rig 763 based on facial landmarks 760 to generate customized facial rig 765 (shown in
For example, if facial landmarks 760 indicates that the distance between the center of the eyes of the face in image 751 is wider than in base facial rig 763, one or more joints (such as joint 766 in
Mesh Generation Module 605 applies facial attributes 761 to customized facial rig 765 (shown in
Thus, avatar generation method 700 and avatar generation engine 600 are able to generate avatar 766, which closely resembles a person's face as captured in image 751. Optionally, a user can then be allowed to modify avatar 766 to his or her liking using the same types of modification controls known in the prior art. However, unlike in the prior art, the starting point for this process (i.e., avatar 766) will already closely resemble the user and will have been created with no effort or time spent by the user, other than taking or uploading a photo.
Thereafter, object 711, which includes data for avatar 766, can be replicated and stored on a plurality of client devices 300 and servers 500. Avatar 766 can be generated locally on each such client device 300 by client application 403 and on server 500 by server application 501 or web server 502. For example, avatar 766 might visually appear in a virtual world depicted on display 308 of client device 300a or on a web site generated by web server 502.
References to the present invention herein are not intended to limit the scope of any claim or claim term, but instead merely make reference to one or more features that may be covered by one or more of the claims. Devices, engines, modules, materials, processes and numerical examples described above are exemplary only, and should not be deemed to limit the claims. It should be noted that, as used herein, the terms “over” and “on” both inclusively include “directly on” (no intermediate materials, elements or space disposed there between) and “indirectly on” (intermediate materials, elements or space disposed there between).