This application relates generally to facial detection and more particularly to facial tracking with classifiers.
The examination of the human face can provide dynamic, varied, and plentiful information. Facial data conveys the identity of the person under observation and can later be used to confirm that identity. Facial information further conveys a mood and a mental state or mental states of a person. The capture and analysis of facial information data of a person is undertaken for a wide variety of purposes and practical applications, including determination of a range of emotions and mental states, facial recognition, motion capture, eye tracking, lie detection, computer animation, and other applications. The analysis of facial information data can also be used for the tracking of facial motions, gestures, gaze directions, head poses, expressions, and so on. The applications for the analysis of facial information are both varied and wide ranging, and include product and service market analysis, biometric and other identification, law enforcement applications, social networking connectivity, and healthcare processes, among many others. The analysis is often based on viewing a face, facial expressions, facial features, movements of facial muscles, etc. The results of the analysis can be used to determine emotional and mental states, identity, veracity, and so on, of the person or persons being analyzed. Facial analysis is often used for tracking purposes. The tracking component is often employed to locate a person or persons, and can be used to predict future movement and location of the person or persons. Such geographical tracking has many practical applications including sporting event coverage, law enforcement applications, disease propagation detection, computer gaming events, social networking connectivity, and so on.
Humans are particularly good at processing facial information data for a variety of purposes. Perhaps foremost among the varied purposes is social interaction. The social interaction can be among strangers, friends, family members, and so on. The facial processing is critical to personal safety and even survival in some cases, and is used for such basic human activities as social interactions including cooperation, locating a mate, etc. The facial processing is used to rapidly identify whether a stranger appears friendly and approachable or appears dangerous and should be avoided. Similarly, the processing can be used to quickly determine a friend's mood, the mental state of a family member, and so on. The processing of facial information data is used to draw attention to important objects or events in one's environment, such as potential sources of physical danger requiring an immediate and appropriate response.
Analysis of facial information data becomes difficult for people and for processors when the desired facial information data is captured along with other undesirable data. Imagine for example, that one friend is looking for another friend in a crowd at a sporting event, music concert, political convention, or other large group activity. The flood of spurious data that is captured simultaneously with the facial information data of the sought-after friend confounds the facial information data. This saturation of the facial information data complicates the search for the friend in the crowd. The spurious data must be separated from the facial information data in order to obtain the desired outcome, which in this case is the detection of one's friend in the crowd. The detection of one's friend is further complicated if the friend is moving along with the rest of the crowd. In this scenario, the friend may not be visible at all times, as he or she is moving in and out of sight among the crowd.
Videos are collected from a plurality of people. The videos are partitioned into video frames, and video frames are analyzed to detect locations of facial points or facial landmarks. The locations of the facial points in a first video frame can be used to estimate the locations of facial points in future video frames. An output from a facial detector can be simulated based on the estimations of the locations of the facial points in the future video frames.
A computer-implemented method for facial tracking is disclosed comprising: obtaining a video that includes a face; performing face detection to initialize locations for facial points within a first frame from the video; refining the locations for the facial points based on localized information around the facial points; estimating future locations for the facial points for a future frame from the first; and simulating an output for a facial detector based on the estimating of the future locations for the facial points. The simulating can include generating a bounding box for the face.
In embodiments, a computer program product embodied in a non-transitory computer readable medium for facial detection comprises: code for obtaining a video that includes a face; code for performing face detection to initialize locations for facial points within a first frame from the video; code for refining the locations for the facial points based on localized information around the facial points; code for estimating future locations for the facial points for a future frame from the first; and code for simulating an output for a facial detector based on the estimating of the future locations for the facial points.
In some embodiments, a computer system for facial detection comprises: a memory which stores instructions; one or more processors attached to the memory wherein the one or more processors, when executing the instructions which are stored, are configured to: obtain a video that includes a face; perform face detection to initialize locations for facial points within a first frame from the video; refine the locations for the facial points based on localized information around the facial points; estimate future locations for the facial points for a future frame from the first; and simulate an output for a facial detector based on the estimating of the future locations for the facial points.
Various features, aspects, and advantages of various embodiments will become more apparent from the following further description.
The following detailed description of certain embodiments may be understood by reference to the following figures wherein:
Processing images is a key skill performed by humans in all areas of life. A person must process images such as black and white and color images; videos including slideshows, video clips, and full-length movies; and other electronic images almost constantly in today's modern, highly interactive and media-intensive society. However, the human ability to process visual stimuli stretches back far before the advent of multimedia images. The ability to distinguish between a non-essential and a pertinent image requires the human brain to make a series of evaluations. For example, a movement or flash, briefly viewed in peripheral vision, can trigger instant attention, interest, concern, and so on. Processing systems in the brain unconsciously coordinate a unified and rapid response that allows a person to identify the pertinent visual data and determine whether the stimulus presents physical danger. The ability to quickly locate the source of a movement or another event, to identify it, and to plan a reaction to is a crucial part of interacting with and functioning in the world.
Facial detection by a computing device is a technique by which a computer mirrors many of the unconscious processes of the human brain to process, evaluate, and categorize a myriad of images and videos. Facial detection can be used for purposes including finding a face in a scene, identifying a face, tracking a face, and so on. Facial detection finds wide-ranging applications in fields including healthcare, law enforcement, social media, gaming, and so on. Detected facial data also can be used to determine the mental and emotional states of the people whose faces have been detected, for example. The determined mental and emotional states can be used for identification and classification purposes, among others.
However, the processing of facial data can be a complex and resource-intensive computational problem. Consider, for example, an image, still or video, of a loved one. The human brain can quickly identify the important face in profile, in a portrait shot, in a crowd, rotated, or even in a decades-old image. Even though human facial detection is by no means foolproof—for example, siblings or even parents and children can be hard to distinguish in photographs taken at the same age—the speed and accuracy of the identification is often remarkable. As a result, automatic facial detection techniques must anticipate and perform many simultaneous tasks, making automated detection complex and not always successful when evaluating similar images.
Certain techniques, however, render automatic facial processing more effective and less computationally intensive. For example, facial tracking can be used to aid in the identification and processing of human faces in videos. In this technique, a given video can be partitioned into frames and all of the frames or a subset of the frames from the video can be analyzed. The analysis can include detecting a face within a first frame. When a face is detected, locations of facial points or landmarks can be initialized. The facial points can include facial features including locations of eyes, ears, a nose, a mouth, a chin, facial hair, and so on. The facial points can also include distinguishing marks and features on a face including a mole, a birthmark, a scar, etc. Based on the locations of the facial points within one video frame, the locations of the same facial features in a later frame from the video can be estimated. The later frame can be the next frame from the video or another frame from a different moment in the video. A facial detector can be simulated based on the estimated locations of the future facial points. The simulating of the facial detector can generate an output, where the output can include a bounding box for the face. The locations of the facial points in subsequent frames and of the bounding box can be adapted based on the actual location of the face in the later frames from the video. The adapted locations of the facial points and the bounding box can be used for other future frames from the video.
The flow 100 includes refining the locations 130 for the first set of facial landmarks based on localized information around the first set of facial landmarks. The refining the locations of facial landmarks can include centering location points on the facial landmarks, where the facial landmarks can include corners of a mouth, corners of eyes, eyebrow corners, tip of nose, nostrils, chin, tips of ears, and so on. The refining of the locations for the facial points can include centering location points on facial attributes including eyes, ears, a nose, a mouth, a chin, etc. The refining can also include detection of the face within a background, for example. The refining can include identification of one face from among a plurality of faces in the frame from the video. The flow 100 includes estimating future locations 140 for landmarks within the first set of facial landmarks for a future frame from the first frame. The estimating future locations can include using the locations of facial points, facial landmarks, facial characteristics, distinguishing marks, etc. in a first frame to estimate the locations of facial points, facial landmarks, facial characteristics, distinguishing marks, etc. in a second frame, for example. The second frame can be a future (subsequent) frame or a past (previous) frame. The future frame can be a next frame in a chronological series of frames from the first frame in the video. The flow 100 includes providing an output for a facial detector 150 based on the estimating of the future locations for the landmarks. The providing an output for a facial detector can include estimating the future locations for the facial landmarks, facial points, facial characteristics, distinguishing marks, etc. The providing an output including the future locations for the facial landmarks, facial points, and so on, can be used to predict the presence and location of a face in a future frame, for example. The future frame can be the next frame in a series of frames, a later frame, and so on. The providing of the output of the facial detector can include generating a bounding box 152 for the face. A first bounding box can be generated for a face that is detected in a first frame. The first bounding box can be a square, a rectangle, and/or any other appropriate geometric shape. The first bounding box can be substantially the same as the bounding box generated by a face detector. The first bounding box can be a minimum-dimension bounding box, where the dimension can include area, volume, hyper-volume, and so on. The first bounding box can be generated based on analysis, estimation, simulation, prediction, and so on.
The flow 100 includes performing face detection to initialize a second set of locations 160 for a second set of facial landmarks for a second face within the video. The face detection of the second face can be based on other facial points, as described above. Facial detection of the second face can be accomplished using a variety of techniques including edge detection, color image processing, landmark identification, and so on. The performing face detection on the second face can include performing facial landmark detection 162 within the first frame from the video for the second face. As was the case for the detection of the first set of facial landmarks for the first face within the video, the facial landmarks for the second face can include corners of the mouth, corners of eyes, eyebrow corners, tip of nose, nostrils, chin, tips of ears, distinguishing marks and features, and so on. Other facial landmarks can also be used. The performing face detection on the second face includes estimating a second rough bounding box 164 for the second face based on the facial landmark detection. The second bounding box can be a square, a rectangle, and/or any other appropriate geometric shape. The second bounding box can be a different geometric shape from that of the first bounding box. The second bounding box can be substantially the same as the bounding box generated by a face detector. The second bounding box can be a minimum-dimension bounding box, where the dimension can include area, volume, hyper-volume, and so on. The second bounding box can be generated based on analysis, estimation, simulation, prediction, and other appropriate techniques. The performing face detection on the second face includes refining the second set of locations 166 for the second set of facial landmarks based on localized information around the second set of facial landmarks. The technique for refining of the locations of the second set of facial landmarks can be the same as or different from the refining of the locations of the first set of facial landmarks. The refining of the locations for the second set of facial landmarks can include centering location points on facial attributes such as facial points including eyes, ears, a nose, a mouth, a chin, etc., as well as refining the locations of facial landmarks that can include corners of a mouth, corners of eyes, eyebrow corners, tip of nose, nostrils, chin, tips of ears, and so on. The refining can also include detection of the second face within a background, for example. The performing face detection on the second face includes estimating future locations 168 for the second set of locations for the second set of facial landmarks for the future frame from the first frame. The estimating future locations for the second set of facial landmarks can include using the locations of facial points, facial landmarks, facial characteristics, distinguishing marks, etc. in a first frame to estimate the locations of facial points, facial landmarks, facial characteristics, distinguishing marks, etc. in a second frame, for example. The second frame can be a future (subsequent) frame or a past (previous) frame. The future frame can be a next frame in a chronological series of frames from the first frame in the video. The performing face detection on the second face includes distinguishing facial points 170 from the first face from other facial points. The distinguishing facial points can include distinguishing facial points from the second face, distinguishing facial points from a third face, and so on. The distinguishing facial points can include distinguishing the second set of facial points, second set of facial landmarks, second set of facial characteristics, second set of distinguishing marks, etc. from the first set of facial points, first set of facial landmarks, first set of facial characteristics, first set of distinguishing marks, and so on.
The flow 100 includes analyzing the face using a plurality of classifiers 175. The face that is analyzed can be the first face, the second face, the third face, and so on. The face can be analyzed to determine facial landmarks, facial features, facial points, and so on. The classifiers can be used to determine facial landmarks including corners of a mouth, corners of eyes, eyebrow corners, tip of nose, nostrils, chin, tips of ears, and so on. The plurality of classifiers can provide for analysis of gender, ethnicity, or age corresponding to the face. Classifiers can be used to provide for analysis of other demographic data and information.
The flow 100 further includes generating a bounding box 180 for the face within the first frame. The bounding box that is generated can be a square, a rectangle, and/or any other appropriate polygon for surrounding a shape with a frame. For example, the bounding box can be generated for a shape that is a face within a frame. The flow 100 includes repeating the refining and the estimating for succeeding frames 185 from the video. The repeating can be accomplished for one succeeding frame, a sequence of succeeding frames, a random selection of succeeding frames, and so on. The repeating can include one or more of the refining and the estimating. The flow 100 includes evaluating the face to determine rotation 190 about a z-axis of the face. The evaluating the face can be used to determine that a face has rotated from a first frame to a second frame, where the second frame can be a past frame, the previous frame, the next frame, a succeeding frame, and so on. The evaluating the face to determine rotation about the z-axis or another axis can determine a view of the face. For example, the view of the face can be a one quarter view, a half (profile) view, a three quarter view, a full view, and so on. The flow 100 includes estimating a quality of the rough bounding box 195 for the future frame. The estimating of the quality of the rough bounding box for future frames can be based on accuracy, percent error, and/or deviation, along with other factors, for the bounding box for a future frame. The estimating of the quality of the bounding box for future frames can be based on a threshold. Various steps in the flow 100 may be changed in order, repeated, omitted, or the like without departing from the disclosed concepts. Various embodiments of the flow 100 may be included in a computer program product embodied in a non-transitory computer readable medium that includes code executable by one or more processors. Various embodiments of the flow 100 may be included in a purpose-built customized processor, computer, and integrated circuit chip.
The contextual information can include video origin information from which a sample is extracted, a subject identification (ID), different expression information, and so on. The flow 200 includes generating a mirror image of the face 220. The generating of the mirror image of the face can be accomplished by rotating the image of the face 180 degrees at the centerline of the face, or by using another mirroring technique, for example. The flow 200 includes generating a rotated image 230 of the face. The rotated image of the face can be rotated by a constant amount, by a series of predetermined amounts, by a random amount, and so on. For example, the face can be rotated by 45 degrees; by a series of rotations including 5 degrees, 10 degrees, and 15 degrees; and so on. The face can be rotated by any appropriate amount for training purposes for training the one or more classifiers. The flow 200 includes translating the rough bounding box 240 to a different location. The translating the rough bounding box can be based on a random translation, a fixed translation, a pattern of translations, a predetermined translation, and so on. For example, a pattern of translations of the bounding box could include translating along the x-axis and y-axis (east, west, north, south), and diagonally (northwest, northeast, southeast, southwest) for up to eight other translations. The translation can be by a distance equal to a dimension of the bounding box, or by any other amount. The flow 200 includes generating a scaled image 250 of the face. The generating of the scaled image of the face can include enlarging the face (zooming in), shrinking the face (zooming out), and so on.
Classification can be based on various types of algorithms, heuristics, codes, procedures, statistics, and so on. Many techniques exist for performing classification. For example, classification of one or more observations into one or more groups can be based on distributions of the data values, probabilities, and so on. Classifiers can be binary, multiclass, linear and so on. Algorithms for classification can be implemented using a variety of techniques including neural networks, kernel estimation, support vector machines, use of quadratic surfaces, and so on. Classification can be used in many application areas such as computer vision, speech and handwriting recognition, and so on. Classification can be used for biometric identification of one or more people in one or more frames of one or more videos.
Returning to
A second frame 502 is also shown. The second video frame 502 includes a frame boundary 530, a first face 532, and a second face 534. The second frame 502 also includes a bounding box 540 and the facial landmarks 542, 544, and 546. In other embodiments, any number of facial landmarks can be generated and used for facial tracking of the two or more faces of a video frame such as the shown second video frame 502. Facial points from the first face can be distinguished from other facial points. In embodiments, the other facial points include facial points of one or more other faces. The facial points can correspond to the facial points of the second face. The distinguishing of the facial points of the first face and the facial points of the second face can be used to distinguish between the first face and the second face, to track either or both of the first face and the second face, and so on. Other facial points can correspond to the second face. As mentioned above, any number of facial points can be determined within a frame. One or more of the other facial points that are determined can correspond to a third face. The location of the bounding box 540 can be estimated, where the estimating can be based on the location of the generated bounding box 520 shown in the prior frame 500. The three facial points shown, 542, 544, and 546 might lie within the bounding box 540 or might not lie partially or completely within the bounding box 540. For example, the second face 534 might have moved between the first video frame 500 and the second video frame 502. Based on the accuracy of the estimating of the bounding box 540, a new estimation can be determined for a third, future frame from the video, and so on.
Various techniques can be used to train tracking facial landmarks of a face of a person, for example, and to improve the tracking of the facial landmarks. The tracking can include tracking facial points, distinguishing features, and so on. The training can include generating a mirror image of the face. The mirror image of the face can be generated, for example, by finding a centerline in the Z-axis for the face, and then rotating the face about the Z-axis. The training can include generating a scaled image of the face. The face can be enlarged (zoom-in), reduced (zoom-out), and so on. Any appropriate technique can be used for the training One example of facial training is shown in the example 800. The training can be based on automatic techniques, manual techniques, algorithms, heuristics, and so on. The training can be used to improve several aspects of facial tracking including detecting locations of one or more facial landmarks, refining of the location of the one or more facial landmarks, estimating locations of one or more facial landmarks in one or more future video frames, simulating an output of a facial detector, and so on. The training can begin with a video frame 810 which contains a face. Various adaptations can be made to the face in the video frame 810 including rotating, forming a mirror image, translating, removing, scaling, and so on. The frames 820 and 822 show variations of the frame 810 in which a mirror image is formed of the face in the frame 820, and the face is rotated in the frame 822. Many other adaptations can be made to the frame which contains the face, including translating the face north, south, east, or west within the frame, translating the face diagonally northwest, northeast, southeast, southwest, and so on. Noise can be introduced into the frames to improve training for detection. A bounding box can be determined for frames generated for variations of the face, such as the bounding box generated for a rotated face as shown in the frame 830. The training can include further variations of the video frame containing the face. For example, the frame 840 shows a bounding box determined for a previous frame being applied to the frame containing the rotated face. The bounding box in the frame 840 demonstrates a box translated from an original position for a face. The translation can be accomplished by shifting the bounding box, by shifting the frame, by shifting the face, and so on. The training technique or techniques can continue for various faces, for numbers of faces partially or completely within a frame, for various degrees of rotation, for various distances and directions of translation, and so on. Additional training techniques can be used individually and combined with other training techniques. The translating of the bounding box to a different location as shown in the frame 840 can be based on velocity of one or more facial landmarks that are determined, angular velocity of one or more facial landmarks that are determined, and so on.
In embodiments, an X can represent a positive case such as a smile while an O can represent a negative case, such as the lack of a smile. The lack of a smile can be a neutral face, a frown, or various other non-smile expressions. In other embodiments, frowns can be a cluster while neutral faces can be another cluster, for example. A non-linear classifier such as a support vector machine (SVM) can be used to analyze the data. A radial basis function (RBF) kernel can be employed. However, the SVM and RBF usage typically does not scale well as data sets become larger. Thus, in embodiments, a Nystrom method can be used to approximate RBF usage, resulting in analysis of the data that is better than using linear SVM analysis and faster than using RBF analysis.
In embodiments, a very large number of frames are obtained for various videos. A sample can be taken from these frames to approximate RBF-type analysis. The sampling can be random. In other cases, the sample can factor in context. For example, a most significant expression can be selected, such as picking a smile with the highest magnitude. In some situations, a large number of frames that are more relevant to the analysis can be selected from one person while include few or no frames of a video from another person. Based on this frame sampling and using Nystrom approximation, non-linear analysis of facial expressions can be accomplished.
The analysis server 1630 can comprise one or more processors 1634 coupled to a memory 1636 which can store and retrieve instructions, and a display 1632. The analysis server 1630 can receive video data and can analyze the video data to detect locations of facial points and to simulate facial detection. The analysis of the facial data and the detection of the facial points can be performed by a web service and/or using cloud computing techniques. The analysis server 1630 can receive facial data or video data from the video data collection machine 1620. The analysis server can receive operation data 1654, where the operation data can include facial point detection data. The facial point detection data and other data and information related to facial tracking and analysis of the facial data can be considered video data 1652 and can be transmitted to and from the analysis server 1630 using the internet or another type of network. In some embodiments, the analysis server 1630 receives video data and/or facial data from a plurality of client machines and aggregates the facial data. The analysis server can perform facial tracking using classifiers.
In some embodiments, a displayed rendering of facial data and locations of facial points can occur on a different computer from the video data collection machine 1620 or the analysis server 1630. This computer can be termed a rendering machine 1640 and can receive facial tracking rendering data 1656, facial data, simulated facial detector data, video data, detected facial points data, and graphical display information. In embodiments, the rendering machine 1640 comprises one or more processors 1644 coupled to a memory 1646 which can store and retrieve instructions, and a display 1642. The rendering can be any visual, auditory, tactile, or other communication to one or more individuals. The rendering can include an email message, a text message, a tone, an electrical pulse, a vibration, or the like. The system 1600 can include a computer program product embodied in a non-transitory computer readable medium for mental state analysis comprising: code for obtaining a video that includes a face; code for performing face detection to initialize locations for a first set of facial landmarks within a first frame from the video wherein the face detection comprises: performing facial landmark detection within the first frame from the video; and estimating a rough bounding box for the face based on the facial landmark detection; code for refining the locations for the first set of facial landmarks based on localized information around the first set of facial landmarks; and code for estimating future locations for landmarks within the first set of facial landmarks for a future frame from the first frame.
Each of the above methods may be executed on one or more processors on one or more computer systems. Embodiments may include various forms of distributed computing, client/server computing, and cloud based computing. Further, it will be understood that the depicted steps or boxes contained in this disclosure's flow charts are solely illustrative and explanatory. The steps may be modified, omitted, repeated, or re-ordered without departing from the scope of this disclosure. Further, each step may contain one or more sub-steps. While the foregoing drawings and description set forth functional aspects of the disclosed systems, no particular implementation or arrangement of software and/or hardware should be inferred from these descriptions unless explicitly stated or otherwise clear from the context. All such arrangements of software and/or hardware are intended to fall within the scope of this disclosure.
The block diagrams and flowchart illustrations depict methods, apparatus, systems, and computer program products. The elements and combinations of elements in the block diagrams and flow diagrams, show functions, steps, or groups of steps of the methods, apparatus, systems, computer program products and/or computer-implemented methods. Any and all such functions—generally referred to herein as a “circuit,” “module,” or “system” may be implemented by computer program instructions, by special-purpose hardware-based computer systems, by combinations of special purpose hardware and computer instructions, by combinations of general purpose hardware and computer instructions, and so on.
A programmable apparatus which executes any of the above mentioned computer program products or computer-implemented methods may include one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors, programmable devices, programmable gate arrays, programmable array logic, memory devices, application specific integrated circuits, or the like. Each may be suitably employed or configured to process computer program instructions, execute computer logic, store computer data, and so on.
It will be understood that a computer may include a computer program product from a computer-readable storage medium and that this medium may be internal or external, removable and replaceable, or fixed. In addition, a computer may include a Basic Input/Output System (BIOS), firmware, an operating system, a database, or the like that may include, interface with, or support the software and hardware described herein.
Embodiments of the present invention are neither limited to conventional computer applications nor the programmable apparatus that run them. To illustrate: the embodiments of the presently claimed invention could include an optical computer, quantum computer, analog computer, or the like. A computer program may be loaded onto a computer to produce a particular machine that may perform any and all of the depicted functions. This particular machine provides a means for carrying out any and all of the depicted functions.
Any combination of one or more computer readable media may be utilized including but not limited to: a non-transitory computer readable medium for storage; an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor computer readable storage medium or any suitable combination of the foregoing; a portable computer diskette; a hard disk; a random access memory (RAM); a read-only memory (ROM), an erasable programmable read-only memory (EPROM, Flash, MRAM, FeRAM, or phase change memory); an optical fiber; a portable compact disc; an optical storage device; a magnetic storage device; or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It will be appreciated that computer program instructions may include computer executable code. A variety of languages for expressing computer program instructions may include without limitation C, C++, Java, JavaScript™, ActionScript™, assembly language, Lisp, Perl, Tcl, Python, Ruby, hardware description languages, database programming languages, functional programming languages, imperative programming languages, and so on. In embodiments, computer program instructions may be stored, compiled, or interpreted to run on a computer, a programmable data processing apparatus, a heterogeneous combination of processors or processor architectures, and so on. Without limitation, embodiments of the present invention may take the form of web-based computer software, which includes client/server software, software-as-a-service, peer-to-peer software, or the like.
In embodiments, a computer may enable execution of computer program instructions including multiple programs or threads. The multiple programs or threads may be processed approximately simultaneously to enhance utilization of the processor and to facilitate substantially simultaneous functions. By way of implementation, any and all methods, program codes, program instructions, and the like described herein may be implemented in one or more threads which may in turn spawn other threads, which may themselves have priorities associated with them. In some embodiments, a computer may process these threads based on priority or other order.
Unless explicitly stated or otherwise clear from the context, the verbs “execute” and “process” may be used interchangeably to indicate execute, process, interpret, compile, assemble, link, load, or a combination of the foregoing. Therefore, embodiments that execute or process computer program instructions, computer-executable code, or the like may act upon the instructions or code in any and all of the ways described. Further, the method steps shown are intended to include any suitable method of causing one or more parties or entities to perform the steps. The parties performing a step, or portion of a step, need not be located within a particular geographic location or country boundary. For instance, if an entity located within the United States causes a method step, or portion thereof, to be performed outside of the United States then the method is considered to be performed in the United States by virtue of the causal entity.
While the invention has been disclosed in connection with preferred embodiments shown and described in detail, various modifications and improvements thereon will become apparent to those skilled in the art. Accordingly, the forgoing examples should not limit the spirit and scope of the present invention; rather it should be understood in the broadest sense allowable by law.
This application claims the benefit of U.S. provisional patent applications “Facial Tracking with Classifiers” Ser. No. 62/047,508, filed Sep. 8, 2014, “Semiconductor Based Mental State Analysis” Ser. No. 62/082,579, filed Nov. 20, 2014, and “Viewership Analysis Based On Facial Evaluation” Ser. No. 62/128,974, filed Mar. 5, 2015. This application is also a continuation-in-part of U.S. patent application “Mental State Analysis Using Web Services” Ser. No. 13/153,745, filed Jun. 6, 2011, which claims the benefit of U.S. provisional patent applications “Mental State Analysis Through Web Based Indexing” Ser. No. 61/352,166, filed Jun. 7, 2010, “Measuring Affective Data for Web-Enabled Applications” Ser. No. 61/388,002, filed Sep. 30, 2010, “Sharing Affect Across a Social Network” Ser. No. 61/414,451, filed Nov. 17, 2010, “Using Affect Within a Gaming Context” Ser. No. 61/439,913, filed Feb. 6, 2011, “Recommendation and Visualization of Affect Responses to Videos” Ser. No. 61/447,089, filed Feb. 27, 2011, “Video Ranking Based on Affect” Ser. No. 61/447,464, filed Feb. 28, 2011, and “Baseline Face Analysis” Ser. No. 61/467,209, filed Mar. 24, 2011. This application is also a continuation-in-part of U.S. patent application “Mental State Analysis Using an Application Programming Interface” Ser. No. 14/460,915, Aug. 15, 2014, which claims the benefit of U.S. provisional patent applications “Application Programming Interface for Mental State Analysis” Ser. No. 61/867,007, filed Aug. 16, 2013, “Mental State Analysis Using an Application Programming Interface” Ser. No. 61/924,252, filed Jan. 7, 2014, “Heart Rate Variability Evaluation for Mental State Analysis” Ser. No. 61/916,190, filed Dec. 14, 2013, “Mental State Analysis for Norm Generation” Ser. No. 61/927,481, filed Jan. 15, 2014, “Expression Analysis in Response to Mental State Express Request” Ser. No. 61/953,878, filed Mar. 16, 2014, “Background Analysis of Mental State Expressions” Ser. No. 61/972,314, filed Mar. 30, 2014, and “Mental State Event Definition Generation” Ser. No. 62/023,800, filed Jul. 11, 2014; the application is also a continuation-in-part of U.S. patent application “Mental State Analysis Using Web Services” Ser. No. 13/153,745, filed Jun. 6, 2011, which claims the benefit of U.S. provisional patent applications “Mental State Analysis Through Web Based Indexing” Ser. No. 61/352,166, filed Jun. 7, 2010, “Measuring Affective Data for Web-Enabled Applications” Ser. No. 61/388,002, filed Sep. 30, 2010, “Sharing Affect Across a Social Network” Ser. No. 61/414,451, filed Nov. 17, 2010, “Using Affect Within a Gaming Context” Ser. No. 61/439,913, filed Feb. 6, 2011, “Recommendation and Visualization of Affect Responses to Videos” Ser. No. 61/447,089, filed Feb. 27, 2011, “Video Ranking Based on Affect” Ser. No. 61/447,464, filed Feb. 28, 2011, and “Baseline Face Analysis” Ser. No. 61/467,209, filed Mar. 24, 2011. The foregoing applications are each hereby incorporated by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
3034500 | Backster, Jr. | May 1962 | A |
3548806 | Fisher | Dec 1970 | A |
3870034 | James | Mar 1975 | A |
4353375 | Colburn et al. | Oct 1982 | A |
4448203 | Williamson et al. | May 1984 | A |
4794533 | Cohen | Dec 1988 | A |
4807642 | Brown | Feb 1989 | A |
4817628 | Zealear et al. | Apr 1989 | A |
4950069 | Hutchinson | Aug 1990 | A |
4964411 | Johnson et al. | Oct 1990 | A |
5016282 | Tomono et al. | May 1991 | A |
5031228 | Lu | Jul 1991 | A |
5219322 | Weathers | Jun 1993 | A |
5247938 | Silverstein et al. | Sep 1993 | A |
5259390 | Maclean | Nov 1993 | A |
5507291 | Stirbl et al. | Apr 1996 | A |
5572596 | Wildes | Nov 1996 | A |
5619571 | Sandstorm et al. | Apr 1997 | A |
5647834 | Ron | Jul 1997 | A |
5649061 | Smyth | Jul 1997 | A |
5663900 | Bhandari et al. | Sep 1997 | A |
5666215 | Fredlund et al. | Sep 1997 | A |
5725472 | Weathers | Mar 1998 | A |
5741217 | Gera | Apr 1998 | A |
5760917 | Sheridan | Jun 1998 | A |
5762611 | Lewis et al. | Jun 1998 | A |
5772508 | Sugita et al. | Jun 1998 | A |
5772591 | Cram | Jun 1998 | A |
5774591 | Black et al. | Jun 1998 | A |
5802220 | Black et al. | Sep 1998 | A |
5825355 | Palmer et al. | Oct 1998 | A |
5886683 | Tognazzini et al. | Mar 1999 | A |
5898423 | Tognazzini et al. | Apr 1999 | A |
5920477 | Hoffberg et al. | Jul 1999 | A |
5945988 | Williams et al. | Aug 1999 | A |
5959621 | Nawaz et al. | Sep 1999 | A |
5969755 | Courtney | Oct 1999 | A |
5983129 | Cowan et al. | Nov 1999 | A |
5987415 | Breese et al. | Nov 1999 | A |
6004061 | Manico et al. | Dec 1999 | A |
6004312 | Finneran et al. | Dec 1999 | A |
6008817 | Gilmore, Jr. | Dec 1999 | A |
6026321 | Miyata et al. | Feb 2000 | A |
6026322 | Korenman et al. | Feb 2000 | A |
6056781 | Wassick et al. | May 2000 | A |
6067565 | Horvitz | May 2000 | A |
6088040 | Oda et al. | Jul 2000 | A |
6091334 | Galiana et al. | Jul 2000 | A |
6099319 | Zaltman et al. | Aug 2000 | A |
6134644 | Mayuzumi et al. | Oct 2000 | A |
6182098 | Selker | Jan 2001 | B1 |
6185534 | Breese et al. | Feb 2001 | B1 |
6195651 | Handel et al. | Feb 2001 | B1 |
6212502 | Ballet et al. | Apr 2001 | B1 |
6222607 | Szajewski et al. | Apr 2001 | B1 |
6309342 | Blazey et al. | Oct 2001 | B1 |
6327580 | Pierce et al. | Dec 2001 | B1 |
6349290 | Horowitz et al. | Feb 2002 | B1 |
6351273 | Lemelson et al. | Feb 2002 | B1 |
6437758 | Nielsen et al. | Aug 2002 | B1 |
6443840 | Von Kohorn | Sep 2002 | B2 |
6530082 | Del Sesto et al. | Mar 2003 | B1 |
6577329 | Flickner et al. | Jun 2003 | B1 |
6606102 | Odom | Aug 2003 | B1 |
6629104 | Parulski et al. | Sep 2003 | B1 |
6792458 | Muret et al. | Sep 2004 | B1 |
6847376 | Engeldrum et al. | Jan 2005 | B2 |
7003135 | Hsieh et al. | Feb 2006 | B2 |
7013478 | Hendricks et al. | Mar 2006 | B1 |
7050607 | Li | May 2006 | B2 |
7113916 | Hill | Sep 2006 | B1 |
7120880 | Dryer et al. | Oct 2006 | B1 |
7197459 | Harinarayan et al. | Mar 2007 | B1 |
7233684 | Fedorovskaya et al. | Jun 2007 | B2 |
7246081 | Hill | Jul 2007 | B2 |
7263474 | Fables et al. | Aug 2007 | B2 |
7266582 | Stelting | Sep 2007 | B2 |
7307636 | Matraszek et al. | Dec 2007 | B2 |
7319779 | Mummareddy et al. | Jan 2008 | B1 |
7327505 | Fedorovskaya et al. | Feb 2008 | B2 |
7350138 | Swaminathan et al. | Mar 2008 | B1 |
7353399 | Ooi et al. | Apr 2008 | B2 |
7355627 | Yamazaki et al. | Apr 2008 | B2 |
7428318 | Madsen et al. | Sep 2008 | B1 |
7474801 | Teo et al. | Jan 2009 | B2 |
7496622 | Brown et al. | Feb 2009 | B2 |
7549161 | Poo et al. | Jun 2009 | B2 |
7551755 | Steinberg et al. | Jun 2009 | B1 |
7555148 | Steinberg et al. | Jun 2009 | B1 |
7558408 | Steinberg et al. | Jul 2009 | B1 |
7564994 | Steinberg et al. | Jul 2009 | B1 |
7573439 | Lau et al. | Aug 2009 | B2 |
7580512 | Batni et al. | Aug 2009 | B2 |
7584435 | Bailey et al. | Sep 2009 | B2 |
7587068 | Steinberg et al. | Sep 2009 | B1 |
7610289 | Muret et al. | Oct 2009 | B2 |
7620934 | Falter et al. | Nov 2009 | B2 |
7644375 | Anderson et al. | Jan 2010 | B1 |
7676574 | Glommen et al. | Mar 2010 | B2 |
7747801 | Han et al. | Jun 2010 | B2 |
7757171 | Wong et al. | Jul 2010 | B1 |
7826657 | Zhang et al. | Nov 2010 | B2 |
7830570 | Morita et al. | Nov 2010 | B2 |
7881493 | Edwards et al. | Feb 2011 | B1 |
7921036 | Sharma | Apr 2011 | B1 |
8010458 | Galbreath et al. | Aug 2011 | B2 |
8022831 | Wood-Eyre | Sep 2011 | B1 |
8219438 | Moon et al. | Jul 2012 | B1 |
8401248 | Moon et al. | Mar 2013 | B1 |
8442638 | Libbus et al. | May 2013 | B2 |
8522779 | Lee et al. | Sep 2013 | B2 |
8600120 | Gonion et al. | Dec 2013 | B2 |
8640021 | Perez et al. | Jan 2014 | B2 |
20010033286 | Stokes et al. | Oct 2001 | A1 |
20010041021 | Boyle et al. | Nov 2001 | A1 |
20020007249 | Cranley | Jan 2002 | A1 |
20020030665 | Ano | Mar 2002 | A1 |
20020042557 | Bensen et al. | Apr 2002 | A1 |
20020054174 | Abbott et al. | May 2002 | A1 |
20020084902 | Zadrozny et al. | Jul 2002 | A1 |
20020171551 | Eshelman | Nov 2002 | A1 |
20020182574 | Freer | Dec 2002 | A1 |
20030035567 | Chang et al. | Feb 2003 | A1 |
20030037041 | Hertz | Feb 2003 | A1 |
20030060728 | Mandigo | Mar 2003 | A1 |
20030093784 | Dimitrova et al. | May 2003 | A1 |
20030182123 | Mitsuyoshi | Sep 2003 | A1 |
20030191682 | Shepard et al. | Oct 2003 | A1 |
20030191816 | Landress et al. | Oct 2003 | A1 |
20040181457 | Biebesheimer | Sep 2004 | A1 |
20050187437 | Matsugu | Aug 2005 | A1 |
20050210103 | Rui | Sep 2005 | A1 |
20050283055 | Shirai et al. | Dec 2005 | A1 |
20050289582 | Tavares et al. | Dec 2005 | A1 |
20060019224 | Behar et al. | Jan 2006 | A1 |
20060143647 | Bill | Jun 2006 | A1 |
20060170945 | Bill | Aug 2006 | A1 |
20060235753 | Kameyama | Oct 2006 | A1 |
20070076922 | Living | Apr 2007 | A1 |
20070167689 | Ramadas et al. | Jul 2007 | A1 |
20070173733 | Le et al. | Jul 2007 | A1 |
20070239787 | Cunningham et al. | Oct 2007 | A1 |
20070255831 | Hayashi et al. | Nov 2007 | A1 |
20070265507 | de Lemos | Nov 2007 | A1 |
20070299964 | Wong et al. | Dec 2007 | A1 |
20080059570 | Bill | Mar 2008 | A1 |
20080091512 | Marci et al. | Apr 2008 | A1 |
20080091515 | Thieberger et al. | Apr 2008 | A1 |
20080101660 | Seo | May 2008 | A1 |
20080103784 | Wong et al. | May 2008 | A1 |
20080184170 | Periyalwar | Jul 2008 | A1 |
20080208015 | Morris et al. | Aug 2008 | A1 |
20080221472 | Lee et al. | Sep 2008 | A1 |
20080287821 | Jung et al. | Nov 2008 | A1 |
20080292151 | Kurtz et al. | Nov 2008 | A1 |
20090002178 | Guday et al. | Jan 2009 | A1 |
20090006206 | Groe | Jan 2009 | A1 |
20090083421 | Glommen et al. | Mar 2009 | A1 |
20090094286 | Lee et al. | Apr 2009 | A1 |
20090112694 | Jung et al. | Apr 2009 | A1 |
20090112810 | Jung et al. | Apr 2009 | A1 |
20090133048 | Gibbs et al. | May 2009 | A1 |
20090150919 | Lee et al. | Jun 2009 | A1 |
20090164132 | Jung et al. | Jun 2009 | A1 |
20090193344 | Smyers | Jul 2009 | A1 |
20090210290 | Elliott et al. | Aug 2009 | A1 |
20090217315 | Malik et al. | Aug 2009 | A1 |
20090259518 | Harvey | Oct 2009 | A1 |
20090263022 | Petrescu | Oct 2009 | A1 |
20090270170 | Patton | Oct 2009 | A1 |
20090271417 | Toebes et al. | Oct 2009 | A1 |
20090299840 | Smith | Dec 2009 | A1 |
20100070523 | Delgo et al. | Mar 2010 | A1 |
20100099955 | Thomas et al. | Apr 2010 | A1 |
20100266213 | Hill | Oct 2010 | A1 |
20100274847 | Anderson et al. | Oct 2010 | A1 |
20100324437 | Freeman | Dec 2010 | A1 |
20110126226 | Makhlouf | May 2011 | A1 |
20110134026 | Kang et al. | Jun 2011 | A1 |
20110143728 | Holopainen et al. | Jun 2011 | A1 |
20110144971 | Danielson | Jun 2011 | A1 |
20110196855 | Wable et al. | Aug 2011 | A1 |
20110231240 | Schoen et al. | Sep 2011 | A1 |
20110251493 | Poh et al. | Oct 2011 | A1 |
20110263946 | el Kaliouby et al. | Oct 2011 | A1 |
20120324491 | Bathiche et al. | Dec 2012 | A1 |
20130023337 | Bowers et al. | Jan 2013 | A1 |
20130044958 | Brandt | Feb 2013 | A1 |
20130116587 | Sornmo et al. | May 2013 | A1 |
20130121584 | Bourdev | May 2013 | A1 |
20130197409 | Baxter et al. | Aug 2013 | A1 |
20140172910 | Jung et al. | Jun 2014 | A1 |
20150154441 | Zhang | Jun 2015 | A1 |
20160104486 | Penilla et al. | Apr 2016 | A1 |
20160371537 | He | Dec 2016 | A1 |
20170003784 | Garg et al. | Jan 2017 | A1 |
20170017831 | Rollend | Jan 2017 | A1 |
Number | Date | Country |
---|---|---|
08115367 | Jul 1996 | JP |
10-2005-0021759 | Mar 2005 | KR |
10-2008-0016303 | Feb 2008 | KR |
1020100048688 | May 2010 | KR |
WO 20111045422 | Apr 2011 | WO |
Entry |
---|
El Maghraby, Amr, et al. “Detect and analyze face parts information using Viola-Jones and geometric approaches.” Int. J. Comput. Appl 101.3 (2014): 23-28. |
Wang, Chao, and Xubo Song. “Tracking facial feature points with prediction-assisted view-based active shape model.” Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 2011. |
Du, Shaoyi, et al. “Rotated haar-like features for face detection with in-plane rotation.” VSMM. 2006. |
Ferry, Quentin, et al. “Diagnostically relevant facial gestalt information from ordinary photos.” Elife 3 (2014). |
Cardinaux, Fabien, Conrad Sanderson, and Sébastien Marcel. “Comparison of MLP and GMM Classifiers for Face Verification on XM2VTS.” International Conference on Audio-and Video-based Biometric Person Authentication. Springer, Berlin, Heidelberg, 2003. (Year: 2003). |
Arca, Stefano, Paola Campadelli, and Raffaella Lanzarotti. “A face recognition system based on automatically determined facial fiducial points.” Pattern recognition 39.3 (2006): 432-443. (Year: 2006). |
Rana Ayman El Kaliouby, Mind-reading machines: automated inference of complex mental states, Jul. 2005, University of Cambridge, Cambridge, United Kingdom. |
International Search Report dated Nov. 2011 for PCT/US2011/39282. |
International Search Report dated Apr. 16, 2012 for PCT/US2011/054125. |
International Search Report dated May 24, 2012 for PCT/US2011/060900. |
Xiaoyu Wang, An HOG-LBP human detector with partial occlusion handling, Sep. 29, 2009, IEEE 12th International Conference on Computer Vision, Kyoto, Japan. |
Zhihong Zeng, A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions, Jan. 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, No. 1. |
Nicholas R. Howe and Amanda Ricketson, Improving the Boosted Correlogram, 2004, Lecture Notes in Computer Science, ISSN 0302-9743, Springer-Verlag, Germany. |
Xuming He, et al, Learning and Incorporating Top-Down Cues in Image Segmentation, 2006, Lecture Notes in Computer Science, ISBN 978-3-540-33832-1, Springer-Verlag, Germany. |
Ross Eaton, et al, Rapid Training of Image Classifiers through Adaptive, Multi-frame Sampling Methods, Oct. 2008, IEEE 37th Applied Imagery Pattern Recognition Workshop, Washington DC. |
Verkruysse, Wim, Lars O. Svaasand, and J. Stuart Nelson. “Remote plethysmographic imaging using ambient light.” Optics express 16.26 (2008): 21434-21445. |
Albiol, Alberto, et al. “Face recognition using HOG-EBGM.” Pattern Recognition Letters 29.10 (2008): 1537-1543. |
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on (vol. 1, pp. I-I). IEEE. |
Number | Date | Country | |
---|---|---|---|
20160004904 A1 | Jan 2016 | US |
Number | Date | Country | |
---|---|---|---|
62047508 | Sep 2014 | US | |
62082579 | Nov 2014 | US | |
62128974 | Mar 2015 | US | |
61352166 | Jun 2010 | US | |
61388002 | Sep 2010 | US | |
61414451 | Nov 2010 | US | |
61439913 | Feb 2011 | US | |
61447089 | Feb 2011 | US | |
61447464 | Feb 2011 | US | |
61467209 | Mar 2011 | US | |
61867007 | Aug 2013 | US | |
61924252 | Jan 2014 | US | |
61916190 | Dec 2013 | US | |
61927481 | Jan 2014 | US | |
61953878 | Mar 2014 | US | |
61972314 | Mar 2014 | US | |
62023800 | Jul 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13153745 | Jun 2011 | US |
Child | 14848222 | US | |
Parent | 14460915 | Aug 2014 | US |
Child | 13153745 | US | |
Parent | 13153745 | Jun 2011 | US |
Child | 14460915 | US |