The present disclosure relates to systems, methods and apparatuses for detecting muscle activity, and in particular, to systems, methods and apparatuses for detecting facial expression according to muscle activity.
In some known systems, online activities can use user facial expressions to perform actions for an online activity. For example, in some known systems, the systems may estimate a user's facial expressions so as to determine actions to perform within an online activity. Various algorithms can be used to analyze video feeds provided by some known systems (specifically, to perform facial recognition on frames of video feeds so as to estimate user facial expressions). Such algorithms, however, are less effective when a user engages in virtual reality (VR) activities. Specifically, VR hardware (such as VR helmets, headsets, and/or other apparatuses) can obscure portions of a user's face, making it difficult to detect a user's facial expressions while using the VR hardware.
Thus, a need exists for apparatuses, methods and systems that can accurately and efficiently detect user facial expressions even when the user's face is partially obscured.
Apparatuses, methods, and systems herein facilitate a rapid, efficient mechanism for facial expression detection according to electromyography (EMG) signals. In some implementations, apparatuses, methods and system herein can detect facial expressions according to EMG signals that can operate without significant latency on mobile devices (including but not limited to tablets, smartphones, and/or the like).
For example, in some implementations, systems, methods and apparatuses herein can detect facial expressions according to EMG signals that are obtained from one or more electrodes placed on a face of the user. In some implementations, the electrodes can be unipolar electrodes. The unipolar electrodes can be situated on a mask that contacts the face of the user, such that a number of locations on the upper face of the user are contacted by the unipolar electrodes.
In some implementations, the EMG signals can be preprocessed to remove noise. The noise removal can be common mode removal (i.e., in which interfering signals from one or more neighboring electrodes, and/or from the facemask itself, are removed). After preprocessing the EMG signals, apparatuses, methods and systems can be analyzed to determine roughness.
The EMG signals can also be normalized. Normalization can allow facial expressions to be categorized into one of a number of users. The categorization can subsequently be used to identify facial expressions of new users (e.g., by comparing EMG signals of new users to those categorized from previous users. In some implementations, determinant and non-determinant (e.g., probabilistic) classifiers can be used to classify EMG signals representing facial expressions.
In some implementations, a user state can be determined before classification of the signals is performed. For example, if the user is in a neutral state (i.e., a state in which the user has a neutral expression on his/her face), the structure of the EMG signals (even after normalization) is different from the signals from a non-neutral state (i.e., a state in which the user has a non-neutral expression on his or her face). Determining whether or not a user is in a neutral state can increase the accuracy of the user's EMG signal classification.
In some implementations, a number of classification methods may be performed as described herein, including but not limited to: a categorization classifier; discriminant analysis (including but not limited to LDA (linear discriminant analysis), QDA (quadratic discriminant analysis) and variations thereof such as sQDA (time series quadratic discriminant analysis)); Riemannian geometry; a linear classifier; a Naïve Bayes Classifier (including but not limited to Bayesian Network classifier); a k-nearest neighbor classifier; a RBF (radial basis function) classifier; and/or a neural network classifier, including but not limited to a Bagging classifier, a SVM (support vector machine) classifier, a NC (node classifier), a NCS (neural classifier system), SCRLDA (Shrunken Centroid Regularized Linear Discriminate and Analysis), a Random Forest, and/or a similar classifier, and/or a combination thereof. Optionally, after classification, the determination of the facial expression of the user is adapted according to one or more adaptation methods, using one or more adaptation methods (for example, by retraining the classifier on a specific expression of the user and/or applying a categorization (pattern matching) algorithm).
As used herein, the term “EMG” refers to “electromyography,” which measures the electrical impulses of muscles.
As used herein, the term “muscle capabilities” refers to the capability of a user to operate a plurality of muscles in coordination for some type of activity. A non-limiting example of such an activity is a facial expression. It should be noted that a deficit in a muscle capability is not necessarily due to damage to the muscles involved, but may be due to damage in any part of the physiological system required for muscles to be operated in coordination, including but not limited to, central or peripheral nervous system damage, or a combination thereof.
Motion capture data may optionally be obtained as described with regard to published US Patent Application No. US20100315524A1 to Gordon et al, published on Dec. 16, 2010, hereby incorporated by reference as if fully set forth herein. Other suitable methods for obtaining motion capture data may optionally be performed and could be selected by one of ordinary skill in the art.
US Patent Application No. 20070179396 describes a method for detecting facial muscle movements. The facial muscle movements are described as being detectable by using one or more of electroencephalograph (EEG) signals, electrooculograph (EOG) signals and electromyography (EMG) signals.
U.S. Pat. No. 7,554,549 describes a system and method for analyzing EMG (electromyography) signals from muscles on the face to determine a user's facial expression, but by using bipolar electrodes. Such expression determination is then used for computer animation.
U.S. Pat. No. 5,767,941A describes a servo-based eye tracking system that corrects the position of a moving object in a frame of reference with a light emitting diode tracking beam to detect the movements of a reference feature. The system uses tracking beam oscillations and phase sensitive detection of the reflectance variations to generate error signals used in the correction. A dithering device in the optical path of the tracking beam creates the oscillations. The system can be used to track and correct movement of the eye pupil, but it includes cumbersome hardware that cannot be used in the context of filming.
U.S. Pat. No. 4,856,891 describes an eye fundus tracking system that utilizes active servo tracking and correlation. The system includes a laser source that projects a tracking strip of coherent light on the fundus and optics for producing an image of reflected light from the tracking strip onto a detecting element. The system also includes a means for scanning the intensity profile of the image strip and electronics for analyzing the scanned intensity profile and for providing correction signals which direct the optical path of both the tracking laser beam and a diagnostic laser beam to a fixed position on the fundus. The system, however, is relatively complex to implement.
U.S. Publ. No. US20170352183A1 describes a head mounted display for tracking the face and the eyes of a user. The system builds a model of the user's face using landmarks and then, by lighting the facial features with light sources in the head mounted display and capturing images of facial frames using image sensors. Motion and position information about different facial frames is derived from pixel brightness. Animation information is then generated from an aggregation of the facial frames as mapped to the facial landmarks. They system combines the image capture of facial frames with eye tracking to produce facial animation information for a virtual avatar.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which user matter of this disclosure belongs. The materials, methods, and examples provided herein are illustrative only and not intended to be limiting.
Implementation of the apparatuses, methods and systems of the present disclosure involves performing or completing certain selected tasks or steps manually, automatically, or a combination thereof. Specifically, several selected steps can be implemented by hardware or by software on an operating system, of a firmware, and/or a combination thereof. For example, as hardware, selected steps of the invention can be implemented as a chip or a circuit. As software, selected steps of the invention can be implemented as a number of software instructions being executed by a computer (e.g., a processor of the computer) using an operating system. In any case, selected steps of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.
Although the present invention is described with regard to a “computer” on a “computer network,” it should be noted that any device featuring a data processor and the ability to execute one or more instructions may be described as a computer or as a computational device, including but not limited to a personal computer (PC), a server, a cellular telephone, an IP telephone, a smart phone, a PDA (personal digital assistant), a thin client, a mobile communication device, a smart watch, head mounted display or other wearable that is able to communicate externally, a virtual or cloud based processor, a pager, and/or a similar device. Two or more of such devices in communication with each other may be a “computer network.”
Embodiments herein are described, by way of example only, with reference to the accompanying drawings. It should be understood that the particulars shown in said drawings are by way of example and for purposes of illustrative discussion of some embodiments only.
Generally, each software component described herein can be assumed to be operated by a computational device (e.g., such as an electronic device including at least a memory and/or a processor, and/or the like).
EMG signals generated by the electrodes can then be processed by a signal processing abstraction layer 104 that can prepare the EMG signals for further analysis. Signal processing abstraction layer 104 can be implemented by a computational device (not shown). In some implementations, signal processing abstraction layer 104 can reduce or remove noise from the EMG signals, and/or can perform normalization and/or other processing in the EMG signals to increase the efficiency of EMG signal analysis. The processed EMG signals are also referred to herein as “EMG signal information.”
The processed EMG signals can then be classified by a classifier 108, e.g., according to the underlying muscle activity. In a non-limiting example, the underlying muscle activity can correspond to different facial expressions being made by the user. Other non-limiting examples of classification for the underlying muscle activity can include determining a range of capabilities for the underlying muscles of a user, where capabilities may not correspond to actual expressions being made at a time by the user. Determination of such a range may be used, for example, to determine whether a user is within a normal range of muscle capabilities or whether the user has a deficit in one or more muscle capabilities.
As a non-limiting example, a user can have a medical condition, such as a stroke or other type of brain injury. After a brain injury, the user may not be capable of a full range of facial expressions, and/or may not be capable of fully executing a facial expression. As non-limiting example, after having a stroke in which one hemisphere of the brain experiences more damage, the user may have a lopsided or crooked smile. Classifier 108 can use the processed EMG signals to determine that the user's smile is abnormal, and to further determine the nature of the abnormality (i.e., that the user is performing a lopsided smile) so as to classify the EMG signals even when the user is not performing a muscle activity in an expected manner.
As described in greater detail below, classifier 108 may operate according to a number of different classification protocols, such as: categorization classifiers; discriminant analysis (including but not limited to LDA (linear discriminant analysis), QDA (quadratic discriminant analysis) and variations thereof such as sQDA (time series quadratic discriminant analysis), and/or similar protocols); Riemannian geometry; any type of linear classifier; Naïve Bayes Classifier (including but not limited to Bayesian Network classifier); k-nearest neighbor classifier; RBF (radial basis function) classifier; neural network and/or machine learning classifiers including but not limited to Bagging classifier, SVM (support vector machine) classifier, NC (node classifier), NCS (neural classifier system), SCRLDA (Shrunken Centroid Regularized Linear Discriminate and Analysis), Random Forest; and/or some combination thereof.
The processed signals may also be used by a training system 106 for training classifier 108. Training system 106 can include a computational device (not shown) that implements and/or instantiates training software. For example, in some implementations, training system 106 can train classifier 108 before classifier 108 classifies an EMG signal. In other implementations, training system 106 can train classifier 108 while classifier 108 classifies facial expressions of the user, or a combination thereof. As described in greater detail below, training system 106, in some implementations, can train classifier 108 using known facial expressions and associated EMG signal information.
Training system 106 may also optionally reduce the number of facial expressions for classifier 108 to be trained on, for example to reduce the computational resources required for the operation of classifier 108 or for a particular purpose for the classification process and/or results. Training system 106 may optionally fuse or combine a plurality of facial expressions in order to reduce their overall number. Training system 106 may optionally also receive a predetermined set of facial expressions for training classifier 108 and may then optionally either train classifier 108 on the complete set or a sub-set thereof.
In some implementations, the set of electrodes 113 comprise a set of surface EMG electrodes that measure a voltage difference within the muscles of a user (the voltage difference being caused by a depolarization wave that travels along the surface of a muscle when the muscle flexes). The signals detected by the set of surface EMG electrodes 113 may be in the range of 5 mV and/or similar signal ranges. In some implementations, the set of surface EMG electrodes 113 can be aligned with an expected direction of an electrical impulse within a user's muscle(s), and/or can be aligned perpendicular to impulses that the user wishes to exclude from detection. In some implementations, the set of surface EMG electrodes 113 can be unipolar electrodes (e.g., that can collect EMG signals from a general area). Unipolar electrodes, in some implementations, can allow for more efficient facial expression classification, as the EMG signals collected by unipolar electrodes can be from a more general area of facial muscles, allowing for more generalized information about the user's muscle movement to be collected and analyzed. Optionally, EMG electrodes 113 can be bipolar electrodes. Also, optionally EMG electrodes 113 can be made of any suitable material, including but not limited to foam electrodes, flexible electrodes, direct contact electrodes, electrodes on foam and/or gel electrodes (in which a gel is applied to the skin before contact is made with the electrode) or non-gel electrodes.
In some implementations, the set of surface EMG electrodes 113 can include facemask electrodes 116a, 116b, and/or additional facemask electrodes, each of which can be operatively coupled to an electrode interface 112 through respective electrical conductors 114a, 114b and/or the like. Facemask electrodes 116 may be provided so as to receive EMG signals from muscles in a portion of the face, such as an upper portion of the face for example. In this implementation, facemask electrodes 116 are preferably located around and/or on the upper portion of the face, more preferably including but not limited to one or more of cheek, forehead and eye areas, most preferably on or around at least the cheek and forehead areas.
In some implementations, the set of surface EMG electrodes 113 can also include lower face electrodes 124a, 124b which can be operatively coupled to electrode interface 112 through respective electrical conductors 122a, 122b and/or the like. Lower face electrodes 124 can be positioned on and/or around the areas of the mouth, lower cheeks, chin, and/or the like of a user's face, in some implementations, lower face electrodes 124 can be similar to facemask electrodes 116, and/or can be included in a wearable device as described in greater detail below. In other implementations, the set of surface EMG electrodes 113 may not include lower face electrodes 124. In some implementations, the set of surface EMG electrodes 113 can also include a ground or reference electrode 120 that can be operatively coupled to the electrode interface 112, e.g., through an electrical conductor 118.
In some implementations, EMG signal processor 109 and EMG signal processing database 111 can be located in a separate apparatus or device from the remaining components shown in
For this non-limiting example, 8 electrodes are shown in different locations. The number and/or location of the surface EMG electrodes 113 can be configured according to the electrode plate of an electrode holder of a facemask apparatus, according to at least some embodiments. Electrode 1 may correspond to electrode 304a of
Facemask apparatus 502, in some implementations, can include facemask circuitry 520, which can be operatively coupled to a local board 522. The facemask connector 524 can also be operatively coupled to a first local board connector 526. Local board 522 can be operatively coupled to bus 506 through a second local board connector 528. In some implementations, the facemask circuitry 520 can include a number of electrodes 530. Electrodes 530 can include or correspond to surface EMG electrodes 113 in
The output of electrodes 530 can, in some implementations, be delivered to local board 522, which can include an ADS (analog to digital signal converter) 532 for converting the analog output of electrodes 530 to a digital signal. ADS 532 may can be a 24 bit ADS.
In some implementations, the digital signal can then be transmitted from local board 522 through second local board connector 528, and then through bus 506 to main board 504. Local board 522 could also support connection of additional electrodes to measure ECG, EEG or other biological signals (not shown).
Main board 504, in some implementations, can include a first main board connector 540 for receiving the digital signal from bus 506. The digital signal can then be sent from the first main board connector 540 to a microcontroller 542. Microcontroller 542 can receive the digital EMG signals, process the digital EMG signals and/or initiate other components of the main board 504 to process the digital EMG signals, and/or can otherwise control the functions of main board 504. In some implementations, microcontroller 542 can collect recorded data, can synchronize and encapsulate data packets, and can communicate the recorded data to a remote computer (not shown) through some type of communication channel, e.g., via a USB or Bluetooth connection. The preferred amount of memory is at least enough for performing the amount of required processing, which in turn also depends on the speed of the communication bus and the amount of processing being performed by other components.
In some implementations, the main board 504 can also include a GPIO (general purpose input/output) ADC connector 544 operatively coupled to the microcontroller 542. The GPIO and ADC connector 544 can allow the extension of the device with external TTL (transistor-transistor logic signal) triggers for synchronization and the acquisition of external analog inputs for either data acquisition, or gain control on signals received, such as a potentiometer. In some implementations, the main board 504 can also include a Bluetooth module 546 that can communicate wirelessly with the host system. In some implementations, the Bluetooth module 546 can be operatively coupled to the host system through the UART port (not shown) of microcontroller 542. In some implementations, the main board 504 can also include a micro-USB connector 548 that can act as a main communication port for the main board 504, and which can be operatively coupled to the UART port of the microcontroller. The micro-USB connector 548 can facilitate communication between the main board 504 and the host computer. In some implementations, the micro-USB connector 548 can also be used to update firmware stored and/or implemented on the main board 504. In some implementations, the main board can also include a second main board connector 550 that can be operatively coupled to an additional bus of the microcontroller 542, so as to allow additional extension modules and different sensors to be connected to the microcontroller 542. Microcontroller 542 can then encapsulate and synchronize those external sensors with the EMG signal acquisition. Such extension modules can include, but are not limited to, heart beat sensors, temperature sensors, or galvanic skin response sensors.
In some implementations, multiple power connectors 552 of the main board 504 can provide power and/or power-related connections for the main board 504. A power switch 554 can be operatively coupled to the main board 504 through one of several power connectors 552. Power switch 554 can also, in some implementations, control a status light 556 that can be lit to indicate that the main board 504 is receiving power. A power source 558, such as a battery, can be operatively coupled to a power management component 560, e.g., via another power connector 552. In some implementations, the power management component 560 can communicate with microcontroller 542.
In stage 2, the EMG signals can, in some implementations, be preprocessed to reduce or remove noise from the EMG signs. Preprocessing may also include normalization and/or other types of preprocessing to increase the efficiency and/or efficacy of the classification process, as described in greater detail below in the discussion of
In stage 3, the preprocessed EMG signals can be classified using the classifier 108, of the classifier 108 can classify the preprocessed EMG signals using a number of different classification protocols as discussed above with respect to
As described below in more detail,
As described below in more detail,
Turning back to stage 3, the classifier 108, in some implementations, can classify the preprocessed EMG signals to identify facial expressions being made by the user, and/or to otherwise classify the detected underlying muscle activity as described in the discussion of
In stage 4, the classifier 108 can, in some implementations, determine a facial expression of the user based on the classification made by the classifier 108.
With respect to
xi(log)=log(xi) (8)
Other than log-normalization, other types of normalization for the signals include whitening. Both whitening and log-normalization support normalization of the signal so that variations in the electrode signals across subjects are reduced or eliminated, for example due to different mask positioning, user's head size, environment or hardware. Whitening involves enforcing the covariance matrix of the feature signal to identity matrix. In an ideal situation, the covariance matrix computed over the roughness samples extracted from a Neutral phase should be an identity matrix, such that all electrodes should have a same variance and there should be no covariance between two electrodes. A transformation matrix may be applied to enforce that an identity matrix is used.
Given the covariance matrix C computed from the neutral samples, whitening consists of applying the transformation matrix T which, when applied to the samples, will enforce the covariance matrix to identity. T is given as the (matrix) square root of the inverse of C.
Stage 2 features the normalization of the variance of the signal for each electrode 113. In stage 2, signal processing abstraction layer 104 can reduce and/or remove noise from the digital EMG signal. Noise removal, in some implementations, includes common mode removal. When multiple electrodes are used during an experiment, the recorded signal of all the electrodes can be aggregated into a single signal of interest, which may have additional noise caused by interactions between the electrodes 113 (e.g., such as power line interference):
xi,e(raw)=xi,e(rcm)+ξi (1)
(2)
In the above equation, ξi can be a noise signal that may contaminate the recorded EMG signals on all the electrodes. To clean the signal, a common mode removal method may be used, an example of which is defined as follows:
In stage 3, the covariance is calculated for each electrode, in some implementations, across a plurality of users. For example, in stage 3, the classifier 108 can analyze the cleaned signal is, in some implementations, for one or more features. For example, the classifier 108 can determine the roughness of the cleaned signal.
The roughness can be used to determine a feature that may be used to classify facial expressions. For example, the roughness of the cleaned EMG signal can indicate the amount of high frequency content in the clean signal xi,e(rcm) and is defined as the filtered, squared double derivative of the cleaned EMG signal. For example, to filter the cleaned EMG signal, the classifier 108 can calculate a moving average of the EMG signal based on time windows of ΔT The roughness ri,e of the cleaned EMG signals from each electrode 113 can then be computed independently such that, for a given electrode e, the following function calculates the roughness of the EMG signals derived from that electrode:
Δxi,e=(xi,e(rcm)−xi-1,e(rcm)) (5)
Δ2xi,e=xi=2,e(rcm)−2xi-1,e(rcm)+xi,e(rcm) (6)
Optionally, in addition to or in place of roughness, exponential averaging (or exponential smoothing) can be used. Exponential averaging may be used to reduce noise in real-time data. It uses a recursive low pass filter as described by the following difference equation:
y(n)=α*x(n)+(1−α*y(n−1)
Exponential averaging in some situations may provide a smoother frequency response, and since it is implemented with an IIR filter it requires less processing power (and memory).
Stages 2 and 3 can therefore process the EMG signals so as to be more efficiently classified using classifiers such as LDA and QDA methods, and their variants such as sQDA. The computation of the covariance in stage 3 is especially important for training discriminant classifiers such as QDA. However, stages 2 and 3 are less critical for classifiers such as Riemannian geometry. The computation of the covariance in stage 3 can also be used for running classifiers based upon Riemannian geometry.
In stage 4, the classifier 108 can also normalize the EMG signal. Normalization may optionally be performed as described in greater detail below with regard to
In stage 1, the log normal of the signal is optionally calculated. The inventors have found, surprisingly, that when the face of a subject has a neutral expression, the roughness diverges less from a multivariate Gaussian distribution, than when the subject has a non-neutral expression. However, when the face of a subject is not neutral and is exhibiting a non-neutral expression, the roughness diverges even more from a multivariate Gaussian distribution. In fact, it is well described by a multivariate log-normal distribution. However, many, if not all, classification methods (especially the most computationally efficient ones) expect the features to be analyzed to follow a multivariate Gaussian distribution. To overcome this problem, one can simply compute the log of the roughness before applying any classification algorithms:
xi(log)=log(xi) (8)
Stage 2 features the normalization of the variance of the signal for each electrode. In stage 3, the covariance is calculated for each electrode, preferably across a plurality of subjects.
Since the roughness of an EMG signal can be a filter, the roughness can contain one free parameter that can be fixed a priori (e.g., such as a time window ΔT over which the roughness is computed). This free parameter (also referred to herein as a meta-parameter), in some implementations, can have a value of 100 milliseconds. In this manner, the meta-parameter can be used to improve the efficiency and accuracy of the classification of the EMG signal.
With respect to
In stage 1, the state of the user can be determined, in particular with regard to whether the face of the user has a neutral expression or a non-neutral expression. The data is therefore, in some implementations, analyzed to determine whether the face of the user is in a neutral expression state or a non-neutral expression state. Before facial expression determination begins, the user can be asked to maintain a deliberately neutral expression, which is then analyzed. Alternatively, the signal processing abstraction layer 104 can determine the presence of a neutral or non-neutral expression without this additional information, through a type of pre-training calibration.
The determination of a neutral or non-neutral expression can be performed based on a determination that the roughness of EMG signals from a neutral facial expression can follow a multivariate Gaussian distribution. Thus, by performing this process, the signal processing abstraction layer 104 can detect the presence or absence of an expression before the classification occurs.
Assume that in the absence of expression, the roughness r is distributed according to a multivariate Gaussian distribution (possibly after log transformation):
r˜N(μ0,Σ0)
Neutral parameters can be estimated from the recordings using sample mean and sample covariance. Training to achieve these estimations is described with regard to
At each time-step, the signal processing abstraction layer 104 can compute the chi-squared distribution (i.e. the multi-variate Z-score):
zi=(ri−μ0)TΣ0−1(ri−μ0)
If zi>zthreshold, then the signal processing abstraction layer 104 can determine that the calculated roughness significantly differ from that which would be expected if the user's facial muscles were in a neutral state (i.e., that the calculated roughness does not follow a multivariable Gaussian distribution). This determination can inform the signal processing abstraction layer 104 that an expression was detected for the user and can trigger the signal processing abstraction layer 104 to send the roughness value to the classifier 108, such that the classifier 108 can classify the data using one of the classifiers.
If zi<=zthreshold, then the signal processing abstraction layer 104 can determine that the calculated roughness follows a multivariable Gaussian distribution, and can therefore determine that the user's expression is neutral.
In some implementations, the threshold zthreshold can be set to a value given in a chi-squared table for p-degree of liberty and an α=0.001, and/or to a similar value. In some implementations, this process can improve the accuracy at which neutral states are detected and can increase an efficiency of the system in classifying facial expressions and/or other information from the user.
In stage 2, if the signal processing abstraction layer 104 determines that the user made a non-neutral facial expression, discriminant analysis can be performed on the data to classify the EMG signals from the electrodes 113. Such discriminant analysis may include LDA analysis, QDA analysis, variations such as sQDA, and/or the like.
In a non-limiting example, using a QDA analysis, the classifier can perform the following:
In the linear and quadratic discriminant framework, data xk from a given class k is assumed to come from multivariate Gaussian distribution with mean μk and covariance Σk. Formally one can derive the QDA starting from probability theory.
Assume p(x|k) follows a multivariate Gaussian distribution:
with class prior distribution πk
and unconditional probability distribution:
Then applying Bayes rule, the posterior distribution is given by:
p(k|x)∝πkp(x|k) (5)
Description of QDA
The goal of the QDA is to find the class k that maximizes the posterior distribution p(k|x) defined by Eq. 5 for a data point xi.
{circumflex over (k)}i=argmaxkp(k|xi) (6)
In other words, for a data point xi QDA describes the most probable probability distribution p(k|x) from which the data point is obtained, under the assumption that the data are normally distributed.
Equation 6 can be reformulated to explicitly show why this classifier may be referred to as a quadratic discriminant analysis, in terms of its log-posterior log (πkp(xi|k)), also called log-likelihood.
Posterior:
The posterior Gaussian distribution is given by:
Log-Posterior:
Taking the log of the posterior does not change the location of its maximum (since the log-function is monotonic), so the Log-Posterior is:
QDA Discriminant Function
Since the class k that maximizes Eq. 9 for a data point xi is of interest, it is possible to discard the terms that are not class-dependent (i.e., log (2π)) and for readability multiply by −2, thereby producing the discriminant function given by:
dk(qda)(xi)=(xi−μk)TΣk−1(xi−μk)+log(|Σk|)−2 log(πk) (10)
In equation 10, it is possible to see that the discriminant function of the QDA is quadratic in x, and to therefore define quadratic boundaries between classes. The classification problem stated in Eq. 6 can be rewritten as:
{circumflex over (k)}=argminkdk(qda)(xi) (11)
LDA
In the LDA method, there is an additional assumption on the class covariance of the data, such that all of the covariance matrices Σk of each class are supposed to be equal, and classes only differ by their mean μk:
Σk=Σ,∀k∈{1, . . . ,K} (12)
Replacing a by Σk and Σ dropping all the terms that are not class-dependent in Eq. 10, the discriminant function of the LDA dk(lda)(xi) is obtained:
dk(lda)(xi)=2μkTΣ−1xi−μkTΣ−1μk−2 log(πk) (13)
QDA for a Sequence of Data Points
In the previous section, the standard QDA and LDA were derived from probability theory. In some implementations, QDA classifies data point by point; however, in other implementations, the classifier can classify a plurality of n data points at once. In other words, the classifier can determine from which probability distribution the sequence z has been generated. It is a naive generalization of the QDA for time series. This generalization can enable determination of (i) if it performs better than the standard QDA on EMG signal data and (ii) how it compares to the Riemann classifier described with regard to
Assuming that a plurality of N data points is received, characterized as:
{xi, . . . ,xi+N}
then according to Equation 5 one can compute the probability of that sequence to have been generated by the class k, simply by taking the product of the probability of each data point:
As before, to determine the location of the maximum value, it is possible to take the log of the posterior, or the log-likelihood of the time-series:
Plugging Eq. 1, the log-likelihood L({tilde over (x)}|k) of the data is given by:
As for the standard QDA, dropping the terms that are not class-dependent and multiplying by −2 gives use the new discriminant function
dk(sQDA)({tilde over (x)})
of the sequential QDA (sQDA) as follows:
Finally, the decision boundaries between classes leads to the possibility of rewriting the classification problem stated in Eq. 6 as:
{circumflex over (k)}=argminkdk(sQDA)({tilde over (x)}) (23)
Links Between QDA and Time-Series sQDA
In some implementations of the QDA, each data point can be classified according to Eq. 11. Then, to average out transient responses so as to provide a general classification (rather than generating a separate output at each time-step), a majority voting strategy may be used to define output labels every N-time-step.
In the majority voting framework, the output label
{circumflex over ({tilde over (k)})}
can be defined as the one with the most occurrences during the N last time-step. Mathematically it can be defined as:
For equation 24,f is equal to one when the two arguments are the same and zero otherwise.
In the case of the sQDA, the output label
{circumflex over ({tilde over (k)})}
can be computed according to Equation 22. The two approaches can thus differ in the way they each handle the time-series. Specifically, in the case of the QDA, the time-series can be handled by a majority vote over the last N time samples, whereas for the sQDA, the time-series can be handled by cleanly aggregating probabilities overtime.
Regularized LDA (RDA)
The regularized discriminant analysis classifier is a combination of the LDA and the QDA.
The RDA classifier is useful when we have a limited number of samples per class compared to the number of features (the number of channels in our case) or when class-covariance matrices are singular or close to singularity, i.e., matrix condition number is large.
In this situation, the RDA can reduce the complexity by weighting the class-covariance matrices of the QDA towards the LDA global covariance matrix.
This regularization is done by computing new class-covariance matrices:
{circumflex over (Σ)}k(λ)=(1−λ)Σk+λΣ
The Σk matrix denotes the class-covariance matrices (QDA) and Σ is the averaged (LDA) covariance matrix. Lambda (0≤λ≤1) is the regularization parameter.
A second shrinkage parameter γ may also be used. This shrinkage parameter forces the eigenvalues of the class-covariance matrices toward equality.
where γ (0≤γ≤1) is the regularization parameter, d is the number of channels and tr denotes the trace of the matrix.
Comparison of the QDA and sQDA Classifiers
For example,
From
Third, the confusion matrices
Finally, the models do not perform equivalently on the neutral state (data not shown). In particular, both the sQDA and the QDA methods encounter difficulties staying in the neutral state in between forced (directed) non-neutral expressions. To counterbalance this issue, determining the state of the subject's expression, as neutral or non-neutral, may optionally be performed as described with regard to stage 1.
Turning back to
The Riemannian Framework
Riemann geometry takes advantage of the particular structure of covariance matrices to define distances that can be useful in classifying facial expressions. Mathematically, the Riemannian distance as a way to classify covariance matrices may be described as follows:
Covariance matrices have some special structure that can be seen as constraints in an optimization framework.
Covariance matrices are semi-positive definite matrices (SPD).
Since covariance can be SPD, the distance between two covariance matrices may not be measurable by Euclidean distance, since Euclidean distance may not take into account the special form of the covariance matrix.
To measure the distance between covariance matrices, one has to use the Riemannian distance δr given by:
where
∥ . . . ∥F
is the Froebenius norm and where
λc,c=1, . . . ,C
are the real eigenvalues of
Σ1−1/2Σ2Σ1−1/2
then the mean covariance matrix Ek over a set of I covariance matrices may not be computed as the Euclidean mean, but instead can be calculated as the covariance matrix that minimizes the sum squared Riemannian distance over the set:
Note that the mean covariance Σk computed on a set of I covariance matrices, each of them estimated using t milliseconds of data, may not be equivalent to the covariance estimated on the full data set of size tI. In fact, the covariance estimated on the full data set may be more related to the Euclidean mean of the covariance set.
Calculating the Riemannian Classifier, rCOV
To implement the Riemennian calculations described above as a classifier, the classifier 108 can:
Select the size of the data used to estimate a covariance matrix.
For each class k, compute the set of covariance matrices of the data set.
The class covariance matrix Σk is the Riemannian mean over the set of covariances estimated before.
A new data point, in fact a new sampled covariance matrix Σi, is assigned to the closest class:
{circumflex over (k)}(i)=argminkδr(Σk,Σi)
Relationship Between sQDA and rCov Classifiers
First, the sQDA discriminant distance can be compared to the Riemannian distance. As explained before in the sQDA framework, the discriminant distance between a new data point xi and a reference class k is given by Eq. 22, and can be the sum of the negative log-likelihood. Conversely, in the Riemannian classifier, the classification can be based on the distance given by Eq. 26. To verify the existence of conceptual links between these different methods, and to be able to bridge the gap between sQDA and rCOV,
Comparison of QDA, sQDA and rCOV Classifiers
To see how each of the QDA, rCOV, and the sQDA methods perform, accuracy of each of these classifiers for different EMG data sets taken from electrodes in contact with the face are presented in Table 1.
Table 1 shows the classification accuracy of each model for 11 subjects (mean and standard deviation of performance across subjects). Note that for sQDA and rCOV, one label is computed using the last 100 ms of data, and featuring an optional 75% overlap (i.e. one output label every 25 ms).
When the previously described stage 1 model of distinguishing between neutral and non-neutral expressions is used, the stability in the neutral state increases for all the models, and overall performance increases (compare columns 2 and 4 in Table 1). However, different versions of this model show similar results across different classifier methods in
Turning back to
Turning now to
In some implementations, adaptation can be used during training, with both neutral and non-neutral expressions. However, after training, the neutral expression (the neutral state) may be used for adaptation. For example, if the classifier employs QDA or a variant thereof, adaptation may reuse what was classified before as neutral, to retrain the parameters of the neutral classes. Next, the process may re-estimate the covariance and mean of neutral for adaptation, as this may deviate from the mean that was assumed by global classifier. In some implementations, only a non-neutral expression is used, such as a smile or an angry expression, for example. In that case, a similar process would be followed with one or more non-neutral expressions.
In the non-limiting example shown in
In stage 1, a plurality of test user classifications from a plurality of different users are categorized into various categories or “buckets.” Each category, in some implementations, represents a pattern of a plurality of sets of EMG signals that correspond to a plurality of expressions. In some implementations, data is obtained from a sufficient number of users such that a sufficient number of categories are obtained to permit optional independent classification of a new user's facial expressions according to the categories.
In stage 2, test user classification variability is, in some implementations, normalized for each category. In some implementations, such normalization is performed for a sufficient number of test users such that classification patterns can be compared according to covariance. The variability is, in some implementations, normalized for each set of EMG signals corresponding to each of the plurality of expressions. Therefore, when comparing EMG signals from a new user to each category, an appropriate category may be selected based upon comparison of EMG signals of at least one expression to the corresponding EMG signals for that expression in the category, in some implementations, according to a comparison of the covariance. In some implementations, the neutral expression may be used for this comparison, such that a new user may be asked to assume a neutral expression to determine which category that user's expressions are likely to fall into.
In stage 3, the process of classification can be initialized on at least one actual user expression, displayed by the face of the user who is to have his or her facial expressions classified. As described above, in some implementations, the neutral expression may be used for this comparison, such that the actual user is asked to show the neutral expression on his or her face. The user may be asked to relax his or her face, for example, so as to achieve the neutral expression or state. In some implementations, a plurality of expressions may be used for such initialization, such as a plurality of non-neutral expressions, or a plurality of expressions including the neutral expression and at least one non-neutral expression.
If the process described with regard to this drawing is being used in conjunction with at least one other classification method, optionally for example such another classification method as described with regard to
In stage 4, a similar user expression category is determined by comparison of the covariances for at least one expression, and a plurality of expressions, after normalization of the variances as previously described. The most similar user expression category is, in some implementations, selected. If the similarity does not at least meet a certain threshold, the process may stop as the user's data may be considered to be an outlier (not shown).
In stage 5, the final user expression category is selected, also according to feedback from performing the process described in this drawing more than once (not shown) or alternatively also from feedback from another source, such as the previous performance of another classification method.
Data collection may be performed as follows. A user is equipped with the previously described facemask to be worn such that the electrodes are in contact with a plurality of facial muscles. The user is asked to perform a set of K expression with precise timing. When is doing this task, the electrodes' activities are recorded as well as the triggers. The trigger clearly encodes the precise timing at which the user is asked to performed a given expression. The trigger is then used to segment data. At the end of the calibration protocol, the trigger time series trig, and the raw electrodes' activities xi(raw) are ready to be used to calibrate the classifier.
In stage 2, a machine learning classifier is constructed for training, for example, according to any suitable classification method described herein. In stage 3, the classifier is trained. The obtained data is, in some implementations, prepared as described with regard to the preprocessing stage as shown for example in
Therefore, first only the segment of the data is considered where the users were explicitly asked to stay in the neutral state xi, i∈ neutral. This subset of the data Xneutral is well described by a multivariate Gaussian distribution
Xneutral˜N({right arrow over (μ)}neutral,Σneutral).
The mean vector {right arrow over (μ)}neutral and the covariance matrix Σneutral can be computed as the sample-mean and sample-covariance:
Once the parameters have been estimated, it is possible to define a statistical test that tells if a data point xi is significantly different from this distribution, i.e. to detect when a non-neutral expression is performed by the face of the user.
When the roughness distribution statistically diverges from the neutral distribution, the signal processing abstraction layer 104 can determine that a non-neutral expression is being made by the face of the user. To estimate if the sampled roughness xi statistically diverges from the neutral state, the signal processing abstraction layer 104 can use the Pearson's chi-squared test given by:
For the above equation, note that the state description is shortened to “neutral” for a neutral expression and “expression” for a non-neutral expression, for the sake of brevity.
In the above equation, zth is a threshold value that defines how much the roughness should differ from the neutral expression before triggering detection of a non-neutral expression. The exact value of this threshold depends on the dimension of the features (i.e. the number of electrodes) and the significance of the deviation α. As a non-limiting example, according to the χ2 table for 8 electrodes and a desired a-value of 0.001, for example, zth must be set to 26.13.
In practice but as an example only and without wishing to be limited by a single hypothesis, to limit the number of false positives and so to stabilize the neutral state, a value of zth=50 has been found by the present inventors to give good results. Note that a zth of 50 corresponds to a probability α-value of ≈1e−7, which is, in other words, a larger probability p(xi≠neutral|zi)=0.99999995 of having an expression at this time step.
To adjust the threshold for the state detection, the standard χ2 table is used for 8 degrees of freedom in this example, corresponding to the 8 electrodes in this example non-limiting implementation. Alternatively given a probability threshold, one can use the following Octave/matlab code to set zth:
In some implementations, in stage 4, the plurality of facial expressions is reduced to a set which can be more easily distinguished. For example, the present inventors initially started with a set of 25 expressions, but now use 5 expressions according to at least some embodiments of the present invention. The determination of which expressions to fuse may be performed by comparing their respective covariance matrices. If these matrices are more similar than a threshold similarity, then the expressions may be fused rather than being trained separately. In some implementations, the threshold similarity is set such that classification of a new user's expressions may be performed with retraining. Additionally or alternatively, the threshold similarity may be set according to the application of the expression identification, for example for online social interactions. Therefore, expressions which are less required for such an application, such as a “squint” (in case of difficulty seeing), may be dropped as potentially being confused with other expressions.
Once the subset of data where non-neutral expression occurs is defined, as is the list of expressions to be classified, it is straightforward to extract the subset of data coming from a given expression. The trigger vector contains all theoretical labels. By combining these labels with the estimated state, one can extract what is called the ground-truth label yi, which takes discrete values corresponding to each expression.
yi∈{1, . . . ,K} (12)
where K is the total number of expressions that are to be classified.
In stage 5, the results are compared between the classification and the actual expressions. If sufficient training has occurred, then the process moves to stage 6. Otherwise, it returns to stages 3 and 4, which are optionally repeated as necessary until sufficient training has occurred. In stage 6, the training process ends and the final classifier is produced.
Turning now to
Stress sensor 1106 may optionally include a galvanic skin monitor, to monitor sweat on the skin of the face which may be used as a proxy for stress. Temperature sensor 1108, in some implementations, measures the temperature of the skin of the face. Pulse oximeter sensor 1110 may optionally be used to measure oxygen concentration in the blood of the skin of the face.
Stress sensor 1106 is, in some implementations, connected to a local stress board 1112, including a galvanic skin response module 1114 and a stress board connector 1116. The measurements from stress sensor 1106 are, in some implementations, processed into a measurement of galvanic skin response by galvanic skin response module 1114. Stress board connector 1116 in turn is in communication with a bus 1118. Bus 1118 is in communication with a main board 1120 (see
Temperature sensor 1108 and pulse oximeter sensor 1110 are, in some implementations, connected to a local pulse oximeter board 1122, which includes a pulse oximeter module 1124 and a pulse oximeter board connector 1126. Pulse oximeter module 1124, in some implementations, processes the measurements from pulse oximeter sensor 1110 into a measurement of blood oxygen level. Pulse oximeter module 1124 also, in some implementations, processes the measurements from temperature sensor 1108 into a measurement of skin temperature. Pulse oximeter board connector 1126 in turn is in communication with bus 1118. A facemask apparatus connector 1128 on facemask apparatus 1102 is coupled to a local board (not shown), which in turn is in communication with main board 1120 in a similar arrangement to that shown in
EMG processing 1212 begins with input raw EMG data from a raw EMG 1218, such as for example from EMG signal acquisition apparatus 102 or any facemask implementation as described herein (not shown). Raw EMG 1218 may for example include 8 channels of data (one for each electrode), provided as 16 bits @2000 Hz. Next, EMG processing 1212 processes the raw EMG data to yield eye motion detection in an eye movements process 1220. In addition, EMG processing 1212 determines a blink detection process 1222, to detect blinking. EMG processing 1212 also performs a facial expression recognition process 1224, to detect the facial expression of the subject. All three processes are described in greater detail with regard to a non-limiting implementation in
Audio processing 1214 begins with input raw audio data from a raw audio 1226, for example from a microphone or any type of audio data collection device. Raw audio 1226 may for example include mono, 16 bits, @44100 Hz data.
Raw audio 1226 then feeds into a phoneme classification process 1228 and a voice activity detection process 1230. Both processes are described in greater detail with regard to a non-limiting implementation in
A non-limiting implementation of gating/logic 1216 is described with regard to
Raw EMG 1218 is received by EMG processing 1212 to begin the process. Preprocessing 1300 preferably preprocesses the data. Optionally, preprocessing 1300 may begin with a notch process to remove electrical power line interference, such as for example 50 Hz or 60 Hz, plus its harmonics.
Next a bandpass filter is optionally applied, to remove noise. As described in greater detail below, the bandpass filter preferably comprises a low pass filter between 0.5 and 40 Hz. EMG data are noisy, can exhibit subject-to-subject variability, can exhibit device-to device variability and, at least in some cases, the informative frequency band is/are not known.
These properties affect the facemask performances in different ways. It is likely that not all of the frequencies carry useful information. It is highly probable that some frequency bands carry only noise. This noise can be problematic for analysis, for example by altering the performance of the facemask.
As an example, imagine a recording where each electrode is contaminated differently by 50 Hz noise, so that even after common average removal (described in greater detail below), there is still noise in the recordings. This noise is environmental, so that one can assume that all data recorded in the same room will have the same noise content. Now if a global classifier is computed using these data, it will probably give good performances when tested in the same environment. However, if tested it elsewhere, the classifier may not give a good performance.
To tackle this problem, one can simply filter the EMG data. However, to do it efficiently, one has to define which frequency band contains useful information. As previously described, the facial expression classification algorithm uses a unique feature: the roughness. The roughness is defined as the filtered (with a moving average) squared second derivative of the input. So it is a non-linear transform of the (preprocessed) EMG data, which means it is difficult to determine to which frequency the roughness is sensitive.
Various experiments were performed (not shown) to determine the frequency or frequency range to which roughness is sensitive. These experiments showed that while roughness has sensitivity in all the frequency bands, it is non-linearly more sensitive to higher frequencies than lower ones. Lower frequency bands contain more information for roughness. Roughness also enhances high-frequency content. Optionally, the sampling rate may create artifacts on the roughness. For example, high frequency content (>˜900 Hz) was found to be represented in the 0-200 Hz domains.
After further testing (not shown), it was found that a low pass filter improved the performance of the analysis, due to a good effect on roughness. The optimal cutoff frequency of the low pass filter was found to be between 0.5 and 40 Hz.
After the bandpass filter is applied, optionally CAR (common average removal) is performed, as for the previously described common mode removal.
The preprocessed data then moves to the three processes of eye movements process 1220 (blue), blink detection process 1222 (green) and facial expression recognition process 1224 (red). Starting with facial expression recognition process 1224, the data first undergoes a feature extraction process 1302, as the start of the real time or “online” process. Feature extraction process 1302 includes determination of roughness as previously described, optionally followed by variance normalization and log normalization also as previously described. Next a classification process 1304 is performed to classify the facial expression, for example by using sQDA as previously described.
Next, a post-classification process 1306 is optionally performed, preferably to perform label filtering, for example according to majority voting, and/or evidence accumulation, also known as serial classification. The idea of majority voting consists in counting the occurrence of each class within a given time window and to return the most frequent label. Serial classification selects the label that has the highest joint probability over a given time window. That is, the output of the serial classification is the class for which the product of the posterior conditional probabilities (or sum of the log-posterior conditional probabilities) over a given time window is the highest. Testing demonstrated that both majority voting and serial classification effectively smoothed the output labels, producing a stable result (data not shown), and may optionally be applied whether singly or as a combination.
An offline training process is preferably performed before the real time classification process is performed, such that the results of the training process may inform the real time classification process. The offline training process preferably includes a segmentation 1308 and a classifier computation 1310.
Segmentation 1308 optionally includes the following steps:
The Chi2-test on the neutral expression is performed to create a detector for the neutral expression. As previously described, separation of neutral and non-neutral expressions may optionally be performed to increase the performance accuracy of the classifier. Next the Kartoffeln Filter is applied to determine outliers. If an expression is determined to be non-neutral, as in step 3, then the segmentation window needs to be longer than the expression to capture it fully. Other statistical tests may optionally be used, to determine the difference between neutral and non-neutral expressions for segmentation. Outliers are then removed from this segmentation as well.
The Kartoffeln filter may optionally be performed as follows. Assume a P-dimensional variable x that follows a P-dimensional Gaussian distribution:
x˜N(μ,Σ)
with μ its P-dimensional mean and Σ its covariance matrix. For any P-dimensional data point rt at time step t, one can compute the probability that it comes from the aforementioned P-dimensional Gaussian distribution. To do so one can use the generalization of the standard z-score in P-dimension, called χ2-score given by:
zt=(rt−μ)TΣ−1(rt−μ)
This score represents the distance between the actual data point rt and the mean μ of the reference Normal distribution in unit of the covariance matrix Σ.
Using zt, one can easily test the probability that a given point rt comes from a reference normal distribution parametrized by μ and Σ simply by looking at a χ(α,df)2 distribution table with the correct degree of freedom df and probability α.
Thus by thresholding the time series z with a threshold, χ(α
The outlier filtering process (i.e. also known as the Kartoffeln filter) is simply an iterative application of the aforementioned thresholding method. Assume one has data points r where r∈PxT with P=8 the dimension (i.e. the number of electrodes) and T the total number of data points in the data set.
Classifier computation 1310 is used to train the classifier and construct its parameters as described herein.
Turning now to eye movements process 1220, a feature extraction 1312 is performed, optionally as described with regard to Toivanen et al (“A probabilistic real-time algorithm for detecting blinks, saccades, and fixations from EOG data”, Journal of Eye Movement Research, 8(2):1,1-14). The process detects eye movements (EOG) from the EMG data, to automatically detect blink, saccade, and fixation events. A saccade is a rapid movement of the eye between fixation points. A fixation event is the fixation of the eye upon a fixation point.
This process optionally includes the following steps (for 1-3, the order is not restricted):
Horizontal bipole and vertical bipole are determined as they relate to the velocity of the eye movements. These signals are then optionally subjected to at least a low pass bandpass filter but may optionally also be subject to a high pass bandpass filter. The signals are then optionally log normalized.
Feature extraction preferably at least includes determination of two features. A first feature, denoted as Dn, is the norm of the derivative of the filtered horizontal and vertical EOG signals:
where H and V denote the horizontal and vertical components of the EOG signal. This feature is useful in separating fixations from blinks and saccades.
The second feature, denoted as Dv, is used for separating blinks from saccades. With the positive electrode for the vertical EOG located above the eye (signal level increases when the eyelid closes), the feature is defined as:
Dv=max−min−|max+min|.
Both features may optionally be used for both eye movements process 1220 and blink detection process 1222, which may optionally be performed concurrently.
Next, turning back to eye movements process 1220, a movement reconstruction process 1314 is performed. As previously noted, the vertical and horizontal bipole signals relate to the eye movement velocity. Both bipole signals are integrated to determine the position of the eye. Optionally damping is added for automatic centering.
Next post-processing 1316 is performed, optionally featuring filtering for smoothness and rescaling. Rescaling may optionally be made to fit the points from −1 to 1.
Blink detection process 1222 begins with feature extraction 1318, which may optionally be performed as previously described for feature extraction 1312. Next, a classification 1320 is optionally be performed, for example by using a GMM (Gaussian mixture model) classifier. GMM classifiers are known in the art; for example, Lotte et al describe the use of a GMM for classifying EEG data (“A review of classification algorithms for EEG-based brain-computer interfaces”, Journal of Neural Engineering 4(2)•July 2007). A post-classification process 1322 may optionally be performed for label filtering, for example according to evidence accumulation as previously described.
An offline training process is preferably performed before the real time classification process is performed, such that the results of the training process may inform the real time classification process. The offline training process preferably includes a segmentation 1324 and a classifier computation 1326.
Segmentation 1324 optionally includes segmenting the data into blinks, saccades and fixations, as previously described.
Classifier computation 1326 preferably includes training the GMM. The GMM classifier may optionally be trained with an expectation maximization (EM) algorithm (see for example Patrikar and Baker, “Improving accuracy of Gaussian mixture model classifiers with additional discriminative training”, Neural Networks (UCNN), 2016 International Joint Conference on). Optionally the GMM is trained to operate according to the mean and/or co-variance of the data.
Raw audio 1226 feeds into a preprocessing process 1400, which optionally includes the following steps:
The pre-emphasis filter and windowing are optionally performed as described with regard to “COMPUTING MEL-FREQUENCY CEPSTRAL COEFFICIENTS ON THE POWER SPECTRUM” (Molau et al, Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on). The filter involves differentiating the audio signal and may optionally be performed as described in Section 5.2 of “The HTK Book”, by Young et al (Cambridge University Engineering Department, 2009). The differentiated signal is then cut into a number of overlapping segments for windowing, which may for example optionally be each 25 ms long and shifted by 10 ms. The windowing is preferably performed according to a Hamming window, as described in Section 5.2 of “The HTK Book”.
Next, the preprocessed data is fed into phoneme classification process 1228, which begins with a phonemes feature extraction 1402. Phonemes feature extraction 1402 may optionally feature the following steps, which may optionally also be performed according to the above reference by Molau et al:
The filtered and windowed signal is then analyzed by FFT (Fast Fourier Transform). The Molau et al reference describes additional steps between the FFT and the DCT (discrete cosine transformation), which may optionally be performed (although the step of VTN warping is preferably not performed). In any case the DCT is applied, followed by performance of the MFCC (Mel-frequency cepstral coefficients; also described in Sections 5.3, 5.4 and 5.6 of “The HTK Book”).
Next liftering is performed as described in Section 5.3 of “The HTK Book”.
The extracted phonemes are then fed into a phonemes classification 1404, which may optionally use any classifier as described herein, for example any facial expression classification method as described herein. Next a phonemes post-classification process 1406 is performed, which may optionally comprise any type of suitable label filtering, such as for example the previously described evidence accumulation process.
An offline training process is preferably performed before the real time classification process is performed, such that the results of the training process may inform the real time classification process. The offline training process preferably includes a segmentation 1408 and a classifier computation 1410. Segmentation 1408 preferably receives the results of voice activity detection process 1230 as a first input to determine whether phonemes can be classified. Given that voice activity is detected, segmentation 1408 then preferably performs a Chi2 test on the detected phonemes. Next, classifier computation 1410 preferably performs a multiclass computation which is determined according to the type of classifier selected.
Turning now to voice activity detection process 1230, raw audio 1226 is fed into a VAD (voice activity detection) feature extraction 1412. VAD feature extraction 1412 optionally performs the following steps:
The LogEnergy step may optionally be performed as described in Section 5.8 of “The HTK Book”.
The rateZeroCrossing step may optionally be performed as described in Section 4.2 of “A large set of audio features for sound description (similarity and classification) in the CUIDADO project”, by G. Peeters, 2004, https://www.ersearchgate.net/publication/200688649_A_large_set_of_audio_features_for_sound_description_similarity_and_classification_in_the_CUIDADO_project). This step can help to distinguish between periodic sounds and noise.
The autocorrelation step may optionally be performed as described in Section 4.1 of “A large set of audio features for sound description (similarity and classification) in the CUIDADO project”.
Optionally, time derivatives may also be obtained as part of the feature extraction process, for example as described in Section 5.9 of “The HTK Book”.
The output of VAD feature extraction 1412 is preferably fed to both a VAD classification 1414 and the previously described phonemes classification 1414. In addition, segmentation 1408 preferably also has access to the output of VAD feature extraction 1412.
Turning now to VAD classification 1414, this process may optionally be performed according to any classifier as described herein, for example any facial expression classification method as described herein.
Next a VAD post-classification process 1416 is performed, which may optionally comprise any type of suitable label filtering, such as for example the previously described evidence accumulation process.
An offline training process is preferably performed before the real time classification process is performed, such that the results of the training process may inform the real time classification process. The offline training process preferably includes a segmentation 1418 and a classifier computation 1420. Segmentation 1418 preferably performs a Chi2 test on silence, which may optionally include background noise, which may for example be performed by asking the subject to be silent. Given that silence is not detected, segmentation 1418 next preferably performs a Chi′ test on the detected phonemes (performed when the subject has been asked to speak the phonemes).
Next, classifier computation 1420 preferably performs a binary computation (on voice activity/not voice activity) which is determined according to the type of classifier selected.
At 1502, it is determined whether voice activity is detected by VAD, for example according to the previously described voice activity detection process (1230). If so, then mouth animation (for animating the mouth of the avatar, if present) is preferably performed in 1504, for example as determined according to the previously described phoneme classification process (1228). The avatar animation features a predetermined set of phonemes, with each phoneme being animated, preferably including morphing between states represented by different phoneme animations. Optionally only a subset of phonemes is animated.
Next, an upper face expression is animated in stage 1506, for example as determined according to the previously described facial expression recognition process (1224). Once voice activity has been detected, preferably expressions involving the lower part of the face are discarded and are not considered.
Turning now back to 1502, if no voice activity is detected, then a full face expression is animated in 1508.
Turning back now to 1500, if no face expression is detected, then it is determined whether a blink is present in 1510. If so, then it is animated in 1512. The blink may optionally be determined according to the previously described blink detection process (1222).
If not, then eye movement is animated in 1514. The eye movement(s) may optionally be determined according to the previously described eye movements process 1220.
After either 1512 or 1514, the process returns to detection of voice activity in 1502, and animation of the mouth if voice activity is detected in 1504.
As shown, a method 1600 begins with receiving EMG facial information in stage 1602. Such EMG facial information may optionally comprise one or more of the following types of information:
Raw EMG data optionally comprises EMG signals from the electrodes, optionally with preprocessing such as noise removal. Processed EMG data before classification optionally comprises data that has undergone preprocessing as described herein but has not been classified. Fully classified EMG data comprises data in which the facial expression classification has been determined as described herein.
In stage 1604, the facial expression to be applied is determined. If raw EMG data is received, then optionally any preprocessing as described above is performed before such a determination. If processed EMG data is received, optionally such data is used “as is” or with additional processing. If classified EMG data is received, then optionally the classification itself is fed into the facial expression determination, optionally with processed but not classified EMG data or any other suitable EMG data. Next, the facial expression is optionally determined as described, for example according to any of the methods described herein and/or according to a vector based or points-based method as described.
In stage 1606, the facial expression data is synchronized to motion capture data. Both the facial expression data and motion capture data may optionally be obtained as described with regard to
In stage 1608, the motion capture data and the facial expression data are integrated. For example, such integration may be performed to create a final media product, such as video data, or as part of the process of creating such a final media product.
System 1700 also features a facial expression processor 1704, for processing EMG electrode data to determine facial expression information as described herein, for example as described with regard to
Motion capture processor 1702 and facial expression processor 1704 are preferably configured as one or more computational devices, including without limitation virtual such devices and “serverless” cloud implementations. Motion capture processor 1702 and facial expression processor 1704 provide the motion capture data and facial expression information, respectively, to a motion capture and facial expression integrator 1706. Motion capture and facial expression integrator 1706 integrates the two types of data so that they are synchronized and can be used for the final media product. Optionally, motion capture and facial expression integrator 1706 creates the final media product.
For obtaining motion capture data, the actor preferably wears a plurality of suitable markers, shown as actor markers 1712. A non-limiting example of configuration of such markers is shown in US20100315524A1, previously incorporated by reference. For obtaining facial expression data, the actor preferably wears a plurality of EMG electrodes, optionally in a mask configuration as described herein, shown as actor facemask 1714.
In stage 1804, mask data is obtained from the EMG electrodes. The mask data is then processed by post-processing software as described herein, in stage 1806, to produce facial motion capture data in stage 1808. Post-processing can include a post-classification process, filtering, and the like as described in connection with
If a face mesh is used, alternatively or additionally, then optionally the EMG electrode data is mapped onto the points of the mesh where muscles are attached to skin. Again, the direct measurement of facial muscle activity provides an advantage.
Optionally, the map or model is adjusted specifically for a particular actor, such that each actor may have a personalized map or model of the facial muscle behavior. Also, optionally, the EMG processor and/or methods thereof, such as the classifier for example and without limitation, are adjusted specifically for a particular actor.
Using data from EMG electrodes has a number of additional advantages. Without wishing to be limited by a closed list, such data can help to overcome “wooden” or unnatural facial expressions. Complex facial expressions remain a challenge for existing approaches to motion capture, because skin stretches, shrinks, and shears much more than other materials such as cloth or paper, and the local motion models typically used in motion capture are not adapted to such deformations and/or may be unable to handle such deformations efficiently.
Facial motion capture data is optionally produced according to musculature-based expressions. These expressions are obtained by mapping motion captured facial motion to deformation of the synthetic facial model, by applying an approximate biomechanical model to both the captured facial data and synthetic facial motion.
In addition, body motion capture data is obtained in stage 1810. Such body motion capture data is then synchronized and combined with the facial motion capture data in stage 1812, for example by 3D rendering software. Such software may also optionally apply musculature-based expressions or other types of facial expression rendering to the facial motion capture data.
Facemask apparatus 502, in some implementations, can include facemask circuitry 520, which can be operatively coupled to a local board 2045. The facemask connector 2030 can be the same connector as facemask connector 524 or a different connector and can also be operatively coupled to a first local board connector 2040 which can be the same or different connector as first local board connector 526 via bus 2035 which can be the same or different bus as the bus connecting connectors 524 and 526. Local board 2045 can be operatively coupled to bus 506 through a second local board connector 2050 which can be the same or a different connector as local board connector 528. Facemask 502 can also include a controller 2020 for synchronizing cameras 2010, 2015 and one or more IR light sources 2025.
Image data from cameras 2010, 2015 can be sent to local board 2045 for preprocessing using image preprocessor 2047, including filtering, generating a Gaussian pyramid, compression, and the like. In some embodiments, local board 2045 can also include a processor configured to generate an optical flow estimation. Local board 2045 can include a memory component 2049 for buffering or otherwise storing eye tracking image data, preprocessed eye tracking image data, or eye tracking optical flow estimation data. In other embodiments, optical flow estimation can be performed at main board 2060. In some implementations, eye tracking image data can be transmitted from local board 2045 through second local board connector 2050 which can be the same or different connector as second local board connector 528, and then through bus 506 to main board 2060 which can be the same or different board as main board 504.
Local board 2045 can also support connection of IR light sources 2025. Local board 2045 can include a controller to turn on, calibrate, and manage light sources 2025. Light sources 2025 can be configured to strobe in phase with image capture by cameras 2010, 2015.
Main board 2060, in some implementations, can include a main board connector 2055 which can be the same or different connector as first main board connector 540 for receiving the digital signal carrying eye tracking image data from bus 506. The digital signal can then be sent from the main board connector 2055 to a microcontroller 2070 which can be the same or different microcontroller as microcontroller 542. Microcontroller 2070 can receive the eye tracking data signals, process the eye tracking data signals and/or initiate other components of the main board 2060 to process the eye tracking data signals, and/or can otherwise control the functions of main board 2060. In some implementations, microcontroller 2070 can collect recorded data, can synchronize and encapsulate data packets, and can communicate the recorded data to a remote computer (not shown) through some type of communication channel, e.g., via a USB or Bluetooth connection. The preferred amount of memory is at least enough for performing the amount of required processing, which in turn also depends on the speed of the communication bus and the amount of processing being performed by other components. In some implementations, the main board 2060 can also include additional connectors 2075 similar to main board 504 for the same or similar peripheral devices or components.
At step 2140, animation data is generated. In some preferred embodiments, animation data is generated here before integration with other motion capture data. In other preferred embodiments, motion capture data from eye tracking can be integrated with other facial motion capture data first. In yet other embodiments, eye tracking motion capture data can be sent to an animation rendering engine alone so that eye animation rendering is done independently. In some embodiments, eye tracking motion capture data and other data is time reconciled as part of generating animation data so that the animation data from the different data sources is synchronized. This is discussed further in connection with
At step 2145, avatar animation data is sent to other devices. In preferred embodiments, animation data is sent to one or more other headsets worn by other subjects so that an avatar of the subject of the animation data can be displayed in real-time. A receiving subject can then interact with that avatar to respond more realistically and genuinely so that the receiving subject's performance or response is enhanced. The animation data can include location and positioning data so that the receiving headset displays can display an avatar of the subject with the proper perspective and within the virtual environment.
Optionally at step 2150, combined motion capture data is stored for post-processing to create a final film product. Optionally, post-processing includes further combination with additional motion capture data and animation. Animation data can be stored on a memory device coupled to the headset. In other embodiments, animation data can be sent, preferably wirelessly, to another computing device for storage.
At step 2315, facial expression data, including data from EMG electrodes, is received. At step 2320, eye tracking data is received. At step 2325, voice activity data is received. At step 2330, post-processing of mask data is performed as described in connection with
It should be understood that method 2300 can be performed with a subset of steps 2315-2325 such that the selected data are integrated and synchronized. For example, it would be possible to render media that included facial expression data and eye tracking data but no voice audio.
At step 2430, avatar animation data is generated. In some embodiments, the data can be received by and avatar animation data can be generated by a computing device that is a component of or mounted on the headset. In other embodiments, the computing device can be separate from the headset, including a computing device that receives the data wirelessly from the headset or a computing device mounted to the subject wearing the headset either wirelessly or wired. The computing device can receive data wirelessly or through a wired connection depending on the source of the data. For example, in embodiments that include a computing device attached to the subject, data from the headset can be received over a wired connection while data from a body motion capture system can be received wirelessly. For either wired or wireless communication, data can be compressed to enhance the speed at which it is sent and received. Generating avatar animation data can include time reconciliation of the facial expressions, voice animation data, eye tracking animation data, and body motion capture data to limit drift of a type of data in relation to one or more of the other types of data.
At step 2430, avatar animation data is sent to another device for display. Steps 2410-2430 are completed essentially in real time so that as the data from subject wearing the first headset is captured, other headsets display the avatar of the subject wearing the first headset essentially in real time. In preferred embodiments, animation data can be sent to non-headset devices for rendering and/or display also. Non-headset computing devices can receive animation data from one or more headsets so that a real-time preview animation of what a final film project would look like can be rendered. The animation preview at a non-headset computing device may be done essentially in real-time so that there is little or no apparent delay between the actual performance and the animation. Non-headset computing devices can include monitors as used by directors, camera crew, and editors during the filming process. For example, non-headset devices can include the monitor of a film camera, a monitor like a traditional director's monitor, a common computer display monitor, a tablet device, and the like. The particular headset can be rendered or combined animation data from two or more headsets can also be rendered within the virtual environment. This allows a director or other third party to view the subjects as they would substantially be rendered in a post-processed animation. It is valuable to a director to have insight into the quality of the final animation before post-processing to allow for additional takes of a scene, give further direction, and the like and not leave directorial changes to the animation or editing process.
In preferred embodiments, the animation displayed in a headset includes the view of the subject wearing the headset such that different headsets will have different views while a non-headset device can have the view of one of the headsets displayed or a neutral view, such as the view based on a camera position.
At step 2435, animation data is stored for later post-processing. In some embodiments, time reconciliation for correlating animation data can be performed during post-processing either alone or in addition to time reconciliation performed in generating animation data for the preview. Time reconciliation can be accomplished during animation data generation through data transmission acceleration or buffering using known techniques so that different sources of data are synchronized. Time reconciliation can be accomplished during a post-processing phase using time shifting of one or more of the data sources. In some embodiments, animation data can be correlated. In some embodiments, data EMG, voice, image data and the like before animation data is generated can be correlated.
While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made, including different combinations of various embodiments and sub-embodiments, even if not specifically described herein.
Number | Name | Date | Kind |
---|---|---|---|
4248244 | Charnitski | Feb 1981 | A |
4856891 | Pflibsen et al. | Aug 1989 | A |
5767941 | Ferguson | Jun 1998 | A |
6785574 | Kajitani | Aug 2004 | B2 |
7554549 | Sagar et al. | Jun 2009 | B2 |
8130225 | Sullivan | Mar 2012 | B2 |
8144153 | Sullivan | Mar 2012 | B1 |
8493286 | Agrama | Jul 2013 | B1 |
8908960 | Sullivan | Dec 2014 | B2 |
9037530 | Tan | May 2015 | B2 |
9155487 | Linderman | Oct 2015 | B2 |
9191643 | Sweeney | Nov 2015 | B2 |
9275487 | Van Gelder | Mar 2016 | B1 |
9351658 | Lee | May 2016 | B2 |
9389694 | Ataee | Jul 2016 | B2 |
9552510 | Li | Jan 2017 | B2 |
9622703 | Badower | Apr 2017 | B2 |
9817498 | Song | Nov 2017 | B2 |
9928633 | Cotter | Mar 2018 | B2 |
10120413 | Aimone | Nov 2018 | B2 |
10156949 | Lee | Dec 2018 | B2 |
10235807 | Thomas | Mar 2019 | B2 |
10515474 | Tadi et al. | Dec 2019 | B2 |
10521014 | Tadi et al. | Dec 2019 | B2 |
10943100 | Tadi et al. | Mar 2021 | B2 |
20020097678 | Bisher | Jul 2002 | A1 |
20030109306 | Karmarkar | Jun 2003 | A1 |
20030117651 | Matraszek | Jun 2003 | A1 |
20030167019 | Viertio-Oja | Sep 2003 | A1 |
20040117513 | Scott | Jun 2004 | A1 |
20040229685 | Smith | Nov 2004 | A1 |
20050180613 | Bronstein | Aug 2005 | A1 |
20060071934 | Sagar | Apr 2006 | A1 |
20060235318 | Ordonez Smith | Oct 2006 | A1 |
20070179396 | Le et al. | Aug 2007 | A1 |
20080058668 | Kaveh | Mar 2008 | A1 |
20080065468 | Berg | Mar 2008 | A1 |
20080181507 | Gope | Jul 2008 | A1 |
20080218472 | Breen | Sep 2008 | A1 |
20080292147 | Bronstein | Nov 2008 | A1 |
20090326406 | Tan | Dec 2009 | A1 |
20100156935 | Lim | Jun 2010 | A1 |
20100211397 | Park | Aug 2010 | A1 |
20100315524 | Gordon et al. | Dec 2010 | A1 |
20110181601 | Mumbauer | Jul 2011 | A1 |
20110243380 | Forutanpour | Oct 2011 | A1 |
20120134548 | Rhoads | May 2012 | A1 |
20120172682 | Linderman | Jul 2012 | A1 |
20130021447 | Brisedoux | Jan 2013 | A1 |
20130279577 | Schwarz | Oct 2013 | A1 |
20130314401 | Engle | Nov 2013 | A1 |
20140043434 | Asano | Feb 2014 | A1 |
20140118582 | Artyomov | May 2014 | A1 |
20140153816 | Cohen | Jun 2014 | A1 |
20140164056 | Johnson | Jun 2014 | A1 |
20140267413 | Du | Sep 2014 | A1 |
20140267544 | Li | Sep 2014 | A1 |
20140323148 | Schmalstieg | Oct 2014 | A1 |
20140364703 | Kim | Dec 2014 | A1 |
20150178988 | Montserrat Mora | Jun 2015 | A1 |
20150213646 | Ma | Jul 2015 | A1 |
20150310262 | Do | Oct 2015 | A1 |
20150310263 | Zhang | Oct 2015 | A1 |
20150313498 | Coleman | Nov 2015 | A1 |
20150325004 | Utsunomiya | Nov 2015 | A1 |
20160042548 | Du | Feb 2016 | A1 |
20160077547 | Aimone | Mar 2016 | A1 |
20160119541 | Alvarado-Moya | Apr 2016 | A1 |
20160193732 | Breazeal | Jul 2016 | A1 |
20160300252 | Frank | Oct 2016 | A1 |
20160323565 | Van Baarsen | Nov 2016 | A1 |
20160328875 | Fang | Nov 2016 | A1 |
20160350979 | Zeng | Dec 2016 | A1 |
20160360970 | Tzvieli | Dec 2016 | A1 |
20170060256 | Heck | Mar 2017 | A1 |
20170069094 | Kim | Mar 2017 | A1 |
20170109864 | Ohba | Apr 2017 | A1 |
20170140578 | Xiao | May 2017 | A1 |
20170205903 | Miller | Jul 2017 | A1 |
20170231490 | Toth | Aug 2017 | A1 |
20170352183 | Katz | Dec 2017 | A1 |
20170364374 | Maiden | Dec 2017 | A1 |
20170367590 | Sebe | Dec 2017 | A1 |
20180027222 | Ogasawara | Jan 2018 | A1 |
20180107275 | Chen | Apr 2018 | A1 |
20180120936 | Keller | May 2018 | A1 |
20180139431 | Simek | May 2018 | A1 |
20180204356 | Xia | Jul 2018 | A1 |
20180211102 | Alsmadi | Jul 2018 | A1 |
20180211398 | Schmidt | Jul 2018 | A1 |
20180225866 | Zhang | Aug 2018 | A1 |
20180240261 | Tadi | Aug 2018 | A1 |
20180325447 | Hriso | Nov 2018 | A1 |
20180335930 | Scapel | Nov 2018 | A1 |
20190001129 | Rosenbluth | Jan 2019 | A1 |
20190004639 | Faulkner | Jan 2019 | A1 |
20190025919 | Tadi | Jan 2019 | A1 |
20190029528 | Tzvieli | Jan 2019 | A1 |
20190082990 | Poltorak | Mar 2019 | A1 |
20190155386 | Tadi | May 2019 | A1 |
20190188281 | Bivens | Jun 2019 | A1 |
20190200888 | Poltorak | Jul 2019 | A1 |
20190212752 | Fong | Jul 2019 | A1 |
20190369727 | Li | Dec 2019 | A1 |
20200129855 | Ambinder | Apr 2020 | A1 |
20200129856 | Bond | Apr 2020 | A1 |
20200410702 | Zhang | Dec 2020 | A1 |
Number | Date | Country |
---|---|---|
103810463 | May 2014 | CN |
104460955 | Mar 2015 | CN |
104504366 | Apr 2015 | CN |
102011052836 | Feb 2012 | DE |
1032872 | Sep 2000 | EP |
1433118 | Jun 2004 | EP |
2118757 | Nov 2009 | EP |
3155560 | Apr 2017 | EP |
2016126500 | Jul 2016 | JP |
20120094857 | Aug 2012 | KR |
101307046 | Jul 2013 | KR |
20150057424 | May 2015 | KR |
20150099129 | Aug 2015 | KR |
101585561 | Jan 2016 | KR |
20160053749 | May 2016 | KR |
2008108965 | Sep 2008 | WO |
2015025251 | Feb 2015 | WO |
2015192117 | Dec 2015 | WO |
2016083826 | Jun 2016 | WO |
2016165052 | Oct 2016 | WO |
2017015949 | Feb 2017 | WO |
Entry |
---|
Ahsan, R., Md., et.al., EMG Signal Classification for Human Computer Interaction: A Review, European Journal of Scientific Research vol. 33, No. 3 (2009) pp. 1-23 of pdf. |
AlZoubi, O. et.al. Classification of EEG for Affect Recognition: An Adaptive Approach, Conference Paper, Dec. 2009, pp. 1-11 pdf. |
AlZoubi, O., et al., Classification of EEG for Affect Recognition: An Adaptive Approach, Conference Paper, Dec. 2009, pp. 52-61. |
Angkoon P., et at., The Relationship between Anthro-pometric Variables and Features of Electromyography Signal for Human-Computer Interface. Dr. Ganesh Naik. Applications, Challenges, and Advancements in Electromyography Signal Processing, IGI Global, 2014, 9781466660908, pp. 1-28. |
Barreto, A. B. PhD., A practical EMG-based human-computer interface for users with motor disabilities, Journal of Rehabilitation Research and Development, vol. 37, No. 1 Jan./Feb. 2000, pp. 53-64. |
Bartlett, M., et al., Automatic Analysis of Spontaneous Facial Behavior: A Final Project Report, Jan. 2001, pp. 1-40. |
Cristhian Manuel Duran Acevedo et.al., Development of an Embedded System for Classification of EMG Signals, 2014, pp. 1-5 of pdf. |
Dubuisson, S., et al., A solution for facial expression representation and recognition, Signal Processing Image Communication 17 (2002) pp. 657-673. |
Gillies, M., et.al., Semi-Autonomous Avatars: A New Direction for Expressive User Embodiment, pp. 1-20 of pdf. |
Giuseppina, Carla Gini, et al., New Results on Classifying EMG Signals for Interfacing Patients and Mechanical Devices, New Trends in Medical and Service Robots, Mechanisms and Machines Science, vol. 20, (2014), pp. 1-16pdf. |
Gruebler, A., et al., Design of a Wearable Device for Reading Positive Expressions from Facial EMG Signals, IEEE Transactions on Affective Computing, vol. 5, No. 3, Jul.-Sep. 2014, pp. 1-11 of pdf. |
Hamedi, M., et.al. Time-Frequency Facial Gestures EMG Analysis using Bilinear Distribution, Conference Paper Nov. 2015, pp. 1-6 of pdf. |
Hema, C.R. et al., Classification of Human Emotional States using Facial Electromyogram signals, pp. 1-7 pdf. |
Hsieh, C., et al., Effective Semantic features for facial expressions recognition using SVM, International Journal of Multimedia Tools and Applications, Mar. 2015, pp. 1-23pdf. |
Huang, C., The Review of Applications and Measurements in Facial Electromyography, Journal of Medical and Biological Engineering vol. 25, No. 1 (2005), pp. 15-20. |
Krepki, R., et al., Berlin Brain-Computer Interface (BBCI) towards a new communication channel for online control multimedia, Jan. 2003, pp. 1-30pdf. |
Lopez, N., Hybrid Human-Machine Interface to Mouse Control for Severely Disabled People, International Journal of Engineering and Innovative Technology, vol. 4 Iss. May 11, 2015, pp. 1-9 of pdf. |
Mantoro, T., Distributed Support for Intelligent Environments, Apr. 2006, pp. 1-204. |
Martisius, I., Design of a Neural Interface Based System for Control of Robotic Devices, Communications in Computer and Information Science, Sep. 2012, pp. 297-311. |
Mohammad-Rezazadeh, I., et al., A novel human-machine interface based on multi-channel facial bioelectric signals, Australasian Physical & Engineering Sciences in Medicine vol. 34, No. 4, (2011) pp. 497-513. |
Mohd-Yasin, F., et al., Techniques of EMG signal analysis: Detection, processing, classification and applications,Biological Procedures Online, Feb. 1997-2006, pp. 1-22 www.biologicalprocedures.com. |
Notice of Allowance dated Aug. 3, 2021 for U.S. Appl. No. 16/678,182 (pp. 1-10). |
Notice of Allowance dated Aug. 8, 2019 for U.S. Appl. No. 16/261,693 (pp. 1-9). |
Notice of Allowance dated May 7, 2020 for U.S. Appl. No. 15/875,227 (pp. 1-11). |
Novak, D., et al., A survey of methods for data fusion and system adaptation using autonomic nervous system responses in physiological computing, Interacting with Computers vol. 24 (2012) 154-172. |
Office Action dated Apr. 24, 2019 for U.S. Appl. No. 15/875,382 (pp. 1-23). |
Office Action dated Apr. 30, 2019 for U.S. Appl. No. 16/261,693 (pp. 1-15). |
Office Action dated Aug. 16, 2019 for U.S. Appl. No. 15/875,227 (pp. 1-12). |
Office Action dated Jan. 11, 2018 for U.S. Appl. No. 15/875,306 (pp. 1-22). |
Office Action dated Jan. 22, 2021 for U.S. Appl. No. 16/678,182 (pp. 1-6). |
Office Action dated Jul. 31, 2020 for U.S. Appl. No. 16/678,163 (pp. 1-10). |
Office Action dated Jul. 9, 2020 for U.S. Appl. No. 16/678,182 (pp. 1-8). |
Podrug, E., et al., Surface EMG pattern recognition by using DWT feature extraction and SVM classifier, First Conference of Medical and Biological Engineering in Bosnia and Herzegovina, CMBEBIH 2015, pp. 1-4pdf. |
Putze, F., et.al. Online Recognition of Facial Actions for Natural EEG-Based BCI Applications, Conference Paper Jan. 2011, pp. 1-11 of pdf. |
Quan, W., et al., Facial Expression Biometrics Using Statistical Shape Models, Journal on Advances in Signal Processing, vol. 2009, Article ID 261542, pp. 1-18. |
Rechy-Ramirez, E.J., et al., Bio-signal based control in assistive robots: a survey, Digital Communications and Networks vol. 1 (2015) 1-18pdf. |
Sheikh, M., Robust Recognition of Facial Expressions on Noise Degraded Facial Images, Thesis, Feb. 2011, pp. 1-129. |
Sorci, M., et al., Fisher's Discriminant and Relevant Component Analysis for Static Facial Expression Classification, 15th European Signal Processing Conference (EUSIPCO 2007) pp. 115-119. |
Surendran, N.K., et al., Towards Real-Time Facial Expression Recognition, Jan. 2006, pp. 1-9pdf. |
Zhang, X., et al. A real-time, practical sensor fault-tolerant module for robust EMG pattern recognition, Journal of NeuroEngineering and Rehabilitation (2015) pp. 1-16pdf. |
Number | Date | Country | |
---|---|---|---|
62661848 | Apr 2018 | US | |
62615029 | Jan 2018 | US |