The present invention relates generally to the detection and classification of facial muscle movements, such as facial expressions or other types of muscle activity, in human subjects. The invention is suitable for use in electronic entertainment or other platforms in which electroencephalograph (EEG) data is collected and analysed in order to determine a subject's facial expression in order to provide control signals to that platform, and it will be convenient to describe the invention in relation to that exemplary, non-limiting application.
Facial expression has long been one of the most important aspects of human to human communication. Humans have become accustomed to consciously and unconsciously showing our feelings and attitudes using facial expressions. Furthermore, we have become highly skilled at reading and interpreting facial expressions of others. Facial expressions form a very powerful part of our everyday life, everyday communications and interactions.
As technology progresses, more of our communication is mediated by machines. People now “congregate” in virtual chat rooms to discuss issues with other people. Text messaging is becoming more popular, resulting in new orthographic systems being developed in order to cope with this unhuman world. Currently, facial expressions have not been used in man machine communication interfaces. Interactions with machines are restricted to the use of cumbersome input devices such as keyboards and joysticks. This limits our communication to only premeditated and conscious actions.
There therefore exists a need to provide technology that simplifies man-machine communications. It would moreover be desirable for this technology to be robust, powerful and adaptable to a number of platforms and environments. It would also be desirable for this technology to optimise the use of natural human to human interaction techniques so that the man-machine interface is as natural as possible for a human user.
With this in mind, one aspect of the invention provides a method of detecting and classifying facial muscle movements, including the steps of receiving bio-signals from at least one bio-signal detector, and applying at least one facial muscle movement-detection algorithm to a portion of the bio-signals affected by a predefined type of facial muscle movement in order to detect facial muscle movements of that predefined type.
The step of applying at least one facial movement-detection algorithm to the bio-signals may include:
comparing the bio-signal portion to a signature defining one or more distinctive signal characteristic of the predefined facial muscle movement type.
In a first embodiment of the invention, the step of applying at least one facial muscle movement-detection algorithm to the bio-signals may include:
directly comparing bio-signals from one or more predetermined bio-signal detectors to the signature.
In another embodiment of the invention, the step of applying at least one facial muscle movement-detection algorithm to the bio-signals may include:
projecting bio-signals from a plurality of bio-signal detectors onto one or more predetermined component vectors; and
comparing the projections onto the one or more component vectors to that signature.
The predetermined component vectors may be determined from applying a first component analysis to historically collected bio-signals generated during facial muscle movements of the type corresponding to that first signature. The first component analysis applied to the historically collected bio-signals may be independent component analysis (ICA). Alternatively, the first component analysis applied to the historically collected bio-signals may be principal component analysis (PCA). In this embodiment, the method may further include the steps of:
applying a second component analysis to the detected bio-signals; and
using the results of the second component analysis to update the one or more predetermined component vectors during bio-signal detection.
The second component analysis may be principal component analysis (PCA).
In yet another embodiment of the invention, the step of applying at least one facial muscle movement-detection algorithm to the bio-signals may include:
applying a desired transform to the bio-signals; and
comparing the results of the desired transform to that signature.
The desired transform may be selected from any one or more of a Fourier transform, wavelet transform or other signal transformation method.
The step of applying at least one facial muscle movement-detection algorithm to the bio-signals may further include the step of:
separating the bio-signals resulting from the predefined type of facial muscle movement from one or more sources of noise in the bio-signals.
The sources of noise may include any one or more of electromagnetic interference (EMI), bio-signals not resulting from the predefined type of facial muscle movement and other muscle artefacts.
The step of applying one or more than one facial muscle movement-detection algorithm to the bio-signals may include comparing the sum or difference of bio-signals from one or more pairs of bio-signal detectors to that signature.
The step of applying one or more than one facial muscle movement-detection algorithm to the bio-signals may further include comparing bio-signals from each of the one or more pairs of bio-signal detectors to that signature.
The comparing step may include:
tracking a derivative of one or more that one of the bio-signals from each of the one or more pairs of bio-signal detectors and the sum or difference of bio-signals from the one or more pairs of bio-signal detectors.
The comparing step may further include:
comparing one or both of gradient and amplitude for one or more that one of the bio-signals from each of the one or more pairs of bio-signal detectors and the sum or difference of bio-signals from the one or more pairs of bio-signal detectors; and
determining when one or both of the gradient and amplitude respectively exceeds predetermined gradient and amplitude thresholds.
The comparing step may further include:
computing the correlation between bio-signals from each of the one or more pairs of bio-signal detectors; and
determining when the correlation exceeds a predetermined correlation threshold.
The step of applying one or more than one facial muscle movement-detection algorithm to the bio-signals may include comparing the power of bio-signals from one or more predetermined bio-signal detector to that signature.
The comparing step may include summing the power of bio-signals from one or more pairs of bio-signal detectors to that signature; and
determining whether the sum exceeds a predetermined threshold indicative of a first facial muscle movement type.
The comparing step may include computing the ratio of the power of bio-signals from a first group of bio-signal detectors to the power of bio-signals from a second group of bio-signal detectors; and
determining whether the ratio exceeds a predetermined threshold indicative of a second facial muscle movement type.
In one or more embodiments of the invention, the bio-signals may include any one or more of electroencephalograph (EEG) signals, electrooculograph (EOG) signals and electromyography (EMG) signals The method may further include the step of:
generating an output signal representative of the detected facial muscle movement type for input to an electronic entertainment application or other application.
Another aspect of the invention provides an apparatus for detecting and classifying facial muscle movements, including:
a processor and associated memory device for causing the processor to carry out the method described above.
Yet another aspect of the invention provides a computer program product, tangibly stored on machine readable medium, the product comprising instructions operable to cause a processor to carry out the method described above.
A further aspect of the invention provides a computer program product comprising instructions operable to cause a processor to carry out the method described above.
These and other features, aspects and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying figures which depict various views and embodiments of the device, and some of the steps in certain embodiments of the method of the present invention, where:
Turning now to
The electrical fluctuations detected over the scalp by the series of scalp sensors are attributed largely to brain tissue located at or near the skull. The source is the electrical activity of the cerebral cortex, a significant portion of which lies on the outer surface of the brain below the scalp. The scalp electrodes pick up electrical signals naturally produced by the brain and make it possible to observe electrical impulses across the surface of the brain. Although in this exemplary embodiment the headset 102 includes several scalp electrodes, in other embodiments only one or more scalp electrodes, e.g. sixteen electrodes, may be used in a headset.
Traditional EEG analysis has focused solely on these signals from the brain. The main applications have been explorative research in which different rhythms (alpha wave, beta wave, etc) have been identified, pathology detection in which onset of dementia or physical injury can be detected, and self improvement devices in which bio-feedback is used to aid in various forms of meditation. Traditional EEG analysis considers signals resulting from facial muscle movement such as eye blinks to be artefacts that mask the real EEG signal desired to be analysed. Various procedures and operations are performed to filter these artefacts out of the EEG signals selected.
The applicants have developed technology that enables the sensing and collecting of electrical signals from the scalp electrodes, and the application of signal processing techniques to analyze these signals in order to detect and classify human facial expressions such as blinking, winking, frowning, smiling, laughing, talking etc. The result of this analysis is able to be used by a variety of other applications, including but not being limited to electronic entertainment applications, computer programs and simulators.
Each of the signals detected by the headset 102 of electrodes is fed through a sensor interface 104, which can include an amplifier to boost signal strength and a filter to remove noise, and then digitized by an analogue-to-digital converter 106. Digitized samples of the signal captured by each of the scalp sensors are stored during operation of the apparatus 100 in a data buffer 108 for subsequent processing.
The apparatus 100 further includes a processing system 109 including a digital signal processor 112, a co-processing device 110 and associated memory device for storing a series of instructions (otherwise known as a computer program or computer control logic) to cause the processing system 109 to perform desired functional steps. Notably, the memory includes a series of instructions defining at least one algorithm 114 to be performed by the digital signal processor 112 for detecting and classifying a predetermined type of facial muscle movement. Upon detection of each predefined type of facial muscle movement, a corresponding control signal is transmitted in this exemplary embodiment to an input/output interface 116 for transmission via a wireless transmission device 118 to a platform 120 for use as a control input by electronic entertainment applications, programs, simulators or the like.
In one embodiment, the algorithms are implemented in software and the series of instructions is stored in the memory of the processing system, e.g., in the memory of the processing system 109. The series of instructions causes the processing system 109 to perform the functions of the invention as described herein. Prior to being loaded into the memory, the instructions can be tangibly embodied in a machine readable storage device, such as a computer disk or memory card, or in a propagated signal. In another embodiment, the algorithms are implemented primarily in hardware using, for example, hardware components such as application specific integrated circuits (ASICs). Implementation of the hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art. In yet another embodiment, the algorithms are implemented using a combination of software and hardware.
Other implementations of the apparatus 100 are possible. Instead of a digital signal processor, an FPGA (field programmable gate array) could be used. Rather than a separate digital signal processor and co-processor, the processing functions could be performed by a single processor. The buffer 108 could be eliminated or replaced by a multiplexer (MUX), and the data stored directly in the memory of the processing system. A MUX could be placed before the A/D converter stage so that only a single A/D converter is needed. The connection between the apparatus 100 and the platform 120 can be wired rather than wireless.
Although the apparatus 100 is illustrated in
The headset 102, including scalp electrodes positioned according to the system 200, is placed on the head of a subject in order to detect EEG signals. As seen in
In traditional EEG research, many signals resulting from eye blinks and other facial muscle movements have been considered to be artefacts masking the real EEG signal required for analysis.
In order to generate the mathematical signature for each facial muscle movement, and as shown in
Once the EEG signal recordings are collected, signal processing operations are then performed at step 706 in order to identify one or more distinctive signal characteristics of each predefined facial muscle movement type. Identification of these distinctive signal characteristics in each EEG signal recording enables classification of the facial muscle movement in a subject to be classified at step 708 and an output signal representative of the detected type of facial muscle movement to be output at step 710. Testing and verification of the output signal at step 712 enables a robust data set to be established.
In some embodiments, it may be necessary to develop a mathematical signature for each subject. In other embodiments of the invention, a generic mathematical signature can be developed for each type of facial muscle movement, e.g., using a limited number of subjects, and stored in the memory of the digital signal processor 112 without requiring the aforementioned steps to be carried out by each subject.
In one of the modes of operation, the portion of the bio-signals affected by a predefined type of facial muscle movement is predominantly found in signals from a limited number of scalp electrodes. For example, eye movement and blinking can be detected by using only two electrodes near the eyes, such as the Fp1 and Fp2 channels shown in
It is also possible to combine the signals from one or more electrodes together, and then to compare that combined bio-signal to one or more signatures defining the distinctive signal characteristics of predefined facial muscle movement types. A weighting may be applied to each signal prior to the signal combining operation in order to improve the accuracy of the facial muscle movement detection and classification.
In other modes of operation, the apparatus 100 acts to decompose the scalp electrode signals into a series of components and then to compare the projection of the bio-signals from the scalp electrodes onto one or more predetermined component vectors with the mathematical signatures defining the signal characteristics of each type of facial muscle movement.
In this regard, independent component analysis (ICA) has been found to be useful for defining the characteristic forms of the potential function across the entire scalp. Independent component analysis maximizes the degree of statistical independence among outputs using a series of contrast functions. As seen in
Another technique for the decomposition of the bio-signals into components is principal component analysis (PCA) which ensures that output components are uncorrelated. In various embodiments of the invention, either or both of independent component analysis and principal component analysis may be used in order to detect and classify facial muscle movements.
In other modes of operation, the apparatus 100 may act to apply a desired Fourier transform to the bio-signals from the scalp electrodes. The transform could alternatively be a wavelet transform or any other suitable signal transformation method. Combinations of one or more different signal transformation methods may also be used. Portions of the bio-signals affected by a predefined type of facial muscle movement may then be identified using a neural network.
Each of the above described techniques for detection and classification of the facial muscle movements may be incorporated into a facial muscle movement detection algorithm stored in the memory of and performed by the digital signal processor 112. Once a particular facial muscle movement detection algorithm has been fully developed, the algorithm may be implemented as a piece of real-time software program or transferred into a digital signal processing or other suitable environment.
As an example of the type of facial muscle movement that can be detected and classified by the apparatus 100, a facial expression algorithm for the detection of an eye blink will now be described. It is to be understood that the general principles described in relation to the algorithm are also applicable to the detection and classification of other types of facial muscle movement, such as winks and eyeball motions. Eye blinks are present in all anterior electrodes but feature most prominently in the two frontal channels Fp1 and Fp2.
In one embodiment of the invention, the predetermined component vectors are identified from historically collected data from a number of subjects and/or across a number of different sessions. As shown in
Independent component analysis is a computationally time consuming activity and in many instances is inappropriate in some application, such as real-time use. Whilst independent component analysis may be used to generate average component vectors for use in the detection and classification of various types of facial muscle movements, the balance of signals across different electrodes vary slightly across different sessions and users.
Accordingly, the average component vectors defined using independent component analysis of historically gathered data may not be optimal during real-time data detection and classification. During real-time operation of the apparatus 100, principal component analysis can be performed on the real-time data and the resulting component vector can be used to update the component vector generated by independent component analysis throughout each session. In this way, the resulting facial muscle movement-detection algorithms can be made robust against electrodes shifting and variances in the strengths of electrode contact.
As can be seen at step 1012, the projection of the historically collected data on the vector component is initially used as a reference in the facial muscle movement detection algorithms 114. However, as data is collected and stored in the data buffer 108 at step 1014, principal component analysis is carried out at step 1016 on the stored data, and the results of the analysis generated at step 1018 are then used to update the component vectors developed during offline independent component analysis.
As has been previously described, component vectors can be used in order that a correct weighting is applied to the contribution from the signals of each relevant electrode. An example of an eye-blink component vector is shown in the vector diagram 1100 in
The result of this process for a single eye blink is shown in
Of particular interest are zero-crossing points in the first order derivative signal, which fall into two categories: positive zero-crossing point and negative zero-crossing point. The sign (namely either positive or negative) of the zero-crossing points indicates whether the signal increases or decreases after crossing the axis. For each eye blink, there are two positive zero-crossing points, respectively referenced 1306 and 1308 on
Accordingly, once the zero-crossing points are identified, the algorithm verifies whether there exists a negative zero-crossing point sandwiched between the two positive zero-crossing points, and the eye blink peak passes amplitude threshold. A default value of the amplitude threshold is initially made, but to increase the accuracy of the algorithm, the threshold amplitude is optionally adjusted at step 1218 based upon the strength of an individual's eye blink peaks.
In this example, the eye blink “signature” defines the distinctive signal characteristics representative of an eye blink, namely a negative zero crossing sandwiched between two positive zero crossings in the first order derivative of the filtered signal, and a signal amplitude greater than a predetermined threshold in the filtered signal. The signature is optionally updated by changing the threshold forming part of the distinctive signal characteristics of the signature during facial muscle movement detection and classification. In other embodiments, the digital signature may define other amplitudes or signal characteristics that exceed one or more predetermined thresholds. The signature may be updated during facial muscle movement detection and classification by changing one or more of those thresholds. More generally, any one or more distinctive signal characteristics of a predetermined facial muscle movement type that form part of a digital signature can be updated during the course of facial muscle movement detection and classification in order to improve the viability and accuracy of the facial muscle movement detection algorithms implemented by the apparatus 100.
The specific channels used to detect and classify various facial expressions may differ according to the particular facial expression in question. In addition to using signals from individual channels or activations of component vectors, the sum or difference of channel pairs may be used in facial muscle movement detection algorithms.
At step 1408, a third order infinite impulse response (IIR) low pass filter is applied at 10 Hz, whilst at step 1410 a first order IIR high pass filter is applied at 0.125 Hz.
At step 1412, a first order derivative operation is performed on the sum of bio-signals from channels Fp1 and Fp2. Similar to the aforementioned algorithm 1200, an eye blink peak is tracked by the negative zero-crossing point of the first order derivative. The rise and the fall of an eye blink signal peak are bounded by positive zero-crossing points preceding and following this negative zero-crossing point of the first order derivative, respectively. An assessment is made as to whether the correlation between the filtered signals of channels Fp1 and Fp2 for the window of data bounded by a positive zero-crossing and a negative zero-crossing for the rise of a peak (and vice versa for the fall of a peak) exceeds a first predetermined threshold at step 1414, whether the lesser amplitude of the rise or the fall of an eye blink peak signal from either of the individual channels Fp1 and Fp2 exceeds a second predetermined threshold at step 1416, and whether the maximum gradient determined from the peak or trough of the first order derivative of the summed signals from channels Fp1 and Fp2 exceeds a third predetermined threshold at step 1418. If these three values are above their respective thresholds in all cases, then an eye opening or eye closing event is detected at step 1420 (dependent on whether the maximum gradient is positive or negative).
Similar algorithms can be used to identify winks, eyeball motions or other related facial muscle expressions. Other algorithms may use different combinations of signal correlation, amplitude displacement and signal gradient measurement, as well as assessing the sum or difference of bio-signals from one or more different channel pairs to those used in the algorithm illustrated in
Other algorithms used to detect and classify facial muscle movements rely upon the determination of signal power upon particular channels, the sum of signal power on one or more pairs of signal channels, the difference of signal power between one or more pairs of channels and/or the ratio of the signal power on one or more channels or one or more channel pairs to the power of signals on one or more other channels or one or more other channel pairs. By using the exemplary algorithm 1500 shown in
At step 1506, a data window is created for each of these channels in which 64 consecutive samples (corresponding to a quarter of a second) are considered. At step 1508, the power of the signal represented by the 64 consecutive samples in the data window for each channel is calculated, and a single power value computed for each channel at step 1510.
In order to determine whether a smile is present in the EEG data sample in step 1502, at step 1512 the ratio of the power on channel pair T7 and T8 to the average of the power on channel pair AF3 and AF4 and channel pair FC5 and FC6 is computed. At step 1514, a low pass filter is applied to the computation carried out at step 1512. At step 1516, a determination is made as to whether the ratio computed at step 1512 exceeds a predetermined threshold indicative of a smile. If this is the case, then a smile is detected at step 1518, or otherwise step 1506 is performed again with the next 64 data samples for each of the channels extracted at step 1504.
Whilst step 1512 is being computed in order to determine whether a smile is present in the bio-signal data, at step 1522 the average power on channel pair FC5 and FC6 is being computed in order to determine whether a clench is present in the bio-signals. If it is determined at step 1526 that this computed average power exceeds a predetermined threshold indicative of a clench, then a clench is detected at step 1528. If the average power computed in step 1522 does not exceed the predetermined threshold indicative of a clench at step 1526, then the power ratio computed at step 1512 is compared to the threshold indicative of a smile in order to determine whether a smile is present in the bio-signals.
In order to improve the efficiency of algorithms relying upon signal power detection, a signal power profile can be created using signal power on all channels to create a signal power profile as a 32 channel vector. There are several ways of creating a signal power profile for an expression. For example, a combination of principle component analysis, simple statistics such as mean, median and manual inspection can be used to create the signal power profile. The profile can then be normalized to a unit vector for scaling simplicity. The dot product of the signal power profile and the normalized signal power on all channels can then be used as the signal to identify when a particular facial expression occurs.
In the algorithm described in relation to
Alternative methods of calculating signal power can be used in other embodiments of the invention. These methods may be based upon signal power present in particular frequency bands on channels, or the ratio of power in two different frequency bands or two different channels. These could be different channels on the same band, different bands on the same channel or different bands on different channels.
The channel correlation performed in relation to the algorithm shown in
The facial expression detection algorithms described above can have multiple threshold values associated therewith. These threshold values may not be intuitive to adjust and it is therefore useful to be able to translate these thresholds into one or more intuitive “sensitivity” parameters.
For example, the effect that an eye blink has on the EEG signal from any one or more of the detectors shown in
threshold=thresh_max+(thresh_min−thresh_max)*s
Additionally the sensitivity threshold may be inferred by any of the individual thresholds based on:
s=(threshold−thresh_max)/(thresh_min−thresh_max)
Due to the variance in musculature of different people, the expressions detected via noise profiling may have very different values. Automatic calibration of these algorithms can be performed to cater for this variability. Calibration can be performed by recording a “neutral” state, defined as anything but the expressions that are being calibrated. Noise values are calculated for this period, the values obtained are sorted and the lower 50% of the values are discarded. The average of the remaining values is then used as a baseline above which we can assume that there is some small amount of expression present.
Calibration of the expression's maximum threshold can then be done by using a multiple of the lower threshold, for example 2 or 3 times the lower threshold. Alternatively the subject can be asked to perform the expression, and the values obtained during this period can be used to determine upper end of the facial expression range. Care should be taken as the perturbations caused by forced expressions may not be as large as naturally occurring expressions, so the maximum value found during such a calibration should be set to 50% of the range.
Although the present invention has been discussed in considerable detail with reference to certain preferred embodiments, other embodiments are possible. Therefore, the scope of the appended claims should not be limited to the description of preferred embodiments contained in this disclosure. All references cited herein are incorporated by reference in their entirety.
This application is a continuation-in-part of and claims priority to U.S. application Ser. No. 11/225,598, filed on Sep. 12, 2005, which is incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 11225598 | Sep 2005 | US |
Child | 11531117 | Sep 2006 | US |