Wearable devices to capture the relative location or interaction between a wearer's tongue and teeth and computer implemented interfaces for tongue location recognition or tongue-on-teeth typing.
Computers and computing systems have affected nearly every aspect of modern living. Computers are generally involved in work, recreation, healthcare, transportation, entertainment, household management, etc. As computers have become increasingly vital to daily life, there has been great interest in exploring new ways of interacting with computer systems. For example, significant interest and research has gone into voice-based computer interactions, which allow users to interact with a computer system using only normal human verbal commands.
While these innovations in computer-to-human interfacing have been valuable, there is still significant work needed in identifying and developing novel ways to interact with a computer. In particular, there is interest in exploring human-to-computer interfacing that allows individuals with disabilities to more fully participate within the modern, computer-based society.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.
The present invention can comprise systems, methods, and apparatus configured for identifying tongue movement. The systems, methods, and apparatus detect an electroencephalography (“EEG”) signal from an EEG sensor. The EEG sensor is configured to sense the EEG signal generated by a brain in association with a tongue movement. The systems, methods, and apparatus also detect the EMG signal from the EMG sensor. The EMG sensor is configured to sense the EMG signal generated by cranial nerve stimulation of muscles associated with the tongue movement. The SKD sensor is configured to sense the skin surface deformation caused by the tongue movement. The systems, methods, and apparatus identify the skin surface deformation from the SKD sensor. The systems, methods, and apparatus identify the tongue movement based on the EEG signal, the EMG signal, and the SKD signal. The systems, methods, and apparatus then correlate the tongue movement with one of a plurality of tongue location areas.
Additional features and advantages of exemplary implementations of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of such exemplary implementations. The features and advantages of such implementations may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features will become more fully apparent from the following description and appended claims, or may be learned by the practice of such exemplary implementations as set forth hereinafter.
In order to describe the manner in which the above recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof, which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings.
The present invention extends to systems, methods, and apparatus configured for identifying tongue movement. The systems, methods, and apparatus detect an electroencephalography (“EEG”) signal from an EEG sensor. The EEG sensor is configured to sense the EEG signal generated by a brain in association with a tongue movement. The systems, methods, and apparatus also detect the EMG signal from the EMG sensor. The EMG sensor is configured to sense the EMG signal generated by cranial nerve stimulation of muscles associated with the tongue movement. The SKD sensor is configured to sense the skin surface deformation caused by the tongue movement. The systems, methods, and apparatus identify the skin surface deformation from the SKD sensor. The systems, methods, and apparatus identify the tongue movement based on the EEG signal, the EMG signal, and the SKD signal. The systems, methods, and apparatus then correlate the tongue movement with one of a plurality of tongue location areas.
Now referring primarily to
Now referring primarily to
Again, referring primarily to
Again, referring primarily to
Each different tongue movement, location of the tongue in relation to the teeth, or pressing of the tongue on pressing locations on the teeth can trigger different types of biological signals. These biological signals can include electric potentials emanating from brain or muscles cells, or combinations thereof. Examples of biological signals include the following:
Firstly, the electrical activity in the brain generates an EEG signal. A group of neurons when active produce an electrical field potential which can be measured on the skin as an EEG signal. Secondly, the muscles involved in the movement of tongue (9) include one or more, but are not necessarily limited to, HG, GG, SG, or TV. The electrical activity in response to a nerve's stimulation of the muscle or just the muscle response can be measured on the skin as an EMG signal. Thirdly, during tongue movement, the skin surface can drastically deform at certain locations on the human face. This skin surface deformation can be used to identify tongue location and direction. A skin surface deformation can be measured on the skin as a SKD signal.
Now referring primarily to
Again, referring primarily to
Now referring primarily to
Again, referring primarily to
Now referring primarily to
In particular embodiments, the six sensors can be placed at four locations. One EEG sensors can be placed behind the top of each outer ear to correspondingly capture the EEG signal. One EMG sensor and one SKD sensor can be located or co-located at behind the bottom of the outer ear to correspondingly capture the EMG signal and SKD signals.
Again, referring primarily to
The wearable device (17) can further include BLUETOOTH® or WI-FI® communication (25) to transmit the EEG, EMG and SKD signals to a computing device (26). The computing device receives the streaming data from all the sensors, analyzes them, and predicts the tongue movement, tongue location or pressing area on the teeth on which the tongue presses. The computing device can include a processor communicatively coupled to a non-transitory memory element containing an executable computer code having one or more modules or functions: (1) Pre-processing, (2) Typing Detector, and (3) Typing Recognizer.
Pre-processing components (27) can be used to filter out the environment noises using notch and band-pass filters. A decomposition analysis (28) can be applied to extract the main structure of the EEG or EMG signals, or combinations thereof, to obtain EEG and EMG data.
A typing detector (29) can be used to detect the tongue movement, tongue location or tongue typing event. A wavelet analysis (30) can be applied to capture the tongue movement or location events and Short Time Fourier Transform (STFT)(31) can be applied to detect tongue typing events.
A typing recognizer (32) can be used to recognize which teeth or area the tongue is tapping on. Then, the recognized input can be mapped into a key map to generate the input key and feedback to a user, or wearer of the wearable device (17). As such, disclosed embodiments provide a user with a means to “type” or send commands to a computing devices by tapping on different areas of their teeth or mouth region. Such a system may provide tremendous assistance to individuals who are paralyzed and/or otherwise unable to communicate. For example, as show in
Again, referring primarily to
Bio-signals can be compact and condense representation in some domains called sparsity. Low-rank spare matrix decomposition (low-rank recovery, rank-sparsity incoherence, or similar application) can be used for signal reconstruction in the presence of low signal to noise ratios.
Distorted data (noisy EMG or EEG signals) have a sparse distributed representation among the interested EEG or EMG signals of embodiments of the tongue typing system. The missing data points can be reconstructed by stacking multiple EMG/EEG samples together. If the number of samples is much smaller than the dimension, the observed matrix M is low-rank. Mathematically, it is formulated as the summation of a low-rank cascaded bio electrical matrix (Lb) and a sparse noise (Sn). To derive the close forms of Lb and Sn from the given matrix (M), minimizes an in-equality constraint ill-posed problem as follows:
minimize∥Lb∥*+∥Sn∥1s.t.M=Lb+Mn
where ∥Lb∥* defines the rank of matrix Lb and ∥Lb*∥1 is the L1 norm Lb and Sn can be recovered by solving a complex optimization problem.
Because low-rank computation only occurs on single dimension, rather than multiple dimensions in image processing, it results in millisecond response even when the algorithm is implemented in compute-intensive software such as MATLAB. RPCA can be applied to input channels including two EEG sensors, two EMG sensors, and two SKD sensors. The EEG and EMG signals can be very weak and require a dedicated signal processing technique to carefully remove the noise while avoiding losing the brain and muscle bio-electric signal activity. The RPCA technique removes most of the high frequency noise in the EEG and EMG signal.
To extract the EEG and EMG data, the sparse vector representing the atoms and their associated weights for the optimal EEG or EMG signals can be recovered by solving the optimization problem. The EEG and EMG signal can be extracted based on the characteristics of the recovered sparse vector. The signatures of the biosignals belonging to the same class can be assumed to approximately lie in a low-dimensional subspace. In order words, the EEG signals related to each movement will lie in one subspace, and the EMG signal will lie into another subspace. Every sequence of bio-signal f(x) can be represented using a basic function with Gabor atoms as follows:
where ND is the number of atoms in the Gabor dictionary and gi is one of the atoms in the dictionary here δi is the coefficient of corresponding to g computed by matching pursuit (MP) algorithm. In other words, mixed bio-signals are sparse in the Gabor dictionary. From MP computation results, the first component of the results would include the main structure of the data and the rest presents the details of the data.
In the above equation, the EEG signal is filled into the main structure of the EEG signal and at lower frequency. The EMG signal and the remaining noise after RPCA represents the detail structure of the signal. The EEG excluded signal is then put into another analysis to extract EMG signal out of the noise signal. The EEG and EMG signal dictionaries can be constructed to ascertain that the signal extracted through matching pursuit implementation will only keep the low frequency components (EEG, EMG). These main structures can be used to detect the tongue pressing event and to recognize the tongue pressing region on the teeth.
Now referring primarily to
∫−∞+∞w(t)dt=0
The wavelet transform uses as the wavelet that satisfies the condition of dynamic scaling and shifting function, ws,p, shown below:
where ws,p(t) are the integrated and integral transformation signal, s is the scale and p is the shift parameter, which can also be the central location of the wavelet in the time domain. The wavelet can be stretched and translated with flexible windows by adjusting s and p, respectively.
The wavelet transform of the wireless received samples {tilde over (r)}(t) using transform coefficient W(s,p) is calculated as follows:
where
As illustrated in
where N is the number samples.
After f is estimated, it can be used to estimate the amplitudes and phases of different signals using the following:
In this way, the system obtains the desired quantities X, Ø, f. The presence of brainwave signal is observable through a short-time Fourier analysis. This event can be confirmed when the maximum power distribution of the peak frequency belongs to the range of 8-12 Hz.
In particular embodiments, an algorithm can be used to accurately recognize correct tongue-teeth pressing areas (13), as shown in
The resulting data can be overlappingly sliced windows using a Hamming window. Then each window goes through the feature extraction process where it is convolved with a filter bank to obtain the coefficients as feature vectors. The Mel filter bank or Mel-frequency cepstral coefficients (MFCC) can be used to process the signals. At this point, a matrix can be constructed of MFCC features (34) where the number of rows corresponds to the number of windows and the columns correspond to the dimension of MFCC features. Here, the Mel-coefficients and the first and second order derivatives can be applied to extend the feature space. In particular embodiments, the combination of Mel-coefficients, delta and double delta can be an improvement as compared to the results of using only Mel-coefficients alone.
Each sample can be represented by a set of MFCC features, and the distribution of these feature points can be used to estimate the Gaussian Mixture Model (GMM) and extract the mean vector for the final descriptor representation. Estimating the GMM from a set of data points can be achieved by 1) initializing the parameters of distribution, and 2) using the Expectation Maximization (EM) algorithm to adjust the model to fit the feature points. In this respect Random initialization may not be effective for the EM to converge to an optimal point, especially in the context of using bio-electrical signals. In some embodiments, a Universal Background Model (UBM) can be used for the initialization step.
The UBM model is a GMM model but is trained on the entire dataset. Therefore, using this model for initialization may help to generalize the problem characteristics, and thus helps the GMM adaptation to quickly converge. The processes of obtaining the GMM model based on UBM can be summarized in two stages: 1) using the EM algorithm for the entire dataset of samples to obtain the UBM model, and 2) using the Maximum a Posteriori Estimation to fit the UBM with the feature points of each sample to create the specific GMM model for the particular sample. The above procedure can be applied separately to the data resulting from each of the sensors (in the illustrative example six sensors generate six channels of data for one tongue posture). The use of multiple sensors to produce multiple discrete channels can significantly improve performance when the kernels of six discrete channels are averaged; however, this is not intended to preclude embodiments having fewer than six discrete channels.
Finally, the data can be classified by a Support Vector Machine with different kernels. In particular embodiments, three basic kernels can be utilized: linear, cosine and RBF. The above classification algorithm provides identifiable “tongue pressing areas” which can be distinguished from one another. In particular embodiments, a localization algorithm can be applied to continuously track the tongue pressing locations even at untrained areas. Different teeth require different activation muscles as well as brain activity. The localization algorithm builds a regression function that correlates the input bio-signal to an output x,y,z coordinate of each tooth or tongue pressing area.
Now referring primarily to
Based on these relationships, a regression model can approximate the x,y,z coordinate of a certain tongue pressing location based on the input from featured signal. Because the GMM has 42 dimensions in total, direct mapping to a three-dimensional location can require an intermediate step that transforms a high dimensional feature GMM into a three-dimensional coordinate. In particular embodiments, the informative level of each feature dimension can be compared to select the best three representative ones. Using the Principal Components Analysis (PCA), we can select the coordinates that represent our data, in particular embodiments, the top three coordinates can be utilized.
For example, the whole features may be used to extract the coefficient matrix of the PCA. Then each feature vector can be multiplied with the first three columns of the coefficient matrix to project it onto the three-dimensional space to construct a reference projection between feature coordinate and real world coordinate for regression analysis. We then apply a linear regression model to interpolate the relation between our ground truth data and the mapping feature location. A resulting non-linear regression model can be approximated by a linear model using Taylor's theorem. The regression model can be applied to a small fixed set of features from the original data and not to raw data.
Now referring primarily to
In the illustrative embodiment of
Now referring primarily to
Again, referring primarily to
A capacitance exists whenever two electrodes are separated from each other by a distance Ad. In the illustrative example, the copper tape can be separated from a human skin by a soft and deformable silicon Ecoflex 00-10 from Smooth On at about 1 mm. At the bottom electrodes on each ear, there can be another electrode at another side of the flexible form to capture the skin surface deformation caused by tongue movements. The tongue movement will create a distance changes between the two side of the flexible form. In particular embodiments, the movement can be captured using a piezoelectric sensor or an accelerometer. However, in preferred embodiments capacitive sensing can be used to capture the miniscule movement of the skin caused by the tongue behavior.
In the example, the distance changes between the two electrodes, one on the skin and the other on the wearable device, are measured. At a stable condition, the capacitance created by two metal plates can be calculated as C=∈0∈rA/d, where C is the capacitance in Farads, A is the area in meters square, d is distance between two plates in meters, and ∈ is dielectric constant, which is the product of free space ∈0 and relative dielectric constant of the material, ∈r.
When the tongue movement occurs, the skin surface deforms, the flexible material in the middle of two copper plates create a change in their area and distance. This generates a change in capacitance which can be measured by the capacitive sensor. Now referring primarily to
Turning now to some illustrative, but non-limiting examples:
EXAMPLE 1: A particular embodiment can be practiced using an open BCI board for EEG and EMG data collection. To measure the capacitance variation created by the skin surface deformation a MSP430FR5969 module can be used. Both devices can communicate to a Lenovo ThinkPad T570 laptop through Bluetooth Low Energy device at 115200 baud rate. The open BCI can be sampled at a sampling rate of up to about 250 Hz, and the MSP430FR5969 can be sampled at about 10 Hz. The data from open BCI can be streamed to a laptop computer through Lab Streaming Layer (LSL) network protocol written on python. The pre-processing and algorithms are implemented on Matlab R2017b. The Matlab and Python data are exchanged using a basic TCP protocol. The signal de-noising, extraction, classification (SVM GMM), localization algorithm can be implemented on Matlab.
EXAMPLE 2: To evaluate the performance of disclosed systems, an experiment was conducted over fifteen participants in a normal office environment. The participants' demographic is summarized in Table 1.
In the experimental setup, the wearer was fitted with the wearable device shown in
After the wearable device was correctly position on the wearer, the user sat in front of a monitor that instructed him on when to perform a gesture and when to rest his tongue. For each gesture, the wearer required approximately three seconds for performing a gesture. Additionally, each wearer was asked to perform ten gestures, twenty times for each gesture.
The data collected from fifteen participants was used to evaluate the system. Each wearer performed ten typing and one resting gesture. Each gesture was repeated twenty times. As such, there were three-thousand and three hundred total samples taken. Each epoch contained a matrix of six columns representing signal from six sensors (two EEG, two EMG, and two SKD). Seventy-five percent of the data was used for training and the remaining twenty-five percent of data was used for testing. The results depicted are the average accuracy for the whole data set collected.
Now referring again to
Now referring primarily to
Now referring primarily to
Additionally, method 50 includes an act 56 of identifying a tongue movement. Act 56 includes identifying the tongue movement based on the EEG signal and the EMG signal. Further, method 50 includes an act 58 of correlating the tongue movement. Act 58 comprises correlating the tongue movement with one of a plurality of tongue location areas. For example,
In addition, as to each term used it should be understood that unless its utilization in this application is inconsistent with such interpretation, common dictionary definitions should be understood to be included in the description for each term as contained in the Random House Webster's Unabridged Dictionary, second edition, each definition hereby incorporated by reference.
All numeric values herein are assumed to be modified by the term “about”, whether or not explicitly indicated. For the purposes of the present invention, ranges may be expressed as from “about” one particular value to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value to the other particular value. The recitation of numerical ranges by endpoints includes all the numeric values subsumed within that range. A numerical range of one to five includes for example the numeric values 1, 1.5, 2, 2.75, 3, 3.80, 4, 5, and so forth. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. When a value is expressed as an approximation by use of the antecedent “about,” it will be understood that the particular value forms another embodiment. The term “about” generally refers to a range of numeric values that one of skill in the art would consider equivalent to the recited numeric value or having the same function or result. Similarly, the antecedent “substantially” means largely, but not wholly, the same form, manner or degree and the particular element will have a range of configurations as a person of ordinary skill in the art would consider as having the same function or result. When a particular element is expressed as an approximation by use of the antecedent “substantially,” it will be understood that the particular element forms another embodiment.
Moreover, for the purposes of the present invention, the term “a” or “an” entity refers to one or more of that entity unless otherwise limited. As such, the terms “a” or “an”, “one or more” and “at least one” can be used interchangeably herein.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above, or the order of the acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
The present invention may comprise or utilize a special-purpose or general-purpose computer system that includes computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general-purpose or special-purpose computer system. Computer-readable media that store computer-executable instructions and/or data structures are computer storage media. Computer-readable media that carry computer-executable instructions and/or data structures are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: computer storage media and transmission media.
Computer storage media are physical storage media that store computer-executable instructions and/or data structures. Physical storage media include computer hardware, such as RAM, ROM, EEPROM, solid state drives (“SSDs”), flash memory, phase-change memory (“PCM”), optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage device(s) which can be used to store program code in the form of computer-executable instructions or data structures, which can be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention.
Transmission media can include a network and/or data links which can be used to carry program code in the form of computer-executable instructions or data structures, and which can be accessed by a general-purpose or special-purpose computer system. A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer system, the computer system may view the connection as transmission media. Combinations of the above should also be included within the scope of computer-readable media.
Further, upon reaching various computer system components, program code in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media at a computer system. Thus, it should be understood that computer storage media can be included in computer system components that also (or even primarily) utilize transmission media.
Computer-executable instructions comprise, for example, instructions and data which, when executed at one or more processors, cause a general-purpose computer system, special-purpose computer system, or special-purpose processing device to perform a certain function or group of functions. Computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like. The invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. As such, in a distributed system environment, a computer system may include a plurality of constituent computer systems. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
Those skilled in the art will also appreciate that the invention may be practiced in a cloud-computing environment. Cloud computing environments may be distributed, although this is not required. When distributed, cloud computing environments may be distributed internationally within an organization and/or have components possessed across multiple organizations. In this description and the following claims, “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The definition of “cloud computing” is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.
A cloud-computing model can be composed of various characteristics, such as on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model may also come in the form of various service models such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”). The cloud-computing model may also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth.
Some embodiments, such as a cloud-computing environment, may comprise a system that includes one or more hosts that are each capable of running one or more virtual machines. During operation, virtual machines emulate an operational computing system, supporting an operating system and perhaps one or more other applications as well. In some embodiments, each host includes a hypervisor that emulates virtual resources for the virtual machines using physical resources that are abstracted from view of the virtual machines. The hypervisor also provides proper isolation between the virtual machines. Thus, from the perspective of any given virtual machine, the hypervisor provides the illusion that the virtual machine is interfacing with a physical resource, even though the virtual machine only interfaces with the appearance (e.g., a virtual resource) of a physical resource. Examples of physical resources including processing capacity, memory, disk space, network bandwidth, media drives, and so forth.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
This application claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 62/849,325 filed on 17 May 2019 and entitled “Tongue Localization, Teeth Interaction, And Detection System,” which application is expressly incorporated herein by reference in its entirety.
This invention was made with government support under grant number 1602428 awarded by the National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62849325 | May 2019 | US |