1. Field of the Invention
The present invention generally relates to a method and apparatus for classifying sleep recordings, and more particularly to a method and apparatus for classifying electroencephalogram sleep recordings into sleep stages using conditional random fields and subject adaptation.
2. Description of the Related Art
Sleep is indispensable to everybody. About one-third of Americans exhibit some kind of sleep problem. Hence, the study of sleep patterns, much of which is through sleep recordings, has consistently been an important research topic.
A typical sleep recording has one or more channels of electroencephalogram (EEG) waves coming from electrodes. Sleep staging is the pattern recognition task of classifying sleep recordings into sleep stages (e.g., wake, sleep) continuously. This task is crucial for the diagnosis and treatment of various sleep disorders. In addition, it relates closely to brain-machine interfaces, where successful classification can facilitate disabled people to control computers. Sleep staging is also of special interest to the study of avian bird song system and the evolutionary theory of mammalian sleep.
Many statistical pattern recognition methods, such as autoregression, and hidden Markov model (HMM), have been used to build an automatic, online sleep stager. Despite all these efforts, existing sleep stagers can only achieve average classification accuracy below 80%, which is insufficient for physicians to diagnose sleep disorders correctly. (In brain-computer interfaces, incorrect EEG wave classification can cause computers to receive wrong instructions.)
In view of the foregoing and other exemplary problems, drawbacks, and disadvantages of the conventional methods and structures, an exemplary feature of the present invention is to provide a method and system that can continuously classify electroencephalogram sleep recordings into sleep stages (e.g., wake, sleep) with improved classification accuracy.
In a first aspect of the present invention, a method of subject-adaptive, real-time sleep stage classification to classify electroencephalogram sleep recordings into sleep stages to determine whether a subject exhibits a sleep disorder includes performing subject adaptation to improve classification accuracy for a new subject with limited training data, the performing subject adaptation comprises using linear-chain conditional random fields and potential functions, training the linear-chain conditional random fields using the training data, continuously receiving the electroencephalogram waves, continuously extracting features from the electroencephalogram waves, the extracting features comprising transforming each of the electroencephalogram waves to capture information embedded in the electroencephalogram waves, and continuously classifying the sleep stages according to extracted features and learned parameters from the linear-chain conditional random fields. The performing subject adaptation includes using training data from a plurality of old subjects to obtain a prior distribution of conditional random field parameters, using the prior distribution of conditional random field parameters in combination with training data for at least one new subject to obtain a regulated estimate of conditional random field parameters, and using the regulated estimate of conditional random field parameters to classify the electroencephalogram waves of the at least one new subject to determine whether a subject exhibits a sleep disorder.
The present method (and system) provides an automatic, online sleep stager based on a recently developed statistical pattern recognition method, conditional random field (CRF), and novel potential functions that have explicit physical meanings. The sleep stager's classification accuracy is much higher than that of existing methods.
One challenge for sleep staging is that in practice, there is often enough training data Dold from several old subjects sold but very limited training data Dnew from a new subject snew, as it often takes several days or several weeks to label sufficient Dnew for snew manually. In this case, it is undesirable to train the parameter vector Θ of the CRF by only using Dnew.
The present invention, however, may perform subject adaptation to improve the classification accuracy on snew. The present invention uses the knowledge on Θ that is learned from Dold to obtain a regulated estimate of Θ from Dnew. In this way, the classification accuracy on snew increases with the size of Dnew and eventually becomes close to the theoretical limit. Especially, even without any Dnew, the average accuracy on snew can be quite good.
CRF was originally proposed by the natural language processing community in 2001 and has been successfully applied to pattern recognition tasks in computer vision. In contrary to HMM, CRF directly models the probabilities of possible label sequences given an observation sequence, without making unnecessary independence assumptions on the observation elements. Consequently, CRF overcomes HMM's shortcoming of being unable to represent multiple interacting features or long-range dependencies among the observation elements. Neither the application of CRF nor subject adaptation has been studied before in EEG wave classification.
The foregoing and other exemplary purposes, aspects and advantages will be better understood from the following detailed description of an exemplary embodiment of the invention with reference to the drawings, in which:
Referring now to the drawings, and more particularly to
As indicated above, the present invention uses the concept of CRF. Let X be the observation sequence, and Y be the corresponding label (state) sequence. The CRF is defined (and exemplarily illustrated in
Definition. Let G=(V, E) be a graph such that Y=(yv)v∈V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional random field in case, when conditioned on X, the random variables yv obey the Markov property with respect to the graph: P(yv\X, yw, w≠v)=P(yv\X, yw, w˜v), where w˜v means that w and v are neighbors in G.
A special case of CRF is the linear-chain CRF (LCRF), where the graph G is a linear chain so that each yi has exactly two neighbors: yi−1 and yi+1. In this case, the distribution of the label sequence Y given the observation sequence X has the following form:
p(Y\X)αexp{Σl=1n└Σj=1k
Here, ƒj and gj are called potential functions. λj and μj are parameters. The selection of appropriate potential functions is both application-dependent and critical to the success of the CRF method.
The sleep stager of the present invention uses linear-chain CRFs. In this case, X=(
The sleep stager of the present invention uses the following two kinds of potential functions, the first one is for ƒj and the second one is for gj:
1y
1y
Here, the indicator function
the number of potential functions is k=|S|2+|S|m. Local features are often the most important ones. Hence, at any time point i(1≦i≦n), the sleep stager focuses on the local observation elements and only consider the first-order term
Given the k=k1+k2 potential functions, parameter estimation (i.e., learning λj's and μj's from a labeled training data set) and inference making (i.e., given X, computing the most likely Y) in the CRF are performed using the forward-backward dynamic programming and Viterbi algorithms.
The present invention uses a method of subject adaptation. This technique combines the (usually sufficient) training data sequence (Xold, Yold) from several old subjects sold with the (possibly insufficient) training data sequence (Xnew, Ynew) from a new subject snew to improve the classification accuracy on snew. Let Θ be the column parameter vector of the CRF that contains λj's and μj's. Lold(Θ)=1np(Yold\Xold,Θ) and Lnew(Θ)=1np(Ynew\XnewΘ) are the log-likelihood functions for sold and snew, respectively. Let {circumflex over (Θ)} denote the maximum-likelihood estimator (MLE) of Θ on sold. A theorem about MLE asserts that {circumflex over (Θ)} asymptotically follows a normal distribution, whose mean vector and covariance matrix are Θ and Σ=−(∇2Lold)−1, respectively. Here, ∇2Lold is the Hessian matrix of Lold(Θ). This can be viewed as a prior of Θ when we fit the same model to snew. The corresponding probability density function is
p(Θ)∝exp{−(Θ−{circumflex over (Θ)})T·Σ−1·(Θ−{circumflex over (Θ)})/2}=exp{(Θ−{circumflex over (Θ)})T·∇2Lold·(Θ−{circumflex over (Θ)})/2}. (2)
From Bayes' theorem, the posterior distribution of Θ is
p(Θ|Xnew, Ynew)∝p(Ynew\Xnew, Θ)p(Θ)∝exp{Lnew(Θ)+(Θ−{circumflex over (Θ)})T·∇2Lold·(Θ−{circumflex over (Θ)})/2}. (3)
The gradient of Lold(Θ), ∇Lold, can be efficiently computed using a backward-forward dynamic programming method. ∇2Lold can be computed numerically by taking difference quotients of ∇Lold. Then one can obtain the point estimate Θ for snew by maximizing Lnew(Θ)+(Θ−{circumflex over (Θ)})T·∇2Lold·(Θ−{circumflex over (Θ)})/2 (e.g., using the BFGS method).
Each EEG recording is first transformed to capture the embedded, useful information. This process is called feature extraction. The most popular signal processing techniques for feature extraction include wavelet transform, fast Fourier transform, zero-crossing, parametric waveform recognition, etc. The present invention uses an approach based on power spectral properties of the EEG signal. The Thompson multi-taper method is applied to 3-second moving window to obtain the localized power spectral density (PSD) with between-window-shift of 2.7 seconds. For each frequency ƒ and each time point i, the logarithm of the PSD is normalized across time to obtain the Z score zf,i, where normalization is performed by first subtracting the mean and then dividing by the standard deviation.
For human beings, the method uses m=6 disjoint frequency bands: 0.2 Hz-4 Hz, 4.2 Hz-8 Hz, 8.2 Hz-12 Hz, 12.2 Hz-16 Hz, 16.2 Hz-23 Hz, and 23.2 Hz-29 Hz, which jointly contain 99% of the power of EEG waves. (For birds (e.g., zebra finches), we choose m=4 disjoint frequency bands: 1 Hz-5 Hz, 5.5 Hz-10 Hz, 10.5 Hz-20 Hz, and 20.5 Hz-30 Hz.) The justifications for selecting these frequency bands are as follows. First, the PSD curves of various stages are well separated within these bands. Second, human sleep is characterized into different stages based on the frequency content of the delta-wave (0 Hz-4 Hz), theta-wave (4 Hz-8 Hz), alpha-wave (8 Hz-13 Hz), beta1-wave (13 Hz-22 Hz), and beta2-wave (22 Hz-35 Hz), which are similar to our frequency bands. Hence, the features contained within these bands should provide enough discrimination power for stage classification.
For the jth (1≦j≦6) band, at time point i, let {tilde over (x)}i,j denote the maximum Z score within this band. That is, {tilde over (x)}i,j=max{Zf,i, for all frequencies f in the jth band}. Since occasionally the recording has very large noise caused by movement, we truncate {tilde over (x)}i,j by xi,j=sign({tilde over (x)}i,j) min{|{tilde over (x)}i,j|, Λ), where A=5.
Vector
A typical hardware configuration of an information handling/computer system in accordance with the invention preferably has at least one processor or central processing unit (CPU).
The CPUs are interconnected via a system bus to a random access memory (RAM), read-only memory (ROM), input/output (I/O) adapter (for connecting peripheral devices such as disk units and tape drives to the bus), user interface adapter (for connecting a keyboard, mouse, speaker, microphone, and/or other user interface device to the bus), a communication adapter for connecting an information handling system to a data processing network, the Internet, an Intranet, a personal area network (PAN), etc., and a display adapter for connecting the bus to a display device and/or printer (e.g., a digital printer or the like).
In addition to the system and method described above, a different aspect of the invention includes a computer-implemented method for performing the above method. As an example, this method may be implemented in a computer system environment.
Such a method may be implemented, for example, by operating a computer, as embodied by a digital data processing apparatus, to execute a sequence of computer-readable instructions. These instructions may reside in various types of computer-readable media.
Thus, this aspect of the present invention is directed to a programmed product, comprising computer-readable media tangibly embodying a program of computer-readable instructions executable by a digital data processor incorporating the CPU and hardware above, to perform the method of the invention.
This computer-readable media may include, for example, a RAM contained within the CPU, as represented by the fast-access storage for example. Alternatively, the instructions may be contained in another computer-readable media, such as a magnetic data storage diskette, directly or indirectly accessible by the CPU. Whether contained in the diskette, the computer/CPU, or elsewhere, the instructions may be stored on a variety of computer-readable data storage media, such as DASD storage (e.g., a conventional “hard drive” or a RAID array), magnetic tape, electronic read-only memory (e.g., ROM, EPROM, or EEPROM), an optical storage device (e.g. CD-ROM, WORM, DVD, digital optical tape, etc.), paper “punch” cards, or other suitable computer-readable media including transmission media such as digital and analog and communication links and wireless. In an illustrative embodiment of the invention, the computer-readable instructions may comprise software object code.
While the invention has been described in terms of several exemplary embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.
Further, it is noted that, Applicants' intent is to encompass equivalents of all claim elements, even if amended later during prosecution.
Number | Name | Date | Kind |
---|---|---|---|
5320109 | Chamoun et al. | Jun 1994 | A |
6419629 | Balkin et al. | Jul 2002 | B1 |
7299088 | Thakor et al. | Nov 2007 | B1 |
20060085190 | Gunawardana et al. | Apr 2006 | A1 |