The present invention generally relates to the field of electroencephalography and specifically to a system and method enabling rapid waveform annotation used to generate a high volume database.
The present invention provides systems and methods comprising: storing an annotated set of confirmed epileptiform discharges (ED) waveforms in a database; receiving, by a computing device, a signal encoding electroencephalograph (EEG) data from a plurality of electrodes each attached to a subject and detecting EEG data; generating a user interface displaying a plurality of waveforms based upon at least a portion of the EEG data; receiving an initial selection of a portion of one of the plurality of waveforms comprising an ED; identifying a list of candidate waveforms including potential EDs by determining an alignment of the initial selection with a portion of one of the plurality of waveforms in the EEG data; displaying the list of candidate waveforms on the user interface; receiving, from the user and via the user interface, an identification of a subset of the list of candidate waveforms; and storing the subset of the list of candidate waveforms as an annotated list of confirmed EDs in the database.
The above features and advantages of the present invention will be better understood from the following detailed description taken in conjunction with the accompanying drawings.
The present invention will now be discussed in detail with regard to the attached figures that were briefly described above. In the following description, numerous specific details are set forth illustrating the Applicant's best mode for practicing the invention and enabling one of ordinary skill in the art to make and use the invention. It will be obvious, however, to one skilled in the art that the present invention may be practiced without many of these specific details. In other instances, well-known machines, structures, and method steps have not been described in particular detail in order to avoid unnecessarily obscuring the present invention. Unless otherwise indicated, like parts and method steps are referred to with like reference numerals.
The capturing of Electroencephalogram (EEG) data involves recording of the brain's electrical activity from sensors placed on a subject's scalp; it is a measure of voltage fluctuations resulting from the ionic current flow within neurons of the brain, For a particular subject, EEG data can provide a continuous measure of cortical function with good temporal resolution. Consequently, EEG is often used in the diagnosis and management of neurological disorders such as epilepsy, sleep disorders, and encephalopathy, which can cause obvious abnormalities (i.e. special EEG waveforms) in EEG readings. For example, EEG recordings an important source of information about the neurological disorder known as epilepsy. EEG signals in people with epilepsy may contain waveforms known as epileptiform discharges (ED), also commonly referred to as “spikes” or “sharp waves,” which can be indicators of some abnormality or problem within the subject. Hereafter, we will use the terms ED and spikes interchangeably.
Epilepsy refers to a group of chronic brain disorders characterized by recurrent unprovoked seizures. EEG is the primary diagnostic test for epilepsy because EDs are a key diagnostic biomarker for epilepsy. When reviewing EEG data for a subject with epilepsy, for example, EDs show up within EEG signals as morphologically defined events that are paroxysmal, clearly distinguishable from the background with abrupt changes in polarity, and characterized by sharp contours. Because EDs occur almost exclusively in EEG data from epileptic patients, the presence of EDs predicts seizure recurrence, and facilitates the diagnosis of epilepsy, enabling appropriate treatment to be prescribed.
EEG interpretation is therefore conventionally performed by physicians with subspecialty training in neurology and clinical neurophysiology. This expertise requires many years of specialized medical training and exposure to 1000s of EEGs from a wide range of patients. When analyzing a patient's EEG data, physicians detect EDs by visually inspecting several EEG waveforms at a time. However, EDs within these various waveforms can be difficult to detect in a consistent manner due to wide patient variability and other factors. Additionally making the analysis of EEG data difficult, in many cases EEG data recordings can last from 30 minutes to several weeks, resulting in a vast amount of data to analyze.
Expert visual inspection and manual annotation are still the gold standard for interpreting EEG data. However, the process can be tedious and ultimately subjective—the agreement rate for the identification of EDs has been found as low as 60% between electroencephalographers for certain cases. In addition, EEGs are frequently misinterpreted by neurologists without specialized neurophysiology training, due to (i) the wide variety of morphologies of EDs and (ii) the similarity of EDs to wave shapes that are part of the normal background activity and to artifacts (e.g., potentials from muscle, eyes, and the heart) that are quite normal. As depicted in
Due to the difficulty of analyzing EEG data, many patients go undiagnosed and untreated, or are misdiagnosed by unqualified practitioners, leading to inappropriate medical interventions and avoidable suffering. Consequently, there is a need for an automated ED detection system. Such a system could potentially provide ED detection in a manner that is more efficient and reliable, and at lower cost, than is currently possible. Additionally, an automated system could potentially be widely deployed, overcoming the problem that qualified EEG experts are in short supply.
A hurdle to achieving a strong algorithm for ED detection is the lack of a sufficient database of annotated EEGs, which would provide a large number of exemplar EDs that may be referenced in identifying new EDs in new data. The primary challenge in obtaining such a database is that detailed manual annotation of waveforms can be slow and labor intensive, which can limit the potential sources of such annotated EEG data.
The brute-force approach to generating such a database is to manually annotate numerous EEG records. However, exhaustive manual annotation of EDs is prohibitively time-consuming, especially for EEG recordings with large numbers of EDs (e.g., up to thousands per hour). The time and labor required severely limits the availability of EEG experts to help establish a large database of annotated EDs.
Automated ED detection could be faster, less expensive, more objective, and potentially more accurate through the use of automated ED detection and classification schemes. Automated ED detection would enable wider availability of EEG diagnostics and more rapid referral to qualified physicians who can provide further medical investigation and interventions.
To address these issues, the present disclosure provides a system and method for at least partially automated EEG review and rapid waveform annotation. At the system's core lies a waveform analysis engine that performs template waveform matching using matching algorithms such as EuD and online machine learning and/or Dynamic Time Warping (DTW), which may substantially accelerate the task of annotating waveforms. These algorithms are described in more detail below.
The disclosed system and method are not limited to the annotation of EDs in scalp EEG recordings, but can also be readily generalized to other waveforms and signal types. The present disclosure, however, provides a number of examples involving the automated analysis and classification of waveforms derived from EEG data for evaluating patients for epilepsy, though it should be understood that the teachings of the present disclosure are equally applicable to other applications involving the analysis of waveforms describing EEG data for other purposes, and for other types of data. Examples of additional applications to EEG include but are not limited to the detection of waveforms characteristic of sleep (e.g. “spindles”, “K-complexes”, and “vertex waves”) and encephalopathy (e.g. “triphasic waves”). Examples of other medical data to which the invention is applicable include but are not limited to electrocardiogram (ECG) data (e.g. abnormal heart beat waveforms), respiratory time-series (e.g. abnormal breathing patterns such as apnea events), and imaging applications (e.g. annotation and detection of anatomical structures in MRI and CT images).
The present system includes a graphical user interface (GUI) designed for EEG review and rapid waveform annotation. To provide rapid annotation, in one embodiment, the system utilizes custom-built algorithms based on a combination of technologies of template matching and online machine learning. Once a user has selected an initial waveform such as an ED, that initial waveform is designated as a template. Template matching techniques can then be utilized to automatically generate a list of waveform candidates that may each contain an ED having a similar shape to that selected by the user. Once a set of waveform candidates is identified, the candidates are displayed back to the user, who can confirm the candidates that do, in fact, contain EDs. Online machine learning can then be used following this initial template matching step to generate a refined set of waveforms, each containing potential EDs. An alternative embodiment relies upon an approach employing DTW-based template matching to identify potential EDs.
The ED-matching approaches underlying the present system are based on the observation that, within the same patient or subject, EDs typically share a similar morphology. Thus, within particular subjects, waveforms such as EDs tend to be fairly stereotyped. Consequently, the identification of one example ED waveform as a template waveform for a particular subject can be utilized to enable rapid and automatic identification and extraction of many more candidate matching waveforms from the subject's EEG data, which can then be further accepted or rejected by an EEG reviewer.
With a suitable choice of similarity measure and ED templates, it therefore may be possible to extract many more similar candidate waveforms from the same EEG record in less time. Rather than annotating one ED at a time, for example, groups of potentially-similar EDs (typically 10-100 of EDs) can be identified and annotated by template matching, accelerating the review and annotation process.
The present system may also employ a cascade of differentiated classifiers for progressive background rejection in order to overcome the between-patient variability of EDs to achieve expert-level automated ED detection. Thus, the disclosed invention further comprises a fully automated ED detection algorithm that performs analogous to a human expert.
Before the EEG data can be analyzed according to the present methods, the EEG data may be pre-processed.
The preprocessing (Step 310) of the raw EEG file 308, described in the preceding paragraph, may produce clean EEG data 620 used to perform template matching (Step 710, below) within the similarity search 306.
After the raw EEG data has been pre-processed (Step 310) as illustrated in
In selected embodiments, the template matching algorithm used to execute step 710 of the similarity search can include a EuD algorithm or a DTW algorithm, the use of both being novel and interchangeable with the disclosed system. The EuD algorithm is based on a simple one-to-one alignment of waveforms that can be computed rapidly. This approach may include a high sensitivity to small variations in the morphology of the waveforms, which does not emphasize morphological features of an ED such as the sharp contour.
In some cases, the use of a DTW-based template-matching algorithm 710 may improve upon a EuD-based approach. As seen in
Using EuD algorithms to perform template matching may include generating a list of waveform candidates based on the z-normalized EuD (although other measures could be used as well) computed with respect to the template waveform selected by the user.
When performing EuD matching, the EuD between 2 time series Q=q1; q2; . . . qn and X=x1; x2; . . . ; xn of the same length is defined as:
ED(Q,X)=√{square root over (Σi=1n(q1−x1)n)}
The EuD is based on a simple one-to-one alignment of waveforms that can be computed rapidly. In a EuD based approach, template matching may be carried out based on the z-normalized EuD.
In the context of the current invention, the EuD approach may be used to measure the similarity between the template ED 700 provided by the user and any matches to the template (Step 710) found within in the clean EEG data 620. For each record, a distance lookup table may be computed beforehand with respect to the same randomly selected reference. To reduce computational complexity, the triangle inequality may be applied to reject samples far away from the given template, and narrow down the range of search to a small group of samples. The accepted samples may be further ranked according to the EuD to the given template in ascending order.
For each EEG recording, a distance look-up table (LUT) is computed beforehand with respect to a reference randomly selected. To reduce the computational complexity, triangle inequality is used with the LUT to abandon waveforms that are far away from a given template. In
Although fast, using EuD as the method of template matching has its drawbacks, resulting in occasional bad rankings in the list of waveforms. With feedback from the user (e.g., waveform selections subsequent to the first selected waveforms), the annotation may be cast into an online machine learning task. By continuously learning from previous annotations, the current ranking in the list may be refined by applying online machine learning.
As described below, following the template matching step in the similarity search, a user may select or identify waveforms identified by template matching that do, in fact, match the user's selected template. With feedback from the user, the annotation (e.g., the indication of whether the waveforms identified by template matching matched the user's template) can be cast into an online machine learning task. By continuously learning from the user's previous annotations, the current ranking in the list can be refined by applying online machine learning. Online machine learning can be used afterwards to refine the ranking in the list for further selection.
Online machine learning is a model of induction that learns sequentially. A key defining characteristic of online machine learning is that the true label of the instance is revealed soon after the prediction is made, to refine the prediction hypothesis for future trials. Due to continual feedback from the user confirming whether identified waveforms actually contain EDs, the online learning algorithms are able to adapt and learn in difficult situations.
The goal of the online machine learning algorithm is to minimize some performance criteria which are algorithm specific. As a non-limiting example in the disclosed system, the MATLAB-based toolbox LIBOL may be applied to provide a collection of various online machine learning algorithms.
One aim of the disclosed invention is to achieve faster annotation of EEG data by a variety of strategies, including preprocessing (Step 310, described above), DTW, and clustering of EDs. As noted above, DTW permits non-linear distortion of the time axis to achieve better waveform alignments and provides an alternative to EuD algorithms for template matching. In DTW, segments of a time series (e.g., the template ED) are aligned with segments of another time series (e.g., potential matching EDs within the EEG data 520), effectively allowing for matching similar waveforms in spite of small local dilations and stretches of the time axis.
Template matching to perform the similarity search (Step 306) using the DTW algorithm may include aligning 2 time series Q=q1, q2, . . . , qn and X=x1, x2, . . . , xn, a warping matrix DεRn
Although DTW often yields good matches, it can be computationally expensive. Returning to
Within this output of matching waveforms, there are typically numerous candidates with some degree of overlap. That is, waveform candidates that include overlapping data points, or data points that occur at the same time or within a threshold time period of one another. In some embodiments, the template matching algorithm may apply a sliding window when extracting potential ED waveform candidates matching the template provided by the user. The sliding window has a length of n and moves 1 data point each time along the time series X=x1, x2, . . . , xn with n being the length of the template, and N the length of the input EEG. For instance, if the candidate waveform Xm=xm, xm+1, . . . , xm+n-1, has a high ranking in the list, finding Xm±1 also in the list with similar rankings is likely (Steps 1100-1110). To remove candidates with large overlaps, the list of waveforms is scanned from top to bottom. Each candidate found to have more half of the window length overlapping with any candidate possessing a higher ranking is discarded from the list (Step 1120).
The overlap scan stops when there are 60 candidates 1140 with less than half of the window length of overlap (Step 1130). If there are less than 60 candidates after scanning the list of all 1,000 waveforms, the disclosed system discards these 1,000 waveforms (Step 1150) from the input EEG (Step 1120). The disclosed system then selects the next 1,000 DTW-based candidates 620 and repeats the overlap scan to identify more candidates (Steps 1100-1110). Ultimately, this process yields at most 60 candidates with less than half of the window length of overlap 1040, ready to be passed to the user for assessment. As seen below, the remaining candidates may be grouped into dusters of 10 waveforms each, according to the characteristic Vpp.
A threshold is applied to each for the 60 candidate wave forms, such that only candidates with Vpp, greater than the threshold 1210 are kept, which takes the value of 95% of the minimum peak-to-trough voltage (Vpp)min obtained from existing annotated EDs. Candidates with Vpp, less than the threshold are discarded (Step 1220) and the Clean EEG is modified accordingly.
In some embodiments, the efficiency of the DTW algorithm may be further improved through use of a modification of the Trillion algorithm from the UCR (University of California, Riverside) suite. The disclosed invention utilizes the most recently annotated ED as the template, and executes the Trillion algorithm for rapid DTW searching for template matching. The UCR suite draws on four ideas to reduce the computational complexity and increase the speed of DTW, namely: early abandoning z-Normalization; reordering early abandoning; reversing the query/data role in lower bound computation; and cascading lower bounds. As a result, DTW can be applied to relatively large datasets, including EEG recordings used for ED detection.
One technique used in the Trillion algorithm to speed up template matching with an expensive distance measure such as DTW is to use a cheap-to-compute lower bound to prune off unpromising candidates, in order to interleave early abandoning calculations of the lower bound with online z-normalization to optimize the normalization step. In other words, as the z-normalization is incrementally computed, the disclosed system may also incrementally compute the lower bound of the same data point. Thus, if this computation can be abandoned early, not only can the distance calculation be pruned, but also the normalization steps.
In similarity search, each subsequence needs to be normalized first. In the Trillion algorithm, the mean of the subsequence can be obtained by keeping two running sums of the long time series, which have a lag of exactly m values. The sum of squares of the subsequence can be similarly computed. The formulas are given below for clarity:
Online normalization in the Trillion algorithm enables early abandoning of the distance computation of the lower bound in addition to the normalization. A high-level outline of the algorithm shown in Table 1 below.
Instead of the conventional left-to-right ordering to incrementally compute the distance, the disclosed invention when using the Trillion algorithm for DTW calculations may utilize a universal optimal ordering. It is conjectured that the universal optimal ordering is to sort the indices based on the absolute values of the z-normalized Q. The intuition behind this idea is that the value at Qi will be compared to many Xi values during a search. However, for subsequence search with z-normalized candidates, the distribution of many Xi values will be approximately Gaussian, with a mean of zero. Thus, sections of the query that are farthest from the zero mean will tend to have the largest contributions to the distance measure. This universal optimal ordering used in the Trillion algorithm has been empirically validated by comparing it with the empirically determined optimal ordering in a series of numerical experiments, yielding a correlation value of 0.999.
Usually, lower bounds are applied to build an envelope around the query Q. However, as discussed in the next section, envelopes can also be formed around the candidate X in a “just-in-time,” fashion, to handle the scenario where all other bounds fail to further prune the candidate waveform. In the Trillion algorithm, this removes the space overhead and the time overhead pays for itself by pruning more full DTW calculations.
An efficient strategy used in the Trillion algorithm to speed up time series similarity search is to use lower bounds to admissibly prune off unpromising candidates. The Trillion algorithm applies all of these lower bounds in cascade. The algorithm first considers the O(1) lower bound LBKimFL, which can be a weak but fast-to-compute lower bound that prunes many candidates. If a candidate is not pruned at this stage, the O(n) lower bound LBKeoghEQ is considered. If this lower bound is completed without exceeding the best-so-far, the Trillion algorithm reverses the query/data role and computes lower bound LBKeoghEC. If this bound does not allow pruning, the Trillion algorithm starts the early abandoning calculation of DTW.
Returning to
Returning to
The computing device may be any computer or program that provides services to other computers, programs, or users either in the same computer or over a computer network. Such computing devices may include, as non-limiting examples, a desktop computer, a laptop computer, a server computer etc.
The computing device may be communicatively coupled to data storage including any information requested or required by the system and/or described herein. The data storage may be any computer components, devices, and/or recording media that may retain digital data used for computing for some interval of time.
The user interface shown in
The EEG viewer may comprise basic navigation functions such as shifting along time either at different step size or via a swift slider, amplitude scaling up/down, montage swap, and manual annotation. Montage swap buttons enable easy switching among the three most commonly used EEG data display schemes, i.e., mono-polar, common-average, and bipolar montage, catering to the needs of neurophysiologists to display EEG in different formats.
Apart from traditional navigation along the time axis with fixed time step-size or sliders, the disclosed system allows for navigation through the waveforms by annotation. EDs annotated can be reviewed or jumped to in the EEG data using the buttons “previous spike” and “next spike”, which jumps directly to the nearest (±1) waveform marker found in the record. Annotation status in terms of total current waveform count and online machine learning classification rate are shown at the top for the purpose of supervision.
The button “Auto-Template Match” activates the similarity search process using the template matching algorithms (i.e., EuD or DTW) disclosed above for rapid waveform annotation. To execute this function, the user may manually select a waveform template by left clicking the mouse at the waveform (right clicking to un-select) before pressing the button. Pressing the button triggers a similarity search.
Thus, the disclosed system is semi-automated in the sense that users are actively involved in 2 types of tasks: (i) the user needs to provide templates 700 for EuD and/or DTW-based template matching algorithms, as seen in
In addition to selecting individual candidate waveforms matching the template, as seen in
In both EuD and DTW, once all candidates are returned and displayed to the user, the user may select all or just some of the candidates that they confirm are in fact representatives of the candidate signal of interest, or deselect those that are not. The end result is therefore that all EDs are certified by an expert's recognition as being valid. The process may then repeat until the user is satisfied that all the samples in a given EEG have been found and marked. This process may also be repeated for hundreds of EEGs, which are then moved into the database.
Returning now to
To cope with the wide between-patient variability inherent in EEG data, one approach to classifier development may be based on the concept of classifier ensembles and cascades, i.e. building up a sophisticated classifier out of many simpler classifiers. Classifier cascades have the advantage of being able to deal with extreme pattern variability (no single classifier in the cascade is expected to do all the work), and computational efficiency.
Deep machine learning techniques may also be used to train (Step 510) and test (Step 500) the classifiers against EEG data 520. Employing a cascade of differentiated classifiers for progressive background rejection may overcome the between-patient variability of EDs to achieve expert-level ED detection. This method is similar to boosting, where a strong algorithm emerges from an ensemble of weak classifiers. Background waveforms are rejected partially at each step while reserving all or nearly all valid EDs.
Returning now to
Simple classifiers (e.g., generated using extreme learning machine (ELM), support vector machine (SVM), or support vector regression (SVR) methods) are trained using the created database in order to generate a large pool of weak classifiers. Training of the overall ED detection system may include subjecting the EEG to a cascade of simple classifiers, beginning with classifiers that are simple filters that are designed to remove obvious background wave forms, then using progressively more complicated classifiers that may use more features or more intensive computations.
Training of classifiers may occur in a series, beginning with a simple classifier. As a simpler classifier makes mistakes, the incorrectly specified, or otherwise incorrect data may be used to train a second, more complex classifier, which will also make mistakes. These mistakes may be used to train a third, more complex classifier, and so on. Thus, the training scheme for the overall detection system may determine an order to rank how effective the classifiers are. Ranks are assigned using receiver operating characteristic (ROC) curves, derived by changing the discriminant threshold upon classification scores.
The threshold may be determined as the value for the threshold that preserves 99.9% of EDs (sensitivity). The rate of background rejected (specificity) may also be recorded. The classifiers can be sorted according to the specificity values in a descending order, to form a cascaded queue.
In
Returning to
The steps included in the embodiments illustrated and described in relation to
Other embodiments and uses of the above inventions will be apparent to those having ordinary skill in the art upon consideration of the specification and practice of the invention disclosed herein. The specification and examples given should be considered exemplary only, and it is contemplated that the appended claims will cover any other such embodiments or modifications as fall within the true scope of the invention.
The Abstract accompanying this specification is provided to enable the United States Patent and Trademark Office and the public generally to determine quickly from a cursory inspection the nature and gist of the technical disclosure and in no way intended for defining, determining, or limiting the present invention or any of its embodiments.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US15/45886 | 8/19/2015 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62039070 | Aug 2014 | US |