The present invention relates to a method and system for automated analysis of distributional data, particularly flow cytometry data, using support vector machines.
Flow cytometry is the measurement of characteristics of minute particles suspended in a flowing liquid stream. A focused beam of laser light illuminates each moving particle and light is scattered in all directions. Detectors placed forward of the intersection point or orthogonal to the laser beam receive the pulses of scattered light, generating signals which are input into a computer analyzer for interpretation. The total amount of forward scattered light detected depends on particle size and refractive index but is closely correlated with cross-sectional area of the particle as seen by the laser, whereas the amount of side scattered light can indicate shape or granularity.
One of the most widely used applications of flow cytometry is that of cellular analysis for medical diagnostics, where the particles of interest are cells suspended in a saline-containing solution. Flow cytometry techniques offer a high-throughput system for collecting large amounts of cell data. Flow cytometry is an effective tool in detecting abnormalities such as MM, CLL, LGL, AML, ALL, MDS, CMML, Lymphoma, MBL, etc. from samples of various types including bone marrow, peripheral blood, and tissue. Further properties of the cell, such as surface molecules or intracellular constituents, can also be accurately quantitated if the cellular marker of interest can be labeled with a fluorescent dye; for example, an antibody-fluorescent dye conjugate may be used to attach to specific surface or intracellular receptors. Immunophenotyping by characterizing cells at different stages of development through the use of fluorescent-labeled monoclonal antibodies against surface markers is one of the most common applications of flow cytometry. Other dyes have been developed which bind to particular structures (e.g., DNA, mitochondria) or are sensitive to the local chemistry (e.g., Ca++ concentration, pH, etc.).
While flow cytometry is widely used in medical diagnostics, it is also useful in non-medical applications, such as water or other liquid analysis. For example, seawater may be analyzed to identify presence of or types of bacteria or other organisms, milk can be analyzed to test for microbes, and fuels may be tested for particulate contaminants or additives.
The laser beam that is used is of a suitable color to excite the fluorochrome or fluorochromes selected. The quantity of fluorescent light emitted can be correlated with the expression of the cellular marker in question. Each flow cytometer is usually able to detect many different fluorochromes simultaneously, depending on its configuration. In some instruments, multiple fluorochromes may be analyzed simultaneously by using multiple lasers emitting at different wavelengths. For example, the FACSCalibur™ flow cytometry system available from Becton Dickinson (Franklin Lakes, N.J.) is a multi-color flow cytometer that is configured for four-color operation. The fluorescence emission from each cell is collected by a series of photomultiplier tubes, and the subsequent electrical events are collected and analyzed on a computer that assigns a fluorescence intensity value to each signal in Flow Cytometry Standard (FCS) data files. Analysis of the data involves identifying intersections or unions of polygonal regions in hyperspace that are used to filter or “gate” data and define a subset of sub-population of events for further analysis or sorting.
The International Society for Analytical Cytology (ISAC) has adopted the FCS Data File Standard for the common representation of FCM data. This standard is supported by all of the major analytical instruments to record the measurements from a sample run through a cytometer, allowing researchers and clinicians to choose among a number of commercially-available instruments and software without encountering major data compatibility issues. However, this standard stops short of describing a protocol for computational post-processing and data analysis.
Due to the large amount of data present in a flow cytometry analysis, it is often difficult to fully utilize the data through a manual process. The high dimensionality of data also makes it infeasible to use traditional statistical methods and learning techniques such as artificial neural networks. The support vector machine is a kernel based machine learning technique capable of processing high dimensional data. It can be an effective tool in handling the flow data with an appropriately designed kernel.
The flow data of a single case typically consist of multiple tubes. Each tube may contain simultaneous measurements of multiple assays. Each run typically collects over 104 events when all the assays are measured, which can produce on the order of 106 measurements for analysis.
The traditional approach in analyzing the flow data typically involves a “gating” method on the data to separate certain groups of cells and a manual examination of a large collection of 2D plots of the data with two parameters at a time. The features of flow cytometry data useful for diagnostics are usually presented in the distribution of attribute values in a high dimensional space. As a result it is difficult for human readers to perceive the convoluted, high dimensional patterns within the data.
Modern technological advancements, such as flow cytometry, have created a vast amount of data in many different forms. One of the greatest challenges presented to computer and information scientists by this information explosion is to develop effective methods to process large quantities of data and extract meaningful information. Traditional statistical methods, though effective on low dimensional data, have proven to be inadequate in processing the “new data” which are often characterized by high complexity and high dimensionality. In particular, the so called “curse of dimensionality” is a serious limitation on the classical statistical tools. Machine learning represents a promising new paradigm in data processing and analysis to overcome the limitations. It uses a “data-driven” approach to automatically “learn” a system, which can be used to make classifications or predictions on future data. Support Vector Machine (SVM) is a state-of-the-art machine learning technology that has revolutionized the field of machine learning and has provided real, effective solutions to many difficult data analysis problems.
SVM combines the concepts of an optimal hyperplane in a high-dimensional inner product space (often an infinite-dimensional Hilbert space) and a kernel function defined on the input space to achieve the flexibility of data representations, computational efficiency, and regularization on model capacities. SVM can be used to solve both classification (pattern recognition) and regression (prediction) problems. A typical SVM pattern recognition setting is given below.
Given a set of training data:
xi, yi i=1,2, . . . , m
The SVM training can be formulated as a problem of finding an optimal hyperplane:
Using Lagrange multipliers, it is transformed to the dual problem:
Solving the quadratic programming problem, we have the SVM solution:
Due to the complexity of the flow cytometry data, it is difficult to explicitly extract necessary features or define patterns that will predict cytogenetic results. The SVM based system offers a distinctive advantage that it requires only a similarity measure between examples to construct the classifier.
According to the present invention, a computer-assisted flow cytometry data analysis system is provided to automate most of the tedious steps of the analysis process, by using advanced machine learning technologies and other mathematical algorithms. Support Vector Machines (SVM) with custom distribution kernel are used to detect abnormal flow distributions. Gaussian Mixture models (GMM) are applied to automatic clustering and gating. A special graph algorithm is developed for automatic gate recognition.
This system retains the traditional features such as gating definition and adjustment, 2D plots, and statistical tables. However, it provides automation at all analysis steps. Furthermore, the SVM method facilitates analyses far beyond the 2D or 3D limitation in the traditional approach.
The inventive system provides automated flow cytometry data analysis including automatic gate prediction, automatic determination of normal versus abnormal for each plot (each marker), automatic determination of abnormal results based on summary table, automated determination of disease type based on combination of abnormalities (summary table, individual plots, and gates distribution). The system provides a user with the ability to train and customize designation of normal versus abnormal. In some embodiments, the flow cytometry analysis system provides means for distinguishing normal from abnormal by displaying labeled plots and values with a visually-distinctive feature, which can be achieved using a specified color, e.g., red, by highlighting, underlining bolding, or any other visually-detectable indicator so clearly flag abnormal results for the system user. The flagged results will be recorded in the associated patient records for evaluation by a pathologist, physician or other medical personnel.
The inventive system will help pathologists significantly improve the accuracy and efficiency in analyzing flow data. It will also provide a powerful tool in discovery of new patterns in flow cytometry.
Support vector machines, examples of which are generally disclosed in U.S. Pat. No. 6,760,715, U.S. Pat. No. 7,117,188 and U.S. Pat. No. 6,996,549, among others, which are incorporated herein by reference, are utilized to analyze flow cytometry data generated by a conventional commercial flow cytometry set-up. Exemplary systems for practicing flow cytometry measurement are described in U.S. Pat. No. 5,872,627, and U.S. Pat. No. 4,284,412, which are incorporated herein by reference. In the specific examples described herein, the data relates to a medical diagnostic application, specifically for detecting hematological conditions such as myelodysplastic syndrome (MDS). Flow cytometric immunophenotyping has proven to be an accurate and highly sensitive method for detection of quantitative and qualitative abnormalities in hematopoietic cells even when combined morphology and cytogenetics were non-diagnostic. The automated flow cytometry data analysis system disclosed herein provides the ability to automatically analyze the huge volumes of data generated during flow cytometry measurement, enhancing the accuracy, repeatability and versatility of flow cytometric methods. Such a capability enhances not only the diagnostic value of flow cytometry but also expands research applications of the method by enabling collection and analysis of massive amounts of flow cytometry data from many subjects for data mining and pattern recognition that go far beyond current limited approaches.
In one aspect of the invention, a method for analysis and classification of flow cytometry data, wherein the flow cytometry data comprises a plurality of features that describe the data, includes the steps of: downloading an input dataset comprising flow cytometry events for a population of cells into a computer system comprising a processor and a storage device, wherein the processor is programmed to execute at least one support vector machine and performs the steps of: defining a hierarchical structure of analytical elements, each analytical element corresponding to a different gating definition, wherein each analytical element applies a gating algorithm to classify a subpopulation of cells according to predetermined criteria on a combination of parameters, wherein the classification is performed using a support vector machine with a distributional kernel; and generating an output display at a display device with an identification of a flow cytometry data classification. In some embodiments, the method further includes selecting a subpopulation of cells and analyzing the selected subpopulation of cells using a different analytical element that applies a different gating algorithm to further classify the subpopulation. In a preferred embodiment, the distributional kernel comprises a Bhattacharya affinity having the form:
where p and q are input data points, M is the mean of a normal distribution and is a covariance matrix. The hierarchical structure may be a tree having a plurality of branches, and further includes a conclusion analysis step for combining results produced by each branch into a diagnostic classification. The diagnostic classification may comprise either presence or absence of a disease. The different gating definition may be selected from the group consisting of sample tube identity, debris vs. non-debris, granulocytes, monocytes, lymphocytes, negative marker intensity and diminished marker intensity.
In another aspect of the invention, a method for automatically analyzing flow cytometry data includes the steps of detecting side scatter and forward scatter events for a sample; generating a plurality of plots of the side scatter and forward scatter events in two- or three dimensions, the plurality of plots comprising flow cytometry data; processing the plurality of plots using a hierarchical structure of analytical elements, each analytical element corresponding to a different gating definition, wherein each analytical element applies a gating algorithm to classify a subpopulation of cells according to predetermined criteria on a combination of parameters, wherein the classification is performed using a distributional kernel; and generating an output at a display device with an identification of one or more flow cytometry data classifications. The method may further comprise selecting a subpopulation of cells and analyzing the selected subpopulation of cells using a different analytical element that applies a different gating algorithm to further classify the subpopulation. In a preferred embodiment, the distributional kernel is a Bhattacharya affinity having the form
where p and q are input data points, M is the mean of a normal distribution and is a covariance matrix. The hierarchical structure may be a tree having a plurality of branches, and may further include a conclusion analysis step for combining results produced by each branch into a diagnostic classification. The diagnostic classification may be either presence or absence of a disease. The different gating definition is selected from the group consisting of sample tube identity, debris vs. non-debris, granulocytes, monocytes, lymphocytes, negative marker intensity and diminished marker intensity.
In still another aspect of the invention, a system for automated analysis of flow cytometry data includes a computer processor in communication with a memory having stored therein flow cytometry data comprising a plurality of assays performed on a plurality of samples comprising cells, the flow cytometry data comprising side scatter and forward scatter events; and a computer-program product embodied in a non-transitory computer readable medium, the computer-program product comprising instructions for causing the computer processor to: receive the flow cytometry data; generate a plurality of plots of the side scatter and forward scatter events in two- or three dimensions; process the plurality of plots using a hierarchical structure of analytical elements, each analytical element corresponding to a different gating definition, wherein each analytical element applies a gating algorithm to classify a subpopulation of cells within the samples according to predetermined criteria on a combination of parameters, wherein the classification is performed using a distributional kernel; and generate an output at a display device with an identification of one or more flow cytometry data classifications of the cells. The computer-program product may further include instructions for causing the computer processor to select a subpopulation of cells and analyze the selected subpopulation of cells using a different analytical element that applies a different gating algorithm to further classify the subpopulation. In a preferred embodiment, the distributional kernel comprises a Bhattacharya affinity having the form:
where p and q are input data points, M is the mean of a normal distribution and Σ is a covariance matrix. The hierarchical structure may be a tree having a plurality of branches, and the system may further include a conclusion analysis step for combining results produced by each branch into a diagnostic classification. In some embodiments, the diagnostic classification comprises either presence or absence of a disease. The different gating definition is selected from the group consisting of sample tube identity, debris vs. non-debris, granulocytes, monocytes, lymphocytes, negative marker intensity and diminished marker intensity. In some embodiments, the memory is associated with a flow cytometry instrument and is specific to an individual subject, while in other embodiments, the memory may be a database configured for storing accumulated flow cytometry data generated from samples collected from multiple subjects.
According to the present invention, a method and system are provided for analysis of flow cytometry data. In particular, the inventive method includes creation of kernels for use in the analysis of data of distributional nature. An input data p in a flow cytometry application is a collection of a large number of points in a space. For example, an image can be regarded as a set of points in a 2-dimensional space. After proper normalizations, p may be viewed as a probability distribution. To define a kernel on two such input data p and q to capture the distributional trends, one must define a function on p and q that measures the similarity between the two entire distributions rather than just the individual points in the distributions.
One way to construct such a “distributional kernel” is to use a distance function (divergence) between the two distributions. If ρ(p, q) is a distance function, then the following is a kernel
k(p,q)=e−ρ(p,q). (1)
There are many distance functions that measure the discrepancy between two probability distributions. Kullback-Leibler divergence, Bhattacharya affinity, Jeffrey's divergence, Mahalanobis distance, Kolmogorov variational distance, and expected conditional entropy are all examples of such distances. Given a distance function, a kernel can be constructed based on the above formula.
For example, a special custom kernel can be constructed based on Bhattacharya affinity. For normal distributions with mean M and covariance matrix Σ, Bhattacharya affinity has the form:
From this distance function, a new kernel is defined using the above equation.
This distributional kernel is computationally efficient with a linear complexity and can handle large quantities of input data. A typical density estimation method has a computational complexity O(n2), which might be too high for some applications. The inventive distributional kernels can be applied directly in a SVM or other machine learning systems to create classifiers and other predictive systems. The distributional kernels provide some distinctive advantages over the standard kernels that are frequently used in SVMs and other kernel machines. They capture the similarities between the overall distributions of the large data components, which may be crucial in some applications.
The raw data generated by the flow cytometer 106 is input into a computer processing system (step 302) which includes at least a memory and a processor that is programmed to execute one or more support vector machines. A typical personal computer (PC) or APPLE® MAC®-type processor is suitable for such processing. The input data set may be divided into two portions, one for use in training the support vector machine, the other for use in testing the effectiveness of the training. In step 304, feature selection algorithms are run on the training data set by executing one or more feature selection programs within the processor. In step 306, the training data set with the reduced feature set is processed using a support vector machine with a distributional kernel such as the Bhattacharya affinity-based kernel. The effectiveness of the training step is evaluated in step 308 by extracting the data corresponding to the features selected in step 304 in the independent test data set and processing the test data using the trained SVM with the distributional kernel. If the results of the test indicate a less than optimal result, the SVM will be re-trained and retested until an optimal solution is attained. If the training is determined to be satisfactory, live data corresponding to flow cytometry measurements taken on a patient sample is input into the processor in step 310. The features that were selected in step 304 are selected from the patient data and processed through the trained and tested SVM with distributional kernel in step 312, with the result being a classification of the patient sample as normal or abnormal. In step 314, a report summarizing the results is generated which may be displayed on a computer monitor 122, on a printed report 124, and/or transmitted via e-mail or other network file transfer system to a research or clinical laboratory, hospital or physician's office. Histograms with one- and two-dimensional representations of the data groupings may also be displayed and/or printed. The results will also be stored, along with the raw data, histograms and other patient data within the computer memory or a patient database.
An optional additional diagnostic procedure may be combined with the flow cytometry data and results to provide enhanced confidence in an automated analysis system. Using a scheme similar to that disclosed in U.S. Pat. No. 7,383,237, of Zhang et al., which is incorporated herein by reference, the results of the flow cytometry testing may be combined with other types of testing.
In a preferred approach, as described in U.S. Pat. No. 7,383,237, each feature of interest within the image is separately pre-processed (step 322) and processed by an SVM which is optimized for that feature. The results of the analyses of all features of interest are combined in a 2nd level image-processing SVM to generate an output classifying the entire image. The trained SVM(s) is/are tested using pre-processed image test data (step 324). If the solution is optimal, images corresponding to live patient data (the same patient for whom the flow cytometry analysis is performed) are input into the processor (step 326). The patient image data is pre-processed (step 328) to identify the features of interest and each feature of interest is processed through the trained first level SVMs that are optimized for the specific feature. The combined results of the analyses of the features of interest are combined and input into the trained 2nd level image-processing SVM to generate an output classifying the entire image (step 330).
The results of step 330 can be communicated for storage in the patient's file in the patient database (step 316) and/or will be input into a 2nd level SVM for analysis in combination with the flow cytometry data results from step 312. This 2nd level SVM will have already been trained and tested using the training and test data as indicated by the dotted lines between steps 308, 324 and 340. The results of step 316 and step 330 are combined for processing by trained 2nd level SVM for combined analysis in step 342. The results of this combined processing with generally be a binary output, e.g., normal or abnormal, diseased or no disease, etc. The combined results may be output for display in step 314 and/or input into a memory or patient database for storage (step 316). Additional optional secondary flow paths may be provided to incorporate other types of data and analysis, such as expert analysis, patient history, etc., which may be combined to produce an ultimate diagnostic or prognostic score or other output that may be used for screening, monitoring and/or treatment.
The object of the present study is to investigate the potential connections between Myelodysplastic Syndrome (MDS)-related chromosome abnormalities in cytogenetics and the patterns in flow cytometry data. This immunophenotyping analysis is one of the most common applications of flow cytometry and the protocols for sample collection and preparation are well known to those in the art. Following the sequence illustrated in
In an exemplary process sequence, the input dataset includes 77 cases (patients) that have both flow cytometry and cytogenetics data. All patients are suspected of having MDS. Among the 77 cases, 37 had chromosome abnormalities as indicated by cytogenetic testing, which involves microscopic examination of whole chromosomes for changes in number or structure. The remaining 40 were found to be negative under cytogenetics.
The aspirated bone marrow samples in suspension were divided among 13 tubes for each patient. In a standard 4-color immunofluorescence protocol, forward light scatter (F SC) and right angle light scatter (SSC) were collected along with 4-color antibody combinations to perform seven different assays, one of which was blank. Each case typically had 20,000-50,000 events where all of the assays are measured. The resulting flow cytometry dataset for each case had approximately 106 measurements.
For each of the 13 tubes, FSC and SSC were measured, allowing gating to exclude cellular debris, shown in the lower left corner of
In order to provide data for both training the SVM and for evaluation of the training, the entire dataset for the 77 cases was divided into a training set and an independent test set. Forty cases (20 positive and 20 negative as determined by cytogenetic testing) were used to train the SVM. The remaining 37 cases (17 positive and 20 negative) were used to form an independent test set.
The previously-described custom kernel based on the Bhattacharya affinity was used for analysis of the flow cytometry data to measure the discrepancy between two probability distributions.
Inclusion of data from all the assays in the classifier will not produce a system with the optimal performance. Therefore, a feature selection on the assays is conducted based on the training set. Two performance measures were applied in the feature selection step. The first feature selection method, the leave-one-out (LOO) error rate for SVM, involves training the SVM on the initial data set, then updating the scaling parameters by performing a gradient step so that LOO error decreases. These steps are repeated until a minimum of the LOO error is reached. A stopping criteria can be applied. The second feature selection method was the kernel alignment. Such a technique is described in U.S. Pat. No. 7,299,213 of Cristianini, which is incorporated herein by reference. Kernel alignment uses training data only and can be performed before training of the kernel machine takes place.
During the feature selection process, it was determined that a significant number of features would not contribute to the accurate classification of the data. The result of the feature selection procedure is given in the Table 2.
A value of “1” in an entry of Table 2 means that a particular assay (tube/assay combination) is selected; “0” means that the assay was not selected. This reduced the number of features to be considered from each case for classifying the data to 26, down from the original 91. The data from the reduced number of assays was then used to train the SVM with the distributional kernel.
Using the selected assays, the trained SVM is then tested with the 37 independent cases. The results at the cutoff of 0 were summarized using the conventional statistical measure of the performance of a binary classification test. Sensitivity, or recall rate, provides a measure of the proportion of correctly classified positives to the total number of positives as determined by cytogenetic testing. Specificity measures the proportion of negatives which are correctly identified. The results of analysis of the test data were as follows:
Sensitivity: 15/17=88% Specificity: 19/20=95%
This produces an overall error rate of 3/37=8%. Using the estimated standard deviation for binomial distribution, σ=0.0449, the test produced a 95% confidence level that the error rate would be less than 15%.
Exemplary results produced by the inventive system are shown in
The center panel 522 of
The bottom of the screenshot of
One part of the software system facilitates the design of the gating structure, configuration and training of SVM, and the setting of default values. Gating is defined as any process that selects a subpopulation of cells based on specific criteria on observed parameters. Gating is an effective technique for reducing the complexity of the data and focusing the analysis on a specific subpopulation of the data. However, in order to address all aspects of the analysis, there will typically be a large number of gates and the gating structure itself may be complex.
The hierarchical structure of this system facilitates flexible and convenient definitions of very general types of gating.
At each node, in step 502 a 2D gating is defined based on a selection of any two parameters. A 2D plot 506 is the basis for defining the gating.
The gated data 504 at a node is the cumulative result of the chain of gating at the series of nodes preceding the current node. Because each node defines a 2D gating with any combination of parameters, the hierarchical scheme allows for the definition of virtually any gating configuration.
For example, a gating on FS (forward scatter) and SS (side scatter) can filter out debris. On the Non-debris, another gating on FS and the CD45 marker can be defined to separate five subpopulations: CD45-Dim (diminished marker), Monocytes, CD45-Negative (negative marker), Granulocytes, and Lymphocytes. The mononuclear cells can be further gated to feed new nodes.
This process would be repeated for each tube of a patient sample. Additional branches with different gating definitions could be run in parallel, for example, a branch could diverge from node #1 to perform a different set of separations. An optional final step would be to combine the results of each tree branch to generate a diagnostic conclusion taking into consideration the results achieved at the end of each branch. In the preferred embodiment, this final analytical step would be performed by a support vector machine, generating a diagnostic score, a binary, e.g., positive or negative, result, a probability, a prognostic prediction, or other appropriate indicator of the subject's diagnosis or prognosis.
The following is an exemplary algorithm for automatic gate detection according an embodiment of the invention:
The system automatically detects gate definitions from user specified points and lines. A pseudo code for the algorithm is given below:
In some situations, the gating may require some adjustments for individual cases. Because of the large number of gates involved in an analysis, this can be a tedious process.
The inventive system provides an automatic gating adjustment function based on clustering. The gates in flow cytometry data are usually associated with clusters of cells. Automated clustering of the actual data provides a natural way to make an appropriate adjustment to the default gating template.
A Gaussian mixture model (GMM) is a probability distribution that is a weighted sum of Gaussian distributions:
The parameters in the GMM can be determined by a learning algorithm known as Expectation-Maximization (EM) algorithm. In statistics, an expectation-maximization algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.
The present system applies GMM to detect clusters in the flow data at a node. The cluster information is then used to make adjustment on gating templates. Users also have the option to manually adjust the gating.
After gating, the characteristics (parameters) of each subpopulation is captured for analysis. Each node in the gating tree has an associated SVM, which is defined on the gated data present at the node. The SVM associated with a specific subpopulation is trained to analyze the distribution patterns in the data for that subpopulation and to provide a quantitative assessment of normality/abnormality for the data in the subpopulation.
The SVM input is not limited to the 2D plot. Any combination of the parameters, as well as the gated populations at each node, can be used for SVM learning and subsequent SVM classification. The system may use different types of SVMs such as C-SVM, nu-SVM, and single-class-SVM.
Additional features of the software system includes functions to import data, make gating adjustments, perform SVM analysis, and present results graphically.
The distributed system of SVM based analysis nodes will provide a quantitative indication of abnormality on an entire case.
In an embodiment of the software system, different visualization methods for displaying data may be included. In addition to traditional 2D plots, 3D plots are available, as illustrated in
A key goal of the automated flow cytometry analysis system is to allow laboratory technicians to more readily identify cases requiring pathologist review. This is achieved in part by displaying abnormal plots and values using a visually-distinguishable feature, such as using a specific color font or highlighting, e.g., red, in a display of the analysis results.
Plot 614 illustrates the results of gating on FS INT LIN and SS INT LIN. Because the results of this gating did not exhibit abnormal results, the plot is not highlighted, as indicated by the clear upper bar 616 of the plot. Table 618 in the display provides the numerical results for each subpopulation. Again, because of the abnormal value for lymphocytes, the displayed value is highlighted to indicate to the user that an abnormal value was measured. On a color display, the number “42.70” might appear in red or some other color to distinguish it from the other values. For purposes of illustration, the value is shown underlined, bolded and in italics. Analysis of the subpopulations shown in plot 610 included further gating of the lymphocytes, the numerical results of which are displayed in table 620 of the display. As described above, each sub-subpopulation is analyzed by a separate node that is branched off from the node that performed the initial gating and analysis. In the example, lymphocytes are gated into subpopulations of T-cells (CD2, CD3), B-cells (CD19, CD20), NK-cells (CD16, (CD3-CD56)), and pre-B cells (CD10+CD19). The resulting numerical results are entered into table 620, which the abnormal results relating to B-cells indicated by highlighting the values 622 and 624 in the display. In table 630 of the display, another abnormal value, for CD4-CD8, is highlighted.
As will be apparent from the foregoing examples and accompanying figures, any combination of parameters may be used to automatically analyze flow cytometry data. Each parameter is separately
In some embodiments, the system is configured to maintain a database to collect data from analyzed cases. (See, e.g., database 130 in
The software preferably includes user instructions with reminders to save the data at the conclusion of an analysis. For multiple analyses of the same case, options are available to overwrite the old data or to save both versions of the data.
To ensure the integrity and security of the software system, a preferred embodiment of the software system includes a real-time authentication function. An authentication server is established to process the authentication requests. The client software communicates with the server over the Internet through a secure protocol.
In some embodiments, the analysis may be performed on a client machine that is remote from the laboratory in which the flow cytometry instrumentation resides. For example, the raw data may be processed and transmitted via a network to one or more remote locations. The flow cytometry analysis software running on a client machine will be required to complete authentication before it is permitted to begin normal operations.
In one embodiment, the client will transmit an encrypted message to the server containing the following fields:
Nonce
Timestamp
Account
Usage
Software signature
Hardware signature
Upon receiving the authentication request, the server will verify each of the fields. If the authentication is successful, the server will send an encrypted authentication message that matches the request back to the client. This protocol is designed to prevent a “replay attack”. The use of nonce and timestamp will ensure that the messages are unique even for the same client.
The authentication function will help provide assurance that the software has not been altered maliciously, the software is properly licensed, the system is configured properly in a conforming environment, and all analyzed cases are accounted for.
Flow cytometric immunophenotyping is an accurate and highly sensitive method for detection of quantitative and qualitative abnormalities in hematopoietic cells even when combined morphology and cytogenetics were non-diagnostic. The automated flow cytometry data analysis system disclosed herein provides the ability to automatically analyze the huge volumes of data generated during flow cytometry measurement, enhancing the accuracy, repeatability and versatility of flow cytometric methods. The capability provided by the methods disclosed herein enhances not only the diagnostic value of flow cytometry but also expands research applications of the technique by enabling collection and analysis of massive amounts of flow cytometry data from many subjects for data mining and pattern recognition that go far beyond current limited approaches.
This application claims the benefit of the priority of U.S. Provisional Application No. 62/090,316, filed Dec. 10, 2014, which is incorporated herein by reference in its entirety. This application is also related to the subject matter of U.S. Pat. No. 8,628,810, the disclosure of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62090316 | Dec 2014 | US |