This invention relates to diagnosis and evaluation of disease. In particular, this invention relates to systems for supporting medical decisions based on multiple sets of information relating to a patient's condition. More particularly, this invention relates to systems for supporting medical decisions based on genetic information and clinical information.
Medical diagnosis and evaluation of a patient's condition are of great concern to medical professionals. Much time is spent on obtaining information from a patient during visits to practitioners. Medical history is often a major component of a proper diagnosis. Additionally, a practitioner may request specific physiological or pathophysiological measurements be made, to facilitate understanding the patient's condition. Such clinical information has historically been a powerful tool to provide proper diagnosis and evaluation of therapy.
With the advent of increasingly widespread acquisition of genetic information from a patient, a practitioner now has an opportunity to incorporate genetic information and clinical information together, to further improve accuracy of diagnosis and evaluation of therapy. Proper diagnosis and monitoring are necessary for a practitioner to be able to make the best informed decisions about either initiating a course of therapy or for altering a therapeutic regimen to best suit a particular patient's needs.
However, there are few systems available that can be used to incorporate genetic information with a patient's clinical information to provide the practitioner with rapid, reliable information to assist in the decision making process.
In order to influence patient management in a clinical environment, medical decision support systems must have a high level of confidence. Examples of such medical systems are:
Many existing medical decision support systems use only one source of information, and thus the confidence of their models is not very high. For example, Shipp et al [1] described a system that uses machine learning techniques, specifically a weighted voting algorithm and support vector system, for prognostic stratification of patients based on gene expression data taken from 58 samples of 32 cured and 26 fatal cases. However their approach misclassified 23% of the patients in terms of predicting the outcome of their chemotherapy treatment. They achieved 77.6% correct prognosis of both cured and fatal cases of B-cell lymphoma cancer. The models on a similar task presented in Alizadeh et al [2] are not clinically applicable for classification purposes. Thus, there is a need for improved reliability of medical decision support systems that can overcome the shortcomings in the art.
In another aspect of the problem, there are no existing methods that help discover relationship between gene expression and clinical parameters thus making a personalised treatment of patients with different gene expression profiles according to their clinical parameters (e.g. age). It is known that some genes change their expression activity in one person over time and in different environments. Thus, there is a need for methods and systems that facilitate the discovery of related gene profiles to clinical data.
Embodiments of this invention include novel methods for increasing the confidence of medical decision support systems by developing independent models based on (1) gene expression data, and (2) on clinical information, and combining them at a higher level. The method is extendable with the addition of other sources of information (e.g. demographic). The invention is also concerned with a novel method for the discovery of relationship between gene expression patterns and clinical parameters thus making a personalised (or a clinical group specific) treatment possible.
We have improved the prediction accuracy of a medical decision support system with the use of multiple types or classes of information about a subject's or patient's condition. In certain embodiments, gene expression information and the available clinical information are used to diagnose disease and to predict outcomes. In general, we can use different types of classifiers/predictors that relate to different sources of information or different classes of information. Each of the classifiers may be obtaining good results for part of the overall problem, for example, for a particular class, but the combination of them provides better accuracy than any of them used individually.
This invention is described with reference to specific embodiments thereof. A more complete understanding of the systems and methods can be appreciated by referring to the Figures, in which:
a depicts schematically a Venn diagram showing how two models, gene information and clinical information, can have different accuracy depending upon grouping of samples.
b depicts a schematic diagram of how gene expression information and clinical information are correlated according to this invention in a hierarchical fashion with each other to produce a medical decision.
Medical Decision Support Systems
A combined model for an improved decision support system includes in general of three modules, one that operates on independent microarray gene expression information, another that operates on independent clinical information, and a third that operates on integrated gene expression and clinical information for each patient. A system may contain one, two, or three of the described modules, but it can also contain more modules if more sources of information are available.
This higher-level integration module combines information from two or more lower modules to produce the final prognosis for the outcome of the disease for this particular patient. Based on the suggested by the system prognosis, the best available treatment is selected.
It can be appreciated that other systems based on multiple-class analysis can be developed using the fundamental methods of this invention. The number of classifier/predictor modules can depend on the number of different classes or types of information available.
An illustration of the method is given in
Evolving Connectionist Systems
In certain embodiments, the practitioner uses an adaptive learning, evolving connectionist systems (ECOS), in particular—an evolving fuzzy neural network system EFuNN, and also uses algorithms for gene expression profile (rule) extraction from ECOS [3, 4] and applies the proposed novel methodology for combining gene expression processing systems with clinical information processing systems as it is described further in the invention. The method allows for different modes of combination of gene expression and clinical data, as well as for adding new data and modules with time thus adjusting and improving the system. With the use of the proposed method the accuracy of the prognosis increases.
Using one of the modes of integration, namely an evolving connectionist system (“ECOS”) trained on the integrated input vector that combines both gene expression and clinical information, rules can be extracted that relate gene expression with clinical information, so that a personalised treatment can be applied for patients in a certain group of clinical parameters (e.g. age) having a particular pattern of gene expressed in their tissue. Evolving connectionist systems are multi-modular, and can be especially useful as architectures that facilitate modelling of evolving processes and knowledge discovery. They are described further in PCT, WO 01/78003, incorporated herein fully by reference.
Briefly, an ECOS may consist of many evolving connectionist modules. An ECOS is a neural network system that operates continuously in time and adapts its structure and functionality through a continuous interaction with the environment and with other systems according to: (i) a set of parameters P that are subject to change during the system operation; (ii) an incoming continuous flow of information with unknown distribution; (iii) a goal (rationale) criteria (also subject to modification) that is applied to optimise the performance of the system over time [7].
Evolving connectionist systems can have the following specific characteristics:
Two distinct phases are in an ECOS operation. During a first learning phase, data vectors are fed into the system one by one with their known output values. In a second phase (recall), a new vector is presented to the system and it calculates the output values for it.
There are different models of ECOS. One model, evolving fuzzy neural networks (EFuNN), is presented in
While a “neural network module” may refer to any neural network satisfying the requirements of the aspects of the invention the use of an ECOS neural network is desirably used in certain embodiments. In some of these embodiments, a neural network is exemplified in the PCT publication WO 01/78003 (incorporated herein by reference). The algorithm describing the neural network is described further in WO 01/78003, incorporated fully by reference and is set out schematically below.
EFuNN Architecture
EFuNNs useful for embodiments of this invention can have a five-layer structure (
The input layer represents input variables. The second layer of nodes (fuzzy input neurons, or fuzzy inputs) represents fuzzy quantification of each input variable space. For example, two fuzzy input neurons can be used to represent “small” and “large” fuzzy values. Different membership functions (MF) can be attached to these neurons (e.g., triangular, Gaussian, etc. [6, 7]. The number and the type of MF can be dynamically modified. The task of the fuzzy input nodes is to transfer the input values into membership degrees to which they belong to the corresponding MF. The layers that represent fuzzy MF are optional, as a non-fuzzy version of EFuNN can also be evolved with only three layers of neurons and two layers of connections.
The third layer contains rule (case) nodes that evolve through supervised and/or unsupervised learning. The rule nodes represent prototypes (exemplars, clusters) of input-output data associations that can be graphically represented as associations of hyper-spheres from the fuzzy input and the fuzzy output spaces. Each rule node r is defined by two vectors of connection weights—W1(r) and W2(r), the latter being adjusted through supervised learning based on the output error, and the former being adjusted through unsupervised learning based on similarity measure within a local area of the problem space. A linear activation function, or a Gaussian function, is used for the neurons of this layer.
The fourth layer of neurons represents fuzzy quantization of the output variables, similar to the input fuzzy neuron representation. Here, a weighted sum input function and a saturated linear activation function is used for the neurons to calculate the membership degrees to which the output vector associated with the presented input vector belongs to each of the output MFs. The fifth layer represents the values of the output variables. Here a linear activation function is used to calculate the defuzzified values for the output variables.
A partial case of EFuNN would be a three layer network without the fuzzy input and the fuzzy output layers. In this case a slightly modified versions of the algorithms described below are applied, mainly in terms of measuring Euclidean distance and using Gaussian activation functions.
Evolving learning in EFunNs is based on either of the following two assumptions:
Each rule node, e.g. rj, represents an association between a hyper-sphere from the fuzzy input space and a hyper-sphere from the fuzzy output space (see
The pair of fuzzy input-output data vectors (xf,yf) will be allocated to the rule node rj if xf falls into the rj input receptive field (hyper-sphere), and yf falls in the rj output reactive field hyper-sphere. This is ensured through two conditions, that a local normalised fuzzy difference between xf and W1(rj) is smaller than the radius Rj, and the normalised output error Er=||y−y′||/Nout is smaller than an error threshold E. Nout is the number of the outputs and y′ is the produced by EFuNN output. The error parameter E sets the error tolerance of the system.
Another example of a neural network module for some aspects of the invention is an evolving classification function (“ECF”), which can be used to classify data. The learning sequence of each iteration of an ECF is described in the following steps:
A recall (classification phase of new input vectors) in ECF is performed in the following way:
The above-described ECF for classification has several parameters that need to be optimized according to the data set used. These are:
These parameters can be optimized with the use of evolutionary computation methods, or other statistical methods, as described in [7].
An important characteristic of ECOS is that they can be used to extract rules that associate input variables (e.g. genes) to output variables (e.g. class categories). Each node in the hidden layer of the ECOS represents the center of a cluster of similar samples and can be expressed semantically as a rule. Each rule relates to the pattern of input feature levels for one or more samples belonging to a particular class from the data set. An example of what a rule might look like when extracted from the EFuNN is shown below:
The rules are then analysed in order to identify a set of variables that are significant in distinguishing between classes. The rule extraction method described above and in reference [3] can be applied to gene expression profiling of disease as described herein and in reference [4], to find patterns of significantly expressed genes in a cluster of diseased tissues. We have unexpectedly found that ECOS described in PCT WO 01/78003 [3], and the profiling method described in PCT/480030 [4], are particularly suited for complex disease profiling based not only on gene expression information, but on a variety of information sources, including gene expression data, protein data, clinical data (for example IPI (International Prognostic Index) number (e.g., see [1]) etc. Here we combine these sources of information through ECOS to create new, more efficient prognostic and classification systems for medical applications. Using these new systems and methods enable one to discover hidden relationships between sets of genes and clinical information previously unidentifiable.
EFuNN Learning Algorithm
To implement an EFuNN learning algorithm to the methods and systems of this invention, set initial values for the system parameters: number of membership functions; initial sensitivity thresholds (default Sj=0.9); error threshold E; aggregation parameter Nagg—number of consecutive examples after each aggregation is performed; pruning parameters OLD an Pr; a value for m (in m-of-n mode); maximum radius limit Rmax; thresholds T1 and T2 for rule extraction
Set the first rule node r0 to memorise the first example (x,y):
W1(r0)=xf, and W2(r0)=yf′
In certain embodiments, EFuNN techniques have certain advantages when compared with the traditional statistical and neural network techniques, including: (i) they can have a flexible structure that reflects the complexity of the data used for their training; (ii) they can perform both clustering and classification/prediction; (iii) models can be adapted on new data without the need to be retrained on old data; (iv) they can be used to extract rules (profiles) of different sub-classes of samples.
a illustrates the method from
b depicts a block diagram of the method from
b depicts a three-layered system of this invention. The two modules from
It can be appreciated that additional layers can be applied depending on the numbers of modules and the types of classifiers or predictors available to be used. For example, in other embodiments, neural networks, support vector machines, rule-based systems, decision trees, and statistical methods can be used to construct a more complex medical decision support system. Thus, it can be appreciated that medical decision support systems can have more than three layers.
A first layer constitutes the modules themselves, each of them trained and tested with parameters optimised on the different data sets available (the different sources of information). It may be desirable to try different classification/prediction models for each of the modules and then chose the best one in terms of smallest residual error. Error minimization methods are well known in the art and need not be described further herein.
A second layer constitutes classes. All the class-elements of the second layer are fully connected with the module-elements of the first layer. A third layer in this example is the final outcome element, a combined output from all class-elements of the second layer. The third layer element is connected fully to the previous layer elements.
Elements from the different layers are connected through connection weights β1, 1-β1, β2, 1-β2, and α as shown in
(1) The first method is based on an exhaustive search in the parameter values space, so for every combination of the parameter values a new system is generated and tested. The parameter values that give the system the highest accuracy are selected for the use in the medical decision support system in a clinical environment.
For the B-Lymphoma case study [1], values for the IPI (International Prognostic Index) are available for each of the 58 samples. Samples were stratified into 5 strata according to their IPI as follows:
Using the IPI stratification, a single prediction factor and a Bayesian classifier gives a prediction accuracy of 73.2%. Using gene expression data and an EFuNN classifier gives an accuracy of 78.5%. The combined model accuracy, for β1=β2=0.75 and α=0.4, gives an overall accuracy of 87.5% if 56 examples are used for the first module. If the number of the IPI examples increases the overall accuracy of the first module increases because the IPI module is independent from the gene expression module (see reference [1]).
(2) The second method is a statistically based specialization method used for the selection of optimal parameter values. Each class output of each module is weighted with the normalised class accuracy calculated for this module across all the modules in the system. Continuous output values for the class outputs (e.g. 0.8 rather than 1) are multiplied by the weights, and the sum of the weighted output values from all modules constitutes the final output value for the class. A class is chosen with the highest output value. This is similar to the principle of statistically based specialisation [5]. The method is illustrated on the following example.
Step 1: Assume that the combined model consists of three modules:
Step 2: Assume that each of the three modules produces different prognostic accuracy as follows: module one: 90% (88 and 92 for each class); module two: 70% (65 and 75 for each class respectively) and module 3: 80% (75% and 85% for each class respectively). By using a combined output at a higher-level decision module for the final prognostic evaluation, the total accuracy increases to 92% as explained below.
Having quantified the accuracy of each of the three modules for each of the two classes, the following connection weights are calculated for each of the three modules for class A:
The following weights are calculated for each of the three modules for class B:
Based on the above coefficients, one calculates the accuracy of the classification of samples for each of the classes A and B, for example, 0.8 and 0.9.
The final decision in the third layer output element can be taken either as the maximum value between the calculated class-output for each class elements or by applying a calculated third layer coefficients for combining the class outputs as follows:
If for a new input vector the corresponding output values for each class of each of the three modules are respectively: 0.6 and 0.5; 0.4 and 0.6; 0.5 and 0.5, the output value for class A is calculated as:
(0.6×0.39)+(0.4×0.28)+(0.5×0.33)=0.51.
The output value for class B is calculated as:
(0.5×0.35)+(0.6×0.325)+(0.5×0.325)=0.53.
The predicted outcome according to the maximizing strategy is class B.
According to the weighed output strategy, the output will be B again as output class A will get final evaluation of 0.51×0.47=0.24, and output class B will get an evaluation of: 0.53×0.53=0.28.
(3) In a third method, a combined system is interpreted as a multi-layer perceptron as described below and shown in
Parameter values attached to the connections in a multi-layer perceptron (MLP) neural network structure (see the example shown in
The invention is also concerned with the discovery of patterns of gene expression in clusters of tissues (groups of samples that have a similar gene expression profiles) related to particular pattern of clinical parameters (e.g. low IPI, old age and low blood pressure). This is an important discovery because different clinical groups may have different patterns of gene expression that makes the process of finding common genes and drug targets for the whole population impossible. The method is described below:
Methods for Discovering Patterns of Genes Related to a Group of Patients Characterized by a Particular Clinical Parameter
One can also use an ECOS module with the combined input vector (N gene expression variables and M clinical variables) to derive patterns of gene expression as they relate to clinical findings. After a module is trained on data as described above, rules that represent clusters of data in the input space (which is the combined gene expression and clinical variables space) can be extracted as described above. As applied to a combined inputs module, these rules will have both gene variables and clinical variables that define the cluster of data samples expressed by this rule, thus uncovering the relationship between the genes and the clinical parameters as captured in the rule. For example:
The above rule is interpreted as follows: for young people with low IPI, if gene 1 is low expressed and genes 15 and 32 are highly expressed, than the chances of the person to survive after the treatment are very high, measured as 90%.
The descriptions and examples herein are intended to illustrate embodiments of the invention, and are not intended to be limiting to the scope of the invention. Other embodiments based on the descriptions, examples and Figures can be produced and practiced by those of ordinary skill in the art. All of those embodiments are considered to be part of this invention. All references cited herein are incorporated herein by reference in their entirety.
Number | Date | Country | Kind |
---|---|---|---|
60403756 | Aug 2002 | US | national |
This application claim priority to U.S. Provisional Patent Application Ser. No: 60/403,756, filed Aug. 15, 2002, incorporated herein fully by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US03/25563 | 8/15/2003 | WO | 8/29/2005 |