The subject matter described herein relates to methods for predicting disease risk, prognosis, and best treatment regimens in clinical subjects. The methods involve evaluating a subjects non-invasively obtained imaging features in view of an association map that correlates imaging features with biological data.
Scientists and clinicians routinely use non-invasive imaging to detail the physical and structural composition of living matter. Assessing the genetic and biochemical makeup of living tissue through non-invasive imaging is a desirable goal of current research. Recent development of genomic and proteomic methods have enabled molecular profiling of biological specimens by simultaneously revealing the expression level of thousands of genes and proteins. For example, gene expression patterns of cancer can reveal its etiology, prognosis, and therapeutic potential (Chung, C. H. et al., Nat. Genet., 32 Suppl.:533-540 (2002); Segal, E. et al., Nat. Genet., 37 Suppl.:S38-45 (2005); Chen, X. et al., Mol Biol Cell, 13:1929-1939 (2002)).
Current methods of molecular profiling often require invasive surgeries for tissue procurement and specialized equipment, thus limiting its routine use. In some cases, current profiling methods provide a single snap shot in time because they are destructive by nature in that cells must be disintegrated to extract nucleic acids or proteins for analysis. Another barrier to wide spread use of molecular profiling is that human tissues exhibit diverse distinctive features on noninvasive radiographic imaging, many of which currently have no known significance. Because imaging features of tissues reflect the dynamic and physiologic interplay of parenchymal cells, blood vessels, and stroma, it would be desirable if imaging features could be used to predict specific gene expression patterns in human diseases.
The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the drawings.
The following aspects and embodiments thereof described and illustrated below are meant to be exemplary and illustrative, not limiting in scope.
In one aspect, a method of constructing an association map between imaging features and biological data is provided, comprising:
identifying one or more imaging features from a plurality of images of a subject;
applying an algorithm to identify relationships between the one or more imaging features and biological data relating to the subject, wherein the identified relationships are used to construct an association map between the one or more imaging features and the biological data;
evaluating the statistical significance of the association map to test its predictive value.
In some embodiments, the features from a plurality of images of a subject are associated with a disease.
In some embodiments, the identifying comprises identifying one or more imaging features based on frequency of the one or more features in the plurality of images.
In some embodiments, the identifying comprises identifying one or more imaging features based on its independence from other features.
In some embodiments, the identifying comprises identifying one or more imaging features from images obtained using an imaging technique selected from the group consisting of computerized tomography imaging, magnetic resonance imaging (MRI), positron emission tomography (PET), ultrasonography (US), optical imaging, infrared imaging, and x-ray radiography. In particular embodiments, the imaging technique comprises the use of an imaging agent or image-enhancing agent.
In some embodiments, the applying comprises applying a module networks algorithm.
In some embodiments, the applying comprises applying an algorithm that applies an iterative Bayesian probabilistic procedure that identifies combinations of imaging features that relate to the biological data.
In some embodiments, the applying comprises applying an algorithm to gene expression data.
In some embodiments, the gene expression data is from a DNA microarray assay. In some embodiments, the gene expression data is from a cDNA microarray assay. In some embodiments, the gene expression data is from an RNA microarray assay.
In some embodiments, the applying comprises applying an algorithm to protein expression data.
In some embodiments, the evaluating the statistical significance of the association map comprises evaluating by comparison of the map with permuted data sets.
In some embodiments, the evaluating the statistical significance of the association map comprises evaluating by testing the prediction using an independent biological data set, independent images, or both.
In a related aspect, a method for predicting a gene or protein expression level in a biological sample is provided, comprising:
providing an image of the biological sample,
comparing the image to an association map as above to predict a gene or protein expression of the biological sample.
In some embodiments, the method further comprises, based on the predicting, providing a treatment prognosis of said patient based on the presence and/or absence of certain imaging features.
In some embodiments, the providing comprises providing a prediction of a patient's response to a drug. In some embodiments, the providing comprises providing a prediction of a patient's probable survival. In particular embodiments, the probable survival is disease free survival.
In some embodiments, the providing comprises providing a likelihood of disease recurrence.
In some embodiments, the providing comprises providing a likelihood of metastasis.
In another aspect, an association map constructed using the above method is provided.
In addition to the exemplary aspects and embodiments described above, further aspects and embodiments will become apparent by reference to the drawings and by study of the following descriptions.
In one aspect, a method is provided wherein an image or one or more imaging features is correlated to an association map of imaging features and biological data. The method finds use in various fields, including medical diagnostics and therapeutics. The methods have use in clinical subject/patient disease screening, diagnosis, characterization, and treatment selection.
The method is based on correlating biological data with associated imaging data, to construct a bidirectional association map, as will be illustrated below in Example 1. The biological data for construction of the association map can be obtained from a database or generated from patient biological samples. Databases of polynucleotide and protein expression data are well known. Such gene expression data can also be obtained, for example, using a DNA microarray that surveys the expression levels of thousands of genes simultaneously. For example, a 21-gene assay, termed Oncotype Dx, is a commercially available DNA microarray to determine prognosis and predict response of primary breast tumors to chemotherapy. A 70-gene signature known as Mammaprint is known for use in determining an adjuvant chemotherapeutic regimen in primary breast cancer. Gene expression signatures have also been identified to predict prognosis or therapeutic response in lung cancer, leukemia, and prostate cancer.
Data from any or all of these sources, preexisting or generated for the purpose of building an association map, are examples of biological data suitable for use in the method described herein. It will be appreciated that the gene expression data can be for any tissue source, such as cancerous tissue, tissue associated with a malignant or benign growth, infected tissue, inflamed tissue, and the like. Gene expression data may relate to expression levels, splicing patterns, gene copy number, chromosomal alterations (e.g., deletions, amplifications, inversions, and translocations), single nucleotide polymorphisms, and the like. Gene expression data include epigenetic data, e.g., relating to DNA methylation and histone modifications (e.g. acetylation, methylation, and ubiquitination). Gene expression data may be based on analyses of DNA, cDNA, mRNA, snRNA, iRNA, or other nucleic acids.
Biological data includes data based on protein-based analyses, including tissue protein expression profiles of different tissues (e.g. cancer, infected, inflamed, infected, etc). Particular examples include biological data from Serial Analysis of Gene Expression (SAGE), nuclear magnetic resonance, protein-interaction screens, chromatin immunoprecipitation-chips, isotope coded affinity tagging, activity based reagents, gel or chromatographic separation, RNAi screens, tissue arrays or mass spectrometry in which a large number of genes, proteins or metabolites are measured in a single experiment or assay is also contemplated. Biological data also include data from serological tests, EKGs, EEG, urinalysis, and other clinical and forensic analyses.
As noted above, the method combines the association map with imaging data. Such imaging data can be obtained from a wide variety of sources, including but not limited to magnetic resonance imaging (MRI), positron emission tomography (PET), computerized tomography (CT), ultrasonography (US), optical imaging, infrared imaging, and x-ray radiography. Imaging can be coupled with drugs or compounds, contrast agents or other agents or stimuli, or medical devices to elicit additional information from the imaging. Images are obtained using these modalities applied to a tissue sample, a lesion, an organism imaged in whole or in part.
In a general embodiment, the method of constructing an association map comprises providing a plurality of images of, for example, a tissue or a whole or part of an organism, such as a human subject, and biological data that has some relation to the images. For example, images of a solid tumor would preferably be accompanied by biological data based on the imaged solid tumor or on a like solid tumor. That is, images of tumors in the thyroid or images of infected tissue on a limb would have corresponding biological data from thyroid tumors or infected limb tissue, respectively. In a preferred embodiment, the image and the biological data derive from the same tissue or organism; however, a population of images and a population of biological data need not have a one-to-one correspondence.
An exemplary association map relating to human hepatocellular carcinoma is constructed by inspecting the imaging data and identifying distinctive features in the image. Examples of distinctive image features (or traits) for human hepatocellular carcinomas are shown in
Such imaging features (and representative data) are associated with a unique image, imaging study, examination, subject or population, all of which are data relating to the image. Such image data independently or in combination define elements or components of the image, or the composite imaging appearance itself, which are included in the biological data used to construct an association map.
It will be appreciated that a single imaging feature may be sufficient to add value an association map; however more (and more detailed) features/data are generally preferred.
In some embodiments, the method of constructing an association map further includes one or both of (i) using an algorithm to identify relationships between one or more imaging features and the biological data and/or (ii) evaluating the statistical significance of the association map.
With respect to (i), algorithms that identify relationships between the imaging features and the biological data are known in the art, and such identified relationships form the basis for constructing an association map between such imaging features and biological data. For example, a module network algorithm is suitable for use (Segal, E. et al., Nat. Genet., 34:166-176 (2003)) wherein the algorithm identifies groups of genes, termed modules, which demonstrate coherent variation in expression across multiple samples. This algorithm further applies an iterative Bayesian probabilistic analysis and to identify combinations of imaging features that can predict the expression levels of gene modules.
As used herein, Bayesian probabilistic analysis refers broadly to a genus of related models and their derivatives. Multiple regression analysis and other analyses are known in the art. Classification algorithms such as neural networks, support vector machines, decision trees, Markov networks, and their derivatives may be applied. An exemplary analyses involves application of the Cox proportional hazard model. Other algorithms that can identify multi-way relationships may also be used.
With respect to (ii), evaluating the statistical significance of the association map ensures that the map is applicable to, and predictive for, images and/or biological data that was not used in the construction of the map. Such statistical analysis thereby provides a means to validate the association map as being generally applicable (i.e., generalizable) to other images and biological data.
For example, when two large biological data sets are compared, many apparent associations will occur by chance alone. These spurious associations are not useful, and in fact interfere with the identification of significant (i.e., “real” or “actually”) associations that have predictive value. Thus, a feature of some embodiments of the present method is confirmation of the statistical significance and predictive value of the association map.
Statistical significance can be evaluated in several ways, for example, by comparing the actual/observed association map with theoretical maps derived from modified/permuted data sets, e.g., where the imaging features and biological data have been scrambled. Observation of the same image feature-biological data association at equal frequency using such scrambled data, strongly suggests that the image feature or gene module is noisy and non-specific.
In addition, statistical significance and predictive value can be evaluated by cross-validation, also called leave-one-out analysis. This means that an association map is constructed on some fraction of the subject biological data or image features, and the resulting map is used to predict the outcome in the remaining patients in subjects not used to “train” the algorithm. In practice, half, ten percent, or a single individual can be left out as the test, and the procedure is iterated until each individual subject in the data set has been used both as the test and for training. Such iterative learning procedures may be a component of the module network algorithm, described above.
Finally, the most robust method for confirming statistical significance and predictive value is to test the association map against a completely independent set of subjects. Because the association map has not been trained on the new set of patients, the ability of the map to predict the outcomes in the test set provides strong evidence that the association map is generalizable—meaning that the map can be used to give diagnostic and prognostic information on most, if not all, future subjects.
An approach of constructing an association map is illustrated in Example 1 using expression data from imaging features on three phase contrast-enhanced CT and gene expression patterns of 28 human hepatocellular carcinomas (HCC). As will become apparent, global gene expression patterns of human cancers are encoded in their dynamic imaging features. In order to relate gene expression to imaging, distinctive features of from qualitative imaging were identified, and coherent patterns of variation from gene expression profiles were defined.
In another aspect, methods for using an association map constructed as described above, and as exemplified in Example 1, are provided. In one embodiment, the association map is used to guide treatment or provide a diagnosis of a subject. For example, an image of a tumor in a subject, such as a brain, breast, lung, prostate tumor, can be viewed in light of the association map to inform the clinician of the gene or protein expression of the patient. Knowledge of the gene or protein expression profile, i.e., molecular based information, about the patient informs the clinician about a patient's likely response to a drug, probability of relapse, survival rate, disease free survival, and the like. Such information will guide the treatment regimen, including the drug selection, dose, dosing regimen, and whether additional treatments should be considered, such as radiotherapy or tumor resection. Thus, a noninvasive image of a patient informs the clinician of molecular information useful in guiding treatment.
While the methods have been exemplified mainly using disease conditions, the methods can also be used for preventative medicine, in which case the biological data, with indeterminate image data, may suggest further imaging to be performed on a subject, e.g., to watch for likely diseases or conditions. This situation would arise, for example, when a subject was at risk for a disease, based on genetic data, lifestyle data, and laboratory tests but the presence of the disease could not be definitively shown by imaging or other methods.
Association maps are also suited for use in predicting subject outcome. Gene expression data or sequence variation patterns that predict treatment response to particular therapies are reported in the medical literature. For example, subjects with breast cancer that express particular cell surface receptors, such are HER2, are more responsive to certain chemotherapeutic agents than subjects that do not express certain cell surface receptors. Thus, an image of a tumor or other diseased tissue in a subject, viewed in light of an association map, can be used to predict response to a selected treatment.
It will also be appreciated that association maps can be constructed from images and biological data generated or gathered solely for this purpose, or another particular purpose. For example, images of patients that were not responsive to a particular drug and biological data from the subjects can be used to build an association map.
An association map between imaging and biological data can also be used to design a targeted therapeutic treatment regimen for a patient, providing a personalized care program. Based on an image of a tumor viewed in light of an association map for that tumor type, information about the gene and/or protein expression of the tumor can be determined. Understanding the tumor cell surface receptors permits selection of targeting agents, such as antibody fragments or other agents that have binding specificity for particular cell surface receptors, that can guide or direct a drug to the tumor cell. The targeting agent can be attached directly to the drug, or attached to a carrier for the drug, such as a liposome.
It will be appreciated that the method described herein can be accompanied, if desired, by additional clinical information for a patient, such as a
The following examples are illustrative in nature and are in no way intended to be limiting.
Imaging features/traits. One hundred thirty eight (138) distinct imaging features that were present in at least one tumor sample were defined and were scored across all tumor samples. Features were selected a priori based on intrinsic radiological interest (e.g., internal arteries and hypodense halos).
Features were also filtered based on their frequency and prominence in the data, inter-observer agreement and independence from other features based on Pearson correlation (cut off value of 0.9). Thirty-two (32) imaging features were used as input in the Bayesian model, and 28 of 32 were found to be informative of gene expression (
Microarray data. Gene expression profiles of imaged HCCs were downloaded from Stanford Microarray Database, which is available via the Stanfor website. Data from array elements that had hybridization signal over background by 1.5 fold in both Cy5 and Cy3 channels and present in 70% of samples were centered by mean across samples. Data from replicate probes representing the same gene (as determined by Locuslink ID) were averaged. 6732 genes met these criteria for data quality and were used for subsequent analysis.
Module network. A module network procedure previously developed was applied (Segal, E. et al., Nat. Genet., 34:166-176 (2003)) to construct an association map between imaging features and gene expression profiles. The module network procedure takes as input a gene expression data and a set of potential regulatory input, and attempts to partition the expression data into distinct and mutually exclusive modules, such that the gene assigned to each module can be well predicted by a small decision tree of input regulatory inputs. The regulatory inputs were set to be the real-valued imaging features and were applied to the expression data described above. The 116 imaging networks can be interactively searched (Segal et al. (2007) Nat. Biotechnol. 25:675-80).
Module enrichment in Gene Ontology annotations. Significance of overlap between genes in modules and gene ontology annotations was calculated by comparison to the degree of overlap expected by chance alone using the hypergeometric distribution. Multiple hypothesis testing was accounted for by calculating a false discovery rate and present results with FDR<0.05.
Mapping venous invasion genes to imaging features. To find imaging features that correspond to the set of 91 genes associated with venous invasion, seven (7) modules that were significantly enriched for these gene were identified using the hypergeometric distribution as described above. The associated imaging feature trees of the 7 modules were analyzed (Table, below), and two features, internal arteries and halos, were found to be overrepresented among the top splits. To identify the consensus threshold of applying these features for this purpose, the p-value weighted average of the splits from the 7 image feature trees was calculated. The consensus thresholds were used for the imaging feature decision tree of
The position (node level) of each imaging feature/trait used to construct the decision trees used to predict the 7 venous invasion modules and their frequency of occurrence at this node level are displayed. Internal Arteries, followed by Hypodense Halos, are over-represented in the imaging networks occupying the top node level and frequency and were thus used to construct the venous invasion predictor.
Clinical data analysis. Microscopic venous invasion status on histologic analysis was available for 30 patients in the training set and 32 patients in the test set. Within each data set, patients were partitioned into two groups based on the two feature decision trees (“internal arteries” and “hypodense halos” on CT scan,
Construction of Association Map. In this example, a three step strategy was used to create an “association map” between imaging features gene expression patterns. More particularly, an association map between imaging features on three phase contrast-enhanced CT and gene expression patterns of 28 human hepatocellular carcinomas (HCC; Chen, X. et al., Mol. Biol. Cell, 13:1929-1939 (2002)) was constructed, as shown in
Next, a module networks algorithm (Segal, E. et al., Nat. Genet., 34:166-176 (2003)) was adopted to systematically search for associations between expression levels of 6732 well-measured genes determined by microarray analysis (Chen, X. et al., Mol Biol Cell, 13:1929-1939 (2002)) and combinations of imaging features. The algorithm identifies groups of genes, termed modules, which demonstrate coherent variation in expression across multiple samples. The algorithm further applies an iterative Bayesian probabilistic procedure to identify combinations of imaging features that can predict the expression levels of gene modules. An end result is identification of specific networks of imaging features that predict the expression level of gene modules. Each network of imaging features predicts the expression level of one gene module.
Next, statistical significance of the association map was validated by comparison with permuted data sets, and also by testing the prediction of the association map in an independent set of tumors.
The association map of imaging features and gene expression revealed that a surprisingly large fraction of the gene expression program can be reconstructed from a small number of imaging features, as seen in
The hierarchical combination of only 28 imaging features was sufficient to predict the variation of all 116 gene modules. As shown in
Using the association map, imaging features predictive of expression level of specific genes are directly revealed, and the potential physiologic significance of many imaging features can be inferred from their associated genes. The distribution of genes into modules defined by imaging features was not random, but was highly enriched for specific and diverse biological functions and processes. Comparison of gene membership in modules versus the published Gene Ontology annotation (Ashburner, M. et al., Nat. Genet., 25:25-29 (2000)) revealed significant overlaps, as shown in
Thus, in one embodiment, the association provides a method for non-invasively delineating a molecularly distinct subset of tumors for a targeted therapeutic strategy. For example, the liver synthetic function of HCC patients is an important guide of disease severity (Thomas, M. B. et al., J. Clin. Oncol., 23:8093-8108 (2005)), and this information is evident in module 595, which details the expression level of albumin, pyruvate kinase, transferrin receptor 2, as well as revealing clotting function (thrombin, factor V, factor X), and detoxification activity (GSTO1, CYP27A1, epoxide hydroxylase), as seen in
It will also be appreciated that identity of genes in a module can reveal the physiologic basis of an imaging feature. The imaging feature “Tumor Margin Score, Minimum” denotes tumors that show an ill-defined transition zone between tumor and surrounding liver tissue. It was found that the presence of this feature was associated with elevated expression of a group of genes associated with extracellular matrix remodeling, such as MMP2, MMP7, COL3A1, COL6A2, and thrombospondin 1 and thrombospondin 2, as seen in
The association map also enables systematic mapping of a predetermined group of genes to their corresponding imaging features. Expression variation in a group of 91 genes that was associated with microscopic venous invasion has been identified (Chen, X. et al., Mol Biol Cell, 13:1929-1939 (2002)), and is a well-established sign of poor prognosis (Thomas, M. B. et al., J. Clin. Oncol., 23:8093-8108 (2005)) that is extremely difficult to predict using conventional imaging methods in the absence of gross venous invasion. Here, the 91 genes in the “venous invasion signature” were enriched in 7 modules and associated with two predominant imaging features—the presence of “Internal Arteries” and absence of “Hypodense Halos”, as seen
The predictive value of the two-feature predictor of venous invasion was validated in an independent set of 32 patients that were not used for training the association map (
In summary, the global gene expression profiles of liver cancer are embodied in their imaging features. The systematic association between imaging features and gene expression allowed useful inference from both directions: on one hand, the association map identified biological processes, based on specific gene expression programs, which underlie specific imaging features. On the other hand, the association map enabled the use of imaging features to reconstruct the global gene expression programs of cancer, thereby creating a noninvasive “molecular portrait” of the tumor (
While a number of exemplary aspects and embodiments have been discussed above, those of skill in the art will recognize certain modifications, permutations, additions and sub-combinations thereof. It is therefore intended that the following appended claims and claims hereafter introduced are interpreted to include all such modifications, permutations, additions and sub-combinations as are within their true spirit and scope.
This work was supported in part by grant number 1 K08 AR050007 from the National Institute of Health. The U.S. Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US07/22973 | 10/30/2007 | WO | 00 | 3/15/2010 |
Number | Date | Country | |
---|---|---|---|
60856386 | Oct 2006 | US |