The present invention relates to systems and methods in computational biology.
The following publications are listed below for an understanding of the background of the invention.
Personalized medicine is moving us closer to a more precise, predictable and powerful method of treatment, that is customized for the individual patient. In this new era, the tools at hand can probe the very molecular makeup of each patient, thus enabling better diagnoses and a more effective treatment of the disease. It is estimated that already 10% of FDA-approved drugs contain pharmacogenomic information, and these drugs are administered differently to people with different ethnic or genetic backgrounds1. One field of research in which personalized medicine holds great promise is cancer therapy. The use of genetic markers to ‘personalize’ cancer treatment and to differentiate one type of cancer from another will facilitate the use of highly tailored therapies and offers tremendous potential for improved prognoses2. Recently, it has become clearer that advances in personalized medicine will require going beyond the genomics, and aiming to develop large-scale functional physiological models3. In particular, various disciplines in systems biology have been suggested as promising approaches towards personalized and stratified medicine4.
Constraint-Based Modeling (CBM) is a computational framework for studying metabolism on a genome-scale, which has been used for a variety of applications5-16. In recent years, two genome scale models of human metabolism have been published17, 18. Their potential clinical utility was demonstrated in various studies; e.g., identifying reactions causally related to hemolytic anemia, predicting drug targets for hypercholesterolemia17, studying disease co-morbidity19 and pinpointing diagnostic biomarkers for inborn errors of metabolism20. While these human models are genome-wide and are thus not specific to any mature cell or tissue-type, they have also successfully served as a basis for generating context-specific models of tissues11, 21, 22 and for studying cancer metabolism23, 24. Building such context-specific models typically requires identifying those reactions in a genome-wide model that need not be considered due to low gene expression or other omics-data).
A metabolic network consisting of m metabolites and n reactions can be represented by a stoichiometric matrix S, where the entry Sij represents the stoichiometric coefficient of metabolite i in reaction j25. A CBM model imposes mass balance, directionality and flux capacity constraints on the space of possible fluxes in the metabolic network's reactions through the set of linear equations:
S·v=0 (1)
v
min
≦v≦ (2)
where v is the flux vector for all of the reactions in the model (i.e. the flux distribution). The exchange of metabolites with the environment is represented as a set of exchange (transport) reactions, enabling a pre-defined set of metabolites to be either taken up from, or secreted to, the growth medium. The steady-state assumption represented in Equation (1) constrains the production rate of each metabolite to be equal to its consumption rate. Enzymatic directionality and flux capacity constraints define lower and upper bounds on the fluxes and are embedded in Equation (2). Gene knock-outs are simulated by constraining the flux through the corresponding metabolic reaction to be zero.
Embodiments of the present invention provide a system and method for generating a personalized metabolic model. As explained below, embodiments utilize gene expression measurements and phenotypic data to produce personalized metabolic models that differ in the bounds set on the reaction fluxes.
Embodiments can be used to construct distinct metabolic models of cells that are similar in their gene expression signatures to produce differentiated, case-specific models. As explained below, a method of the invention modifies the upper bounds of only a subset of the reactions in a generic model which are associated with a predetermined phenotype (e.g., proliferation rate) in accordance with gene expression levels. This is in contrast, for example, to a prior art method known as “E-Flux”, which changes the bounds of all enzyme-associated reactions in the network26
The method of the invention uses three inputs: (1) for each of p cells, a level of a quantifiable phenotype of the cell under predetermined conditions, (2) a generic metabolic model to be personalized, involving reactions occurring in the p cells, and, (3) for each of the p cells, and for each of one or more enzymes catalyzing a reaction in the generic model, an expression level in the cell of the gene encoding for the enzyme.
The method is generic and can be used with any metabolic model of the cells, and any quantifiable phenotype. The generic metabolic model may be, for example, the human model27 and may be specified by the equations (1) and (2) above. The quantifiable phenotype may be, for example, the growth rate of the cell under predetermined conditions.
In order to understand the invention and to see how it may be carried out in practice, a preferred embodiment will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:
a shows a flow chart of an algorithm for generating a personalized metabolic model in accordance with one embodiment of the invention;
b shows a flow chart for a method for calculating minNormVal in accordance with one embodiment of the invention;
c shows a flow chart for a method for calculating maxNormVal.
a shows the Spearman correlation achieved by the different methods in predicting the individualized growth rates measurements across the HapMap and NCI-60 cell-lines;
b shows a comparison between mean predicted and measured growth rates across the four HapMap populations;
c shows a comparison between mean predicted and measured growth rates across the nine tumor types composing the NCI-60 collection;
d shows a distribution of hypergeometric P-values achieved by either the core set of essential genes predicted by the generic model (black); the addition of predictions in accordance with the invention (light blue); or by analyzing the gene expression alone (red);
a shows a comparison between measured and predicted drug response for the HapMap, CEU (Western European ancestry) and NCI-60 datasets;
b shows core metabolic pathways and metabolic enzymes that carry known and/or predicted cancer therapeutic targets, in which metabolic enzymes colored green are a subset of known cytostatic drug targets, metabolic enzymes colored red are those found in current clinical trials, out of which those marked by an asterisk were identified as selective targets by the invention, metabolic enzymes colored blue denote novel selective predictions by the invention;
c shows a schematic description of metabolic changes following Malonyl-CoA Decarboxylase (MLYCD) knockdown;
a shows MLYCD mRNA expression upon nucleofection with Non Targeting Control (NTC) and three independent siRNA constructs in HapMap (left) and RPMI-8226 (right) cells;
b shows cell proliferation of HapMap (left) and RPMI-8226 (right) cells upon MLYCD knockdown (siRNA);
c shows MLYCD mRNA expression upon infection with Non-Targeting Control (NTC) and two independent shRNA constructs in RPMI-8226 cells;
d shows the ratio between reduced (GSH) and oxidized (GSSG) glutathione in RPMI-8226 cells infected with the indicated constructs;
e shows cell proliferation of the indicated cell-lines in the presence or absence of 2 mM N-Acetyl Cysteine;
a shows MLYCD mRNA expression upon nucleofection with Non-Targeting Control (NTC) and three independent siRNA constructs in K562 leukemia cells (left), and cell proliferation of K562 cells (right) upon MLYCD knock-down (siRNA);
b shows different expression of MLYCD among the different cells-lines;
c shows the functional inactivation of MLYCD resulted in a substantial decrease of acyl-carnitines, consistent with the expected inhibition of carnitine-palmitoyl transferase (CPT) by Malonyl-Coa. ** P-value <0.01; *** P-value <0.001.
a is a flowchart of a method for generating a personalized metabolic model in accordance with one embodiment of the invention.
In step 2, a generic metabolic model of an organism is provided in which the forward and backward reaction of each reversible reaction in the generic model appears separately. In step 4, for each of two or more reactions in the metabolic model, an expression level of the reaction in each cell in a population of p predetermined cells is provided. The expression of a reaction in a cell may be defined, for example, as the mean expression level of the genes encoding the catalyzing enzymes of the reaction. A significance threshold is set, for example, by a false discovery rate (FDR) analysis as disclosed, for example, in Benjamini et al.28 with α=0.05.
In step 6, a level of a predetermined quantifiable phenotype is provided for each of the p cells. In step 8, for each reaction Ri in the generic metabolic model, a correlation ρ(i) is calculated between (a) the expression level of the gene or genes encoding for the enzyme catalyzing the reaction and (b) the level of the quantifiable phenotype of the cell. The correlation is calculated over the population of the p cells.
Then, in step 10, a set t of reactions, referred to herein as ‘the highly correlated reactions’, is identified whose correlation with the phenotype level is above a predetermined threshold.
In step 12, a (t×p) expression matrix, referred to herein as the “Exp-matrix”, is calculated. The Exp-matrix has elements Ei,j, given by
where GEi,j is the input expression level of reaction i in the cell j.
The Exp-matrix embeds information relating to the direction and magnitude of change of the upper bound based on the expression levels. Due to the factor ρi/|ρi| for reactions whose expression is positively correlated with growth rate (ρ(i)>0), as the expression GEi,j increases, Eij also increases (becomes more positive). For negatively correlated reactions (ρ(i)<0), as the expression, GEij, increases, Eij decreases (becomes more negative).
In step 14, the values of the Exp-matrix are preferably normalized in order to adapt the values of the Exp-matrix to the actual upper bounds. In the normalization procedure, each reaction i is normalized across the p cells so that (a) the lower bound associated with the cell having the lowest expression value is assigned the minimal value of the normalization range, and (b) the upper bound associated with the cell having the highest expression value is assigned the maximal value of the normalization range.
The process then terminates.
The normalization of the values of the Exp-matrix in step 14 may be performed, for example, by calculating a matrix UBij given by the algebraic expression:
minNormVal may be computed, for example, using a Flux Variability Analysis, for example, as disclosed in Varma et al.29, over the set of essential reactions in the model. In order to define the maximal value of this range (maxNormVal), changes in the biomass production may be examined as a function of the generic model's upper bound. maxNormVal can then be defined as the value beyond which the resulting changes in the biomass production become smaller.
b is a flowchart for a method for calculating minNormVal. In step 24 the set of essential reactions in the generic model is identified, for example, using a Flux Balance Analysis53. This set is composed of those reactions whose knockout reduces the phenotype level by more than a predetermined level, such as 90% of its maximal level.
In step 26, the minimal flux through each essential reaction is calculated, for example using a Flux Variability Analysis, such as described in Varma et al.29. As each of these reactions is necessary for biomass production, reducing the upper bound of a reaction below its minimal flux value would result in cell death. In step 27, minNormVal is calculated, for example, as the maximal value of minimal flux of all of the essential reactions. The process then terminates.
c shows a flow chart for a method for calculating maxNormVal. In step 28, the upper bounds of the reactions highly correlated with the predetermined phenotype are set, and in step 30, a rate of biomass production is calculated. Then, in step 32, the upper bounds of the highly correlated reactions are increased by a small increment, such as of 0.1, and in step 34 the rate of biomass production is recalculated. In step 36 it is determined whether the rate of biomass production decreased as a result of the last increase in the upper bounds. If yes, then in step 40, maxNormVal is calculated as the maximal value of the current values of the upper bounds. If in step 36 it is determined that the rate of biomass production did not decrease as a result of the last increase in the upper bounds, the process returns to step 32 with the upper bounds being again increased by a small increment.
The system has other features. A network interface 148 may provide a direct connection to a remote server via a telephone link or to the Internet. Network interface 148 may also connect to a local area network (LAN) or other network interconnecting many computer systems. Many other devices or subsystems (not shown) may be connected in a similar manner. Furthermore, it is not necessary for all of the devices shown in Supplementary
Although the above has been described generally in terms of specific hardware, it would be readily apparent to one of ordinary skill in the art that many system types, configurations, and combinations of the above devices are suitable for use in light of the present disclosure. Of course, the types of system elements used depend highly upon the application.
The invention was applied to a dataset composed of 224 lymphoblast cell-lines from the HapMap project30, for which both gene expression and growth rate measurements are available31. This dataset is composed of cell-lines taken from healthy human individuals from four different populations, including Caucasian (CEU), African (YRI), Chinese (CHB) and Japanese (JPT) ethnicities. Applying the invention to the generic human model17, the corresponding 224 personalized metabolic models were constructed, one for each cell-line. The correlation between the growth rates predicted by these models and those measured experimentally is highly significant (Spearman R=0.44, P-value=5.87e-12,
Embodiments of the invention was also applied to build individual models and predict the proliferation rates of 60 cancer cell-lines (the NCI-60 dataset33), obtaining a highly significant correlation between the measured and predicted growth rates (Spearman R=0.69, P-value=1.22e-9,
Two contemporary model-building methods failed to achieve highly significant results in these tasks: iMAT, an omics-integration method that defines a subset of active and inactive reactions based on expression data21, resulted in insignificant or even negative correlations between the actual and predicted growth rates for both datasets (HapMap: Spearman R=0.03, P-value=0.66; NCI-60: Spearman R=−0.32, P-value=0.01,
To further test the universality of growth-associated genes and reactions that have been identified across the NCI-60 cancer cell-lines, an embodiment of the invention was used to build a metabolic model for each of the 99 cancer cell-lines found in the Achilles dataset34, 35. Specifically, the set of growth-associated genes identified in the NCI-60 dataset, together with the Achilles cell-lines' gene expression, was used to reconstruct the corresponding cell specific models. Comparing the predicted and measured growth rates, a significant correlation was found (Spearman R=0.65, P-value=0.01). Finally, the ability of the Achilles models and of the generic human model to predict gene essentiality was examined. The core set of essential genes identified by the generic model was predictive for up to 45 cell-lines (hyper-geometric P-value <0.05, FDR corrected with α=0.1,
Predicting Individual Cell Line Drug Response
To study the potential utility of the invention in guiding personalized medicine therapeutic decisions, the HapMap and NCI-60 PRIME models were used to predict the response of each individual cell-line to various metabolic drugs, and compared the predicted response with the response measured in vitro31, 36-39 (Tables 2, 3, and 4). Overall, the analysis was predictive for 12 out of 16 drugs tested in the HapMap and the NCI-60 datasets (
a. shows a comparison between the measured and predicted drug response for the HapMap, CEU (Western European ancestry) and NCI-60 datasets. Overall, significant correlations (Spearman P-value <0.05) were obtained for 12 out of the 16 drugs examined (those marked with an asterisk). The HapMap drugs are 5-fluorouracil (5FU) and 6-mercaptopurine (6MP); the CEU drugs are Ethacrynic acid, Hexachlorophene, Digoxin, Azathioprine, Reserpine and Pyrimethamine; The NCI-60 drugs for dataset 1 include Gemcitabine, Methotrexate and Pyrimethamine; For dataset 2, Trimetrexate and Gemcitabine; For dataset 3, Methotrexate, Quinacrine HCl and Allopurinol.
Predicting Cancer Drug Targets with a Selective Effect on Cell Proliferation
The array of models built for both normal (HapMap) and cancer (NCI-60) proliferating cells allows searching for selective drug targets, i.e., those that would kill cancer cells without disrupting the proliferation of normal cells. This approach differs from most of the existing cytotoxic strategies, which affect both normal and cancer cells. All knockdowns of individual reactions in the normal lymphoblasts and cancer cell models were simulated, and their selective effect on the proliferation of cancerous versus normal cells was quantified. Of interest are potential drug targets affecting both normal and cancer cell proliferation (non-selective) and those affecting mainly cancer cell proliferation (selective).
The set of predicted non-selective targets was highly enriched with known cytostatic drugs23, 40 (mean hypergeometric P-value=7.28e-4,
The prediction that cancer cells are selectively sensitive to MLYCD inhibition was tested experimentally. MLYCD was suppressed in the NCI-60 leukemia cell-line RPMI-8226 and in the control lymphoblast HapMap cell-line using small interfering RNA (siRNA) (
The embodiment was then used to investigate the “metabolic rewiring” that occurs upon MLYCD inactivation. The model predicted that MLYCD suppression would divert the accumulated malonyl-CoA to fatty acid biosynthesis, increasing the demand of cells for reducing power (
a shows MLYCD mRNA expression upon nucleofection with Non Targeting Control (NTC) and three independent siRNA constructs in HapMap (left) and RPMI-8226 (right) cells.
Reconstructing Personalized Metabolic Models of Breast Cancer Clinical Samples and Predicting their Survival
The ability of the model to predict a patient's prognosis based on their biopsy samples was also examined. To this end, gene expression data of tumor biopsies taken from four cohorts of breast cancer patients was utilized. This type of cancer cell was used due to the large availability of clinical data that is associated with the gene expression of each sample, including more than 2000 breast cancer patients44-46. The model was applied to reconstruct personalized metabolic models for each individual sample and to assess its maximal growth rate. A Kaplan-Meier survival analysis47 showed that patients with predicted high growth rate had significantly improved survival than those with a predicted low growth rate, in all four cohorts (log rank P-values are: 4.04e-4, 1.88e-4, 0.03 and 5.92e-7 for Miller et al., Chang et al., Pawitan et al. and Curtis et al. respectively,
The finding that high growth rate is a good prognostic sign may seem counter-intuitive at first. However, two key factors may explain this apparent contradiction:
(1) This finding is in accordance with the prevailing notion that proliferating cells tend to better respond to chemotherapy51-53. Indeed, when performing this analysis separately for chemotherapy-treated and non-treated patients, it was found that the association between growth rate and favored prognosis is more prominent in the treated group (Curtis et al., log rank P-values: 0.01 and 0.07 for treated versus non-treated patients, respectively; Chang et al., log rank P-values: 1e-3 and 0.02 for treated versus non-treated patients, respectively). Moreover, applying the invention to reconstruct metabolic models of chemo-sensitive and chemo-resistant cancer cells, we find that the former have a significantly higher predicted growth rate (one-sided Wilcoxon P-value=1.35e-4,
(2) Another explanation that may underlie this apparent contradiction is the observation that high migratory cells such as late stage metastatic cancer cells tend to have lower growth rates54, probably due to the diversion of energy resources from biomass production to other cellular tasks, such as motility and counteracting rising radical oxygen levels54, 55. Indeed, it was found that higher stage tumors have significantly lower predicted growth rates in all four datasets (one-sided Wilcoxon P-value <0.05, Table 9). Combining these two observations together, it was examined to what extent the predicted growth rate is a prognostic marker when performing the analysis for each cancer stage separately and then considering treated versus non-treated patients42. It was found that the association between predicted growth rates and better survival only holds for the treated patients in each stage (log rank P-value <0.05, Table 10). These results suggest that it is likely that both explanations may underlie the observed association, with drug responsiveness possibly playing a larger role.
Methods
Drug Response Simulations
Each drug is mapped to its corresponding metabolic reaction through its known enzymatic targets according to the DrugBank database40. The drug response data used in this analysis was measured in various ways:
(a) ATP concentrations (HapMap dataset). In this case the in silico drug response was computed in two steps. Firstly, a wild-type flux distribution was obtained via Flux Balance Analysis, in which the corresponding drug target reaction is initially forced to be active in the pre-drug condition. Enforcing the target reaction to be active is necessary in order to get any effect on the resulting flux distribution, following inhibition of the target reaction, to be simulated in the next step. Here a positive flux was enforced through the target reactions that is 50% of the maximal flux rate it is capable of carrying (the results are robust to various activation thresholds. (2) Next, the knockout flux distribution is computed via MOMA (Minimization Of Metabolic Adjustment)56 while constraining the flux through the corresponding reactions to zero. This process is repeated for each personalized model separately and the predicted ATP production is used to estimate the cell response to the simulated drug. A robustness analysis is carried out by using 1000 different wild-type flux distributions.
(b) AC50 values (CEU dataset); AC50 values represent the concentration in which the drug exhibits 50% of its maximum efficacy. In this case, in silico AC50 values are calculated by estimating the maximal flux rate carried by the target reaction when the growth rate is set to 50% of the drug's maximal response (a value that was available in the dataset used36).
(c) IC50 values (NCI-60 dataset); IC50 values represent the concentration of drug needed in order to reduce the growth to 50% of its maximal value. In this case, in silico IC50 values are calculated by estimating the maximal flux rate carried by the target reaction when the growth rate is set to 50% of its maximal value. In all cases of drug response simulations, the empirical significance test is carried out by permuting the measured data 1000 times and estimating the resulting correlation for each permuted vector.
Gene Essentiality Analysis
The effect of reaction deletion on cell proliferation was simulated in a similar manner to that described above in reference to the drug response simulation (part a) with the difference of optimizing for maximal growth rate rather than ATP production and the induction of partial knockouts (i.e., reducing reaction flux level to 10-30% of its maximal flux).
In order to compare the model predictions to the experimental shRNA scores provided per gene, each reaction is assigned the minimal shRNA score of its catalyzing enzymes. Both experimental and predicted data are represented as the log-transformed fold-change in the cell growth. A hyper-geometric test is then applied in order to determine whether there is a significant overlap between experimental and predicted growth reducing genes. For evaluating the performance of the generic model, the model's single set of predicted growth reducing genes is compared with the experimentally determined growth reducing genes in each cell-line considered alone. The predictive value of the personalized models generated is then evaluated by extending the set of growth-reducing genes identified by the generic model with additional specific growth-reducing genes identified by the corresponding generated personalized model, comparing them again to the experimentally determined growth reducing genes in each of the cell-lines. In order to evaluate the predictive power of the gene expression alone, whether or not there is a significant overlap between growth reducing genes and highly expressed genes was examined. In this regard, highly expressed genes are defined as the set of genes whose expression is higher than the mean+half the standard deviation of the expression of the genes in each cell-line. Defining the set of essential genes in each dataset is threshold dependent. The results were obtained by comparing the set of genes that reduce growth rate by at least 25%. Examining alternative thresholds between 10-60%, the invention was found to enhance the prediction of the generic model for 40-80% of the cell-lines.
Drug Selectivity Analysis
The effect of a reaction's deletion on cell proliferation for the identification of selective treatment was simulated in a similar manner to that described above in reference to gene essentiality. The overlap between the set of known cytostatic drug targets and those predicted, was found to be robust to different thresholds that determine the value (in percentage) under which the deletion is considered to effect the cell's proliferation rate (Table 5). The set of selective reaction targets is composed of those that reduce the growth of all normal cells by less than 20% and the growth of all cancer cells by more than 20%. Additionally, this set includes only those reactions that exhibit more than a 20% difference in growth reduction between the normal and cancer proliferating cells (Table 6). The enrichment of selective and non-selective targets within genes whose low expression is associated with an improved survival was calculated by assigning each reaction the minimal log rank p-value of its catalyzing enzymes. The log rank p-values were determined by a Kaplan-Meier survival analysis performed for each gene separately. A hyper-geometric test was then applied to determine the significance level of this analysis.
Constructing Personalized Metabolic Models for the Achilles Cell-Lines and Breast Cancer Patients
The set of growth-associated reactions identified in the NCI-60 dataset was utilized as input in the construction process of both the Achilles cell-line models and of the breast cancer patients' models. The method then proceeds by adjusting the bounds of this set of reactions according to the specific cell expression levels.
The Achilles dataset contains many samples having the exact same growth rate (only 15 unique growth rate values for 99 cell-lines). Therefore, the correlation between these measured values and the predicted growth rates is obtained by aggregating all cell-lines having the same growth rate, and comparing these unique values to the averaged predicted growth rates of the corresponding cell-lines.
Experimental Procedures
Cell Culture
HapMap cells (GM06997, CEPH/UTAH pedigree 13291) were obtained from Coriell Institute and RPMI-8226 and K562 cells were obtained from NCI-Frederick Cancer DCTD Tumor/Cell line Repository. Cells were grown in RPMI 1640 plus 10% fetal bovine serum (FBS) in the presence of 5% carbon dioxide. Cell count was determined using a CASY Cell Counter (Roche Applied Science). Where indicated, cells were incubated with 2 mM N-acetyl-cysteine.
MLYCD Silencing
siRNA
2×106 Cells were nucleofected using Nucleofector I (Amaxa) and Amaxa Cell Line Nucleofector Solution Kit C (Lonza), program A-030 and 1 μM siRNA. The MLYCD-targeting siRNA constructs were purchased from Sigma Aldrich and are as follows:
The non-targeting siRNA is the MISSION siRNA Universal Negative Control #1 (Sigma Aldrich).
shRNA
The viral supernatant for infection was obtained from the filtered growth media of the packaging cells HEK293T transfected with 3 μg psPAX, 1 μg pVSVG, 4 μg of shRNA contructs and 24 μl Lipofectamine 2000 (Life Technology) and the relevant shRNAs. 5×105 cells were then plated on 6-well plates and infected with the viral supernatant in the presence of 4 μg/mL polybrene. After two days, the medium was replaced with selection medium containing 2 μg/mL puromycin.
The expression of the shRNA constructs was induced by incubating cells with 2 μg/mL Doxycyclin.
The shRNA sequences were purchased from Thermoscientific and were as follows:
Glutathione Measurements
Glutathione levels were measured using the GSH-Glo Glutathione Assay (Promega) after 72 hours of Doxyclyclin induction, using 20 μL/well of 2×105 cells/mL in accordance with the manufacturer's instructions.
qPCR Experiments.
mRNA was extracted with RNeasy Kit (Qiagen) and 1 μg of mRNA was retro-transcribed into cDNA using the High Capacity RNA-to-cDNA Kit (Applied Biosystems, Life Technologies, Paisley, UK).
For the qPCR reactions, 0.5 μM primers were used. 1 μL of Fast Sybr green gene expression master mix; 1 μL of each primer and 4 μL of 1:10 dilution of cDNA in a final volume of 20 μL were used. Real-time PCR was performed using the Step One Real-Time PCR System (Life Technologies Corporation Carlsbad, Calif.) using the fast Sybr green program. Expression levels of the indicated genes were calculated using the ΔΔCt method by the appropriate function of the software using actin as calibrator.
Primer sequences are as follows:
LC-MS Metabolomic Analysis.
For liquid chromatography separation, column A was the Sequant Zic-Hilic (150 mm×4.6 mm, internal diameter (i.d.) 5 μm) with a guard column (20 mm×2.1 mm i.d. 5 μm) from HiChrom, Reading, UK. Mobile phase A: 0.1% formic acid v/v in water. Mobile B: 0.1% formic acid v/v in acetonitrile. The flow rate was kept at 300 μL/min and the gradient was as follows: 0 minutes 80% of B, 12 minutes 50% of B, 26 minutes 50% of B, 28 minutes 20% of B, 36 minutes 20% of B, 37-45 minutes 80% of B. Column B was the sequant Zic-pHilic (150 mm×2.1 mm i.d. 5 μm) with the guard column (20 mm×2.1 mm i.d. 5 μm) from HiChrom, Reading, UK. Mobile phase C: 20 mM ammonium carbonate plus 0.1% ammonia hydroxide in water. Mobile phase D: acetonitrile. The flow rate was kept at 100 μL/minute and gradient as follow: 0 minutes 80% of D, 30 minutes 20% of D, 31 minutes 80% of D, 45 minutes 80% of D. The mass spectrometer (Thermo Q-Exactive Orbitrap) was operated in a polarity switching mode.
Metabolomic Extraction of Cell-Lines.
5×105 cells/mL were plated onto 6-well plates and cultured in standard medium for 24 hours. For the intracellular metabolomic analysis, cells were quickly washed three times with phosphate buffer saline solution (PBS) to remove contamination from the media. The PBS was thoroughly aspirated and cells were lysed by adding a pre-cooled Extraction Solution (Methanol: Acetonitrile: Water 50:30:20). The cell number was counted and cells were lysed in 1 ml of Extraction Solution per 2×106 cells. The cell lysates were vortexed for 5 minutes at 4° C. and immediately centrifuged at 16,000 g for 15 minutes at 0° C.
Datasets
Expression data and growth rate measurements for the HapMap dataset were taken from Choy et al.31. The data includes Utah residents with Northern and Western European ancestry (CEU; 56 samples), Han Chinese in Beijing, China (CHB; 43 samples), Japanese in Tokyo, Japan (JPT; 43 samples) and Yoruba from Ibadan, Nigeria (YRI; 82 samples). Expression data for the NCI-60 dataset was taken from Lee et al.33. Doubling times for the NCI-60 cell lines were downloaded from the website of the Developmental Therapeutics Program (DTP) at NCI/NIH (http://dtp.nci.nih.gov/docs/misc/common_files/cell_list.html). Expression data and shRNA screen data for the Achilles cell-lines were downloaded from the Cancer Cell Line Encyclopedia and Achilles project websites.
To control for the indirect usage of growth rate measurement in the application of the method of the invention, a 5-fold cross validation analysis was employed for the HapMap dataset and a 4-fold cross validation analysis for the NCI-60 dataset which was repeated 1000 times for each dataset. To further evaluate the significance of the results, an empirical test was conducted by comparing the results to those obtained by random sets of the HapMap or the NCI-60 models, generated by permuting the gene expression values 1000 times. For both datasets, the original results were found to be highly significant (empiric P-value <9e-4). An additional permutation test was conducted by choosing a random set of genes at the size of the growth-associated genes, and utilizing them in the personalized model building process. Repeating this procedure 1000 times for each dataset resulted again in highly significant results (empiric P-value <9e-4).
Correlation Results for the HapMap and NCI-60 Datasets when Modifying the Bounds on all Enzyme-Associated Reactions
2. Personalized Metabolic Models Predict Individual Cell Lines' Drug Response
Drug response data used in this dataset is given as ATP concentration levels31. For each of the lymphoblast models and for each activation threshold (see Methods) the solution space was sampled and 1000 different flux distributions were obtained based on Bordel et al57. The results shown in Table 2 are the mean Spearman R correlation between measured and predicted drug response obtained by utilizing each of the 1000 sampled flux distributions as the wild-type flux distribution. The partial correlation results, i.e., controlling for the measured growth rate, are also reported in Table 2.
Drug response data used in this dataset is given as AC50 values36. AC50 values represent the concentration in which the drug exhibits 50% of its maximum efficacy. In this case, in silico AC50 values were calculated by estimating the maximal flux rate carried by the target reaction when the growth rate is set to 50% of the drug's maximal response (a value that was available in the dataset used36). The partial correlation results (i.e., controlling for the measured growth rate) are reported in Table 3.
Table 4 shows drug response data from this dataset, given as IC50 values37-39. IC50 values represent the concentration of drug needed in order to reduce the growth to 50% of its maximal value. In this case, in silico IC50 values are calculated by estimating the maximal flux rate carried by the target reaction when growth rate is set to 50% of its maximal value. The partial correlation results (i.e., controlling for the measured growth rate) in this case were insignificant (P-value >0.05).
In cases where multiple reactions are associated with a drug's targets, the analysis was done for each reaction separately and the reported correlation is the mean over the correlation obtained for each reaction alone. In most cases similar correlations were obtained for different reactions targeted by the same drug. Two exceptions were the drugs Ethacrynic acid and Hexachlorophene in which a significant correlation of 0.24 was obtained for only one of the target reactions.
3. Personalized Metabolic Models Predict Drug Targets with a Selective Effect on Cell Proliferation
Enrichment Analysis of Currently Used Cytostatic Drugs:
The effect of a reaction's deletion on cell proliferation for the identification of selective treatment was simulated as described in the Methods section of this document. The overlap between the set of 24 cytostatic drug targets taken from Folger et al.23 (drugs classified as “Metabolic Anticancer Drug Targets”), and the predictions were found to be robust to different thresholds that determine the value (in percentage) under which the deletion is considered to effect the cell's proliferation rate. Table 5 describes the enrichment found for different thresholds. The mean over these values is the one reported herein. Additionally, the hyper-geometric enrichment of the predicted non-selective targets within genes whose low expression is significantly associated with improved survival was examined. The enrichment results are also reported in Table 7.
MLYCD Experiments
a shows MLYCD mRNA expression upon nucleofection with Non Targeting Control (NTC) and three independent siRNA constructs in K562 leukemia cells (left), and cell proliferation of K562 cells (right) upon MLYCD knock-down (siRNA).
Flux Analysis for MLYCD Knockdown
Utilizing the RPMI-8226 model, 1000 wild-type flux distributions were first obtained via Flux Balance Analysis (FBA), in which the MLYCD reaction is forced to be active at a rate that is at least 50% of the maximal flux rate it can carry. Next, the knockout flux distribution was computed via MOMA while constraining the flux through the corresponding reactions to zero. Utilizing the 1000 pre- and post-knockout flux distributions, a one-sided Wilcoxon ranksum test was applied to determine reactions whose flux has been significantly increased or decreased. In Table 7 reactions whose flux was significantly decreased are underlined. The flux of all other reactions was significantly increased. Specifically, an increase in the oxidative-branch of the pentose phosphate pathway that generates NADPH was found. NADPH is necessary in order to meet the increasing demands of the fatty acid synthesis pathway. This in turn results in a decreased flux through the reaction that generates reduced glutathione and then utilizes it to detoxify ROS.
Reconstructing Personalized Metabolic Models from Clinical Samples Obtained from Cancer Patients
Reconstructing Personalized Metabolic Models for Chemo-Sensitive and Chemo-Resistant Cancer Cell Lines:
The system was used to construct personalized metabolic models of 35 chemo-sensitive and chemo-resistant cancer cells, both cell-lines and primary cultures of cancer patients58-61. Since growth rate measurements are unavailable in this case, the reconstruction process was done in a similar manner to that applied for the reconstruction of breast cancer models based on tumor biopsies expression levels. The predicted proliferation rates of the chemo-sensitive cells was found to be significantly higher than those of the chemo-resistant cells (one-sided Wilcoxon P-value=1.35e-4) in line with the observation that highly proliferating cells tend to better respond to chemotherapy51-53.
Reconstructing Personalized Metabolic Models for Breast Cancer Patients:
Clinical predictors of breast cancer survival that were available in each of the datasets are:
The Chang et al. dataset includes the information on whether or not the patient has developed metastasis, and it was found that patients with metastasis have a significantly lower predicted growth rate (one-sided Wilcoxon test=0.02).
Table 10 shows the predicted growth rate as a prognostic marker for patient survival when performing the analysis for each cancer stage separately and then considering treated versus non-treated patients. The results in Table 10 represent the log rank P-value and where performed using the dataset in Curtis et al.42 where sufficient number of samples in each group was available
For stage 1 patients the sample size for treated patients was too small (only 15 patients) and hence was excluded from the analysis.
Estimating the Predictive Power of the Gene Expression Derived Growth Rates in Predicting Patient Prognosis:
To examine the potential added value of personalized metabolic models in predicting patient prognosis, the survival analysis was repeated using the raw gene expression alone. The NCI-60 gene expression values of the set of genes that are most correlated to the growth rate measurements in this dataset (with FDR and α=0.05) were utilized, and multiple linear regression analysis was applied to estimate the model's parameters. The model's parameters were then used to predict the growth rates of the individual samples found in the four cohorts analyzed. Repeating the Kaplan-Meier analysis with the new estimated growth rates, a significant result was obtained only for the Chang et al. dataset (P-value=0.0023), indicating that in this dataset, high growth rate is a sign of good prognosis.
The present application claims the benefit of U.S. Provisional Patent Application No. 61/760,731, entitled “System and Method for Personalized Metabolic Modeling”, filed on Feb. 5, 2013, the disclosure of which is incorporated herein by reference in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
61760731 | Feb 2013 | US |